Detecting Fraudulent Interviewers by Improved Clustering Methods – The Case of Falsifications of Answers to Parts of a Questionnaire

Falsified interviews represent a serious threat to empirical research based on survey data. The identification of such cases is important to ensure data quality. Applying cluster analysis to a set of indicators helps to identify suspicious interviewers when a substantial share of all of their interviews are complete falsifications, as shown by previous research. This analysis is extended to the case when only a share of questions within all interviews provided by an interviewer is fabricated. The assessment is based on synthetic datasets with a priori set properties. These are constructed from a unique experimental dataset containing both real and fabricated data for each respondent. Such a bootstrap approach makes it possible to evaluate the robustness of the method when the share of fabricated answers per interview decreases. The results indicate a substantial loss of discriminatory power in the standard cluster analysis if the share of fabricated answers within an interview becomes small. Using a novel cluster method which allows imposing constraints on cluster sizes, performance can be improved, in particular when only few falsifiers are present. This new approach will help to increase the robustness of survey data by detecting potential falsifiers more reliably.

Lingua:: Inglese

Frequenza di pubblicazione:: 4 volte all'anno
Argomenti della rivista:: Matematica, Probabilità e statistiche

Feed RSS della rivista

Detecting Fraudulent Interviewers by Improved Clustering Methods – The Case of Falsifications of Answers to Parts of a Questionnaire

Samuel De Haas

Peter Winker

Pubblicato online: 23 set 2016

Pagine: 643 - 660

Ricevuto: 01 apr 2015

Accettato: 01 nov 2015

DOI: https://doi.org/10.1515/jos-2016-0033

Parole chiaveSurvey data falsifications, partial falsifications, cluster analysis, constraint cluster analysis, bootstrap

© 2016 Samuel De Haas et al., published by De Gruyter Open

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Parole chiave
Survey data falsifications, partial falsifications, cluster analysis, constraint cluster analysis, bootstrap