Consistency of aberrant response behavior: Are misfit persons consistent across two different questionnaires administered at the same time?

Adams, R.J., Wilson, M., & Wang, W. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21(1), 1-23. https://doi.org/10.1177/0146621697211001 Search in Google Scholar

Alnahdi, G. H., & Yada, A. (2020). Rasch analysis of the Japanese version of Teacher Efficacy for Inclusive Practices Scale: Scale unidimensionality. Frontiers in Psychology, 11: 1725. https://doi.org/10.3389/fpsyg.2020.01725 Search in Google Scholar

American Educational Research Association (AERA), American Psychological Association (APA), National Council for Measurement in Education (NCME). (2014). Standards for educational and psychological testing. American Educational Research Association. Search in Google Scholar

André, Q. (2022). Outlier exclusion procedures must be blind to the researcher’s hypothesis. Journal of Experimental Psychology: General, 151(1), 213–223. https://doi.org/10.1037/xge0001069 Search in Google Scholar

Andrich, D., & Marais, I. (2014). Person proficiency estimates in the dichotomous rasch model when random guessing is removed from difficulty estimates of multiple choice items. Applied Psychological Measurement, 38(6), 432-449. https://doi.org/10.1177/0146621614529646 Search in Google Scholar

Andrich, D., Marais, I., & Humphry, S. (2016). Controlling guessing bias in the dichotomous Rasch model applied to a large-scale, vertically scaled testing program. Educational and Psychological Measurement, 76(3), 412-435. https://doi.org/10.1177/0013164415594202 Search in Google Scholar

Artner, R. (2016). A simulation study of person-fit in the Rasch model. Psychological Test and Assessment Modeling, 58(3), 531–563. Search in Google Scholar

Bond, T. G., & Fox, C. M. (2007). Applying the Rasch model: Fundamental measurement in the human sciences (2nd ed.). Lawrence Erlbaum Associates. Search in Google Scholar

Briz-Redon, A. (2021). Respondent burden effects on item non-response and careless response rates: An analysis of two types of surveys. Mathematics, 9(17), 2035. https://doi.org/10.3390/math9172035 Search in Google Scholar

Burchell, B., & Marsh, C. (1992). The effect of questionnaire length on survey response. Quality and Quantity, 26(3), 233-244. https://doi.org/10.1007/BF00172427 Search in Google Scholar

Chalmers, R. P. (2012). mirt: A Multidimensional Item Response Theory Package for the R Environment. Journal of Statistical Software, 48(6), 1-29. https://doi.org/10.18637/jss.v048.i06 Search in Google Scholar

Conjin, J. M., Emons, W. H. M., & Sijtsma, K. (2014). Statistic lz-based person-fit methods for noncognitive multiscale measures. Applied Psychological Measurement, 38(2), 122-136. https://doi.org/10.1177/0146621613497568 Search in Google Scholar

Crişan, D. R., Tendeiro, J. N., & Meijer, R. R. (2017). Investigating the practical consequences of model misfit in unidimensional IRT models. Applied Psychological Measurement, 41(6), 439–455. https://doi.org/10.1177/0146621617695522 Search in Google Scholar

Curtis, D. D. (2001). Misfits: People and their problems. What might it all mean? International Education Journal, 2(4), 91-99. Search in Google Scholar

Curtis, D. D. (2004). Person misfit in attitude surveys: Influences, impacts and implications. International Education Journal, 5(2), 125-144. Search in Google Scholar

Drasgow, F., Levine, M. V., & William, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38(1), 67-86. https://doi.org/10.1111/j.2044-8317.1985.tb00817.x Search in Google Scholar

Du, J., Wang, Y., Wu, A., Jiang, Y., Duan, Y., Geng, W., Wan, L., Li, J., Hu, J., Jiang, J., Shi, L., & Wei, J. (2024). The validity and IRT psychometric analysis of Chinese version of Difficult Doctor-Patient Relationship Questionnaire (DDPRQ-10). BMC Psychiatry, 23: 900. https://doi.org/10.1186/s12888-023-05385-5 Search in Google Scholar

Egberink, I. J. L., Meijer, R. R., Veldkamp, B. P., Schakel, L., & Smid, N. G. (2010). Detection of aberrant item score patterns in computerized adaptive testing: An empirical example using the CUSUM. Personality and Individual Differences, 48(8), 921-925. https://doi.org/10.1016/j.paid.2010.02.023 Search in Google Scholar

Emons, M. H. W., Sijtsma, K., & Meijer, R. R. (2005). Global, local, and graphical person fit analysis using person-response functions. Psychological Methods, 10(1), 101-119. https://doi.org/10.1037/1082-989X.10.1.101 Search in Google Scholar

Felt, J. M., Castaneda, R., Tiemensma, J., & Depaoli, S. (2017). Using person fit statistics to detect outliers in survey research. Frontiers in Psychology, 8: 863. https://doi.org/10.3389/fpsyg.2017.00863 Search in Google Scholar

Ferrando, P. J. (2015). Assessing person fit in typicalresponse measures. In S. P. Reise & D. A. Revicki (Eds.), Handbook of item response theory modeling: Applications to typical performance assessment (pp. 128–155). Routledge/Taylor & Francis Group. Search in Google Scholar

Ferrando, P. J., Vigil-Colet, A., & Lorenzo-Seva, U. (2016). Practical person-fit assessment with the linear FA model: New developments and a comparative study. Frontiers in Psychology, 7: 1973. https://doi.org/10.3389/fpsyg.2016.01973 Search in Google Scholar

Haberman, S. J., Sinharay, S., & Chon, K. H. (2013). Assessing item fit for unidimensional item response theory models using residuals from estimated item response functions. Psychometrika, 78(3), 417–440. https://doi.org/10.1007/s11336-012-9305-1 Search in Google Scholar

Hayat, B., Rahayu, W., Putra, M. D. K., Sarifah, I., Puri, V. G. S., & Isa, K. (2023). Metacognitive Skills Assessment in Research-Proposal Writing (MSARPW) in the Indonesian university context: Scale development and validation using multidimensional item response models. Jurnal Pengukuran Psikologi dan Pendidikan Indonesia, 12(1), 31-47. https://doi.org/10.15408/jp3i.v12i1.31679 Search in Google Scholar

Hong, S. E., Monroe, S., & Falk, C. F. (2020). Performance of person-fit statistics under model misspecification. Journal of Educational Measurement, 57(3), 423-442. https://doi.org/10.1111/jedm.12207 Search in Google Scholar

International Test Commission (ITC). (2014). ITC guidelines on quality control in scoring, test analysis, and reporting of test scores. International Journal of Testing, 14(3), 195-217. https://doi.org/10.1080/15305058.2014.918040 Search in Google Scholar

Jones, E. A., Wind, S. A., Tsai, C-L., & Ge, Y. (2023). Comparing person-fit and traditional indices across careless response patterns in surveys. Applied Psychological Measurement, 47(5-6), 365-385. https://doi.org/10.1177/01466216231194358 Search in Google Scholar

Karabatsos, G. (1998). Analyzing nonadditive conjoint structures: Compounding events by Rasch model probabilities. Journal of Outcome Measurement, 2(3), 191-221. Search in Google Scholar

Karabatsos, G. (2000). A critique of rasch residual fit statistics. Journal of Applied Measurement, 1(2), 152-176. Search in Google Scholar

Karabatsos, G. (2003). Comparing the aberrant response detection performance of thirty-six person-fit statistics. Applied Measurement in Education, 16(4), 277-298. https://doi.org/10.1207/S15324818AME1604_2 Search in Google Scholar

Levine, M. V., & Rubin, D. B. (1979). Measuring the appropriateness of multiple-choice test scores. Journal of Educational Statistics, 4(4), 269–290. https://doi.org/10.2307/1164595 Search in Google Scholar

Linacre, J. M. (2005). When to stop removing items and persons in Rasch analysis? Rasch Measurement Transactions, 23(4), 1241. Search in Google Scholar

Li, M.-n. F., & Olejnik, S. (1997). The power of Rasch person–fit statistics in detecting unusual response patterns. Applied Psychological Measurement, 21(3), 215–231. https://doi.org/10.1177/01466216970213002 Search in Google Scholar

Liu, Y., & Maydeu Olivares, A. (2014). Identifying the source of misfit in item response theory models. Multivariate Behavioral Research, 49(4), 354-371. https://doi.org/10.1080/00273171.2014.910744 Search in Google Scholar

Liu, Y., & Liu, H. (2021). Detecting noneffortful responses based on a residual method using an iterative purification process. Journal of Educational and Behavioral Statistics, 46(6), 717-752. https://doi.org/10.3102/1076998621994366 Search in Google Scholar

Liu, T., Lan, T., & Xin, T. (2019a). Detecting random responses in a personality scale using IRT-based personfit indices. European Journal of Psychological Assessment, 35(1), 126-136. https://doi.org/10.1027/1015-5759/a000369 Search in Google Scholar

Liu, T., Sun, Y., Li, Z., & Xin, T. (2019b). The impact of aberrant response on reliability and validity. Measurement: Interdisciplinary Research and Perspectives, 17(3), 133-142. https://doi.org/10.1080/15366367.2019.1584848 Search in Google Scholar

Lundgren, E., & Eklof, H. (2023). Questionnaire-taking motivation: Using response times to assess motivation to optimize on the PISA 2018 student questionnaire. International Journal of Testing, 23(4), 231-256. https://doi.org/10.1080/15305058.2023.2214647 Search in Google Scholar

Maroqi, N. (2018). Uji validitas konstruk pada instrumen Rosenberg Self-Esteem Scale dengan metode confirmatory factor analysis (CFA). Jurnal Pengukuran Psikologi dan Pendidikan Indonesia, 7(2), 92-96. https://doi.org/10.15408/jp3i.v7i2.12101 Search in Google Scholar

Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149–174. https://doi.org/10.1007/BF02296272 Search in Google Scholar

Maydeu-Olivares, A. (2013). What should we assess the goodness of fit of IRT models? Measurement, 11(3), 127-137. https://doi.org/10.1080/15366367.2013.841511 Search in Google Scholar

Meijer, R., R. (1996). Person fit research: An introduction. Applied Measurement In Education, 9(1), 3-8. https://doi.org/10.1207/s15324818ame0901_2 Search in Google Scholar

Meijer, R. R. (2003). Diagnosing item score patterns on a test using item response theory-based person-fit statistics. Psychological Methods, 8(1), 72–87. https://doi.org/10.1037/1082-989X.8.1.72 Search in Google Scholar

Meijer, R. R., & Sijtsma, K. (1995). Detection of aberrant item score patterns: A review of recent developments. Applied Measurement in Education, 8(3), 261–272. https://doi.org/10.1207/s15324818ame0803_5 Search in Google Scholar

Meijer, R. R., & Sijtsma, K. (2001). Methodology review: Evaluating person fit. Applied Psychological Measurement, 25(2), 107–135. https://doi.org/10.1177/01466210122031957 Search in Google Scholar

Meijer, R. R., & Tendeiro, J. N. (2012). The use of the lz and lz* person-fit statistics and problems derived from model misspecification. Journal of Educational and Behavioral Statistics, 37(6), 758-766. https://doi.org/10.3102/1076998612466144 Search in Google Scholar

Meijer, R. R., Niessen, M. S. A., & Tendeiro, N. J. (2016). A practical guide to check the consistency of item response patterns in clinical research through person fit statistics: examples and a computer program. Assessment, 23(1), 56-62. https://doi.org/10.1177/1073191115577800 Search in Google Scholar

Moshagen, M., & Bader, M. (2024). semPower: General power analysis for structural equation models. Behavior Research Methods, 56(4), 2901-2922. https://doi.org/10.3758/s13428-023-02254-7 Search in Google Scholar

Ogihara, Y., & Kusumi, T. (2020). The developmental trajectory of self-esteem across the life span in Japan: Age differences in scores on the Rosenberg Self-Esteem Scale from adolescence to old age. Frontiers in Public Health, 8: 132. https://doi.org/10.3389/fpubh.2020.00132 Search in Google Scholar

Olson J. F., & Fremer J. (2013). TILSA Test security guidebook: Preventing, detecting, and investigating test security irregularities. Council of Chief State School Officers. Search in Google Scholar

Panayides, P., & Tymms, P. (2012). Is aberrant response behavior a stable characteristic of students in classroom math tests? Rasch Measurement Transactions, 26(3), 1382-1383. Search in Google Scholar

Panayides, P., & Tymms, P. (2013). Investigating whether aberrant response behaviour in classroom maths tests is a stable characteristic of students. Assessment in Education: Principles, Policy & Practice, 20(3), 349-368. https://doi.org/10.1080/0969594x.2012.723610 Search in Google Scholar

Pina, J. A. L., & Montesinos, M. D. H. (2005). Fitting Rasch model using appropriateness measure statistics. The Spanish Journal of Psychology, 8(1), 100-110. https://doi.org/10.1017/S113874160000500X Search in Google Scholar

R Core Team. (2015). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/ Search in Google Scholar

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Danish Institute for Educational Research. Search in Google Scholar

Reise, S. P., & Flannery, W. P. (1996). Assessing person-fit on measures of typical performance. Applied Measurement in Education, 9(1), 9–26. https://doi.org/10.1207/s15324818ame0901_3 Search in Google Scholar

Rolstad, S., Adler, J., & Ryden, A. (2011). Response burden and questionnaire length: Is shorter better? A review and meta-analysis. Value in Health, 14(8), 1101-1108. https://doi.org/10.1016/j.jval.2011.06.003 Search in Google Scholar

Rosenberg, M. (1965). Rosenberg Self-Esteem Scale (RSES) [Database record]. APA PsycTests. https://doi.org/10.1037/t01038-000 Search in Google Scholar

Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2), 1–36. https://doi.org/10.18637/jss.v048.i02 Search in Google Scholar

Sijtsma, K., & Meijer, R. R. (2001). The person response function as a tool in person-fit research. Psychometrika, 66(2), 191–207. https://doi.org/10.1007/BF02294835 Search in Google Scholar

Smith, R. M. (1986). Person fit in the Rasch model. Educational and Psychological Measurement, 46(2), 359–372. https://doi.org/10.1177/001316448604600210 Search in Google Scholar

Spoden, C., Fleischer, J., & Frey, A. (2020). Person misfit, test anxiety, and test-taking motivation in a large-scale mathematics proficiency test for self-evaluation. Studies in Educational Evaluation, 67: 100910. https://doi.org/10.1016/j.stueduc.2020.100910 Search in Google Scholar

Tay, L., Meade, A. W., & Cao, M. (2015). An overview and practical guide to IRT measurement equivalence analysis. Organizational Research Methods, 18(1), 3-46. https://doi.org/10.1177/1094428114553062 Search in Google Scholar

Tesio, L., Caronni, A., Kumbhare, D., & Scarano, S. (2024a). Interpreting results from Rasch analysis 1. The “most likely” measures coming from the model. Disability and Rehabilitation, 46(3), 591–603. https://doi.org/10.1080/09638288.2023.2169771 Search in Google Scholar

Tesio, L., Caronni, A., Simone, A., Kumbhare, D., & Scarano, S. (2024b). Interpreting results from Rasch analysis 2. Advanced model applications and the data-model fit assessment. Disability and Rehabilitation, 46(3), 604–617. https://doi.org/10.1080/09638288.2023.2169772 Search in Google Scholar

Turner, K. T., & Engelhard, G., Jr. (2024). Using functional clustering to diagnose person misfit. Journal of Experimental Education, 92(2), 377–397. https://doi.org/10.1080/00220973.2022.2161088 Search in Google Scholar

van der Linden, W. J., & van Krimpen-Stoop, E. M. L. A. (2003). Using response times to detect aberrant responses in computerized adaptive testing. Psychometrika, 68(2), 251–265. https://doi.org/10.1007/BF02294800 Search in Google Scholar

Wanders, R. B. K., Meijer, R. R., Ruhé, H. G., Sytema, S., Wardenaar, K. J., & de Jonge, P. (2018). Person-fit feedback on inconsistent symptom reports in clinical depression care. Psychological Medicine, 48(11), 1844-1852. https://doi.org/10.1017/S003329171700335X Search in Google Scholar

Wang, W.-C., Chen, P.-H., & Cheng, Y.-Y. (2004). Improving measurement precision of test batteries using multidimensional item response models. Psychological Methods, 9(1), 116–136. https://doi.org/10.1037/1082-989X.9.1.116 Search in Google Scholar

Wind, A. S., & Schumacker, E. R. (2017). Detecting measurement disturbances in rater mediated assessments. Educational Measurement: Issues and Practice, 36(4), 44-51. https://doi.org/10.1111/emip.12164 Search in Google Scholar

Wright, B. D., & Stone, M. (1999). Measurement essentials (2nd ed.). Wide Range, Inc. Search in Google Scholar

Yekutieli, D., & Benjamini, Y. (1999). Resampling-based Search in Google Scholar

false discovery rate controlling multiple test procedures for correlated test statistics. Journal of Statistical Planning and Inference, 82(1-2), 171-196. https://doi.org/10.1016/S0378-3758(99)00041-5 Search in Google Scholar

Zahra, N. S., & Wirawan, H. (2024). Empowering digital transformation: Developing and validating a Digital Leadership Scale through Rasch model analysis. Measurement: Interdisciplinary Research and Perspectives. Advanced online publication. https://doi.org/10.1080/15366367.2024.2334591 Search in Google Scholar

Zou, D., & Bolt, D. M. (2023). Person misfit and person reliability in Rating Scale Measures: The role of response styles. Measurement: Interdisciplinary Research and Perspectives, 21(3), 167-180. https://doi.org/10.1080/15366367.2022.2114243 Search in Google Scholar

Idioma:: Inglés

Calendario de la edición:: 1 veces al año
Temas de la revista:: Ciencias sociales, Psicología, Psicología aplicada

RSS Feed de revista

Consistency of aberrant response behavior: Are misfit persons consistent across two different questionnaires administered at the same time?

Muhammad Dwirifqi Kharisma Putra

Faturochman,

Publicado en línea: 16 ago 2025

Páginas: 49 - 60

Recibido: 21 nov 2024

Aceptado: 28 jul 2025

DOI: https://doi.org/10.2478/rjap-2025-0006

Palabras claveaberrant response, person-fit, outfit statistics, Rasch measurement, Zh statistic

© 2025 Muhammad Dwirifqi Kharisma Putra et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Palabras clave
aberrant response, person-fit, outfit statistics, Rasch measurement, Zh statistic