[
Adams, R.J., Wilson, M., & Wang, W. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21(1), 1-23. https://doi.org/10.1177/0146621697211001
]Search in Google Scholar
[
Alnahdi, G. H., & Yada, A. (2020). Rasch analysis of the Japanese version of Teacher Efficacy for Inclusive Practices Scale: Scale unidimensionality. Frontiers in Psychology, 11: 1725. https://doi.org/10.3389/fpsyg.2020.01725
]Search in Google Scholar
[
American Educational Research Association (AERA), American Psychological Association (APA), National Council for Measurement in Education (NCME). (2014). Standards for educational and psychological testing. American Educational Research Association.
]Search in Google Scholar
[
André, Q. (2022). Outlier exclusion procedures must be blind to the researcher’s hypothesis. Journal of Experimental Psychology: General, 151(1), 213–223. https://doi.org/10.1037/xge0001069
]Search in Google Scholar
[
Andrich, D., & Marais, I. (2014). Person proficiency estimates in the dichotomous rasch model when random guessing is removed from difficulty estimates of multiple choice items. Applied Psychological Measurement, 38(6), 432-449. https://doi.org/10.1177/0146621614529646
]Search in Google Scholar
[
Andrich, D., Marais, I., & Humphry, S. (2016). Controlling guessing bias in the dichotomous Rasch model applied to a large-scale, vertically scaled testing program. Educational and Psychological Measurement, 76(3), 412-435. https://doi.org/10.1177/0013164415594202
]Search in Google Scholar
[
Artner, R. (2016). A simulation study of person-fit in the Rasch model. Psychological Test and Assessment Modeling, 58(3), 531–563.
]Search in Google Scholar
[
Bond, T. G., & Fox, C. M. (2007). Applying the Rasch model: Fundamental measurement in the human sciences (2nd ed.). Lawrence Erlbaum Associates.
]Search in Google Scholar
[
Briz-Redon, A. (2021). Respondent burden effects on item non-response and careless response rates: An analysis of two types of surveys. Mathematics, 9(17), 2035. https://doi.org/10.3390/math9172035
]Search in Google Scholar
[
Burchell, B., & Marsh, C. (1992). The effect of questionnaire length on survey response. Quality and Quantity, 26(3), 233-244. https://doi.org/10.1007/BF00172427
]Search in Google Scholar
[
Chalmers, R. P. (2012). mirt: A Multidimensional Item Response Theory Package for the R Environment. Journal of Statistical Software, 48(6), 1-29. https://doi.org/10.18637/jss.v048.i06
]Search in Google Scholar
[
Conjin, J. M., Emons, W. H. M., & Sijtsma, K. (2014). Statistic lz-based person-fit methods for noncognitive multiscale measures. Applied Psychological Measurement, 38(2), 122-136. https://doi.org/10.1177/0146621613497568
]Search in Google Scholar
[
Crişan, D. R., Tendeiro, J. N., & Meijer, R. R. (2017). Investigating the practical consequences of model misfit in unidimensional IRT models. Applied Psychological Measurement, 41(6), 439–455. https://doi.org/10.1177/0146621617695522
]Search in Google Scholar
[
Curtis, D. D. (2001). Misfits: People and their problems. What might it all mean? International Education Journal, 2(4), 91-99.
]Search in Google Scholar
[
Curtis, D. D. (2004). Person misfit in attitude surveys: Influences, impacts and implications. International Education Journal, 5(2), 125-144.
]Search in Google Scholar
[
Drasgow, F., Levine, M. V., & William, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38(1), 67-86. https://doi.org/10.1111/j.2044-8317.1985.tb00817.x
]Search in Google Scholar
[
Du, J., Wang, Y., Wu, A., Jiang, Y., Duan, Y., Geng, W., Wan, L., Li, J., Hu, J., Jiang, J., Shi, L., & Wei, J. (2024). The validity and IRT psychometric analysis of Chinese version of Difficult Doctor-Patient Relationship Questionnaire (DDPRQ-10). BMC Psychiatry, 23: 900. https://doi.org/10.1186/s12888-023-05385-5
]Search in Google Scholar
[
Egberink, I. J. L., Meijer, R. R., Veldkamp, B. P., Schakel, L., & Smid, N. G. (2010). Detection of aberrant item score patterns in computerized adaptive testing: An empirical example using the CUSUM. Personality and Individual Differences, 48(8), 921-925. https://doi.org/10.1016/j.paid.2010.02.023
]Search in Google Scholar
[
Emons, M. H. W., Sijtsma, K., & Meijer, R. R. (2005). Global, local, and graphical person fit analysis using person-response functions. Psychological Methods, 10(1), 101-119. https://doi.org/10.1037/1082-989X.10.1.101
]Search in Google Scholar
[
Felt, J. M., Castaneda, R., Tiemensma, J., & Depaoli, S. (2017). Using person fit statistics to detect outliers in survey research. Frontiers in Psychology, 8: 863. https://doi.org/10.3389/fpsyg.2017.00863
]Search in Google Scholar
[
Ferrando, P. J. (2015). Assessing person fit in typicalresponse measures. In S. P. Reise & D. A. Revicki (Eds.), Handbook of item response theory modeling: Applications to typical performance assessment (pp. 128–155). Routledge/Taylor & Francis Group.
]Search in Google Scholar
[
Ferrando, P. J., Vigil-Colet, A., & Lorenzo-Seva, U. (2016). Practical person-fit assessment with the linear FA model: New developments and a comparative study. Frontiers in Psychology, 7: 1973. https://doi.org/10.3389/fpsyg.2016.01973
]Search in Google Scholar
[
Haberman, S. J., Sinharay, S., & Chon, K. H. (2013). Assessing item fit for unidimensional item response theory models using residuals from estimated item response functions. Psychometrika, 78(3), 417–440. https://doi.org/10.1007/s11336-012-9305-1
]Search in Google Scholar
[
Hayat, B., Rahayu, W., Putra, M. D. K., Sarifah, I., Puri, V. G. S., & Isa, K. (2023). Metacognitive Skills Assessment in Research-Proposal Writing (MSARPW) in the Indonesian university context: Scale development and validation using multidimensional item response models. Jurnal Pengukuran Psikologi dan Pendidikan Indonesia, 12(1), 31-47. https://doi.org/10.15408/jp3i.v12i1.31679
]Search in Google Scholar
[
Hong, S. E., Monroe, S., & Falk, C. F. (2020). Performance of person-fit statistics under model misspecification. Journal of Educational Measurement, 57(3), 423-442. https://doi.org/10.1111/jedm.12207
]Search in Google Scholar
[
International Test Commission (ITC). (2014). ITC guidelines on quality control in scoring, test analysis, and reporting of test scores. International Journal of Testing, 14(3), 195-217. https://doi.org/10.1080/15305058.2014.918040
]Search in Google Scholar
[
Jones, E. A., Wind, S. A., Tsai, C-L., & Ge, Y. (2023). Comparing person-fit and traditional indices across careless response patterns in surveys. Applied Psychological Measurement, 47(5-6), 365-385. https://doi.org/10.1177/01466216231194358
]Search in Google Scholar
[
Karabatsos, G. (1998). Analyzing nonadditive conjoint structures: Compounding events by Rasch model probabilities. Journal of Outcome Measurement, 2(3), 191-221.
]Search in Google Scholar
[
Karabatsos, G. (2000). A critique of rasch residual fit statistics. Journal of Applied Measurement, 1(2), 152-176.
]Search in Google Scholar
[
Karabatsos, G. (2003). Comparing the aberrant response detection performance of thirty-six person-fit statistics. Applied Measurement in Education, 16(4), 277-298. https://doi.org/10.1207/S15324818AME1604_2
]Search in Google Scholar
[
Levine, M. V., & Rubin, D. B. (1979). Measuring the appropriateness of multiple-choice test scores. Journal of Educational Statistics, 4(4), 269–290. https://doi.org/10.2307/1164595
]Search in Google Scholar
[
Linacre, J. M. (2005). When to stop removing items and persons in Rasch analysis? Rasch Measurement Transactions, 23(4), 1241.
]Search in Google Scholar
[
Li, M.-n. F., & Olejnik, S. (1997). The power of Rasch person–fit statistics in detecting unusual response patterns. Applied Psychological Measurement, 21(3), 215–231. https://doi.org/10.1177/01466216970213002
]Search in Google Scholar
[
Liu, Y., & Maydeu Olivares, A. (2014). Identifying the source of misfit in item response theory models. Multivariate Behavioral Research, 49(4), 354-371. https://doi.org/10.1080/00273171.2014.910744
]Search in Google Scholar
[
Liu, Y., & Liu, H. (2021). Detecting noneffortful responses based on a residual method using an iterative purification process. Journal of Educational and Behavioral Statistics, 46(6), 717-752. https://doi.org/10.3102/1076998621994366
]Search in Google Scholar
[
Liu, T., Lan, T., & Xin, T. (2019a). Detecting random responses in a personality scale using IRT-based personfit indices. European Journal of Psychological Assessment, 35(1), 126-136. https://doi.org/10.1027/1015-5759/a000369
]Search in Google Scholar
[
Liu, T., Sun, Y., Li, Z., & Xin, T. (2019b). The impact of aberrant response on reliability and validity. Measurement: Interdisciplinary Research and Perspectives, 17(3), 133-142. https://doi.org/10.1080/15366367.2019.1584848
]Search in Google Scholar
[
Lundgren, E., & Eklof, H. (2023). Questionnaire-taking motivation: Using response times to assess motivation to optimize on the PISA 2018 student questionnaire. International Journal of Testing, 23(4), 231-256. https://doi.org/10.1080/15305058.2023.2214647
]Search in Google Scholar
[
Maroqi, N. (2018). Uji validitas konstruk pada instrumen Rosenberg Self-Esteem Scale dengan metode confirmatory factor analysis (CFA). Jurnal Pengukuran Psikologi dan Pendidikan Indonesia, 7(2), 92-96. https://doi.org/10.15408/jp3i.v7i2.12101
]Search in Google Scholar
[
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149–174. https://doi.org/10.1007/BF02296272
]Search in Google Scholar
[
Maydeu-Olivares, A. (2013). What should we assess the goodness of fit of IRT models? Measurement, 11(3), 127-137. https://doi.org/10.1080/15366367.2013.841511
]Search in Google Scholar
[
Meijer, R., R. (1996). Person fit research: An introduction. Applied Measurement In Education, 9(1), 3-8. https://doi.org/10.1207/s15324818ame0901_2
]Search in Google Scholar
[
Meijer, R. R. (2003). Diagnosing item score patterns on a test using item response theory-based person-fit statistics. Psychological Methods, 8(1), 72–87. https://doi.org/10.1037/1082-989X.8.1.72
]Search in Google Scholar
[
Meijer, R. R., & Sijtsma, K. (1995). Detection of aberrant item score patterns: A review of recent developments. Applied Measurement in Education, 8(3), 261–272. https://doi.org/10.1207/s15324818ame0803_5
]Search in Google Scholar
[
Meijer, R. R., & Sijtsma, K. (2001). Methodology review: Evaluating person fit. Applied Psychological Measurement, 25(2), 107–135. https://doi.org/10.1177/01466210122031957
]Search in Google Scholar
[
Meijer, R. R., & Tendeiro, J. N. (2012). The use of the lz and lz* person-fit statistics and problems derived from model misspecification. Journal of Educational and Behavioral Statistics, 37(6), 758-766. https://doi.org/10.3102/1076998612466144
]Search in Google Scholar
[
Meijer, R. R., Niessen, M. S. A., & Tendeiro, N. J. (2016). A practical guide to check the consistency of item response patterns in clinical research through person fit statistics: examples and a computer program. Assessment, 23(1), 56-62. https://doi.org/10.1177/1073191115577800
]Search in Google Scholar
[
Moshagen, M., & Bader, M. (2024). semPower: General power analysis for structural equation models. Behavior Research Methods, 56(4), 2901-2922. https://doi.org/10.3758/s13428-023-02254-7
]Search in Google Scholar
[
Ogihara, Y., & Kusumi, T. (2020). The developmental trajectory of self-esteem across the life span in Japan: Age differences in scores on the Rosenberg Self-Esteem Scale from adolescence to old age. Frontiers in Public Health, 8: 132. https://doi.org/10.3389/fpubh.2020.00132
]Search in Google Scholar
[
Olson J. F., & Fremer J. (2013). TILSA Test security guidebook: Preventing, detecting, and investigating test security irregularities. Council of Chief State School Officers.
]Search in Google Scholar
[
Panayides, P., & Tymms, P. (2012). Is aberrant response behavior a stable characteristic of students in classroom math tests? Rasch Measurement Transactions, 26(3), 1382-1383.
]Search in Google Scholar
[
Panayides, P., & Tymms, P. (2013). Investigating whether aberrant response behaviour in classroom maths tests is a stable characteristic of students. Assessment in Education: Principles, Policy & Practice, 20(3), 349-368. https://doi.org/10.1080/0969594x.2012.723610
]Search in Google Scholar
[
Pina, J. A. L., & Montesinos, M. D. H. (2005). Fitting Rasch model using appropriateness measure statistics. The Spanish Journal of Psychology, 8(1), 100-110. https://doi.org/10.1017/S113874160000500X
]Search in Google Scholar
[
R Core Team. (2015). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/
]Search in Google Scholar
[
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Danish Institute for Educational Research.
]Search in Google Scholar
[
Reise, S. P., & Flannery, W. P. (1996). Assessing person-fit on measures of typical performance. Applied Measurement in Education, 9(1), 9–26. https://doi.org/10.1207/s15324818ame0901_3
]Search in Google Scholar
[
Rolstad, S., Adler, J., & Ryden, A. (2011). Response burden and questionnaire length: Is shorter better? A review and meta-analysis. Value in Health, 14(8), 1101-1108. https://doi.org/10.1016/j.jval.2011.06.003
]Search in Google Scholar
[
Rosenberg, M. (1965). Rosenberg Self-Esteem Scale (RSES) [Database record]. APA PsycTests. https://doi.org/10.1037/t01038-000
]Search in Google Scholar
[
Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2), 1–36. https://doi.org/10.18637/jss.v048.i02
]Search in Google Scholar
[
Sijtsma, K., & Meijer, R. R. (2001). The person response function as a tool in person-fit research. Psychometrika, 66(2), 191–207. https://doi.org/10.1007/BF02294835
]Search in Google Scholar
[
Smith, R. M. (1986). Person fit in the Rasch model. Educational and Psychological Measurement, 46(2), 359–372. https://doi.org/10.1177/001316448604600210
]Search in Google Scholar
[
Spoden, C., Fleischer, J., & Frey, A. (2020). Person misfit, test anxiety, and test-taking motivation in a large-scale mathematics proficiency test for self-evaluation. Studies in Educational Evaluation, 67: 100910. https://doi.org/10.1016/j.stueduc.2020.100910
]Search in Google Scholar
[
Tay, L., Meade, A. W., & Cao, M. (2015). An overview and practical guide to IRT measurement equivalence analysis. Organizational Research Methods, 18(1), 3-46. https://doi.org/10.1177/1094428114553062
]Search in Google Scholar
[
Tesio, L., Caronni, A., Kumbhare, D., & Scarano, S. (2024a). Interpreting results from Rasch analysis 1. The “most likely” measures coming from the model. Disability and Rehabilitation, 46(3), 591–603. https://doi.org/10.1080/09638288.2023.2169771
]Search in Google Scholar
[
Tesio, L., Caronni, A., Simone, A., Kumbhare, D., & Scarano, S. (2024b). Interpreting results from Rasch analysis 2. Advanced model applications and the data-model fit assessment. Disability and Rehabilitation, 46(3), 604–617. https://doi.org/10.1080/09638288.2023.2169772
]Search in Google Scholar
[
Turner, K. T., & Engelhard, G., Jr. (2024). Using functional clustering to diagnose person misfit. Journal of Experimental Education, 92(2), 377–397. https://doi.org/10.1080/00220973.2022.2161088
]Search in Google Scholar
[
van der Linden, W. J., & van Krimpen-Stoop, E. M. L. A. (2003). Using response times to detect aberrant responses in computerized adaptive testing. Psychometrika, 68(2), 251–265. https://doi.org/10.1007/BF02294800
]Search in Google Scholar
[
Wanders, R. B. K., Meijer, R. R., Ruhé, H. G., Sytema, S., Wardenaar, K. J., & de Jonge, P. (2018). Person-fit feedback on inconsistent symptom reports in clinical depression care. Psychological Medicine, 48(11), 1844-1852. https://doi.org/10.1017/S003329171700335X
]Search in Google Scholar
[
Wang, W.-C., Chen, P.-H., & Cheng, Y.-Y. (2004). Improving measurement precision of test batteries using multidimensional item response models. Psychological Methods, 9(1), 116–136. https://doi.org/10.1037/1082-989X.9.1.116
]Search in Google Scholar
[
Wind, A. S., & Schumacker, E. R. (2017). Detecting measurement disturbances in rater mediated assessments. Educational Measurement: Issues and Practice, 36(4), 44-51. https://doi.org/10.1111/emip.12164
]Search in Google Scholar
[
Wright, B. D., & Stone, M. (1999). Measurement essentials (2nd ed.). Wide Range, Inc.
]Search in Google Scholar
[
Yekutieli, D., & Benjamini, Y. (1999). Resampling-based
]Search in Google Scholar
[
false discovery rate controlling multiple test procedures for correlated test statistics. Journal of Statistical Planning and Inference, 82(1-2), 171-196. https://doi.org/10.1016/S0378-3758(99)00041-5
]Search in Google Scholar
[
Zahra, N. S., & Wirawan, H. (2024). Empowering digital transformation: Developing and validating a Digital Leadership Scale through Rasch model analysis. Measurement: Interdisciplinary Research and Perspectives. Advanced online publication. https://doi.org/10.1080/15366367.2024.2334591
]Search in Google Scholar
[
Zou, D., & Bolt, D. M. (2023). Person misfit and person reliability in Rating Scale Measures: The role of response styles. Measurement: Interdisciplinary Research and Perspectives, 21(3), 167-180. https://doi.org/10.1080/15366367.2022.2114243
]Search in Google Scholar