[
Azen, R., & Budescu, D. v. (2006). Comparing Predictors in Multivariate Regression Models: An Extension of Dominance Analysis. Journal of Educational and Behavioral Statistics, 31, 157–180. https://doi.org/10.3102/10769986031002157
]Search in Google Scholar
[
Bachman, L. F., Lynch, B. K., & Mason, M. (1995). Investigating variability in tasks and rater judgments in a performance test of foreign language speaking. Language Testing, 12, 239–257.10.1177/026553229501200206
]Search in Google Scholar
[
Bachman, L. F., & Palmer, A. S. (2010). Language testing in practice: Designing and developing useful language tests. Oxford applied Linguistics. Oxford: Oxford Univ. Press.
]Search in Google Scholar
[
Becker, S., Spinath, B., Ditzen, B., & Dörfler, T. (2020). Der Einfluss von Stress auf Prozesse beim diagnostischen Urteilen – eine Eye Tracking-Studie mit mathematischen Textaufgaben. [The influence of stress on processes of diagnostic judgementan eye tracking study based on mathematical word problems]. Unterrichtswissenschaft, 48, 531–550. https://doi.org/10.1007/s42010-020-00078-4
]Search in Google Scholar
[
Brookhart, S. M. (2013). How to create and use rubrics for formative assessment and grading. Alexandria, Virginia: ASCD.
]Search in Google Scholar
[
Brown, A., Iwashita, N., & McNamara, T. (2005). An Examination of Rater Orientations and Test-Taker Performance on English-for-Academic-Purposes Speaking Tasks. Princeton, NJ: Educational Testing Service.
]Search in Google Scholar
[
Caban, H. L. (2003). Rater group bias in the speaking assessment of four L1 Japanese ESL students. Second Language Studies, 21, 1–44.
]Search in Google Scholar
[
Carey, M. D., Mannell, R. H., & Dunn, P. K. (2011). Does a rater’s familiarity with a candidate’s pronunciation affect the rating in oral proficiency interviews? Language Testing, 28(2), 201–219. https://doi.org/10.1177/0265532210393704
]Search in Google Scholar
[
Chuang, Y.–Y. (2009). Foreign language speaking assessment: Taiwanese College English teachers’ scoring performance in the holistic and analytic rating methods. Asian EFL Journal, 11, 150–173.
]Search in Google Scholar
[
Cortina Kai S., & Thames, M. H. (2013). Teacher Education in Germany. In M. Kunter, J. Baumert, W. Blum, U. Klusmann, S. Krauss, & M. Neubrand (Eds.), Cognitive activation in the mathematics classroom and professional competence of teachers (pp. 49–62). Boston: Springer-Verlag.
]Search in Google Scholar
[
Council of Europe (2020). Common European Framework of Reference for Languages: Learning, teaching, assessment – Companion volume. Strasbourg: Council of Europe Publishing. Retrieved from http://www.coe.int/lang-cefr
]Search in Google Scholar
[
Davis, L. (2016). The influence of training and experience on rater performance in scoring spoken language. Language Testing, 33, 117–135.10.1177/0265532215582282
]Search in Google Scholar
[
Duijm, K., Schoonen, R., & Hulstijn, J. H. (2018). Professional and non-professional raters’ responsiveness to fluency and accuracy in L2 speech: An experimental approach. Language Testing, 35, 501–527. https://doi.org/10.1177/0265532217712553
]Search in Google Scholar
[
Eckes, T. (2015). Introduction to many-facet Rasch measurement: Analyzing and evaluating rater-mediated assessments (2nd, Revised ed.). Language Testing and Evaluation: Vol. 22. Frankfurt a.M.: Peter Lang.
]Search in Google Scholar
[
Fulcher, G. (2015). Re-examining language testing: A philosophical and social inquiry. Abingdon, London, New York: Routledge.10.4324/9781315695518
]Search in Google Scholar
[
Harsch, C., & Martin, G. (2013). Comparing holistic and analytic scoring methods: Issues of validity and reliability. Assessment in Education: Principles, Policy & Practice, 20, 281–307.10.1080/0969594X.2012.742422
]Search in Google Scholar
[
Herppich, S., Praetorius, K., Förster, N., Glogger-Frey, I., Karst, K., Leutner, D., Südkamp, A. (2018). Teachers’ assessment competence: Integrating knowledge-, process-, and product-oriented approaches into a competence-oriented conceptual model. Teaching and Teacher Education, 76, 181–193.10.1016/j.tate.2017.12.001
]Search in Google Scholar
[
Hinger, B., & Stadler, W. (2018). Testen und Bewerten fremdsprachlicher Kompetenzen. [Testing and evaluation of foreign language skills]. Narr Studienbücher. Tübingen: Narr Francke Attempto.
]Search in Google Scholar
[
Hochstetter, J. (2011). Diagnostische Kompetenz im Englischunterricht der Grundschule: Eine empirische Studie zum Einsatz von Beobachtungsbögen. [Diagnostic competence in primary school English teaching: An empirical study on the use of observation sheets]. Giessener Beiträge zur Fremdsprachendidaktik. Tübingen: Narr.
]Search in Google Scholar
[
Hoppe, T., Renkl, A., Seidel, T., Rettig, S., & Rieß, W. (2020). Exploring How Teachers Diagnose Student Conceptions about the Cycle of Matter. Sustainability, 12, 4184. https://doi.org/10.3390/su12104184
]Search in Google Scholar
[
Hsieh, C.– N., & Davis, L. (2019). The effect of audiovisual input on academic listen-speak task performance. In S. Papageorgiou & K. M. Bailey (Eds.), Global research on teaching and learning English: Vol. 6. Global perspectives on language assessment: Research, theory, and practice (pp. 96–107). London: Routledge.
]Search in Google Scholar
[
Iwashita, N., Brown, A., McNamara, T., & O’Hagan, S. (2008). Assessed Levels of Second Language Speaking Proficiency: How Distinct? Applied Linguistics, 29, 24–49.10.1093/applin/amm017
]Search in Google Scholar
[
Jacob, A. (2012). Examining the relationship between student achievement and observable teacher characteristics: Implications for school leaders. International Journal of Educational Leadership Preparation, 1–13.
]Search in Google Scholar
[
Keller, S. D., Jansen, T., & Vögelin, C. (2019). Can an instructional video increase the quality of English teachers’ assessment of learner essays? RISTAL, 2(1), 140–161 https://doi.org/10.23770/rt1829
]Search in Google Scholar
[
Kim, Y.–H. (2009). Exploring rater and task variability in second language oral performance assessment. In A. Brown & K. Hill (Eds.), Tasks and Criteria in Performance Assessment (91-110). Peter Lang.
]Search in Google Scholar
[
Knoch, U. (2011). Diagnostic writing assessment: The Development and Validation of a Rating Scale. Language Testing and Evaluation. Frankfurt a.M.: Peter Lang GmbH Internationaler Verlag der Wissenschaften.
]Search in Google Scholar
[
Kolb, A. (2011). Kontinuität und Brüche: Der Übergang von der Primar- zur Sekundarstufe im Englischunterricht aus der Perspektive der Lehrkräfte. [Continuity and breaks: The transition from primary to secondary English teaching from the teachers’ perspective]. Zeitschrift Für Fremdsprachenforschung, 22, 145–175.
]Search in Google Scholar
[
Lenske, G. (2016). Schülerfeedback in der Grundschule.: Untersuchung zur Validität. [Pupil feedback in the primary school: study on validity]. Münster, New York: Waxmann.
]Search in Google Scholar
[
Loibl, K., Leuders, T., & Dörfler, T. (2020). A Framework for Explaining Teachers’ Diagnostic Judgements by Cognitive Modeling (DiaCoM). Teaching and Teacher Education, 91, 103059. https://doi.org/10.1016/j.tate.2020.103059
]Search in Google Scholar
[
Lumley, T., & McNamara, T. F. (1995). Rater characteristics and rater bias: Implications for training. Language Testing, 12, 54–71. https://doi.org/10.1177/026553229501200104
]Search in Google Scholar
[
Metruk, R. (2018). Comparing holistic and analytic ways of scoring in the assessment of speaking skills. Journal of Teaching English for Specific and Academic Purposes, 6, 179–189. https://doi.org/10.22190/JTESAP1801179M
]Search in Google Scholar
[
Porsch, R., & Wilden, E. (2017). The Introduction of EFL in Primary Education. Challenges for EFL Teachers in Germany. In E. Wilden & R. Porsch (Eds.), The professional development of primary EFL teachers: National and international research (pp. 59–75). Münster: Waxmann Verlag GmbH.
]Search in Google Scholar
[
Rieu, A., Loibl, K., & Leuders, T. (2020). Förderung diagnostischer Kompetenz von Lehrkräften bei Aufgaben der Bruchrechnung. [Promotion of diagnostic competence of teachers in tasks dealing with fractions]. Herausforderung Lehrer*innenbildung - Zeitschrift zur Konzeption, Gestaltung und Diskussion, 3, 492–509. https://doi.org/10.4119/HLZ-3167
]Search in Google Scholar
[
Shaw, S. D. (2007). Modelling facets of the assessment of writing within an ESM environment. Research Notes, 27, 14–19. Retrieved from https://www.cambridgeenglish.org/images/23146-research-notes-27.pdf
]Search in Google Scholar
[
Südkamp, A., Kaiser, J., & Möller, J. (2012). Accuracy of teachers’ judgments of students’ academic achievement: A meta-analysis. Journal of Educational Psychology, 3, 743–762.10.1037/a0027627
]Search in Google Scholar
[
Sundqvist, P., Wikström, P., Sandlund, E., & Nyroos, L. (2018). The teacher as examiner of L2 oral tests: A challenge to standardization. Language Testing, 35, 217–238.10.1177/0265532217690782
]Search in Google Scholar
[
Winke, P., Gass, S., & Myford, C. (2013). Raters’ L2 background as a potential source of bias in rating oral performance. Language Testing, 30, 231–252. https://doi.org/10.1177/0265532212456968
]Search in Google Scholar
[
Witzigmann, S., & Sachse, S. (2020). Verarbeitung von Hinweisreizen beim Beurteilen von mündlichen Sprachproben von Schülerinnen und Schülern durch Hochschullehrende im Fach Französisch. Unterrichtswissenschaft, 48, 551-571. https://doi.org/10.1007/s42010-020-00076-6
]Search in Google Scholar
[
Yan, X., & Ginther, A. (2018). Listeners and raters: Similarities and differences in evaluation of accented speech. In O. Kang & A. Ginther (Eds.), Assessment in second language pronunciation (pp. 67–88). Milton Park, Abingdon, Oxon, New York, NY: Routledge.
]Search in Google Scholar
[
Zhang, Y., & Elder, C. (2011). Judgments of oral proficiency by non-native and native English speaking teacher raters: Competing or complementary constructs? Language Testing, 28, 31–50. https://doi.org/10.1177/0265532209360671
]Search in Google Scholar