Matthias von Davier

Matthias von Davier is a psychometrician, academic, inventor, and author. He is the executive director of the TIMSS & PIRLS International Study Center in Lynch School of Education and Human Development and the J. Donald Monan, S.J., University Professor in Education at Boston College.

von Davier's research focuses on developing advanced psychometric models and methodologies for analyzing complex educational and survey data. He has authored and co-authored more than 130 research articles, chapters, and research reports, along with six books, including Advancing Human Assessment, which is a part of the series Methodology of Educational Measurement and Assessment, co-edited by him. Additionally, he is the recipient of numerous awards such as the 2006 ETS Research Scientist award, the 2012 National Council on Measurement in Education (NCME) Brad Hanson Award for Contributions to Educational Measurement, and the AERA Division-D 2017 Award for Significant Contribution to Measurement and Research Methodology via his book Handbook of International Large-Scale Assessment.

von Davier has been a Fellow of the American Educational Research Association (AERA) since 2021 and elected National Academy of Education member since 2022. He has served as the editor two leading scientific journals, the British Journal of Mathematical and Statistical Psychology and Psychometrika, and is a founding co-editor of the Springer journal Large-Scale Assessments in Education, which is a joint publication of the IEA and ETS. He has also been invited as a keynote speaker for the Anne Anastasi lecture at Fordham University, the 9th IEA International Research Conference in Dubai, The Cross Straights Conference on Educational Measurement in Nanchang, China, the International Meeting of the Psychometric Society, the University of Connecticut, Ludwig Maximilian University of Munich, and the Organisation for Economic Co-operation and Development.

Education and Early Career
von Davier obtained a master's degree in psychology with honors from the Faculty of Mathematics and Science (Mathematisch-Naturwissenschaftlichen Fakultät) at CAU Kiel University in 1993. Subsequently, he completed a Doctoral degree (Dr. rer. nat.) in psychology from the same faculty in 1996.

von Davier's career began as an Assistant Research Scientist at the Institute for Science Education (IPN) at Kiel University. He then was awarded a Postdoctoral Fellowship at Educational Testing Service (ETS) in Princeton, NJ, where he developed item fit measures for complex IRT models. He moved to the role of Research Scientist in the Center for Global Assessment at ETS, Princeton, from November 2000 to April 2004.

Career
In 2004, von Davier became a Senior Research Scientist at the Center for Global Assessment in Princeton, where he led initiatives focused on evaluating outcomes-based models. Transitioning to various roles within the Educational Testing Service (ETS), he assumed responsibilities as a Senior Research Scientist in 2007 while also serving as Technical Director for the National Assessment of Educational Progress (NAEP) Task Order Component and managing the Virtual Research Laboratory at ETS/IEA Research Institute. Among other professional appointments, he stepped into the role of Principal Research Scientist in May 2007.

von Davier was appointed Director of Research at the Center for Global Assessment in June 2011, overseeing international survey assessment research and leading the ETS Research Initiative as a Co-Leader. Since September 2013, he has served as co-director at the Center for Global Assessment, concurrently holding the position of Senior Research Director since October 2014. In January 2017, he assumed the position of Distinguished Research Scientist at the National Board of Medical Examiners (NBME) in Philadelphia. He is the executive director at the TIMSS & PIRLS International Study Center in the Lynch School of Education and Human Development at Boston College since September 2020, alongside his role as the J. Donald Monan, S.J., University Professor in Education at the same institution.

Methodological Research
von Davier's areas of study include item response theory (IRT), latent class analysis, and diagnostic classification models, with a broader emphasis on classification and mixture distribution models, computational statistics, person-fit, item-fit, model checking, and hierarchical model extensions for categorical data analysis.

Focusing on psychometric methodologies, von Davier's quantitative methodological research has received several patents.

Contributions to Psychometric Theory
von Davier's work in psychometrics has centered around model development, model fit, and estimation methods, including parallel computation and estimation of latent variable models in complex data collection designs. Key examples include his contributions to model extensions around the Rasch Model, such as Conditional Maximum Likelihood Estimation of various Polytomous Rasch Models, Extensions of Mixture Distribution Rasch models, and polytomous HYBRID models. He has worked on Fit Assessment in Latent Variable Models, encompassing Person, Item, and Model Fit Assessment. Among other contributions, the General Diagnostic Model is considered a flexible diagnostic classification model for both binary and polytomous data, as well as for binary and polytomous ordinal attributes. His work also includes the Parallel-E, Parallel-M algorithm. He has also developed models that integrate information on achievement, non-response, and process data, including extensions of the speed-accuracy model. Additionally, his research delved into the use of Artificial Intelligence in automated item generation and automated scoring.

Applied Research
von Davier's applied research has focused on utilizing psychometric methods in international large-scale assessment. In his roles at ETS and Boston College, he led the psychometric work on transitioning the PIAAC 2012, the PISA 2015, the TIMSS 2019 and the PIRLS 2021 from a paper-based to a computer-based trendline using mode effect models with data from studies that were designed to align results from paper and computer-based assessments.

Another line of his research has concerned the more general issue of linking in large-scale educational assessments.

A third line of von Davier's research has discussed the response styles and correcting for survey response bias in self-reports. The applications range from mixture models for personality data to the pitfalls of attempts to correct response bias by anchoring vignettes. More recently, his research's focal point was the use of process data in assessment to improve achievement estimation and contextualize assessment results.

Publications
von Davier has authored and co-authored over 150 publications in peer-reviewed journals, edited books, monographs, and research report series. His h-index is 52. He co-edited several books on topics ranging from Latent Variable Models in Psychometrics to International Large-Scale Assessments and NLP in Assessment.

von Davier's first book, Multivariate and Mixture Distribution Rasch Models: Extensions and Applications, explored the advanced applications and extensions of the Rasch model across various disciplines, including education, psychology, health sciences. Allan S. Cohen commented in the Journal of the American Statistical Association, "This book, published in honor of the retirement of Jürgen Rost, is an edited volume of 22 invited chapters written by eminent researchers in the field of item response theory (IRT)." His next book, The Role of International Large-Scale Assessments: Perspectives from Technology, Economy, and Educational Research, published in 2012, discussed the significance of large-scale international assessments as catalysts for change in understanding the role of human capital distribution, impacting policy, education, and research. In 2013, he co-edited the Handbook of International Large-Scale Assessment: Background, Technical Issues, and Methods of Data Analysis, with Leslie Rutkowski and David Rutkowski, which explored the methodology, technical details, and policy implications of International Large-Scale Assessments (ILSA) in education. Terry Ackerman remarked, "This book is an excellent resource and guide to international large-scale assessments or ILSAs. The three editors have done an excellent job identifying a group of prominent scholars whose expertise ranges from international testing and behavioral statistics to educational policy."

Alongside Randy E. Bennett, von Davier published Advancing Human Assessment: The Methodological, Psychological and Policy Contributions of ETS in 2017, detailing the advancements in human assessment made by ETS, covering measurement and statistics, education policy, psychology, and the development of widely used educational surveys and methodologies. Building upon this exploration of assessment methodologies, he co-edited Advancing Natural Language Processing in Educational Assessment in 2023 with Victoria Yaneva, which looked into the implementation, benefits, and challenges of using NLP in educational testing and assessment. In addition, his book, Handbook of Diagnostic Classification Models: Models and Model Extensions, Applications, Software Packages, provided an overview of diagnostic classification models (DCMs), discussing their development, application, and advantages in offering detailed evaluations of test taker performance across multiple skill domains compared to traditional assessment models. Yu Bao reviewed the book and stated, "The Handbook of Diagnostic Classification Models serves as a reference book that consists of a comprehensive collection of the majority of research topics and a summary of the influential publications within recent decades."

In his highly cited studies, von Davier wrote the practices researchers can use for analyzing and reporting data from large-scale international assessments, addressing common issues and statistical complexities to ensure unbiased results. He emphasized the importance of correctly using plausible values in large-scale survey data analysis to avoid biased estimates and underscored the need to follow established procedures and guidelines. Additionally, he presented a diagnostic model for multidimensional skill profiles using maximum likelihood techniques, demonstrated its application with simulated and real data, and introduced general diagnostic models (GDMs) for estimating skill profiles, suitable for polytomous data and missing responses, with a focus on TOEFL Internet-based testing (iBT) field test data. In related research, he showed that the G-DINA and LCDM approaches to diagnostic modeling are special cases of the GDM. Some of his later research focused on large language models, recurrent neural networks, and other so-called AI methods and how they can be used in automated item generation, automated scoring, and other applications in large-scale educational assessment.

Awards and honors

 * 2006 – ETS Scientist Award, ETS
 * 2012 – Bradley Hanson Award for Contributions to Educational Measurement, NCME
 * 2017– Award for Significant Contribution to Educational Measurement and Research Methodology, AERA

Books

 * Multivariate and Mixture Distribution Rasch Models: Extensions and Applications (2007) ISBN 978-0387329161
 * The Role of International Large-Scale Assessments: Perspectives from Technology, Economy, and Educational Research (2012) ISBN 978-9400797116
 * Handbook of International Large-Scale Assessment: Background, Technical Issues, and Methods of Data Analysis (2013) ISBN 978-1439895122
 * Advancing Human Assessment: The Methodological, Psychological and Policy Contributions of ETS (2017) ISBN 978-3319586878
 * Handbook of Diagnostic Classification Models: Models and Model Extensions, Applications, Software Packages (2019) ISBN 978-3030055837
 * Advancing Natural Language Processing in Educational Assessment (2023) ISBN 978-1032244525

Selected articles

 * Von Davier, M. (2008). A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology, 61(2), 287–307. A general diagnostic model applied to language testing data
 * Von Davier, M., Gonzalez, E., & Mislevy, R. (2009). What are plausible values and why are they useful. IERI monograph series, 2(1), 9-36.
 * Rutkowski, L., Gonzalez, E., Joncas, M., & Von Davier, M. (2010). International large-scale assessment data: Issues in secondary analysis and reporting. Educational researcher, 39(2), 142–151. International Large-Scale Assessment Data: Issues in Secondary Analysis and Reporting
 * Von Davier, M. (2018). Automated item generation with recurrent neural networks. Psychometrika, 83(4), 847–857. Automated Item Generation with Recurrent Neural Networks
 * Ulitzsch, E., von Davier, M., & Pohl, S. (2020). A hierarchical latent response model for inferences about examinee engagement in terms of guessing and item‐level non‐response. British Journal of Mathematical and Statistical Psychology, 73, 83–112. A hierarchical latent response model for inferences about examinee engagement in terms of guessing and item‐level non‐response
 * Bezirhan, U., & von Davier, M. (2023). Automated reading passage generation with OpenAI's large language model. Computers and Education: Artificial Intelligence, 5, 100161. Automated reading passage generation with OpenAI's large language model
 * Jung, J. Y., Tyack, L., & von Davier, M. (2024). Combining machine translation and automated scoring in international large-scale assessments. Large-scale Assessments in Education, 12(1), 10. Combining machine translation and automated scoring in international large-scale assessments