Accurate Assessment via Process Data.

Szczegóły
Abstrakt

Tytuł:: Accurate Assessment via Process Data.
Autorzy:: Zhang S; University of Illinois at Urbana-Champaign, Champaign, IL, USA.
Wang Z; Citadel Securities, New York, NY, USA.
Qi J; Columbia University, New York, NY, USA.
Liu J; Columbia University, New York, NY, USA. .
Ying Z; Columbia University, New York, NY, USA.
Źródło:: Psychometrika [Psychometrika] 2023 Mar; Vol. 88 (1), pp. 76-97. Date of Electronic Publication: 2022 Aug 13.
Typ publikacji:: Journal Article; Research Support, U.S. Gov't, Non-P.H.S.
Język:: English
Imprint Name(s):: Original Publication: Research Triangle Park, VA : Psychometric Society
MeSH Terms:: Psychometrics*
Academic Success*
Humans
References:: AERA, APA, and NCME. (2014). Standards for educational and psychological testing. American Educational Research Association American Psychological Association.
Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rater V.2. The Journal of Technology, Learning and Assessment, 4(3). Retrieved from https://ejournals.bc.edu/index.php/jtla/article/view/1650.
Bejar, I. I., Mislevy, R. J., & Zhang, M. (2016). Automated scoring with validity in mind. In A. A. Rupp & J. P. Leighton (Eds.), The Wiley handbook of cognition and assessment (pp. 226–246). https://doi.org/10.1002/9781118956588.ch10.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397–479). Addison-Wesley.
Blackwell, D. (1947). Conditional expectation and unbiased sequential estimation. The Annals of Mathematical Statistics, 18(1), 105–110.
Bolsinova, M., & Tijmstra, J. (2018). Improving precision of ability estimation: Getting more from response times. British Journal of Mathematical and Statistical Psychology, 71(1), 13–38. (PMID: 10.1111/bmsp.1210428635139)
Casella, G., & Berger, R. L. (2002). Statistical inference (Vol. 2). Duxbury.
Clauser, B. E., Harik, P., & Clyman, S. G. (2000). The generalizability of scores for a performance assessment scored with a computer-automated scoring system. Journal of Educational Measurement, 37(3), 245–261. (PMID: 10.1111/j.1745-3984.2000.tb01085.x)
Evanini, K., Heilman, M., Wang, X., & Blanchard, D. (2015). Automated scoring for the toefl junior® comprehensive writing and speaking test. ETS Research Report Series, 2015(1), 1–11. (PMID: 10.1002/ets2.12052)
Fife, J. H. (2013). Automated scoring of mathematics tasks in the common core era: Enhancements to m-rater in support of [Formula: see text] mathematics and the common core assessments. ETS research report series, 2013(2), i–35. (PMID: 10.1002/j.2333-8504.2013.tb02333.x)
Foltz, P. W., Laham, D., & Landauer, T. K. (1999). Automated essay scoring: Applications to educational technology. In B. Collis & R. Oliver (Eds.), Proceedings of EdMedia + Innovate Learning 1999 (pp. 939–944). Association for the Advancement of Computing in Education (AACE).
Frey, A., Spoden, C., Goldhammer, F., & Wenzel, S. F. C. (2018). Response time-based treatment of omitted responses in computer-based testing. Behaviormetrika, 45(2), 505–526. (PMID: 10.1007/s41237-018-0073-9)
He, Q., Veldkamp, B. P., Glas, C. A., & de Vries, T. (2017). Automated assessment of patients’ self-narratives for posttraumatic stress disorder screening using natural language processing and text mining. Assessment, 24(2), 157–172. (PMID: 10.1177/107319111560255126358713)
He, Q., Veldkamp, B. P., Glas, C. A., & Van Den Berg, S. M. (2019). Combining text mining of long constructed responses and item-based measures: A hybrid test design to screen for posttraumatic stress disorder (ptsd). Frontiers in Psychology, 10, 2358. (PMID: 10.3389/fpsyg.2019.02358316956476817621)
He, Q., & von Davier, M. (2016). Analyzing process data from problem-solving items with N-grams: Insights from a computer-based large-scale assessment. In Y. Rosen, S. Ferrara, & M. Mosharraf (Eds.), Handbook of research on technology tools for real-world skill development (pp. 750–777). IGI Global. https://doi.org/10.4018/978-1-4666-9441-5.ch029.
Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55–67. https://doi.org/10.1080/00401706.1970.10488634. (PMID: 10.1080/00401706.1970.10488634)
Kendall, M. G. (1938). A new measure of rank correlation. Biometrika, 30(1/2), 81–93. (PMID: 10.2307/2332226)
Kim, J. K., & Nicewander, W. A. (1993). Ability estimation for conventional tests. Psychometrika, 58(4), 587–599. (PMID: 10.1007/BF02294829)
LaMar, M. M. (2018). Markov decision process measurement model. Psychometrika, 83(1), 67–88. (PMID: 10.1007/s11336-017-9570-028447309)
Lehmann, E. L., & Romano, J. P. (2005). Testing statistical hypotheses (3rd ed.). Springer.
Liu, H., Liu, Y., & Li, M. (2018). Analysis of process data of pisa 2012 computer-based problem solving: Application of the modified multilevel mixture irt model. Frontiers in Psychology, 9, 1372. (PMID: 10.3389/fpsyg.2018.01372301231716085588)
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Routledge.
Messick, S. (1989). Meaning and values in test validation: The science and ethics of assessment. Educational Researcher, 18(2), 5–11. (PMID: 10.3102/0013189X018002005)
Muraki, E. (1992). A generalized partial credit model: Application of an em algorithm. ETS Research Report Series, 1992(1), i–30.
OECD. (2012). Literacy, numeracy and problem solving in technology-rich environments: Framework for the oecd survey of adult skills. OECD Publishing.
Page, E. B. (1966). The imminence of grading essays by computer. The Phi Delta Kappan, 47(5), 238–243.
Qiao, X., & Jiao, H. (2018). Data mining techniques in analyzing process data: A didactic. Frontiers in Psychology, 9, 2231. (PMID: 10.3389/fpsyg.2018.02231305327166265513)
Rasch, G. (1960). Probabilistic models for some intelligence and achievement tests. Danish Institute for Educational Research.
Rose, N., von Davier, M., & Nagengast, B. (2017). Modeling omitted and not-reached items in irt models. Psychometrika, 82(3), 795–819. (PMID: 10.1007/s11336-016-9544-7)
Rudner, L. M., Garcia, V., & Welch, C. (2006). An evaluation of IntelliMetric[Formula: see text] essay scoring system. The Journal of Technology, Learning and Assessment, 4(4). Retrieved from https://ejournals.bc.edu/index.php/jtla/article/view/1651.
Rupp, A. A. (2018). Designing, evaluating, and deploying automated scoring systems with validity in mind: Methodological design decisions. Applied Measurement in Education, 31(3), 191–214. (PMID: 10.1080/08957347.2018.1464448)
Rupp, A. A., Templin, J., & Henson, R. A. (2010). Diagnostic measurement: Theory, methods, and applications. Guilford Press.
Schleicher, A. (2008). Piaac: A new strategy for assessing adult competencies. International Review of Education, 54(5–6), 627–650. (PMID: 10.1007/s11159-008-9105-0)
Tang, X., Wang, Z., Liu, J., & Ying, Z. (2021a). An exploratory analysis of the latent structure of process data via action sequence autoencoders. British Journal of Mathematical and Statistical Psychology, 74(1), 1–33.
Tang, X., Zhang, S., Wang, Z., Liu, J., & Ying, Z. (2021b). Procdata: An R package for process data analysis. Psychometrika, 86(4), 1058–1083.
Tang, X., Wang, Z., He, Q., Liu, J., & Ying, Z. (2020). Latent feature extraction for process data via multidimensional scaling. Psychometrika, 85(2), 378–397.
Tikhonov, A. N. & Arsenin, V. Y. (1977). Solutions of ill-posed problems (pp. 1–30). New York.
Ulitzsch, E., von Davier, M., & Pohl, S. (2020). A hierarchical latent response model for inferences about examinee engagement in terms of guessing and item-level non-response. British Journal of Mathematical and Statistical Psychology, 73, 83–112. (PMID: 10.1111/bmsp.1218831709521)
van der Linden, W. J. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72(3), 287. (PMID: 10.1007/s11336-006-1478-z)
von Davier, M., Sinharay, S., Oranje, A., & Beaton, A. (2006). 32 the statistical procedures used in national assessment of educational progress: Recent developments and future directions. Handbook of Satistics, 26, 1039–1055. (PMID: 10.1016/S0169-7161(06)26032-2)
Wainer, H., Dorans, N. J. , Flaugher, R., Green, B. F., & Mislevy, R. J. (2000). Computerized adaptive testing: A primer. Routledge.
Xu, H., Fang, G., Chen, Y., Liu, J., & Ying, Z. (2018). Latent class analysis of recurrent events in problem-solving items. Applied Psychological Measurement, 0146621617748325.
Zumbo, B. D., & Hubley, A. M. (2017). Understanding and investigating response processes in validation research (Vol 26). Springer.
Contributed Indexing:: Keywords: Process data; Rao–Blackwellization; ability estimation; automated scoring
Entry Date(s):: Date Created: 20220813 Date Completed: 20230303 Latest Revision: 20230303
Update Code:: 20240104
DOI:: 10.1007/s11336-022-09880-8
PMID:: 35962849
: Czasopismo naukowe

Full Text Finder

Accurate assessment of a student's ability is the key task of a test. Assessments based on final responses are the standard. As the infrastructure advances, substantially more information is observed. One of such instances is the process data that is collected by computer-based interactive items and contain a student's detailed interactive processes. In this paper, we show both theoretically and with simulated and empirical data that appropriately including such information in the assessment will substantially improve relevant assessment precision.
(© 2022. The Author(s) under exclusive licence to The Psychometric Society.)

Informacja