Skip to main content

Psychometrics for adaptive learning: The Urnings algorithm (Maria Bolsinova)

One of the most promising ambitions in educational technology is the move towards large-scale personalized learning enabled by the development of online adaptive learning systems (ALS, a.k.a. adaptive learning environments, computer adaptive practice systems). These systems dynamically adjust the level of practice and instructional material based on the individual student’s with the goal of improving both the learning process and the learning outcomes. To optimize feedback, instructions, and learning material, one needs to have continuously updated, accurate and reliable measures of the students’ changing abilities. This makes measurement one of the central issues in ALS. Measuring change is in itself not a trivial problem, however it is made even more challenging because of the adaptive nature of ALS and because they usually operate at a large scale with thousands of students needing continuous updates of their ability estimates. These features of ALS pose challenges for traditional psychometric models and algorithms. A possible solution to these challenges is provided by the Elo Rating System (ERS) which allows the tracking of student abilities and item difficulties by updating the ratings of the learners and the items after every response. However, ERS does not provide a measure of uncertainty about the ratings, which makes it impossible to evaluate the reliability of measurement and to make statistical inferences based on the ratings. Furthermore, adaptivity of item selection in ALS leads to systematic bias in the ratings. A new urn-based rating system called Urnings has been developed to solve these issues. Here, every student and item is represented by an urn filled with a combination of green and red balls. The urns are updated after every response, such that the proportions of green balls represent student ability or item difficulty. The main advantages of this approach is that the standard errors of the urn-based ratings are known, and that the adaptive item selection can be explicitly accounted for such that no bias would occur. In this presentation, I will highlight features of the Urnings rating system using both simulated and empirical data. Additionally, I will discuss the possibilities for improving the precision of the urn-based ratings by incorporating information in the process data (e.g., response times) collected in the system.