Volume 12

May 2011

Special Issue on Adaptive Testing: Welcome and Introduction by Nathan A. Thompson, Assessment Systems Corporation 

Special Issue: Article #1

Creating a K-12 Adaptive Test: Examining the Stability of Item Parameter Estimates and Measurement Scaleby G. Gage Kingsbury and Steven L. Wise, Northwest Evaluation Association


Development of adaptive tests used in K-12 settings requires the creation of stable measurement scales to measure the growth of individual students from one grade to the next, and to measure change in groups from one year to the next. Accountability systems like No Child Left Behind require stable measurement scales so that accountability has meaning across time. This study examined the stability of the measurement scales used with the Measures of Academic Progress. Difficulty estimates for test questions from the reading and mathematics scales were examined over a period ranging from 7 to 22 years. Results showed high correlations between item difficulty estimates from the time at which they where originally calibrated and the current calibration. The average drift in item difficulty estimates was less than .01 standard deviations. The average impact of change in item difficulty estimates was less than the smallest reported difference on the score scale for two actual tests. The findings of the study indicate that an IRT scale can be stable enough to allow consistent measurement of student achievement.

Special Issue: Article #2

Computer adaptive testing for small scale programs and instructional systems by Lawrence M. Rudner & Fanmin Guo, Graduate Management Admission Council


This study investigates measurement decision theory (MDT) as an underlying model for computer adaptive testing when the goal is to classify examinees into one of a finite number of groups. The first analysis compares MDT with a popular item response theory model and finds little difference in terms of the percentage of correct classifications. The second analysis examines the number of examinees needed to calibrate MDT item parameters and finds accurate classifications even with calibration sample sizes as small as 100 examinees.

Special Issue: Article # 3

National Tests in Denmark – CAT as a Pedagogic Toolby Jakob Wandall, Danish National School Agency


Testing and test results can be used in different ways. They can be used for regulation and control, but they can also be a pedagogic tool for assessment of student proficiency in order to target teaching, improve learning and facilitate local pedagogical leadership. To serve these purposes the test has to be used for low stakes purposes, and to ensure this, the Danish National test results are made strictly confidential by law. The only test results that are made public are the overall national results. Because of the test design, test results are directly comparable,offering potential for monitoring added value and developing new ways of using test results in a pedagogical context. This article gives the background and status for the development of the Danish national tests, describes what is special about these tests (e.g., Information Technology [IT]-based, 3 tests in 1, adaptive), how the national test are carried out, and what is tested. Furthermore, it describes strategies for disseminating the results to the pupil, parents, teacher, headmaster and municipality; and how the results can be used by the teacher and headmaster.

Special Issue: Article #4

Design of a Computer-Adaptive Test to Measure English Literacy and Numeracy in the Singapore Workforce: Considerations, Benefits, and Implications by Jared Jacobsen, Senior Research Consultant, CASAS, Richard Ackermann, TOPSpro and CASAS eTests Manager Team Code, Jane Egüez Director, Program Development, CASAS, Debalina Ganguli, Director, Research and Analysis, CASAS, Patricia Rickard, President CASAS, Linda Taylor, Director, Assessment Development, CASAS


A computer adaptive test (CAT) is a delivery methodology that serves the larger goals of the assessment system in which it is embedded. A thorough analysis of the assessment system for which a CAT is being designed is critical to ensure that the delivery platform is appropriate and addresses all relevant complexities. As such, a CAT engine must be designed to conform to the validity and reliability of the overall system. This design takes the form of adherence to the assessment goals and objectives of the adaptive assessment system. When the assessment is adapted for use in another country, consideration must be given to any necessary revisions including content differences. This article addresses these considerations while drawing, in part, on the process followed in the development of the CAT delivery system designed to test English language workplace skills for the Singapore Workforce Development Agency. Topics include item creation and selection, calibration of the item pool, analysis and testing of the psychometric properties, and reporting and interpretation of scores. The characteristics and benefits of the CAT delivery system are detailed as well as implications for testing programs considering the use of a CAT delivery system.