Volume 13

November 2012

Issue #3

The Contribution of Constructed Response Items to Large Scale Assessment: Measuring and Understanding their Impact by Robert W. Lissitz and Xiadong Hou, University of Maryland and Sharon Cadman Slater, Educational Testing Service


This article investigates several questions regarding the impact of different item formats on measurement characteristics.  Constructed response (CR) items and multiple choice (MC) items obviously differ in their formats and in the resources needed to score them. As such, they have been the subject of considerable discussion regarding the impact of their use and the potential effect of ceasing to use one or the other item format in an assessment.  In particular, this study examines the differences in constructs measured across different domains, changes in test reliability and test characteristic curves, and interactions of item format with race and gender. The data for this study come from the Maryland High School Assessments that are high stakes state examinations whose passage is required in order to obtain a high school diploma. Our results indicate that there are subtle differences in the impact of CR and MC items.  These differences are demonstrated in dimensionality, particularly for English and Government, and in ethnic and gender differential performance with these two item types. 

Volume 13 

October  2012 

Issue #2

Applying Multidimensional Item Response Theory Models in Validating Test Dimensionality: An Example of K-12 Large-scale Science Assessment by Ying Li, American Institutes for Research, Washington D.C. and Hong Jiao & Robert W. Lissitz, University of Maryland, College Park, MD.


This study investigated the application of multidimensional item response theory (IRT) models to validate test structure and dimensionality. Multiple content areas or domains within a single subject often exist in large-scale achievement tests. Such areas or domains may cause multidimensionality or local item dependence, which both violate the assumptions of the unidimensional IRT models currently used in many statewide large-scale assessments. An empirical K–12 science assessment was used as an example of dimensionality validation using multidimensional IRT models. The unidimensional IRT model was also included as the most commonly used model in current practice. The procedures illustrated in this real example can be utilized to validate the test dimensionality for any testing program once item response data are collected.

April  2012 

Issue #1

Evaluating the Content of Validity of Multistage-Adaptive Tests by Katrina Crotts, Stephen G. Sireci, and April Zenisky, University of Massachusetts Amherst


Validity evidence based on test content is important for educational tests to demonstrate the degree to which they fulfill their purposes.  Most content validity studies involve subject matter experts (SMEs) who rate items that comprise a test form.  In computerized-adaptive testing, examinees take different sets of items and test “forms” do not exist, which makes it difficult to evaluate the content validity of different tests taken by different examinees.  In this study, we evaluated content validity of a multistage-adaptive test (MST) using SMEs’ content validity ratings of all items in the MST bank.  Analyses of these ratings across the most common “paths” taken by examinees were conducted.  The results indicated the content validity ratings across the different tests taken by examinees were roughly equivalent.  The method used illustrates how content validity can be evaluated in an MST context.