Volume 7 
July 2005 
Issue #3

Evaluating Computer Automated Scoring: Issues, Methods, and an Empirical Illustration. Yongwei Yang, The Gallup Organization, Chad W. Buckendahl, Buros Center for Testing, University of Nebraska-Lincoln, Piotr J. Juszkiewicz, The Gallup Organization, Dennison S. Bhola, James Madison University, July 2005

Abstract

With the continual progress of computer technologies, computer automated scoring (CAS) has become a popular tool for evaluating writing assessments. Research of applications of these methodologies to new types of performance assessments is still emerging. While research has generally shown a high agreement of CAS system generated scores with those produced by human raters, concerns and questions have been raised about appropriate analyses and validity of decisions/interpretations based on those scores. In this paper we expand the emerging discussions on validation strategies on CAS by illustrating several analyses can be accomplished with available data. These analyses compare the degree to which two CAS systems accurately score data from a structured interview using the original scores provided by human raters as the criterion. Results suggest key differences across the two systems as well as differences in the statistical procedures used to evaluate them. The use of several statistical and qualitative analyses is recommended for evaluating contemporary CAS systems.

Volume 7 
April 2005 
Issue #2

Some Useful Cost-Benefit Criteria for Evaluating Computer-based Test Delivery Models and Systems. Richard M. Luecht, University of North Carolina at Greensboro, April, 2005

Abstract

Computer-based testing (CBT) is typically implemented using one of three general test delivery models: (1) multiple fixed testing (MFT); (2) computer-adaptive testing (CAT); or (3) multistage testing (MSTs). This article reviews some of the real cost drivers associated with CBT implementation-focusing on item production costs, the costs associated with administering the tests, and system development costs-and elaborates three classes of cost-benefit-related factors useful for evaluating CBT models: (1) real measurement efficiency; (2) testing system performance; and (3) provision for data quality control/assurance.

Volume 7 
April 2005 
Issue #1

Strategies to Assess the Core Academic Knowledge of English Language Learners. Stanley Rabinowitz, Sri Ananda, & Andrew Bell, WestEd

Abstract

This paper focuses on this assessment issue: How do you increase the validity of assessments of ELL student performance on core academic content? We begin by exploring NCLB expectations for ELL assessments and an increasingly popular approach to meeting these requirements proposed by some states-translation of assessments into students' native languages. Then, we present key research findings on attempts to increase access to and validity of assessment for ELLs. We conclude by proposing a comprehensive strategy for the assessment of ELL students' performance in core academic content.