Click on the title to view or save pdf document.

Evaluating Computer Automated Scoring:  Issues, Methods, and an Empirical Illustration, Yongwei Yang, The Gallup Organization, Chad W. Buckendahl, Buros Center for Testing, University of Nebraska-Lincoln, Piotr J. Juszkiewicz, The Gallup Organization, Dennison S. Bhola, James Madison University

Abstract

With the continual progress of computer technologies, computer automated scoring (CAS) has become a popular
tool for evaluating writing assessments. Research of applications of these methodologies to new types of performance assessments is still emerging. While research has generally shown a high agreement of CAS system generated scores with those produced by human raters, concerns and questions have been raised about appropriate analyses and validity of decisions/interpretations based on those scores. In this paper we expand the emerging discussions on validation strategies on CAS by illustrating several analyses can be accomplished with available data. These analyses compare the degree to which two CAS systems accurately score data from a structured interview using the original scores provided by human raters as the criterion. Results suggest key differences across the two systems as well as differences in the statistical procedures used to evaluate them. The use of several statistical and qualitative analyses is recommended for evaluating contemporary CAS systems.

Keywords: automated scoring, computerized testing, structured interviews, validity


Some Useful Cost-Benefit Criteria for Evaluating Computer-based Test Delivery Models and Systems,
Richard M. Luecht, University of North Carolina at Greensboro

April, 2005

Abstract

Computer-based testing (CBT) is typically implemented using one of three general test delivery models: (1) multiple fixed testing (MFT); (2) computer-adaptive testing (CAT); or (3) multistage testing (MSTs).  This article reviews some of the real cost drivers associated with CBT implementation—focusing on item production costs, the costs associated with administering the tests, and system development costs—and elaborates three classes of cost-benefit-related factors useful for evaluating CBT models: (1) real measurement efficiency; (2) testing system performance; and (3) provision for data quality control/assurance.


Strategies to Assess the Core Academic Knowledge of English Language Learners, Stanley Rabinowitz, Sri Ananda, & Andrew Bell, WestEd

Abstract

With the goal of eliminating the achievement gap between advantaged and disadvantaged students, the No Child Left Behind Act of 2001 (NCLB) requires that all students achieve proficiency in English language arts and mathematics by 2014. Even subgroups considered at risk must demonstrate continuous progress towards proficiency in the core academic areas of English language arts and mathematics, as measured by their performance on state assessments. Failure to do so results in serious consequences for schools, districts and states.

Creating Better Tests for Everyone Through Universally Designed Assessments, Sandra Thompson and Martha Thurlow, University of Minnesota, David B. Malouf, U.S. Department of Education, May, 2004

Abstract

Universally designed assessments are designed and developed to allow participation of the widest possible range of students, in a way that results in valid inferences about performance on grade-level standards for all students who participate in the assessment. This paper explores the development of universal design and considers its application to large-scale assessments. Building on universal design principles presented by the Center for Universal Design (Center for Universal Design, 1997), seven elements of universally designed assessments are identified and described. These elements were derived from a review of literature on universal design, assessment and instructional design, and research on topics such as assessment accommodations (Thompson, Johnstone, & Thurlow, 2002). The seven elements are:

  1. Inclusive assessment population

  2. Precisely defined constructs

  3. Accessible, non-biased items

  4. Amenable to accommodations

  5. Simple, clear, and intuitive instructions and procedures

  6. Maximum readability and comprehensibility

  7. Maximum legibility

Each of the elements is explored in this paper. Numerous resources relevant to each of the elements are identified, with specific suggestions for ways in which assessments can be designed to meet the needs of the widest range of students possible. Challenges and opportunities arising from the application of universally designed assessments are identified.



The Ideal Role of Large-Scale Testing in a Comprehensive Assessment System, Charles A. DePascale, National Center for the Improvement of Educational Assessment, July 2003

Abstract

The role of large-scale assessment in public education has grown tremendously since the mid-1980s and unquestionably will continue to grow with the implementation of the assessment and accountability requirements of the No Child Left Behind Act.  In the rush to meet the demand to measure validly and reliably the performance of all students, however, it must not be forgotten that large-scale assessment is only one component of a comprehensive assessment system.  The factors that led to the predominance of large-scale assessment are reviewed and the appropriate role of large-scale assessment in a comprehensive assessment system is discussed.


Ensuring Fair Testing Practices:
The Responsibilities of Test Sponsors, Test Developers, Test Administrators, and Test Takers in Ensuring Fair Testing Practices, Barbara S. Plake, Buros Center for Testing, University of Nebraska-Lincoln,  Patrick Jones, Excelsior College, July, 2002

Abstract

The focus of tests today oftentimes centers on ways to provide good quality tests to test takers in a cost-effective manner.  Test sponsors are concerned about the policy issues related to test use; test developers must prepare a test that meets both the purpose and specifications articulated by the test sponsor and the technical standards for quality tests.  Test administrators are responsible for test delivery in ways that protect the integrity of the test scores and the security of the test product.  Test takers often have limited options in when, how, or why they are taking the test, and may feel victimized in the process.  The purpose of this paper is to focus on the test taker and to consider how all parties in the test process (test sponsor, test developer, test administrator, and test taker) have a role to play in ensuring fair testing practices and valid test results.

Megatrends in Personnel Testing: A Practitioner’s Perspective,  John W. Jones, Ph.D., Kelly D. Higgins, M.A., NCS Pearson, January 2001

Abstract

This paper briefly reviewed current personnel testing trends as documented in the literature. Yet research literature often misses fast-moving megatrends that will ultimately change the face of personnel testing practices. Therefore, the primary purpose of this paper was to list and describe, from a practitioner’s perspective, 10 dominant megatrends that are impacting the personnel testing industry. Five megatrends were classified as being "technocentric," and five were classified as being "content-specific." Technocentric themes were related to virtual career centers, integrated assessment platforms, a number of Internet-age access concerns, media-rich assessments, and data warehousing and mining. Content-specific trends were related to certification testing, 21st century test constructs, human resource lifecycle assessments, technology-friendly tests, and bottom-line impact and return on investment (ROI) studies. A review of these 10 megatrends suggests that the personnel testing industry is keeping pace with rapid technological innovations.

Send correspondence to: Dr. John Jones, NCS Pearson, Suite 770, 9701 W. Higgins Road, Rosemont, Illinois 60018 or jwjones@ncs.com.

Promoting Stakeholder Acceptance of CBT, J. Patrick Jones, Professional Examination Service, September 2000

Abstract

This article describes the major elements of a communication plan for the implementation of a computer-based testing (CBT) program. The major benefits and potential drawbacks of a CBT program are reviewed, and the information needs of various stakeholder groups are identified. The article concludes with an overview of communication strategies and evaluation techniques that can facilitate the transition to CBT.

To comment on this article, send us an e-mail at jones@testpublishers.org


Increasing the Validity of Adapted Tests: Myths to be Avoided and Guidelines for Improving Test Adaptation Practices, Ronald K. Hambleton and Liane Patsula, University of Massachusetts at Amherst, August 1999

Abstract

Adapting or translating achievement, aptitude, and personality tests and questionnaires from one language and culture to others has been done for a long time. Unfortunately, there is substantial evidence to suggest that often these adapted tests are problematic because of a failure to do the test adaptation work correctly. The purposes of this paper are to describe five myths about test adaptation that need to be discarded and to offer a set of steps to follow in test adaptation projects. The International Test Commission guidelines for adapting tests are also presented in the paper.

To comment on this article, send us an e-mail at hambleton@testpublishers.org



The Journal of Applied Testing Technology ("JATT") is posted by the Association of Test Publishers with the understanding that the content of the Journal does not constitute the rendering of legal, accounting or other professional opinions.  All opinions expressed by authors are solely that of the individuals themselves and are not representative of the viewpoints of the Association of Test Publishers or the JATT editorial board.  If legal advice or other expert assistance is required, the services of a competent professional should be sought.
 

©1997 - 2003 Association of Test Publishers

Saturday February 04, 2006