JATT is a peer reviewed journal published by the Association of Test Publishers

Co-Editors
Chad W. Buckendahl, Ph.D., William G. Harris Ph.D.

Current Volume

Previous Volumes

Abstracts

Comments


Submit Article

Submit Comments

Search

Receive Notices

Member Home Page

Public Home Page

Volume 10

April 2008

Issue#1

Examining Panelist Data from a Bilingual Standard Setting Study. Elaine M. Rodeck, Tzu-Yun Chin, Susan L. Davis, Barbara S. Plake, Buros Center for Testing, University of Nebraska-Lincoln

Abstract

This study examined the relationships between the evaluations obtained from standard setting panelists and changes in ratings between different rounds of a standard setting study that involved setting standards on different language versions of an exam We investigated panelists’ evaluations to determine if their perceptions of the standard setting were related to adjustments they made in their recommended cut scores across rounds of the process. The standard setting was conducted for a high school mathematics test composed of multiple-choice and constructed response items. The test was designed for a population of students who speak and receive primary instruction in either English or French. Results indicated panelists’ ratings of their ratings and their comfort with the process were related to how their ratings changed across sequential rounds of the process.  Differences in the degree to which the evaluations influenced the standard setting judgments were observed across the English and French panelists, with the French group reporting increasing comfort across rounds in contrast to the English group that had relatively higher comfort at the beginning of the process.  The results illustrate how standard setting evaluation data can provide insight into factors that affect panelists’ ratings. 

Volume 9

April 2008

Issue#3

Examining Panelist Data from a Bilingual Standard Setting Study. Elaine M. Rodeck, Tzu-Yun Chin, Susan L. Davis, Barbara S. Plake, Buros Center for Testing, University of Nebraska-Lincoln

Abstract

This study examined the relationships between the evaluations obtained from standard setting panelists and changes in ratings between different rounds of a standard setting study that involved setting standards on different language versions of an exam We investigated panelists’ evaluations to determine if their perceptions of the standard setting were related to adjustments they made in their recommended cut scores across rounds of the process. The standard setting was conducted for a high school mathematics test composed of multiple-choice and constructed response items. The test was designed for a population of students who speak and receive primary instruction in either English or French. Results indicated panelists’ ratings of their ratings and their comfort with the process were related to how their ratings changed across sequential rounds of the process.  Differences in the degree to which the evaluations influenced the standard setting judgments were observed across the English and French panelists, with the French group reporting increasing comfort across rounds in contrast to the English group that had relatively higher comfort at the beginning of the process.  The results illustrate how standard setting evaluation data can provide insight into factors that affect panelists’ ratings. 

Volume 8

February 2007

Issue#1

Does Quantity Equal Quality? The Relationship Between Length of Response and Scores on the SAT Essay. Jennifer L. Kobrin, Hui Deng, and Emily J. Shaw, The College Board

Abstract

This study was designed to address two frequent criticisms of the SAT essay -- that essay length is the best predictor of scores, and that there is an advantage in using more "sophisticated" examples as opposed to personal experience. The study was based on 2,820 essays from the first three administrations of the new SAT. Each essay was coded for number of words, number of paragraphs, whether or not the response included first-person, and whether or not the response went to the second page. Analyses included descriptive statistics and group comparisons on the essay response features, correlations between essay length and scores, and hierarchical multiple regression to examine the contribution of each essay feature variable to the prediction of essay scores. The number of words in the essay explained 39% of the variance of essay scores. Whether or not the essay reached the second page explained an additional 1.5%, and whether or not the essay was written in first person explained an additional 1.1% . An examination of these features potentially affecting SAT essay scores is essential to maintain that the SAT writing section promotes valid interpretations of students’ writing skills. The research described in this paper may benefit other testing programs that include essay assessments. The careful analysis of response features and the identification of potential construct-irrelevant features in essay assessments are important for evaluating the content and construct validity of writing assessments.

Volume 7

July 2005

Issue#3

Evaluating Computer Automated Scoring: Issues, Methods, and an Empirical Illustration. Yongwei Yang, The Gallup Organization, Chad W. Buckendahl, Buros Center for Testing, University of Nebraska-Lincoln, Piotr J. Juszkiewicz, The Gallup Organization, Dennison S. Bhola, James Madison University, July 2005

Abstract

With the continual progress of computer technologies, computer automated scoring (CAS) has become a popular tool for evaluating writing assessments. Research of applications of these methodologies to new types of performance assessments is still emerging. While research has generally shown a high agreement of CAS system generated scores with those produced by human raters, concerns and questions have been raised about appropriate analyses and validity of decisions/interpretations based on those scores. In this paper we expand the emerging discussions on validation strategies on CAS by illustrating several analyses can be accomplished with available data. These analyses compare the degree to which two CAS systems accurately score data from a structured interview using the original scores provided by human raters as the criterion. Results suggest key differences across the two systems as well as differences in the statistical procedures used to evaluate them. The use of several statistical and qualitative analyses is recommended for evaluating contemporary CAS systems.

Volume 7

April 2005

Issue#2

Some Useful Cost-Benefit Criteria for Evaluating Computer-based Test Delivery Models and Systems. Richard M. Luecht, University of North Carolina at Greensboro, April, 2005

Abstract

Computer-based testing (CBT) is typically implemented using one of three general test delivery models: (1) multiple fixed testing (MFT); (2) computer-adaptive testing (CAT); or (3) multistage testing (MSTs). This article reviews some of the real cost drivers associated with CBT implementation—focusing on item production costs, the costs associated with administering the tests, and system development costs—and elaborates three classes of cost-benefit-related factors useful for evaluating CBT models: (1) real measurement efficiency; (2) testing system performance; and (3) provision for data quality control/assurance.

Volume 7

April 2005

Issue#1

Strategies to Assess the Core Academic Knowledge of English Language Learners. Stanley Rabinowitz, Sri Ananda, & Andrew Bell, WestEd

Abstract

This paper focuses on this assessment issue: How do you increase the validity of assessments of ELL student performance on core academic content? We begin by exploring NCLB expectations for ELL assessments and an increasingly popular approach to meeting these requirements proposed by some states—translation of assessments into students’ native languages. Then, we present key research findings on attempts to increase access to and validity of assessment for ELLs. We conclude by proposing a comprehensive strategy for the assessment of ELL students’ performance in core academic content.

Volume 6

May 2004

Issue#1

Creating Better Tests for Everyone Through Universally Designed Assessments, Sandra Thompson and Martha Thurlow, University of Minnesota, David B. Malouf, U.S. Department of Education, May, 2004

Abstract

Universally designed assessments are designed and developed to allow participation of the widest possible range of students, in a way that results in valid inferences about performance on grade-level standards for all students who participate in the assessment. This paper explores the development of universal design and considers its application to large-scale assessments. Building on universal design principles presented by the Center for Universal Design (Center for Universal Design, 1997), seven elements of universally designed assessments are identified and described. These elements were derived from a review of literature on universal design, assessment and instructional design, and research on topics such as assessment accommodations (Thompson, Johnstone, & Thurlow, 2002). The seven elements are:

  1. Inclusive assessment population

  2. Precisely defined constructs

  3. Accessible, non-biased items

  4. Amenable to accommodations

  5. Simple, clear, and intuitive instructions and procedures

  6. Maximum readability and comprehensibility

  7. Maximum legibility

Each of the elements is explored in this paper. Numerous resources relevant to each of the elements are identified, with specific suggestions for ways in which assessments can be designed to meet the needs of the widest range of students possible. Challenges and opportunities arising from the application of universally designed assessments are identified.

Volume 5

July 2003

Issue#1

The Ideal Role of Large-Scale Testing in a Comprehensive Assessment System, Charles A. DePascale, National Center for the Improvement of Educational Assessment, July 2003

Abstract

The role of large-scale assessment in public education has grown tremendously since the mid-1980s and unquestionably will continue to grow with the implementation of the assessment and accountability requirements of the No Child Left Behind Act.  In the rush to meet the demand to measure validly and reliably the performance of all students, however, it must not be forgotten that large-scale assessment is only one component of a comprehensive assessment system.  The factors that led to the predominance of large-scale assessment are reviewed and the appropriate role of large-scale assessment in a comprehensive assessment system is discussed.

Volume 4

July 2002

Issue#1

Ensuring Fair Testing Practices: The Responsibilities of Test Sponsors, Test Developers, Test Administrators, and Test Takers in Ensuring Fair Testing Practices, Barbara S. Plake, Buros Center for Testing, University of Nebraska-Lincoln,  Patrick Jones, Excelsior College, July, 2002

Abstract

The focus of tests today oftentimes centers on ways to provide good quality tests to test takers in a cost-effective manner.  Test sponsors are concerned about the policy issues related to test use; test developers must prepare a test that meets both the purpose and specifications articulated by the test sponsor and the technical standards for quality tests.  Test administrators are responsible for test delivery in ways that protect the integrity of the test scores and the security of the test product.  Test takers often have limited options in when, how, or why they are taking the test, and may feel victimized in the process.  The purpose of this paper is to focus on the test taker and to consider how all parties in the test process (test sponsor, test developer, test administrator, and test taker) have a role to play in ensuring fair testing practices and valid test results.

Volume 3

January 2001

Issue#1

Megatrends in Personnel Testing: A Practitioner’s Perspective,  John W. Jones, Ph.D., Kelly D. Higgins, M.A., NCS Pearson, January 2001

Abstract

This paper briefly reviewed current personnel testing trends as documented in the literature. Yet research literature often misses fast-moving megatrends that will ultimately change the face of personnel testing practices. Therefore, the primary purpose of this paper was to list and describe, from a practitioner’s perspective, 10 dominant megatrends that are impacting the personnel testing industry. Five megatrends were classified as being "technocentric," and five were classified as being "content-specific." Technocentric themes were related to virtual career centers, integrated assessment platforms, a number of Internet-age access concerns, media-rich assessments, and data warehousing and mining. Content-specific trends were related to certification testing, 21st century test constructs, human resource lifecycle assessments, technology-friendly tests, and bottom-line impact and return on investment (ROI) studies. A review of these 10 megatrends suggests that the personnel testing industry is keeping pace with rapid technological innovations.

Volume 2

September 2000

Issue#1

Promoting Stakeholder Acceptance of CBT, J. Patrick Jones, Professional Examination Service, September 2000

Abstract

This article describes the major elements of a communication plan for the implementation of a computer-based testing (CBT) program. The major benefits and potential drawbacks of a CBT program are reviewed, and the information needs of various stakeholder groups are identified. The article concludes with an overview of communication strategies and evaluation techniques that can facilitate the transition to CBT.

Volume 1

August 1999

Issue#1

Increasing the Validity of Adapted Tests: Myths to be Avoided and Guidelines for Improving Test Adaptation Practices, Ronald K. Hambleton and Liane Patsula, University of Massachusetts at Amherst, August 1999

Abstract

Adapting or translating achievement, aptitude, and personality tests and questionnaires from one language and culture to others has been done for a long time. Unfortunately, there is substantial evidence to suggest that often these adapted tests are problematic because of a failure to do the test adaptation work correctly. The purposes of this paper are to describe five myths about test adaptation that need to be discarded and to offer a set of steps to follow in test adaptation projects. The International Test Commission guidelines for adapting tests are also presented in the paper.