skip to main navigation skip to demographic navigationskip to welcome messageskip to quicklinksskip to features

WEAC's Position on Standardized Achievement Testing

By Russ Allen, PhD
Research and Professional Development Consultant

(This paper appears in the Wisconsin Council on Children and Families special report: "Standardized Testing: One Size Fits All?")

A standardized achievement test is one that is administered under the same conditions and scored in the same way for all students. Standardized achievement tests tend to be predominantly multiple choice in format, although one can have standardized tests in any format (e.g., standardized writing tests or standardized performance tasks). Most standardized achievement tests are norm-referenced, meaning that their primary purpose is to compare the performance of a student, or group of students, with another group (often the so-called "national average").

The position of WEAC regarding the use of standardized achievement tests is based on the classroom experiences of its members, resolutions passed by the WEAC Representative Assembly, and, with a few additions, is consistent with the recommendations of a 2001 report entitled Building Tests that Support Instruction and Accountability: A Guide for Policymakers.

Among the organizations supporting this report were the following: the American Association of School Administrators, the National Association of Elementary School Principals, the National Association of Secondary School Principals, the National Education Association, and the National Middle School Association. WEAC is on record in support of this commission's recommendations.

Building Tests that Support Instruction and Accountability is directed primarily at state-mandated testing programs. However, the observations and recommendations are relevant for assessment programs at all levels, including district and national. The central conclusion of the report is that ". . . federally mandated and state-administered tests seem to have little instructional utility, thus bringing into question their usefulness in an accountability system that assumes that information obtained from tests will result in appropriate changes in instruction." The report includes numerous recommendations for improving state level assessment programs, which, if followed, may bring about improvements in student learning.

WEAC supports the recommendations of this report and hopes that it will encourage policymakers to think carefully about all the issues associated with the development and implementation of assessment programs. This type of reflection is important because far too often, policymakers and members of the public focus almost entirely on the need for simple and inexpensive accountability measures, with little or no regard for how testing programs will improve student learning.

This paper contains WEAC's position on the use of standardized achievement tests, and also gives specific attention to the testing requirements of the 2001 Elementary and Secondary Education Act.

WEAC believes that:

  • Assessment programs must be based on clearly defined content standards. Furthermore, these standards must be prioritized. Many states and districts simply purchase off-the-shelf tests from one of the major testing companies with little regard given to how well they align with existing content standards. (In many cases, content standards may be nonexistent; Wisconsin has Model Academic Standards at selected grade levels). In addition, standards often are defined vaguely, causing teachers, students, and parents to lack a clear understanding of what is expected of them.

    The need to set priorities regarding what is to be assessed simply reflects the fact that we cannot teach everything, that students cannot learn everything, and that test developers cannot assess everything that someone believes that students ought to know or be able to do.

  • Reasonable performance standards need to be established. Some of the worst abuses in testing can occur during the establishment of performance goals. For example, in 1997, Wisconsin invited 185 people to Spring Green to establish performance levels on the state's 4th, 8th, and 10th grade Knowledge & Concepts Examinations in English/language arts, mathematics, science, and social studies. Four levels were established: Minimal Performance, Basic, Proficient, and Advanced (1). Participants were directed to identify standards based on expectations of what students should know and be able to demonstrate. They were not told that the scores would be used to compare schools and groups of students or to identify schools in need of improvement.

    The Proficient level, the "goal" for all students in Wisconsin, tends to be lower in grade four and progressively higher in grades eight and ten. One of the consequences of this is that even though students continue to achieve at high levels in grades eight and ten, this type of reporting suggests that they are losing ground. Scores also tend to vary across subject areas (generally being highest in mathematics and lowest in social studies). (2)

  • Assessment programs at all levels need to make greater use of authentic measures of student achievement. WEAC believes that assessment programs need to move beyond the almost exclusive use of multiple choice tests, which do not do a good job of measuring problem-solving, creative thinking, or other higher order thinking skills. We recognize that these types of assessment are more costly to develop and score; however, we simply must have more authentic measures of what students know and are able to do.

  • Assessment results should not be used for inappropriate reasons, including (1) making decisions about graduation or promotion exclusively on the basis of test scores, and (2) comparing districts and schools solely on the basis of test scores. As for (1), we believe that a single piece of evidence, such as a test score, should never be the only criterion used to make high stakes decisions related to graduation or promotion. As for (2), we believe that there is too much emphasis given to district and school comparisons based on test scores and that these comparisons divert our attention away from what must be done to improve student learning.

  • Assessments should be (a) appropriate for the accountability purposes for which they are used, (b) appropriate for determining whether students have attained standards, (c) appropriate for enhancing instruction, and (d) not the cause of negative consequences. This requirement reflects the fact that assessment programs do not always have the consequences that were intended. In particular, we need to track the amount of time taken from teaching and learning for test preparation or actual test-taking.

    A recent analysis of high-stakes testing in 18 states by Amrein and Berliner (2002) found that " . . . if the intended goal of high-stakes testing policy is to increase student learning, then that policy is not working. While a state's high-stakes test may show increased scores, there is little support in these data that such increases are anything but the result of test preparation and/or the exclusion of students from the testing process. The authors also note that there are numerous cases of unintended consequences associated with high-stakes testing, including "increased drop-out rates, teachers' and schools' cheating on exams, teachers' defection from the profession. . ."

  • States, school districts, and schools need to monitor the breadth of the curriculum so that policymakers can determine how much instructional attention is given to all content standards and subject areas, including those not assessed. Assessments sometimes have unintended consequences--subjects or content not tested often tend to be judged as less significant. This is particularly true when assessment results are used to make decisions related to promotion or graduation. Many of those who teach electives in Wisconsin's public schools (e.g., art, band, physical education, foreign languages, computer science, etc.) fear that when cuts are made, their courses may be the ones that are eliminated simply because they are not assessed by the Department of Public Instruction.

  • Educators must receive professional development to help them use the results of assessments to improve instruction and learning. Because the primary purpose of assessment must be to improve student learning, teachers need to know how to use the results in their day-to-day work. Unfortunately, the State of Wisconsin provides almost no professional development for teachers on ways to use assessment data in order to accomplish this purpose. Districts vary as to the quantity and quality of professional development that is provided.

  • Testing programs at all levels need to have adequate resources (including time, money, and staff). The consequences of poorly designed assessment programs can be devastating for students, parents, teachers, and schools.

Concerns and Observations About ESEA 2001:

The 2001 Elementary and Secondary Education Act represents a significant change in the role of the federal government in U.S. education. Beginning in 2005-2006, this law requires that each state test students annually in reading and mathematics in grades 3-8 and once in grades 10-12. Beginning in 2007-2008, students must be tested in science, at least once in grades 3-5, once in grades 6-8, and once in grades 10-12. States also will be required to test statewide samples of students every other year on tests administered by the National Assessment of Educational Progress (NAEP). Results of NAEP testing will be used as an "external audit" of state-level testing programs.

In addition, states are required to define "proficient" on each test and then to identify the level of improvement that is sufficient each year to demonstrate "Adequate Yearly Progress" (AYP) toward meeting the requirement of having all students proficient in 12 years.

Schools failing to meet AYP are subject to a variety of sanctions, including "corrective action" in the early stages and ultimately "restructuring" beginning after six years (e.g., re-opening as a charter school, privatization, state takeover, or other major changes).

WEAC has the following concerns about the 2001 ESEA:

  • It is unlikely that the testing requirements of ESEA will improve the education for children. In fact, it may cause significant harm because the new ESEA testing requirements will narrow the curriculum and cause teachers to spend excessive time on preparation for taking the machine-scorable, standardized ESEA achievement tests. As a result, other important curriculum, skills, and knowledge will be de-emphasized.

  • The ESEA violates Wisconsin's history of local control, which gives citizens a significant voice in the way public schools are run.

  • ESEA test scores will become the predominant measure used to judge the quality of education offered by public schools and school districts. It is probable that everything else that's important in schools will be perceived as less important (including acquiring skills and knowledge in areas not tested, being a good citizen, learning how to get along with others, etc.).

  • A significant proportion of schools, especially those in poor and urban areas, will be unable to meet the requirements for Adequate Yearly Progress, opening the door to school improvement plans, "reconstitution," and charters. This will make it more difficult to attract and retain quality teachers and support staff, even though this is where they are needed most.

  • One of the eight national goals for education in 1999 called for every school to work with parents to increase parental involvement and participation in the social, emotional, and academic growth of children. This goal was considered critical because there is a wealth of research showing that the family is the child's first and most important teacher. Despite this body of research, ESEA seems to assume that the family has almost no role to play in the education of children by holding schools entirely accountable for student success or failure.

  • ESEA will be a bonanza for the large testing companies because state departments of education (including Wisconsin) will have no choice but to contract for test development, scoring, and reporting with the large private testing companies that dominate the market. During the past several years many of these companies have experienced serious problems related to scoring and reporting (errors in the reports themselves, delays in reporting, and delays in developing new tests). Because of the volume of tests that must be developed within a short period of time, we should be concerned about the quality of the products and services that are to be delivered.

  • ESEA testing will be a "low stakes" activity for students since they have little or no incentive to do well. In contrast, ESEA testing will be "high stakes" for teachers, principals, other school personnel, parents, and community residents because the future of local schools will be at stake.

  • Each state is required to develop proficiency standards for each content area tested. Definitions will vary across states, and state-to-state comparisons should not be made. Nevertheless, comparisons already have been made and are likely to continue.

  • It is likely that each state will lower its proficiency standards to meet the requirements for Adequate Yearly Progress. Proficiency will have to be set at a level that is realistic for all students to attain. This means that one effect of ESEA 2001 may be a "race to the bottom" in which standards are lowered throughout the country. (Note that the Wisconsin Department of Public Instruction intends to re-define proficiency in February 2003, based on testing this fall. This redefinition is necessary for two reasons: (1) testing will take place at the beginning of the school year, not at the end and (2) the purposes of testing are not the same as they were when the current standards were established in 1997). Most certainly, the DPI will be criticized for "lowering" standards.

  • Beginning in 2002-2003 states will be required to test statewide samples of approximately 2,000 students every other year on NAEP tests in reading and mathematics at grades 4 and 8. NAEP tests will serve as an external "audit" to validate the results of state testing programs (e.g., to determine if progress or lack of progress is, in fact, "real"). It is inevitable that the standards for what is proficient will differ between NAEP and the individual states. This will create problems for states because the percent proficient on state tests will most certainly exceed the percent proficient on NAEP.

  • NAEP testing will be a "low stakes" activity for students and schools (largely because the sampling procedures used by NAEP do not allow for student or school reports). Because the stakes will be low, test scores may not be valid and reliable. The extent to which the NAEP results can/should be used to validate the results of state testing also is a function of the degree to which the content assessed by NAEP overlaps with the content measured by the state tests. When there is considerable overlap, the use of NAEP tests to validate the results of state testing will be more legitimate than in cases in where the state and NAEP tests measure different content domains.

  • ESEA requires that all state tests be aligned with state standards. However, Wisconsin does not have standards for mathematics and reading at grades 3, 5, 6, and 7. These will have to be developed. Furthermore, Wisconsin has 501 standards in English/language arts, mathematics, science, and social studies in grades 4, 8, and 10. This is far too many to test. This number must be reduced.

  • ESEA requires that 95% of students be tested. This 95% rule holds even if school officials believe that more than 5% of its students should be excluded from testing because of special needs or language deficiencies. For example, ESEA requires that Level 3 English Language Learners be tested (these are students who are not yet proficient in English). Many fear that this will force students to take tests before they are ready. Currently, Level 3 students are excluded from state testing in Wisconsin.

  • Attaining Adequate Yearly Progress will be very difficult for all schools to attain. The goal of ESEA, to have 100% of students proficient in reading and mathematics in grades 3 through 8 within 12 years is laudable, but not practical. It would be like requiring every student to run a five-minute mile. One hundred percent efficiency in any activity, from business to government, is an unreasonable goal.

    In addition, AYP will need to take measurement error into account. It also is probable that schools will demonstrate erratic patterns of change-improving in one year and falling back in the next (or vice versa). What this means is that sustained progress toward meeting the goal of having all students proficient in 12 years is not likely to be the norm. It's also possible to see improvements in some of the subgroup populations, but not in all. The final goal of having 100% of all students proficient is unattainable except in very small schools with few students.

Endnotes:

  1. For an explanation of the four levels see "Understanding Proficiency Scores," available online at: http://www.weac.org/resource/1997-98/jan98/proficnt.html

  2. For example, on the 4th grade mathematics test a student who scores at the 46th percentile (four points below the national average -- the 50th percentile) is Proficient, whereas a 10th grader who scores at the 57th percentile or less is classified as Minimal Performance. On the 10th grade mathematics test a student must score at the 79th percentile or higher to be classified as Proficient. This compares with 10th grade social studies, in which a student has to score only at the 50th percentile to be Proficient. Likewise, when NAEP results are reported, the public believes that any student who is not at the proficient level or higher is failing! If we really want to use test results to improve teaching and learning, perhaps the setting of these levels should be discontinued. This would require that we look at strengths and weaknesses within a content area, and not just refer to the arbitrary performance levels.

More information:

At the Capitol News Archives