WEAC's Position on Standardized Achievement Testing
By Russ Allen, PhD
Research and Professional Development Consultant
(This paper appears in the Wisconsin Council on Children and Families
special report: "Standardized
Testing: One Size Fits All?")
| A standardized achievement test is one
that is administered under the same conditions and scored in the
same way for all students. Standardized achievement tests tend
to be predominantly multiple choice in format, although one can
have standardized tests in any format (e.g., standardized writing
tests or standardized performance tasks). Most standardized achievement
tests are norm-referenced, meaning that their primary purpose
is to compare the performance of a student, or group of students,
with another group (often the so-called "national average").
|
The position of WEAC regarding the use of standardized achievement
tests is based on the classroom experiences of its members, resolutions
passed by the WEAC Representative Assembly, and, with a few additions,
is consistent with the recommendations of a 2001 report entitled Building
Tests that Support Instruction and Accountability: A Guide for Policymakers.
Among the organizations supporting this report were the following:
the American Association of School Administrators, the National Association
of Elementary School Principals, the National Association of Secondary
School Principals, the National Education Association, and the National
Middle School Association. WEAC is on record in support of this commission's
recommendations.
Building Tests that Support Instruction and Accountability is directed
primarily at state-mandated testing programs. However, the observations
and recommendations are relevant for assessment programs at all levels,
including district and national. The central conclusion of the report
is that ". . . federally mandated and state-administered tests
seem to have little instructional utility, thus bringing into question
their usefulness in an accountability system that assumes that information
obtained from tests will result in appropriate changes in instruction."
The report includes numerous recommendations for improving state level
assessment programs, which, if followed, may bring about improvements
in student learning.
WEAC supports the recommendations of this report and hopes that it
will encourage policymakers to think carefully about all the issues
associated with the development and implementation of assessment programs.
This type of reflection is important because far too often, policymakers
and members of the public focus almost entirely on the need for simple
and inexpensive accountability measures, with little or no regard for
how testing programs will improve student learning.
This paper contains WEAC's position on the use of standardized achievement
tests, and also gives specific attention to the testing requirements
of the 2001 Elementary and Secondary Education Act.
WEAC believes that:
- Assessment programs must be based on clearly defined content
standards. Furthermore, these standards must be prioritized. Many
states and districts simply purchase off-the-shelf tests from one
of the major testing companies with little regard given to how well
they align with existing content standards. (In many cases, content
standards may be nonexistent; Wisconsin has Model Academic Standards
at selected grade levels). In addition, standards often are defined
vaguely, causing teachers, students, and parents to lack a clear understanding
of what is expected of them.
The need to set priorities regarding what is to be assessed simply
reflects the fact that we cannot teach everything, that students cannot
learn everything, and that test developers cannot assess everything
that someone believes that students ought to know or be able to do.
- Reasonable performance standards need to be established.
Some of the worst abuses in testing can occur during the establishment
of performance goals. For example, in 1997, Wisconsin invited 185
people to Spring Green to establish performance levels on the state's
4th, 8th, and 10th grade Knowledge & Concepts Examinations in
English/language arts, mathematics, science, and social studies. Four
levels were established: Minimal Performance, Basic, Proficient, and
Advanced (1). Participants were directed to identify
standards based on expectations of what students should know and be
able to demonstrate. They were not told that the scores would be used
to compare schools and groups of students or to identify schools in
need of improvement.
The Proficient level, the "goal" for all students in Wisconsin,
tends to be lower in grade four and progressively higher in grades
eight and ten. One of the consequences of this is that even though
students continue to achieve at high levels in grades eight and ten,
this type of reporting suggests that they are losing ground. Scores
also tend to vary across subject areas (generally being highest in
mathematics and lowest in social studies). (2)
- Assessment programs at all levels need to make greater use of
authentic measures of student achievement. WEAC believes that
assessment programs need to move beyond the almost exclusive use of
multiple choice tests, which do not do a good job of measuring problem-solving,
creative thinking, or other higher order thinking skills. We recognize
that these types of assessment are more costly to develop and score;
however, we simply must have more authentic measures of what students
know and are able to do.
- Assessment results should not be used for inappropriate reasons,
including (1) making decisions about graduation or promotion exclusively
on the basis of test scores, and (2) comparing districts and schools
solely on the basis of test scores. As for (1), we believe that
a single piece of evidence, such as a test score, should never be
the only criterion used to make high stakes decisions related to graduation
or promotion. As for (2), we believe that there is too much emphasis
given to district and school comparisons based on test scores and
that these comparisons divert our attention away from what must be
done to improve student learning.
- Assessments should be (a) appropriate for the accountability
purposes for which they are used, (b) appropriate for determining
whether students have attained standards, (c) appropriate for enhancing
instruction, and (d) not the cause of negative consequences. This
requirement reflects the fact that assessment programs do not always
have the consequences that were intended. In particular, we need to
track the amount of time taken from teaching and learning for test
preparation or actual test-taking.
A recent analysis of high-stakes testing in 18 states by Amrein and
Berliner (2002) found that " . . . if the intended goal of high-stakes
testing policy is to increase student learning, then that policy is
not working. While a state's high-stakes test may show increased scores,
there is little support in these data that such increases are anything
but the result of test preparation and/or the exclusion of students
from the testing process. The authors also note that there are numerous
cases of unintended consequences associated with high-stakes testing,
including "increased drop-out rates, teachers' and schools' cheating
on exams, teachers' defection from the profession. . ."
- States, school districts, and schools need to monitor the breadth
of the curriculum so that policymakers can determine how much instructional
attention is given to all content standards and subject areas, including
those not assessed. Assessments sometimes have unintended consequences--subjects
or content not tested often tend to be judged as less significant.
This is particularly true when assessment results are used to make
decisions related to promotion or graduation. Many of those who teach
electives in Wisconsin's public schools (e.g., art, band, physical
education, foreign languages, computer science, etc.) fear that when
cuts are made, their courses may be the ones that are eliminated simply
because they are not assessed by the Department of Public Instruction.
- Educators must receive professional development to help them
use the results of assessments to improve instruction and learning.
Because the primary purpose of assessment must be to improve student
learning, teachers need to know how to use the results in their day-to-day
work. Unfortunately, the State of Wisconsin provides almost no professional
development for teachers on ways to use assessment data in order to
accomplish this purpose. Districts vary as to the quantity and quality
of professional development that is provided.
- Testing programs at all levels need to have adequate resources
(including time, money, and staff). The consequences of poorly
designed assessment programs can be devastating for students, parents,
teachers, and schools.
Concerns and Observations About ESEA 2001:
The 2001 Elementary and Secondary Education Act represents a significant
change in the role of the federal government in U.S. education. Beginning
in 2005-2006, this law requires that each state test students annually
in reading and mathematics in grades 3-8 and once in grades 10-12. Beginning
in 2007-2008, students must be tested in science, at least once in grades
3-5, once in grades 6-8, and once in grades 10-12. States also will
be required to test statewide samples of students every other year on
tests administered by the National Assessment of Educational Progress
(NAEP). Results of NAEP testing will be used as an "external audit"
of state-level testing programs.
In addition, states are required to define "proficient" on
each test and then to identify the level of improvement that is sufficient
each year to demonstrate "Adequate Yearly Progress" (AYP)
toward meeting the requirement of having all students proficient in
12 years.
Schools failing to meet AYP are subject to a variety of sanctions,
including "corrective action" in the early stages and ultimately
"restructuring" beginning after six years (e.g., re-opening
as a charter school, privatization, state takeover, or other major changes).
WEAC has the following concerns about the 2001 ESEA:
- It is unlikely that the testing requirements of ESEA will improve
the education for children. In fact, it may cause significant
harm because the new ESEA testing requirements will narrow the curriculum
and cause teachers to spend excessive time on preparation for taking
the machine-scorable, standardized ESEA achievement tests. As a result,
other important curriculum, skills, and knowledge will be de-emphasized.
- The ESEA violates Wisconsin's history of local control, which
gives citizens a significant voice in the way public schools are run.
- ESEA test scores will become the predominant measure used to
judge the quality of education offered by public schools and school
districts. It is probable that everything else that's important
in schools will be perceived as less important (including acquiring
skills and knowledge in areas not tested, being a good citizen, learning
how to get along with others, etc.).
- A significant proportion of schools, especially those in poor
and urban areas, will be unable to meet the requirements for Adequate
Yearly Progress, opening the door to school improvement plans, "reconstitution,"
and charters. This will make it more difficult to attract and
retain quality teachers and support staff, even though this is where
they are needed most.
- One of the eight national goals for education in 1999 called
for every school to work with parents to increase parental involvement
and participation in the social, emotional, and academic growth of
children. This goal was considered critical because there is a
wealth of research showing that the family is the child's first and
most important teacher. Despite this body of research, ESEA seems
to assume that the family has almost no role to play in the education
of children by holding schools entirely accountable for student success
or failure.
- ESEA will be a bonanza for the large testing companies because
state departments of education (including Wisconsin) will have no
choice but to contract for test development, scoring, and reporting
with the large private testing companies that dominate the market.
During the past several years many of these companies have experienced
serious problems related to scoring and reporting (errors in the reports
themselves, delays in reporting, and delays in developing new tests).
Because of the volume of tests that must be developed within a short
period of time, we should be concerned about the quality of the products
and services that are to be delivered.
- ESEA testing will be a "low stakes" activity for students
since they have little or no incentive to do well. In contrast, ESEA
testing will be "high stakes" for teachers, principals,
other school personnel, parents, and community residents because the
future of local schools will be at stake.
- Each state is required to develop proficiency standards for
each content area tested. Definitions will vary across states,
and state-to-state comparisons should not be made. Nevertheless, comparisons
already have been made and are likely to continue.
- It is likely that each state will lower its proficiency standards
to meet the requirements for Adequate Yearly Progress. Proficiency
will have to be set at a level that is realistic for all students
to attain. This means that one effect of ESEA 2001 may be a "race
to the bottom" in which standards are lowered throughout the
country. (Note that the Wisconsin Department of Public Instruction
intends to re-define proficiency in February 2003, based on testing
this fall. This redefinition is necessary for two reasons: (1) testing
will take place at the beginning of the school year, not at the end
and (2) the purposes of testing are not the same as they were when
the current standards were established in 1997). Most certainly, the
DPI will be criticized for "lowering" standards.
- Beginning in 2002-2003 states will be required to test statewide
samples of approximately 2,000 students every other year on NAEP tests
in reading and mathematics at grades 4 and 8. NAEP tests will
serve as an external "audit" to validate the results of
state testing programs (e.g., to determine if progress or lack of
progress is, in fact, "real"). It is inevitable that the
standards for what is proficient will differ between NAEP and the
individual states. This will create problems for states because the
percent proficient on state tests will most certainly exceed the percent
proficient on NAEP.
- NAEP testing will be a "low stakes" activity for students
and schools (largely because the sampling procedures used by NAEP
do not allow for student or school reports). Because the stakes
will be low, test scores may not be valid and reliable. The extent
to which the NAEP results can/should be used to validate the results
of state testing also is a function of the degree to which the content
assessed by NAEP overlaps with the content measured by the state tests.
When there is considerable overlap, the use of NAEP tests to validate
the results of state testing will be more legitimate than in cases
in where the state and NAEP tests measure different content domains.
- ESEA requires that all state tests be aligned with state standards.
However, Wisconsin does not have standards for mathematics and reading
at grades 3, 5, 6, and 7. These will have to be developed. Furthermore,
Wisconsin has 501 standards in English/language arts, mathematics,
science, and social studies in grades 4, 8, and 10. This is far too
many to test. This number must be reduced.
- ESEA requires that 95% of students be tested. This 95% rule
holds even if school officials believe that more than 5% of its students
should be excluded from testing because of special needs or language
deficiencies. For example, ESEA requires that Level 3 English Language
Learners be tested (these are students who are not yet proficient
in English). Many fear that this will force students to take tests
before they are ready. Currently, Level 3 students are excluded from
state testing in Wisconsin.
- Attaining Adequate Yearly Progress will be very difficult for
all schools to attain. The goal of ESEA, to have 100% of students
proficient in reading and mathematics in grades 3 through 8 within
12 years is laudable, but not practical. It would be like requiring
every student to run a five-minute mile. One hundred percent efficiency
in any activity, from business to government, is an unreasonable goal.
In addition, AYP will need to take measurement error into account.
It also is probable that schools will demonstrate erratic patterns
of change-improving in one year and falling back in the next (or vice
versa). What this means is that sustained progress toward meeting
the goal of having all students proficient in 12 years is not likely
to be the norm. It's also possible to see improvements in some of
the subgroup populations, but not in all. The final goal of having
100% of all students proficient is unattainable except in very small
schools with few students.
Endnotes:
- For an explanation of the four levels see "Understanding
Proficiency Scores," available online at: http://www.weac.org/resource/1997-98/jan98/proficnt.html
- For example, on the 4th grade mathematics test
a student who scores at the 46th percentile (four points below the
national average -- the 50th percentile) is Proficient, whereas a
10th grader who scores at the 57th percentile or less is classified
as Minimal Performance. On the 10th grade mathematics test a student
must score at the 79th percentile or higher to be classified as Proficient.
This compares with 10th grade social studies, in which a student has
to score only at the 50th percentile to be Proficient. Likewise, when
NAEP results are reported, the public believes that any student who
is not at the proficient level or higher is failing! If we really
want to use test results to improve teaching and learning, perhaps
the setting of these levels should be discontinued. This would require
that we look at strengths and weaknesses within a content area, and
not just refer to the arbitrary performance levels.
More information: