Assessment & Testing Terms
There are several glossaries available online that define the language
of assessment and testing. Most definitions in this paper were written
by others; in these cases, the source of the definition is given. Definitions
not followed by an online source were written by the author. Definitions
that are central to understanding assessment and testing in Wisconsin
are marked with an asterisk.
Accommodations: Changes in the administration of an assessment
[for a child with disabilities], such as setting, scheduling, timing,
presentation format, response mode, or others, including any combination
of these that does not change the construct intended to be measured
by the assessment or the meaning of the resulting scores. Accommodations
are used for equity, not advantage, and serve to level the playing field.
To be appropriate, assessment accommodations must be identified in the
student’s Individualized Education Plan (IEP) or Section 504 plan
and used regularly during instruction and classroom assessment (Glossary
of Assessment Terms and Acronyms).
Accountability (Accountability System): The demand by a community
(public officials, employers, and taxpayers) for school officials to
prove that money invested in education has led to measurable learning.
"Accountability testing" is an attempt to sample what students
have learned, or how well teachers have taught, and/or the effectiveness
of a school's principal's performance as an instructional leader. School
budgets and personnel promotions, compensation, and awards may be affected
(Assessment Terminology: A Glossary of Useful Terms).
Action Research: School and classroom-based studies initiated
and conducted by teachers and other school staff. Action research involves
teachers, aides, principals, and other school staff as researchers who
systematically reflect on their teaching or other work and collect data
that will answer their questions. It offers staff an opportunity to
explore issues of interest to them in an effort to improve classroom
instruction and educational effectiveness (Assessment Terminology: A
Glossary of Useful Terms).
Alignment: The “No Child Left Behind” law requires
that states align their assessment programs with their state’s
academic standards (which define what students should know and be able
to do). Supporters of alignment maintain that student learning is enhanced
when there is alignment among curriculum, instruction, and assessment.
An important question to address in any discussion of alignment is how
much alignment is enough?
Alternate Assessment: Alternate assessments measure the performance
of a relatively small population of students who are unable to participate
in the general assessment system, with or without accommodations as
determined by the IEP Team (Glossary of Assessment Terms and Acronyms
).
American College Test (ACT): The ACT is a college admissions
exam used primarily by Midwest and southern colleges. The test consists
of four sections--English, Math, Reading, and Science--and an optional
Writing section. Test lengths vary (English—45 minutes, Mathematics—60
minutes, Reading –35 minutes, Science –35 minutes; and Writing
(optional)—30 minutes. Students are given a score of 1-36 on each
of the four required sections. The average of these scores is then calculated
and rounded to the nearest whole number. This is referred to as the
"composite ACT score." The essay is not included in the composite
score.
Assessment Literacy: The possession of knowledge about the basic
principles of sound assessment practice, including terminology, the
development and use of assessment methodologies and techniques, familiarity
with standards of quality in assessment. Increasingly, [it includes]
familiarity with alternatives to traditional measurements of learning
(Assessment Terminology: A Glossary of Useful Terms).
Authentic Assessment: (1) An assessment that requires students
to generate a response to a question rather than choose from a set of
responses provided to them. Exhibitions, investigations, demonstrations,
written or oral responses, journals, and portfolios are examples of
the assessment alternatives we think of when we use the term "alternative
assessment." Ideally, alternative assessment requires students
to actively accomplish complex and significant tasks, while bringing
to bear prior knowledge, recent learning, and relevant skills to solve
realistic or authentic problems (Glossary of Useful Terms).
(2) Evaluating by asking for the behavior the learning is intended
to produce. The concept of model, practice, feedback in which students
know what excellent performance is and are guided to practice an entire
concept rather than bits and pieces in preparation for eventual understanding.
A variety of techniques can be employed in authentic assessment.
The goal of authentic assessment is to gather evidence that students
can use knowledge effectively and be able to critique their own efforts.
Authentic tests can be viewed as "assessments of enablement,"
in Robert Glaser's words, ideally mirroring and measuring student performance
in a "real-world" context. Tasks used in authentic assessment
are meaningful and valuable, and are part of the learning process (Assessment
Terminology: A Glossary of Useful Terms).
Cohort: A group whose progress is followed by means of measurements
at different points in time (Assessment Terminology: A Glossary of Useful
Terms).
Competency Test: A test intended to [determine whether or not]
a student has met established minimum standards of skills and knowledge
and is thus eligible for promotion, graduation, certification, or other
official acknowledgment of achievement (Assessment Terminology: A Glossary
of Useful Terms).
*Constructed Response Item: An exercise for which examinees
must create their own responses or products (performance assessment)
rather than choose a response from an enumerated set (multiple choice)
(Glossary of Useful Terms). [Examples of constructed response items
are written answers to questions or writing prompts].
Constructivist Theory: Constructivist theory . . . posits that
people build new information onto pre-existing notions and modify their
understanding in light of new data. In the process, their ideas gain
in complexity and power. Constructivist theorists dismiss the idea that
students learn by absorbing information through lectures or repeated
rote practice. [Instead, students are taught to] . . . create their
own meaning and achieve their own goals by interacting actively with
objects and information and by linking new materials to existing cognitive
structures (Glossary of Useful Terms).
*Cut Score: A specified point on a score scale, such that scores
at or above that point are interpreted or acted upon differently from
scores below that point (Glossary of Useful Terms). [On Wisconsin’s
Knowledge & Concepts Examinations cut scores separate the four levels
of performance from one another: Advanced, Proficient, Basic, and Minimal].
*Criterion-Referenced Test (CRT): A test in which the results
can be [are] used to determine a student's progress toward mastery of
a content area. [Each student’s] performance is compared to an
expected level of mastery in a content area rather than to other students'
scores. . . The "criterion" is the standard of performance
established as the passing score for the test. Scores have meaning in
terms of what the student knows or can do, rather than how the test-taker
compares to a reference or norm group. Criterion-referenced tests can
have norms, but comparison to a norm is not the purpose of the assessment
(Assessment Terminology: A Glossary of Useful Terms). [Wisconsin’s
Knowledge & Concepts examinations are criterion-referenced. Students’
scores are reported as performance levels, ranging from Advanced to
Minimal].
*English Language Learner (ELL): In Wisconsin, English language
learners who are beginning to acquire English language proficiency,
meaning that they have been identified as having an English language
proficiency level of 1 or 2, must be assessed with WAA-ELL (Wisconsin
Alternate Assessment for English Language Learners). ELLs with an English
language proficiency level of 3 or above must participate in the regular
WKCE-CRT (Wisconsin Department of Public Instruction).
*ESEA: The Elementary and Secondary Education Act (ESEA) was
first enacted in 1965. This act’s foundational principle of providing
educational opportunities to our most disadvantaged youth has remained
strong. The No Child Left Behind Act of 2001 (NCLB), a major reform
of the ESEA, was passed by congress and signed into law on January 8,
2002. NCLB redefines the federal role in K-12 education and will help
close the achievement gap between disadvantaged and minority students
and their peers. NCLB encompasses numerous programs across ten titles,
totaling approximately $22 billion in 2004-05. Wisconsin's total funding
amount for 2004-05 under NCLB is approximately $292 million, consisting
of 20 different programs, 14 of which have been approved for funding
by the U.S. Department of Education (Wisconsin Department of Public
Instruction).
High Stakes Testing: Any testing program whose results have
important consequences for students, teachers, schools, and/or districts.
Such stakes may include promotion, certification, graduation, or denial/approval
of services and opportunity. High stakes testing can corrupt the evaluation
process when pressure to produce rising test scores results in "teaching
to the test" or making tests less complex (Assessment Terminology:
A Glossary of Useful Terms).
Holistic Scoring: In assessment, assigning a single score based
on an overall assessment of performance rather than by scoring or analyzing
dimensions individually. The product is considered to be more than the
sum of its parts and so the quality of a final product or performance
is evaluated rather than the process or dimension of performance. A
holistic scoring rubric might combine a number of elements on a single
scale. Focused holistic scoring may be used to evaluate a limited portion
of a learner's performance (Assessment Terminology: A Glossary of Useful
Terms).
Individualized Education Program (IEP): A document that reflects
the decisions made by an interdisciplinary team, including the parent
and the student when appropriate. During an IEP meeting for a student
with a disability (SWD), the team will identify the student’s
abilities and disabilities (Glossary of Assessment Terms and Acronyms
).
Individuals with Disabilities Act: See Students with Disabilities
Inter-rater Reliability: The consistency with which two or more
judges rate the work or performance of test takers. Inter-rater reliability
is used when constructed responses [such as a written essay] are judged
by two or more judges who differ in scores awarded (Glossary of Useful
Terms).
Longitudinal Measurement: The comparison of measurements of
the same groups of students collected at two or more points in time
(Glossary of Assessment Terms and Acronyms Used in Assessing Special
Education Students).
*No Child Left Behind (NCLB): The 2001 Elementary and Secondary
Education Act (ESEA), often called the “No Child Left Behind”
law, requires increased testing, along with sanctions for schools whose
students do not meet specific targets. Beginning in 2005-06, Wisconsin’s
students had to be tested annually in reading and mathematics in grades
3-8, and once in high school. Beginning in 2007-08, testing also is
required in science at least once in elementary school (4th grade),
middle school (8th grade), and high school (10th grade). Participation
in testing by the National Assessment of Educational Progress (NAEP)
also is required. [See discussion under AYP, ESEA, and NAEP).
Mean: One of several ways of representing a group with a single,
typical score. It is figured by adding up all the individual scores
in a group and dividing them by the number of people in the group. [The
mean] can be affected by extremely low or high scores (Assessment Terminology:
A Glossary of Useful Terms).
Median: When the numbers in some set (such as tests scores of
students in a class) are arranged in rank order, the median divides
the scores into two equal subgroups (one-half of scores above and one-half
below).
Mode: The mode is the most frequently occurring value in a data
set. For example, suppose a class of students is tested and the mode
is 70%. This tells us that 70% is the score reported by most students.
Note that a data set may have several modes.
Norm-Referenced Test (NRT): A test in which a student or a group's
performance is compared to that of a norm group. [Scores on norm-referenced
tests are reported as percentiles; a student with a percentile score
of 70 scored higher than 70% of the students who were in the norm group].
Often used to measure and compare students, schools, districts, and
states on the basis of norm-established scales of achievement (Assessment
Terminology: A Glossary of Useful Terms). Also see Percentile Rank.
Normal Curve Equivalent (NCE): The NCE, measures where a student
falls along the normal distribution (ranging from 1 to 99). Unlike percentile
ranks, NCE scores can be averaged. [NCEs can be used] to compare different
tests for the same student or group of students and between different
students on the same test. For those who want more technical detail:
an NCE is a normalized test score with a mean of 50 and a standard deviation
of 21.06. NCEs should be used instead of percentiles for comparative
purposes. Required by many categorical funding agencies, e.g., Chapter
I or Title I (Assessment Terminology: A Glossary of Useful Terms).
Normal Curve: The numbers in some collection or data set (for
example test scores of 100 students) are normally distributed if they
are in the shape of a bell-shaped curve. In a normal distribution, the
curve is symmetrical in shape, with most values in the center and fewer
on either side. In a normal distribution the mean, median, and mode
are the same. When data are normally distributed, 68% of numbers will
be within one standard deviation of the mean. Ninety-five percent will
be plus or minus 2 standard deviations, while 99% will fall within 3
standard deviations. [Also see Normal Curve Equivalent and Standard
Deviation].
Opportunity to Learn: In terms of testing, opportunity to learn
means that before a student is tested, he or she is given adequate and
timely instruction of the knowledge and skills measured by a test.
Percent Correct: When the raw score is divided by the total
number of questions and the result is multiplied by 100, the percent-correct
score is obtained. Like raw scores, percent-correct scores have little
meaning by themselves. They tell what percent of the questions a student
got right on a test, but unless we know something about the overall
difficulty of the test, this information is not very helpful (Iowa Testing
Programs).
*Percentile Rank: A student's percentile rank is a score that
tells the percent of students in a particular group that got lower raw
scores on a test than the student did. It shows the student's relative
position or rank in a group of students who are in the same grade and
who were tested at the same time of year (fall, midyear, or spring)
as the student. Thus, for example, if Toni earned a percentile rank
of 72 on the Language test, it means that she scored higher than 72
percent of the students in the group with which she is being compared.
Of course, it also means that 28 percent of the group scored higher
than Toni. Percentile ranks range from 1 to 99.
A student's percentile rank can vary depending on which group is used
to determine the ranking. A student is simultaneously a member of many
different groups: all students in her classroom, her building, her school
district, her state, and the nation (Iowa Testing Programs).
Performance Assessment: Performance assessment is a form of
testing that requires students to perform a task rather than select
an answer from a ready-made list [e.g., multiple choice, true-false,
matching, etc.]. Performance assessment is an activity that requires
students to construct a response, create a product, or perform a demonstration.
Usually there are multiple ways that an examinee can approach a performance
assessment and more than one correct answer (Glossary of Useful Terms).
*Performance Standards: 1. A statement or description of a set
of operational tasks exemplifying a level of performance associated
with a more general content standard; the statement may be used to guide
judgments about the location of a cut score on a score scale; the term
often implies a desired level of performance. 2. Explicit definitions
of what students must do to demonstrate proficiency at a specific level
on the content standards . . . (Glossary of Useful Terms). [The Wisconsin
Knowledge & Concepts Examinations have four levels of performance,
ranging from minimal to advanced].
Portfolio: A systematic and organized collection of a student's
work that exhibits to others the direct evidence of a student's efforts,
achievements, and progress over a period of time. The collection should
involve the student in selection of its contents, and should include
information about the performance criteria, the rubric or criteria for
judging merit, and evidence of student self-reflection or evaluation.
It should include representative work, providing a documentation of
the learner's performance and a basis for evaluation of the student's
progress. Portfolios may include a variety of demonstrations of learning
and have been gathered in the form of a physical collection of materials,
videos, CD-ROMs, reflective journals, etc. (Assessment Terminology:
A Glossary of Useful Terms).
Portfolio Assessment: Portfolios may be assessed in a variety
of ways. Each piece may be individually scored, or the portfolio might
be assessed merely for the presence of required pieces, or a holistic
scoring process might be used and an evaluation made on the basis of
an overall impression of the student's collected work. It is common
that assessors work together to establish consensus of standards or
to ensure greater reliability in evaluation of student work. Established
criteria are often used by reviewers and students involved in the process
of evaluating progress and achievement of objectives (Assessment Terminology:
A Glossary of Useful Terms).
Primary Trait Scoring: A type of rubric scoring constructed to
assess a specific trait, skill, behavior, or format, or the evaluation
of the primary impact of a learning process on a designated audience
(Assessment Terminology: A Glossary of Useful Terms).
*Proficiency Scores: Students’ scores on the Wisconsin
Knowledge & Concepts Examinations (WKCE) are reported as percentiles,
scale scores, and proficiency levels. There are four proficiency levels
(also referred to as performance levels):
- Advanced: in-depth understanding of knowledge and skills in the
content area.
- Proficient: a competent level of achievement.
- Basic: some weaknesses that should be addressed. Basic does not
mean the child is failing in the content area.
- Minimal Performance: limited academic knowledge and skills in the
area tested.
A proficiency score answers the question, “How does the achievement
of my child on this test compare with established expectations for academic
success?” Wisconsin’s proficiency levels were set in February
of 2003 by a group of 240 citizens, including educators, government
leaders, and representatives of business and labor. It took them three
days to set the standards on all the tests. [Also see Performance Standards].
Raw Score: The number of questions a student gets right on a
test is the student's raw score (assuming each question is worth one
point). By itself, a raw score has little or no meaning. The meaning
depends on how many questions are on the test and how hard or easy the
questions are. For example, if Kati got 10 right on both a math test
and a science test, it would not be reasonable to conclude that her
level of achievement in the two areas is the same. This illustrates
why raw scores are usually converted to other types of scores for interpretation
purposes (Iowa Testing Programs).
Reliability: [Reliability tells us] the degree to which the
results of an assessment are dependable and consistently measure particular
student knowledge and/or skills. Reliability is an indication of the
consistency of scores across raters, over time, or across different
tasks or items that measure the same thing. Thus, reliability may be
expressed as (a) the relationship between test items intended to measure
the same skill or knowledge (item reliability), (b) the relationship
between two administrations of the same test to the same student or
students (test/retest reliability), or (c) the degree of agreement between
two or more raters (rater reliability). An unreliable assessment cannot
be valid (Glossary of Useful Terms). [Also see Inter-rater Reliability].
*Safe Harbor: The State, school districts, schools, and each
subgroup of 40 or more students [50 for students with disabilities in
Wisconsin] must reach the performance targets for increasing proficiency
in reading and math to make AYP. However, there is an exception to that
requirement. The State, school districts and schools may still make
AYP if each subgroup that fails to reach its proficiency performance
targets reduces its percentage of students not meeting standards by
10% of the previous year's percentage, plus the subgroup must meet the
attendance rate or graduation rate targets (Illinois State Board of
Education)
Scholastic Achievement Test (SAT): The Scholastic Achievement
Test is taken by high school students for admission to many undergraduate
college programs. The SAT measures a student’s knowledge and skills
in Critical Reading, Mathematics, and Writing. Each section of the SAT
is scored on a scale of 200-800, meaning that the composite score ranges
from 600 to 2,400.
*Scale Scores: These are test scores based on a scale ranging
from 001 to 999. Scale scores are useful in comparing performance in
one subject area across classes, schools, districts, and other large
populations, especially in monitoring change over time. Scores on the
Wisconsin Knowledge & Concepts Examinations are reported as scale
scores (also as percentiles and performance levels). A scale score on
a test is similar to the score given in certain sports such as skating
and diving. For example, in figure skating, a participant’s score
typically is based on two factors—the degree of difficulty and
the quality of the performance.
On the state tests, a child’s score is based primarily on the
number of difficulty of questions answered correctly. Because challenging
test questions are given more “weight,” two students both
could answer 25 questions correctly yet end up with different scale
scores. This is because one student could have answered 25 relatively
“easy” questions, while the second student might have answered
some very difficult questions along with those that are less difficult.
By knowing a student’s score in any of the subjects tested, you
can tell if he or she scored at the lower, middle, or upper ends of
the scale. The higher your child’s scale score, the better he
or she did on the test.
Scale scores on Wisconsin’s tests can be used to measure progress
in the same subject area over time. However, because each subject has
its own scale, the scores in different subjects don’t have the
same meaning.
Scoring Rubric: Specific sets of criteria that clearly define
for both student and teacher what a range of acceptable and unacceptable
performance looks like. Criteria define descriptors of ability at each
level of performance and assign values to each level. Levels referred
to are proficiency levels which describe a continuum from excellent
to unacceptable product (Glossary of Useful Terms).
*Selected Response Item: This is an exercise in which examinees
must choose a response from an enumerated set [multiple choice, true
false, matching] rather than create their own responses or products
(e.g., performance assessment such as a written response to a question)
(Glossary of Useful Terms).
Standard Error of Measurement (SEM): Whenever a student is tested,
the score he or she receives (the observed score) is said to be an estimate
of a student’s “true score” (what the student really
knows and is able to do). Suppose that a student receives a test score
of 75 with a standard error of measurement of 4. This tells us that
the student’s true score falls between 71 and 79 (one SEM below
and one SEM above the observed score. The Standard Error of Measurement
reminds us that any test score is an estimate of what a student knows
and is able to do.
Standardization: A consistent set of procedures for designing,
administering, and scoring an assessment. The purpose of standardization
is to assure that all students are assessed under the same conditions
so that their scores have the same meaning and are not influenced by
differing conditions. Standardized procedures are very important when
scores will be used to compare individuals or groups (Glossary of Useful
Terms).
*Standardized Achievement Test: An objective test that is given
and scored in a uniform manner. Standardized tests are carefully constructed
and items are selected after trials for appropriateness and difficulty.
Tests are issued with a manual giving complete guidelines for administration
and scoring. The guidelines attempt to eliminate extraneous interference
that might influence test results. Scores are often are often norm-referenced.
(They can be criterion-referenced also). A test designed to be given
under specified, standard conditions to obtain a sample of learner behavior
that can be used to make inferences about the learner's ability. Standardized
testing allows results to be compared statistically to a standard such
as a norm or criteria. If the test is not administered according to
the standard conditions, the results are invalid (Assessment Terminology:
A Glossary of Useful Terms). [Also see standardization].
Standard Deviation: Standard deviation is a measure of how spread
out, or bunched together, the numbers are in some data set. For example,
if test scores are bunched close together (meaning all students score
about the same), the standard deviation will be small. Conversely, if
the data points are spread out (meaning that many are far from the mean),
then the standard deviation will be large. In this example, small and
large are relative terms.
For purposes of illustration, let’s assume that a group of students
is tested and that the scores are normally distributed with a mean of
77 and a standard deviation of 5. When the data points in a set are
normally distributed, approximately two-thirds of the scores (68%) are
plus or minus one standard deviation from the mean. In this example,
we can say that two thirds of students had a test score between 72%
and 82% correct (+/- 1 standard deviation (5%). Two standard deviations
away from the mean (67% - 87%) account for approximately 95% of scores,
while three standard deviations (62% - 92%). account for 99%. [Also
see Normal Curve].
*Student with Disabilities: In the Individuals with Disabilities
Education Act (IDEA), a student with disabilities is defined as “a
child evaluated in accordance with §§300.530-300.536 as having
mental retardation, a hearing impairment including deafness, a speech
or language impairment, a visual impairment including blindness, serious
emotional disturbance (hereafter referred to as emotional disturbance),an
orthopedic impairment, autism, traumatic brain injury, an other health
impairment, a specific learning disability, deaf-blindness, or multiple
disabilities, and who, by reason thereof, needs special education and
related services.” Section 504 of the Rehabilitation Act of 1973
includes the following definitions: j)Handicapped person – (1)Handicapped
person means any person who (i) has a physical or mental impairment
which substantially limits one or more major life activities,(ii) has
a record of such an impairment, or (iii)is regarded as having such an
impairment. (2)As used in paragraph (j)(1)of this section, the phrase:
(i)Physical or mental impairment means (A)any physiological disorder
or condition, cosmetic disfigurement, or anatomical loss affecting one
or more of the following body systems: neurological; musculoskeletal;
special sense organs; respiratory, including speech organs; cardiovascular;
reproductive, digestive, genito-urinary; hemic and lymphatic; skin;
and endocrine; or (B) any mental or psychological disorder, such as
mental retardation, organic brain syndrome, emotional or mental illness,
and specific learning disabilities. (ii)Major life activities means
functions such as caring for one’s self, performing manual tasks,
walking, seeing, hearing, speaking, breathing, learning, and working.
(iii)Has a record of such an impairment means has a history of, or has
been misclassified as having, a mental or physical impairment that substantially
limits one or more major life activities. (iv)Is regarded as having
such an impairment means (A) has a physical or mental impairment that
does not substantially limit major life activities but that is treated
by a recipient as constituting such a limitation;(B)has a physical or
mental impairment that substantially limits major life activities only
as a result of the attitudes of others toward such impairment; or (C)has
none of the impairments defined in paragraph (j)(2)(i)of this section
but is treated by a recipient as having such an impairment (Glossary
of Assessment Terms and Acronyms ).
*Subgroup: A well-defined group of students. It is important
in this context because the requirements of No Child Left Behind. No
Child Left Behind identifies the following specific subgroups that must
achieve Adequate Yearly Progress: students of racial or ethnic minority,
students with disabilities, gender, limited-English-proficient (LEP)
students, and economically disadvantaged students (Glossary of Assessment
Terms and Acronyms ).
[In Wisconsin, scores for subgroups are reported only if there are
at least 40 students (50 for students with disabilities). Thus, if a
school has 20 students in a subgroup, the results would not be reported
(and the school would not be penalized if these students failed to meet
AYP requirements). However, if there are 40 students in the district,
then the results would be reported and AYP requirements would have to
be met for the district as a whole].
Test bias: A test item or test that is biased is one in which
there is differential performance for students from different group
who have the same ability levels. Bias can be the result of any number
of factors, including use of unfamiliar language, use of an unfamiliar
test format or item structure, use of stereotypes, etc. It’s important
to recognize that even though two groups score differently on a test
item or entire test, this does not mean there is bias. The groups may
have different levels of ability that account for the differences in
scores.
Test security: Established procedures to ensure current or future
confidentiality, fidelity, and integrity of a test whereby public access
is limited and strictly monitored, with clearly outlined consequences
for breaches in test security (Glossary of Assessment Terms and Acronyms
).
*Title I: Title I is part of the Elementary and Secondary Education
Act that was passed in 1965. This law established a number of programs
that distribute federal funds to schools and school districts with high
percentages of low income students. Schools and districts that receive
Title I funds must meet federal rules and guidelines, including the
requirements of No Child Left Behind. [Also see ESEA and Students with
Disabilities].
Validity: The extent to which an assessment measures what it
is supposed to measure and the extent to which inferences and actions
made on the basis of test scores are appropriate and accurate. For example,
if a student performs well on a reading test, how confident are we that
that student is a good reader? A valid standards-based assessment is
aligned with the standards intended to be measured, provides an accurate
and reliable estimate of students' performance relative to the standard,
and is fair. An assessment cannot be valid if it is not reliable (Glossary
of Useful Terms). [Also see Reliability].
Value-added Assessment (Value-added Measurement): Value-added
assessment is a method of analyzing and reporting student test results
based on improvement (“growth”) in standardized test scores
over two or more points in time. This procedure contrasts with more
traditional approaches, which analyze and report test results at a single
moment in time. Both methods use standardized achievement tests, but
value-added measurement compares each student’s latest test score
with the same student’s past test score to determine growth or
improvement. Within the community of measurement experts there is considerable
debate about value-added assessment.
*Wisconsin Knowledge & Concepts Examinations: Beginning in
the 2005-06 school year, the federal No Child Left Behind Act requires
all states to test all students in reading and mathematics in grades
3 through 8 and once in high school (grade 10 under Wisconsin law s.118.30).
These tests are referred to as the Wisconsin Knowledge and Concepts
Examination - Criterion-Referenced Tests (WKCE-CRT) and replace the
WKCE reading and mathematics tests beginning in the Fall of 2005. Student
performance on these assessments is reported in proficiency categories
and used to determine the adequate yearly progress of students at the
school, district and state levels.
These standardized tests include commercially-developed questions used
in schools across the country and questions developed specifically for
Wisconsin in order to improve coverage of Wisconsin academic standards.
The WKCE-CRT measures achievement in reading, language applications,
mathematics, science, and social studies using multiple-choice and short-answer
questions. Students also provide a rough draft writing sample (Wisconsin
Department of Public Instruction).
*Wisconsin Student Assessment System (WSAS): The Wisconsin Department
of Public Instruction defines the WSAS as follows: one way that students
demonstrate their progress toward achieving the academic standards in
English language arts, mathematics, science, and social studies is through
participation in the Wisconsin Student Assessment System (WSAS). At
present the WSAS includes both regular assessments taken by nearly all
students and alternate assessments taken by certain students with limited
English proficiency or disabilities.
Beginning in the 2005-06 school year, the federal No Child Left Behind
Act required all states to test all students in reading and mathematics
in grades 3 through 8 and once in high school (grade 10 under Wisconsin
law s.118.30). These tests are referred to as the Wisconsin Knowledge
and Concepts Examination Criterion-Referenced Tests (WKCE-CRT) and replace
the WKCE reading and mathematics tests beginning in Fall 2005. Student
performance on these assessments is reported in proficiency categories
and used to determine the adequate yearly progress of students at the
school, district and state levels. WSAS regular assessments also include
DPI-approved, locally-adopted and locally-scored supplemental assessments.
WSAS alternate assessments are alternatives to WSAS regular assessments
and consist of DPI-approved protocols and rubrics for the local collection
and local scoring of student work (Wisconsin Department of Public Instruction)
*The Wisconsin Alternate Assessment for Students with Disabilities
(WAA-DIS): This is a test given to students in Wisconsin whose disabilities
do not allow them to take the regular Knowledge & Concepts Examinations.
The scores are shown for Pre-requisite skill levels, ranging from Pre-Minimal
to Pre-Advanced.
*Wisconsin Alternate Assessment for English Language Learners (WAA-EEL):
The WAA-EEEL is a test given to students in Wisconsin whose English
skills are not adequate to take the regular Knowledge & Concepts
Examinations. The proficiency levels on the WAA for Limited English
Proficient Students are equivalent to those for the regular tests.
Developed by Russ Allen, PhD,
Teaching and Learning, WEAC
Sources:
Posted October 29, 2006