| SEARCH OnWEAC |
|---|
Introduction
For the past few years critics of current assessment practices have called for dramatic changes in how we assess what students know and are able to do. Most of the criticism has been directed at the widespread use of standardized achievement tests in our schools; however, many teacher-made tests and tests found in textbooks have similar weaknesses and limitations.1 Those who propose changes in assessment rest their argument on the premise that what we assess and how we assess it affects both what is taught and the way it is taught. Critics of current assessment practices argue that the goal should be to have students who can create, reflect, solve problems, collect and use information, and formulate interesting and worthwhile questions. Thus, it is argued, our assessments - whether they are developed by teachers, writers of textbooks, or large corporations - must measure the extent to which students have mastered these types of knowledge and skills.
This is not to say that concepts, facts, definitions, dates, names, and locations have no place in education. However, as these critics point out, many of our assessment practices place too much emphasis on assessing content and give far too little attention to the skills and knowledge listed above. They also argue that we must no longer treat assessment (testing) as fundamentally separate from instruction. If curriculum, instruction, and assessment are integrated, the assessment itself becomes a valuable learning experience. Their conclusion is that by requiring students to complete high quality performance tasks we have the potential to bring about significant and positive changes in instruction and learning.2
This paper is intended to provide an introduction to some of the important ideas associated with the concepts of performance assessment, authentic assessment, authentic instruction, performance criteria, and portfolios. It first summarizes the criticisms made of standardized achievement tests and of curriculum and instruction organized for the purpose of teaching subject matter content. Following that is a discussion of performance assessment, authentic assessment, and authentic instruction and learning. Attention then is directed to performance criteria and portfolios. The paper concludes by suggesting some ideas for getting started and by offering an example of a performance task.
Because the discussion of these topics is relatively brief, those who wish to develop and implement performance assessments and portfolios will need to read further and also to consult with experienced practitioners.
The Popularity of Standardized Achievement Tests
Ten or fifteen years ago, few persons questioned the widespread use of standardized achievement tests in our schools. After all, standardized achievement tests take relatively little time to administer and are inexpensive. In addition, the results are simple to report and understand. Often a single score, such as a percentile rank, standard score, or grade equivalent is reported for each student, and aggregate scores are reported for a classroom, school, or school district. Finally, and very significantly, standardized achievement tests are promoted as "objective" measures of achievement, meaning that the results are not affected by the personal values or biases of the person who scores the test.
For many individuals, an assessment system relying on objective measures of achievement appears entirely appropriate. Standardized achievement tests are promoted as scientifically-developed instruments which are valid and reliable measures of what a student knows and is able to do. They originated at a time when it seemed both necessary and logical to teach students a given body of subject matter content. Furthermore, many learning theorists believed that teaching and learning were most effective when concepts and ideas were broken into smaller and smaller components. Standardized achievement tests reflected these assumptions and practices, for they were specific to each discipline and typically used a set of multiple choice items to "sample" the scope of a particular discipline. Advocates of standardized testing assumed that a student who had a command of the pieces (e.g., specific knowledge and facts) also would have a good understanding of the larger content domain.
The results of standardized achievement tests served, and continue to serve, a variety of purposes. Unfortunately, many of these purposes are not justified. Test scores are used to compare students with other students, to place pupils into groups or programs, and to guide and counsel students. The results also are used to evaluate teachers, administrators, and even the quality of a school district's entire curricular and instructional program.
To a certain extent, most teachers require their students to demonstrate competency by having them perform or develop projects. However, this practice seldom extends to school- or district-wide testing programs. Instead, many district and school level testing programs are based primarily, if not exclusively, on the use of one or more batteries of commercially-produced, norm-referenced, standardized achievement tests.
Consequently, each year in many school districts students at several grade levels are tested using standardized achievement tests; invariably, the results show that most students perform far above the "national average."
Criticisms of Standardized Achievement Tests
There is no shortage of critics or criticisms of standardized achievement tests. Examples include the following:
Although this list of criticisms is directed specifically at standardized achievement tests, many also apply to teacher-made tests and tests supplied by textbook publishers.
Criticism of Content-Based Curriculum and Instruction
Along with those who criticize the excessive use of standardized achievement tests in our schools are others who maintain that too much of curriculum and instruction is organized for the purpose of teaching content. Although the critics of assessment and of instruction have a different focus, their conclusions are the same. Both groups maintain that we fail to teach and assess the skills and knowledge which are highly valued.
Critics of current instructional practices state that in too many places instruction is teacher- dominated and that students are expected to be passive learners. Glickman (1991), for example, asserts that little has changed in classroom teaching over the past half century: "The majority of classroom time is spent on teachers lecturing, students listening, students reading textbooks, or students filling out worksheets. To observe classrooms now is to observe them 50 years ago . . ." (p. 5).
Similar criticisms were made a few years ago by National Assessment of Educational Progress in its summary of twenty years of national testing: "Across the past 20 years little seems to have changed in how students are taught. Despite much research suggesting better alternatives, classrooms still appear to be dominated by textbooks, teacher lectures, and short - answer activity sheets" (Mullis, et al., 1990, p. 10).
One person who is especially critical of curriculum and instruction organized around subject matter is Grant Wiggins. In a 1989 article entitled, "The Futility of Trying to Teach Everything of Importance," Wiggins criticizes those who seek to teach everything of importance because it reduces education to trivia, forgettable verbalisms, or lists. The alternative, he argues, is to teach students to know and do a few things well. Specifically, he states that we should seek to develop "habits of mind and high standards of craftsmanship". Wiggins further states that we should seek to develop in students a ". . . disgust for thoughtless, superficial, and shoddy academic work." If this is a goal, Wiggins asserts that curriculum design can finally ". . . be liberated from the sham of typical scope and sequence whereby it is assumed that a logical outline of all adult knowledge is translatable into complete lessons, and where a fact or theory encountered once in the 8th grade as a spoken truism is somehow to be recalled and intelligently used in the 11th " ( p. 45).
As an alternative, Wiggins argues, curriculum should be organized to accomplish four purposes: (1) to equip students with the ability to further their superficial knowledge through careful questioning; (2) to enable them to turn those questions into warranted, systematic knowledge; (3) to develop in students high standards of craftsmanship; and (4) to engage students so thoroughly in important questions that they learn to take pleasure in seeking important knowledge.
Theodore Sizer, founder of the Coalition of Essential Schools, advocates a similar message, for he states that all students should be required to demonstrate competency with performances or exhibitions. Further, Sizer maintains that all decisions about a school's curriculum should flow from the devising of a "culminating exhibition" at graduation. Sizer maintains that schools should seek to graduate students who have the ability to synthesize information, to practice cross-disciplinary inquiry, to formulate and answer questions, and to judge the quality of evidence. Thus, maintains Sizer, we must design courses and activities that engage students directly in these kinds of matters (Performances and Exhibitions, p. 3).
Performance Assessment, Authentic Assessment, Authentic Instruction and Learning
Those who propose that we change assessment (and instructional) practices use terms and concepts, which although different, mean much the same. These terms include performance assessment, authentic assessment, and authentic instruction and learning.
Performance Assessment
In its simplest terms, a performance assessment is one which requires students to demonstrate that they have mastered specific skills and competencies by performing or producing something. 3 Advocates of performance assessment call for assessments of the following kind: designing and carrying out experiments; writing essays which require students to rethink, to integrate, or to apply information; working with other students to accomplish tasks; demonstrating proficiency in using a piece of equipment or a technique; building models; developing, interpreting, and using maps; making collections; writing term papers, critiques, poems, or short stories; giving speeches; playing musical instruments; participating in oral examinations; developing portfolios; developing athletic skills or routines, etc.
Authentic Assessment
Similar to performance assessment is the concept of authentic assessment. Meyer (1992) notes that performance and authentic assessments are not the same, and that a performance is "authentic" to the extent it is based on challenging and engaging tasks which resemble the context in which adults do their work. In practical terms, this means that an authentic task or assessment is one in which students are allowed adequate time to plan, to complete the work, to self-assess, to revise, and to consult with others. Meyer also contends that authentic assessments must be judged by the same kinds of criteria (standards) which are used to judge adult performance on similar tasks.
A more elaborate definition of authenticity is offered by Wiggins (1990, CLASS), who suggests that three factors determine the authenticity of an assessment: the task, the context, and the evaluation criteria. An authentic task is one which requires the student to use knowledge or skills to produce a product or complete a performance. Based on this definition, memorizing a formula would not be an authentic task; however, using the formula to solve a practical problem would be.
As for context, Wiggins suggests that there be as much realism as is possible. He maintains that the setting (including the time allowed to complete the task) should mimic or duplicate the context faced by professionals, citizens, and consumers. An examination in which the student has almost no prior knowledge of what will be asked, little time to complete the activity, and no opportunity to reflect or consult appropriate resources would not be authentic.
Finally, Wiggins states that an authentic assessment should be judged using criteria which are similar to those used to judge adults who perform or produce. As an example, authentic criteria used to evaluate a written paper would give primary consideration to the paper's organization and ideas; mechanical errors (such as spelling, punctuation, grammar) would not be the primary focus.
What is to be made of the distinction between performance and authentic assessments? Fortier (1993) notes that authenticity is always a relative concept and that it is unrealistic to expect that an assessment will be completely authentic. For example, he points out that a driving test, even though most would define it as authentic when compared with a paper and pencil test, can never be completely such because drivers do not ordinarily have a law officer seated next to them while they drive.
In short, as the term is used in the literature, an authentic performance assessment requires students to demonstrate skills and competencies which realistically represent those needed for success in the daily lives of adults. Authentic tasks are worth repeating and practicing. They require students to apply what they know, not merely to recall or recognize information. Finally, authentic tasks are those which are judged by criteria or standards similar to those used to evaluate the efforts of adults.
Authentic Instruction and Learning
Similar to performance or authentic assessment is the term authentic learning and instruction. Although this term refers to instruction and learning, it is appropriate to discuss it within the framework of assessment because those who call for changes in either assessment or instruction maintain that assessment and instruction must be integrated. In a 1993 article in Educational Leadership, Newmann and Wehlage use the concept "authentic instruction" to describe instruction which results in significant and meaningful student achievement, in contrast with that which is trivial and useless.4
In particular, Newmann and Wehlage maintain that instruction is authentic if it helps students achieve three broad goals:
To help the reader understand the concept of authentic instruction, the authors offer five standards or criteria, each based on a five-point scale, which can be used to evaluate the extent to which a lesson is authentic. These criteria, with explanations in parentheses, are as follow
Performance Criteria 5
Advocates of performance assessments maintain that every task must have performance criteria for at least two reasons: (1) the criteria define for students and others the type of behavior or attributes of a product which are expected, and (2) a well-defined scoring system allows the teacher, the students, and others to evaluate a performance or product as objectively as possible. If performance criteria are well defined, another person acting independently will award a student essentially the same score. Furthermore, well-written performance criteria will allow the teacher to be consistent in scoring over time.
Stiggins (1991) notes that if a teacher fails to have a clear sense of the full dimensions of performance, ranging from poor or unacceptable to exemplary, he or she will not be able to teach students to perform at the highest levels or help students to evaluate their own performance.
In developing performance criteria, Stiggins maintains that one must both define the attribute(s) being evaluated and also develop a performance continuum. For example, one attribute in the evaluation of writing might be writing mechanics, defined as the extent to which the student correctly uses proper grammar, punctuation, and spelling. As for the performance dimension, it can range from high quality (well-organized, good transitions with few errors) to low quality (so many errors that the paper is difficult to read and understand).
The key to developing performance criteria, asserts Stiggins, is to place oneself in the hypothetical situation of having to give feedback to a student who has performed poorly on a task. Stiggins suggests that a teacher should be able to tell the student exactly what must be done to receive a higher score. If performance criteria are well defined (with examples provided whenever possible), the student then will understand what he or she must do to improve.
It is possible, of course, to develop performance criteria for almost any of the characteristics or attributes of a performance or product. However, experts in developing performance criteria warn against evaluating those aspects of a performance or product which are easily measured (such as counting mechanical errors) or failing to distinguish between quality and quantity. Ultimately, it is asserted, performances and products must be judged on those attributes which are most crucial.
Portfolios
Invariably, proponents of performance assessment also advocate the use of student portfolios. In doing so, they also remind us that a portfolio is more than a folder stuffed with student papers, video tapes, progress reports, or related materials. It must be a purposeful collection of student work that tells the story of a student's efforts, progress, or achievement in a given area over a period of time. If it is to be useful, specific design criteria also must be used to create and maintain a portfolio system.
Typically, proponents of portfolios suggest two reasons for their use. The first reason reflects dissatisfaction with the kind of information typically provided to students, parents, teachers, and members of the community about what students have learned or are able to do. As examples, we are reminded that traditional grading systems ("A's", "B's", etc. ) or test scores (percentile scores or percent correct) tell us almost nothing about what a student has learned or is able to do.
Second, it is argued that a well-designed portfolio system, which requires students to participate in the selection process and to think about their work, can accomplish several important purposes: it can motivate students; it can provide explicit examples to parents, teachers, and others of what students know and are able to do; it allows students to chart their growth over time and to self-assess their progress; and, it encourages students to engage in self-reflection.
Frazier and Paulson (1992) argue that the primary worth of portfolios is that they allow students the opportunity to evaluate their work. Further, ". . . portfolio assessment offers students a way to take charge of their learning; it also encourages ownership, pride, and high self-esteem" (p. 64).
Vavrus (1990) notes that several decisions must be addressed prior to establishing a portfolio system. The decisions, with some of her recommendations, follow.
As an example of a well-designed portfolio system, one only need look to the Walden III Alternative High School in Racine, Wisconsin. This school requires that all students complete a portfolio before graduation and also demonstrate mastery of the sixteen topics addressed in the portfolio by giving presentations before a committee consisting of staff, another student, and an outside adult.7 Among the purposes of the portfolio is to require graduating seniors to look at themselves in-depth. Although students are encouraged to be creative and unique, the portfolio has several basic requirements regarding its form and content. For example, it must be well written, with a title page, table of contents, and specific headings. Students also are encouraged to include photographs, charts, drawings, and appendices along with samples of their work.
Developing Performance Tasks
Developing performance tasks or performance assessments seems reasonably straightforward, for the process consists of only three steps.8 The reality, however, is that quality performance tasks are difficult to develop. With this caveat in mind, the three steps, with a brief discussion of each, follow.
Step 1. List the skills and knowledge you wish to have students learn as a result of completing a task.
As tasks are designed, one should begin by identifying the types of knowledge and skills students are expected to learn and practice. These should be of high value, worth teaching to, and worth learning. In order to be authentic, they should be similar to those which are faced by adults in their daily lives and work.
Herman, Aschbacher, and Winters (1992, pp. 25-26) suggest that educators need to ask themselves five questions as they identify what is to be learned or practiced by completing a performance task. Their questions, with examples, follow:
Step 2. Design a performance task which requires the students to demonstrate these skills and knowledge. The performance tasks should motivate students. They also should be challenging, yet achievable. That is, they must be designed so that students are able to complete them successfully. In addition, one should seek to design tasks with sufficient depth and breadth so that valid generalizations about overall student competence can be made.
Herman, Aschbacher, and Winters (p. 31) have a list of questions which are helpful in guiding the process of developing performance tasks.Those questions, with their recommendations, follow:
Step 3. Develop explicit performance criteria which measure the extent to which students have mastered the skills and knowledge.
It is recommended that there be a scoring system for each performance task. The performance criteria consist of a set of score points which define in explicit terms the range of student performance. Well-defined performance criteria will indicate to students what sorts of processes and products are required to show mastery and also will provide the teacher with an "objective" scoring guide for evaluating student work. The performance criteria should be based on those attributes of a product or performance which are most critical to attaining mastery. It also is recommended that students be provided with examples of high quality work, so they can see what is expected of them.
Additional Recommendations for Developing Performance Tasks
An Example of a Performance Task
The last part of this paper presents an example of a performance task which requires students to interview adults and develop written and oral reports. Although the task is intended for use at the secondary level, the format is appropriate for younger children. Thus, one could modify the sample task to have elementary students study a community's history or investigate important community issues or problems.
Along with this task are two examples of performance criteria which could be used to evaluate the student's written assignment. Similar performance criteria would have to be developed if the other skills involved in this task were to be evaluated, such as interviewing, speaking, and working cooperatively with others.
The sample performance task, The Effect of the Great Depression on the Lives of Average People, was developed for use at the secondary level.9 In addition to having students learn more about the 1930's Depression, the task is designed to help students learn and practice the following kinds of skills: developing questionnaires; interviewing, taking notes and transcribing them; working with other students; analyzing data (questionnaire responses); developing conclusions, generalizations, and hypotheses; giving an oral presentation; and writing a report. This task also brings students into contact with members of the community.
The task consists of six steps:
Performance Criteria
Performance criteria for the written report, as described in Step 6, follow. The criteria are of two kinds: one for writing mechanics, the other for content. 10
Scoring Criteria (Mechanics)
4 = the paper is easy to read and uses appropriate format. It is carefully proofread to correct spelling, capitalization, punctuation and usage errors. It is written in complete sentences and uses paragraphs correctly.
3 = the paper is generally well proofread and uses appropriate format but has occasional minor lapses.
2 = the paper may lack the appropriate format. It is proofread but may display errors in spelling, capitalization, punctuation and usage. It is written in complete sentences but may not be paragraphed correctly.
1 = The paper is poorly presented, indicating the author is unaware of the requirements of written communications. It will have a significant number of proofreading errors, sentence fragments, and/or flaws in usage.
0 = The student failed to attempt the paper.
Scoring Criteria (Content)
4 = The paper is written in a style appropriate to the genre being assessed. It is well organized, clearly written, and meets the needs of the author and reader. It will contain sufficient details, examples, descriptions and insights to engage the reader. The author will bring closure through a resolution of a problem or a summary of the topic.
3 = The paper is written in an appropriate style and format. It may appear to be well organized and clearly written but may demonstrate minor lapses in the communication to the reader. It may be missing some details and/or examples, and offer incomplete descriptions and fewer insights into the characters and/or topics. The author may not sufficiently close the piece of writing and may leave the reader "hanging" or may offer the reader an inappropriate closing or ending.
2 = The paper may demonstrate an incomplete or inadequate knowledge of the skills assessed. Significant flaws may be evident as the author fails to address the prompt in an appropriate manner, ideas may be conveyed in a random method, and very little is given in proof, details, facts, examples or descriptions. Closure is often missing.
1 = The paper will barely attempt the task. The general idea may be conveyed, but there will be a definite lack of understanding on the part of the author regarding the appropriate format or procedures.
0 = The student failed to attempt the paper.
Conclusion
This paper began with a discussion of the criticisms made of current assessment and instructional practices. It was noted that critics maintain that we often fail to teach and assess the kinds of skills and knowledge which have lasting value for students.
Following this was a discussion of the kinds of weaknesses ascribed to standardized achievement tests. However, it was pointed out that many tests provided by publishers of textbooks and teacher-developed tests have similar weaknesses and limitations. As for instruction, it was noted that too much of instruction remains teacher-dominated (with lectures), and that students all too frequently are taught subject matter content at the expense of important skills.
At the end of this paper an example of a perform-ance task was offered. This task requires students to learn and demonstrate a variety of skills and knowledge, ranging from developing questionnaires and interviewing adults to analyzing and reporting data.
This exercise was presented in order to show how teachers might design performance-based instruction and assessment for use in their classroom. This type of performance task has several positive features. It engages the learner, rather than having the teacher dominate the learning prcess and tell students what is important. It also illustrates how instruction and assessment can be integrated. In addition, this type of performance task meets the definition of authenticity, for it replicates the kind of work done by many students. Finally, and perhaps most important, it teaches students the kinds of skills and knowledge which we want them to master.
Selected Bibliography
Archbald, Doug A. and Newmann, Fred M. Beyond Standardized Testing. Reston, Virginia: National Association of Secondary School Principals , 1988.
Baron, Joan Boykoff, et al. "Toward a New Generation of Student Outcome Measures: Connecticut's Common Core of Learning Assessment." Paper presented at the Annual Meeting of the American Education Research Association, March 27 - 31, 1989. San Francisco, CA.
Baron, Joan Boykoff, "Performance Assessment: Blurring the Edges among Assessment, Curriculum, and Instruction." This Year in School Science Washington, D.C.: American Association for the Advancement of Science 1990.
Barth, Patti and Mitchell, Ruth. Smart Start: Elementary Education for the 21st Century. Golden Colorado: North American Press, 1992.
Cronin , John F. Four Misconceptions about Authentic Learning." Educational Leadership April 1993): 78 - 81.
Fortier, John, The Wisconsin Road Test as an Empirical Example of a Large-Scale, High-Stakes, Authentic Performance Assessment. Madison, Wisconsin: Wisconsin Department of Public Instrucion , 1993.
Frazier, Darlene M. and Paulson, F. Leon. "How Portfolios Motivate Reluctant Writers." Educational Leadership (May 1992): 62-65.
From Gatekeeper to Gateway: Transforming Testing in America. Boston College, Chestnut Hill, Massachusetts: National Commission on Testing and Public Policy, 1990.
Glickman, Carl. "Pretending Not to Know What We Know." Educational Leadership May 1991): 4 -10.
Herman, Joan L., Aschbacher, Pamela R., and Winters, Lynn. A Practical Guide to Alternative Assessment. Alexandria, Virginia: Association for Supervision and Curriculum Development, 1992.
Lipman, M., "Some Thoughts on the Formation of Reflective Education." In Teaching-Thinking Skills: Theory and Practice , pp. 151-161. Edited by J.B. Baron and R. J.Sternberg. New York: W. H. Freeman, 1987.
Meyer, Carol. "What's the Difference Between Authentic and Performance Assessment?" Educational Leadership (May 1992): 39-42.
Mislevy, Robert J. Foundations of a New Test Theory. Princeton, New Jersey: Educational Testing Service, 1989.
Mullis, Ina V.S., Owen, Eugene H., and Phillips, Gary W. Accelerating Academic Achievement: A Summary of Findings from 20 Years of NAEP. Princeton, New Jersey: Educational Testing Service, 1990.
Newmann, Fred M. and Wehlage, Gary G. "Five Standards of Authentic Instruction." Educational Leadership (April 1993): 8-12.
"Performances and Exhibitions: The Demonstration of Mastery." Horace ( March 1990) :1-12.
Resnick, L.B., and Resnick D.P. Assessing the Thinking Curriculum: New Tools for Educational Reform. Pittsburgh, Pennsylvania: Learning Research and Development Center: University of Pittsburgh and Carnegie Mellon University, 1989.
Stiggins, Richard J. "Assessment Literacy." Phi Delta Kappan (March 1991): 534-539.
Classroom Assessment Based on Observation and Judgment: A workshop in the NWREL Classroom Assessment Training Program. Portland, Oregon: Northwest Regional Educational Laboratory, 1991.
" A True Test: Toward More Authentic and Equitable Assessment." Educational Leadership (May 1989): 703-713
Testing: Where We Stand. Arlington, Virginia: American Association of School Administrators, 1989.
Vavrus, Linda, "Put Portfolios to the Test." Instructor (August 1990): 48 - 52.
UCLA Graduate School of Education. Proceedings of the 1992 CRESST Conference, "What Works in Performance Assessment?" Los Angeles, CA: 1993.
Wiggins, Grant. "Creating Tests Worth Taking." Educational Leadership (May 1992): 26 - 35.
"The Futility of Trying to Teach Everything of Importance." Educational Leadership, (November 1989): 44-59.
"Standards, Not Standardization: Evoking Quality Student Work." Educational Leadership (February 1991): 18 - 25.
"Toward More Instructionally-Appropriate and Effective Testing: Authentic Assessment." Published by the Center for Research on Evaluation, Standards, and Student Testing, UCLA (1990).
"A True Test: Toward More Authentic and Equitable Assessment." Educational Leadership (May 1989): 703-713
Wolf , Dennie Palmer. "What Works in Performance Assessment?" Proceedings of the 1992 CRESST Conference." UCLA Graduate School of Education, Los angeles, CA: 1993.
This paper was prepared by Russ Allen,
research consultant
in the WEAC Instruction and Professional Development Division.