Empirical Evidence on Achievement

Introduction

Some critics of American public education would have us believe that today's students are far less knowledgeable and skilled than students of twenty, thirty, or forty years ago. The "evidence" to support this conclusion is largely anecdotal, based on limited personal experiences, or is a consequence of selective interpretation of test scores. The reality is that there are no empirical data to support such a bleak picture of America's students or public schools.

This does not mean that the public schools are without weaknesses. There are serious problems which need to be addressed; however, there is no evidence to suggest that the public schools as a group are as bad as some critics suggest.

For example, consider student performance on the Iowa Test of Basic Skills and the Iowa Test of Educational Development, two widely used commercially-developed, norm-referenced tests. Scores on these tests, which can be tracked all the way back to the 1930's, were at record highs in the 1990's (Bracey, 1995).

  • 68% of teachers would like to see greater use of instructional techniques which focus on critical thinking and problem-solving, with less emphasis given to mastery of content.
  • 95% of teachers would like kindergarten through 3rd grade classes to have no more than 20 students.
  • 62% of teachers would like each teacher to have a ten-minute break each morning and afternoon.

Graduation rates for Wisconsin over the past half century also show remarkable changes. In 1950, only one-third of Wisconsin's adults had a high school diploma or more. By 1990, nearly 80% of adults in Wisconsin had at least a high school diploma. Likewise, in 1950, only 12.9% of Wisconsin's adults, aged 25 or more, had formal education beyond high school. By 1990, this figure had increased more than three-fold, to 41.5%.

Berliner and Biddle in The Manufactured Crisis (1995) challenge those who argue that today's students are not as intelligent or able as students of the past. They offer the following points:

  • ". . . since 1932 the mean IQ for white Americans aged two to seventy-five has risen about .3 points per year" (p. 43). Scores for other groups are not available.
  • "In the United States, today's youth probably average about 15 IQ points higher than did their grandparents and 7.5 points higher than did their parents on the Stanford-Binet and Wechsler tests" (p. 43).
  • "Or to put this another way, the number of students expected to have IQ's of 130 or higher--the typical cut-off point for defining giftedness in many school districts throughout the nation--is now about seven times greater than it was for the generation now retiring from leadership positions in the country and often complaining about the poor performance of today's youth. Now that is something to contemplate" (p. 44).

This section of the paper briefly summarizes six kinds of information regarding student achievement and competency: (1) National Assessment of Educational Progress (NAEP), (2) SAT scores, (3) ACT scores, (4) Wisconsin's high school graduation rate, (5) the Wisconsin Student Assessment System, and (6) International Assessments. There also are a few comments about the Sandia Report.

1. National Assessment Results

Since 1969, the National Assessment of Educational Progress (NAEP) has tested national samples of students ages nine, thirteen, and seventeen. In general, the scores of students in reading and mathematics have been stable over the past two decades, whereas scores in science are down slightly.

Berliner and Biddle note that ". . . evidence from the NAEP also does not confirm the myth of a recent decline in American student achievement. Instead, it indicates a general pattern of stable achievement combined with modest growth in achievement among students from minority groups and from 'less advantaged' backgrounds" (pp. 25-26).

There is no evidence to suggest that students of twenty, thirty, or forty years ago were any more knowledgeable or skilled. The strengths and weaknesses of today's students are essentially the same as those of their parents and grandparents.

Scores of Wisconsin's students on National Assessment Tests have been very positive:

· In 1994, Wisconsin's fourth graders ranked third among the 44 states and other jurisdictions which participated in the National Assessment of Educational Progress assessment in reading. The average score for Wisconsin's fourth graders was 225. Scores ranged from Maine's high of 229 to Guam's low of 183. The national average was 213.

· In 1992, Wisconsin's fourth grade students tied for second, while eighth graders tied for fourth on the NAEP mathematics assessment.

· In 1992, Wisconsin's fourth graders ranked sixth on the NAEP reading assessment.

· In 1990, Wisconsin's eighth grade students ranked sixth in the first state-by-state comparisons of mathematics performance on the National Assessment of Educational Progress.

2. SAT Scores

The Scholastic Aptitude Test (now called the Scholastic Assessment Test) was originally normed in 1941 on a population of 10,654 white males who primarily attended private eastern universities. The test measures student knowledge in two areas, verbal and mathematical, and is designed to predict academic success in college. Scores on the SAT are not reported as the number or percent of correct answers (there are 138 questions), but as a scale score, ranging from 400 to 1,600.

Bracey (1995) points out the following: "Ever since the 1970's, when the College Board sponsored a study of the decline in SAT scores, the minuscule annual score changes have been front-page, prime-time news. For the last three years, scores have been edging upward, with the 1995 gains the largest in a decade. When the scores were in decline, the New York Times and the Washington Post positioned the results on page 1. In 1993 and 1994, the New York Times buried the news of the upticks in scores deep in Section A, while the Post relegated the outcome to the Metro section, which contains news of local interest. This year, the Post continued its policy of placing the SAT results in the Metro Section, while on the morning of the release the New York Times ignored the story altogether. The Washington Times did put the story on page 1, but implied the gains occurred because the new SAT is easier" (p. 153).

During the period from approximately 1963 to 1975 there was a decline in aggregate SAT scores in the range of 60 to 90 scale points. Many argued that this decline was proof of a serious and significant deterioration in America's schools. In reality, this decrease of from 60 to 90 points on a 1,200 point scale represented a drop of approximately 5% in the number of questions answered correctly.

Furthermore, measurement experts who have investigated the drop in SAT scores have concluded that the most important reason for the decline was due to the fact that a greater number of students, especially those with weaker high school records, began to take the SAT. In short, beginning in the mid- 1960's, takers of the SAT became a less elite population of high school students. Thus, in recent years, more than one million students take the SAT annually. Compare this figure with the 10,654 who originally took the SAT in 1941.

Critics also fail to acknowledge that in recent years SAT scores have increased. In 1995, for example, SAT scores had their largest increase in a decade. This growth was largely ignored by the popular media.

Bracey makes an additional point: "So although critics have trumpeted the 'alarming' news that aggregate national SAT scores fell during the late 1960's and the early 1970's, this decline indicates nothing about the performance of American schools. Rather, it signals that students from a broader range of backgrounds were then getting interested in college, which should have been cause for celebration, not alarm" (Berliner and Biddle, p. 21).

Some critics now charge that the recent improvements in SAT scores are due to the fact that the test is easier. Representatives of SAT, however, maintain that the test has essentially the same difficulty level as in previous years. In fact, current scores (and those for 1996-97 when a new scale will be used) will still be "anchored" to the original 1941 performance levels. Thus, if one feels compelled to compare the performance of today's students with the original norming population of nearly sixty years ago, he or she will be able to do so.

SAT Scores in Wisconsin

Wisconsin's students have consistently outscored students throughout the nation on the SAT over the past two decades. However, a minority of Wisconsin's graduating seniors take the SAT. In 1995, about 9% of 12th grade students (4,998) took the SAT. As these figures are considered, keep in mind the important conclusion by Powell and Steelman (1996). In their study of state SAT scores, Powell and Steelman report that more than 80% of the variation in state SAT averages is attributable to the participation rate. That is, the fewer students tested in a state, the higher SAT scores tend to be.

SAT scores: Wisconsin and the nation, 1985-1995

Wisconsin Nation
Verbal Math Total Verbal Math Total
1975 492 544 1036 434 472 906
1980 472 533 1005 424 466 890
1985 478 536 1014 431 475 906
1990 466 514 980 422 474 896
1995 501 572 1073 428 482 910

3. ACT Scores:

Wisconsin has placed first or tied for first on the ACT (American College Test) for the past eleven years. Overall, the ACT is the predominant college admissions test in 28 states, including Wisconsin. Scores are reported on a scale, ranging from 1 to 36. Approximately two-thirds (64%, or 37,194) of Wisconsin's graduating seniors took the ACT in 1995.

ACT scores: Wisconsin and the nation,
1986, 1990 and 1995

Wisconsin Nation
1986 492 434
1990 472 424
1995 478 431

4. Wisconsin's High School Graduation Rate

Wisconsin's dropout rate has declined steadily over the past decade. In 1985 the annual dropout rate was 3.65%; in 1995 it declined to its lowest level ever--2.63%. (Note: A dropout rate of 2.63% means that 2.63% of the state's students in grades 9-12 dropped out of school during the school year. This percent represents approximately 6,800 students).

Percent of Wisconsin students who dropped out of school, 1985-1995

Year Percent
1985 3.65%
1986 3.49%
1987 3.24%
1988 3.30%
1989 3.11%
1990 3.13%
1991 3.26%
1992 3.00%
1993 3.15%
1994 2.93%
1995 2.63%

This means that at the current time about 87-88% of all 9th grade students graduate from high school "on time." Others graduate after their original class (a few return to school; others pass the GED).

National graduation rates are considerably lower, as shown in the table below.

National graduation rates
for selected years

Year Percent who graduate
1929-30 29%
1939-40 50%
1949-50 59%
1959-60 70%
1969-70 77%
1979-80 71%
1989-90 72%
1994-95 73%

Note: We were not able to obtain graduation rates for Wisconsin for earlier years. In addition, graduation rates of forty and fifty years ago, however calculated, are somewhat suspect simply because the "compulsory attendance laws" of this period were not enforced and/or did not require school attendance beyond the eighth grade.

5. The Wisconsin Student Assessment System

Beginning with the 1993-94 school year, the DPI's Wisconsin Student Assessment System (WSAS) has tested eighth and tenth grade students in language, reading, mathematics, science, social studies, and writing. These tests are known as the Knowledge And Concepts Examinations.

The latest assessment of students (October, 1995) included 30 multiple-choice questions in each of the areas listed above; also included were two writing samples and a survey of students' career interests and educational plans.

Student performance over the three years of testing has improved steadily. For example, the average Grand Composite Scores (calculated by adding the scores of each specific subtest) are 162 for both eighth and tenth grade students during the 1995-96 school year. The eighth and tenth grade average Grand Composite Scores in 1993-94 were 155 and 154, respectively.

Average Grand Composite Scores,
1993-94 to 1995-96*

Year Eighth Tenth
1993-94 155 154
1994-95 159 158
1995-96 162 162

Student performance on the state assessment tests is not the same for all subpopulations. For example, females outperform males, while among the various ethnic groups, white students have the highest levels of performance.

Average Grand Composite Scores
by ethnicity and gender, 1995-96

Eighth Tenth
All students 159 160
Native-American 134 135
Asian-American 146 152
African-American 119 122
Hispanic-American 134 138
White 163 163
Mixed ethnic 153 155
Females 161 162
Males 156 158

National Percentile Scores

National comparisons also are available for the Knowledge and Concepts Examinations. This makes it possible to compare the performance of students in Wisconsin with students throughout the country.

Except for writing, the 1995-96 national percentile scores for Wisconsin students were above the national averages on all of the Knowledge and Concepts Examinations. Performance in writing is mixed; tenth graders compare favorably with the national average, whereas eighth grade students score slightly below the national average.

Wisconsin's average percentile scores
for eighth and tenth grade students, 1995-96

National Percentile Scores*
Subject Eighth Tenth
Reading 59 66
Mathematics 72 73
Language 56 62
Science 65 68
Social Studies 64 64
Battery Total 70 74
Imaginative Writing 49 62
Expressive Writing 45 56

*The national average is the 50th percentile.

6. International Assessments

Some critics of American education often argue that the results of domestic assessments are no longer relevant because the United States is now part of a highly competitive, global economy. They call attention to the relatively poor performance of U.S. students on international assessments in mathematics and science. The same critics usually fail to mention that the performance of U.S. students in reading has been very favorable .

There are at least three problems associated with international assessments that need to be understood by anyone who uses or reports the results: (1) the selection of samples, (2) the practice of rank-ordering countries, and (3) the use of a single statistic to describe a country's quality of education.

The Selection of Samples

Rotberg (1990) alerts us to the fact that in many international assessments the performance of representative, national samples of U.S. students has been compared with elite populations of students in other countries. She also points out that in some of the assessments of 12th grade students, it was found that countries which test a greater percentage of their twelfth grade students have the weakest overall performance. Conversely, countries which tested smaller percentages did the best. For example, on an eighth-grade mathematics assessment, Japan was top-ranked, whereas Hong Kong was in the middle. By 12th grade, however, Hong Kong was top-ranked, and Japan was second. This apparent "decline" in the performance of Japanese students was a consequence of the difference in the size of the populations from which the samples were drawn. A much smaller and more select group of students was tested in Hong Kong, (only 3% of the students take mathematics in grade 12), compared with a much larger group of students in Japan.

There have been so many problems associated with testing senior high school level students that there have been no international assessments in mathematics and science at the secondary level since 1987 (Bracey, 1996, p. 5).

Other studies, comparing the performance of smaller groups of younger students also have created headlines about the poor relative performance of U.S. students. In a 1996 article written for Educational Researcher, Bracey is especially critical of the research by Stevenson, Stigler and others who have compared elementary U.S. students with students from China, Japan, and Taiwan. In general, these studies suggest that the best U. S. elementary school students would be only average students in Japan or China.

Bracey offers the following comments about these small scale studies:

"The various articles (studies) do not reveal how the schools were selected or how representative they are. It would be naive in the extreme to believe that a nation as closed, a nation as obsessed with its public image as The People's Republic of China . . . would give an American researcher free access to a random sample of schools" (p. 7).

Furthermore, "over 20% of the Chicago children did not speak English at home. The Chicago sample was thus not a representative sample of the United States, nor was it comparable to the Beijing sample on many important demographic variables. The Chicago sample is heavily weighted with variables associated with low achievement" (p. 8).

Rank-ordering of Countries

In addition to problems associated with sampling, the results themselves are frequently misunderstood. Whenever results are reported by the media, average scores for an entire country are reduced to a single statistic--a rank among all countries. Average scores for participating countries tend to be closely bunched, but when countries are ranked from top to bottom, the small differences in scores tend to become large differences in ranks. For example, if the scores of U.S. nine- and thirteen-year-olds on the 1992 Second International Assessment of Educational Progress had been only slightly different, their ranks would have varied considerably. "If U.S. 13-year-olds had scored 72% correct in science, instead of 67, they would have finished 5th rather than 13th. Similarly, if the third-ranked 9-year-olds had scored 60 instead of 65, they would have finished 12th. Most countries score close together such that small differences in scores make large differences in ranks" (Bracey, 1996, p. 6).

The Use of a Single Statistic

Use of a single score (a ranking) to summarize the entire U.S. system of education is simplistic and ignores the variation which exists among the fifty states, as well as the differences found among school systems within each state.

This is especially critical in a country such as the United States which is extremely diverse and has great variation in the quality of its public schools. For example, in the 1992 international assessment of mathematics, U.S. 13-year-olds ranked 13th among 15 nations. However, if other reporting categories are used, a far different picture emerges. In this instance, Asian-American students scored the highest on this assessment, while students from Iowa and North Dakota tied with Korea for third.

Asian students, U.S. Schools (287)
Taiwan (285)
Korea, Iowa, North Dakota (283)
Advantaged urban students, U.S. (283)
White students, U.S. schools (277)
Hungary, Wisconsin (277)

In contrast, the lowest ranked categories were as follows:

Jordan (246)
Mississippi (246)
Hispanic students, U.S. schools (245)
Disadvantaged urban students, U.S. (239)
Black students, U.S. (236)
District of Columbia (234)

7. The Sandia Report

In February, 1990, at the request of the Bush Administration, the Strategic Studies Center at the Sandia National Laboratory in New Mexico began a comprehensive review of the effectiveness of K-12 education in the United States. The request was apparently made in the belief that the Laboratory would find a system of failing K-12 schools, thus providing a rationale for a national school voucher system.

The researchers at Sandia gave a positive evaluation of U.S. public education in April, 1992: "Our most detailed analyses to date have focused on popular measures used to discuss the status of education in America. We looked at data over time to put performance of the current system in proper perspective. To our surprise, on nearly every measure we found steady or slightly improving trends" (Carson, Huelskamp, and Woodall, p.259).

Conclusions

The general conclusions of the Sandia Report were as follows:

  • Educational data are generally incomplete and sometimes inaccurate.
  • The data that are available indicate serious problems in American education. However, they do not support popular headlines nor indicate system-wide failure. The educational system has never performed better.
  • The evidence of decline used to justify system-wide reform is based on misinterpretations or misrepresentations of the data.
  • Based on these conclusions, we believe that the national debate is not focused on the most pressing problems.

Reforms

The Sandia Report also identified the serious educational problems facing the nation, concluding that "these challenges do not call for a system-wide revolution." Among the suggested reforms were the following:

  • Improving the performance of disadvantaged students.
  • Meeting the educational and training needs of immigrants.
  • Upgrading the quality of educational data available to policymakers.
  • Improving the status of K-12 educators.

Barriers to Improvement

Finally, the Sandia Report identified the impediments to educational improvement:

  • The crisis rhetoric which hinders reform by claiming system-wide failure.
  • The misuse of simplistic measures with dubious value, e.g., the use of declining average SAT scores or unfavorable international comparisons.
  • The preoccupation with the link to economic competitiveness.
  • The excessive focus on projected shortfalls in technical expertise distracts our attention from meeting the basic reforms identified above.