Test score data has appropriate uses and limitations

Steve Glazerman recently criticized a report by the Economic Policy Institute (EPI), saying EPI’s report misused NAEP data in a practice he dubbed “misnaepery.” Elaine Weiss of EPI sent us this response.

Steve Glazerman is right that, for specific purposes, longitudinal data would be an ideal tool to assess student growth in response to policy changes. That would be true, however, only if the data existed to conduct the study at hand, and they were well-constructed, controlled for within-school policy choices that lead to bias, and employed a scale to reflect appropriate levels of student growth from one year to the next. As the vast majority of scholars, who rely on cross-sectional data, will attest, such datasets are extremely rare.

Moreover, scholars using longitudinal data would want to be sure of their validity — that they are really measuring what they claim to measure. In this case, the DC Comprehensive Assessment System (DC-CAS) is seriously deficient, due largely to the high stakes that have been attached to them through the IMPACT teacher evaluation system.

As John Merrow recently reminded us, at Noyes Elementary School, which was awarded Blue Ribbon status, test scores rose from 44% proficient in reading in 2007 to an astounding 84% proficient in 2009. The incoming principal’s concern that her students were very poor readers, and could not have made such progress, led her to institute strict security measures for the 2010 test. Reading proficiency fell far below 2007 levels, to 32 percent proficient. Determining how DCPS policies and teachers have affected student learning will be very difficult given these problems.

With respect to the report from the Broader, Bolder Approach to Education that Glazerman criticizes, however, no longitudinal data are available to answer the important questions we sought to explore. Our report assesses how much benefit market-oriented reforms, like those instituted in Washington, DC, New York City, and Chicago, provide to students and schools, compared with different policies in comparable districts.

Reform leaders advance these policies using state test score data to prove how much students have gained. The only possible way to assess the validity of these claims, then, is to compare their reported successes with reliable data from the National Assessment of Educational Progress, NAEP. Indeed, this is what respected scholars, from government to academic institutions to the private sector have long done; NAEP data are considered far superior to state test score data, for several reasons.

First, NAEP tends to assess more complex skills and in-depth knowledge than most state tests. This means that it is harder to “teach to the test;” scores on NAEP are more likely to represent actual student knowledge and ability with respect to the full 4^th or 8^th grade curriculum, rather than the ability to take tests. (Indeed, test-prep strategies are widely used by teachers as a key means of raising DC CAS scores.)

Second, NAEP is consistent over time. While the specific questions vary from year to year, the level of material a student must master to achieve a given score does not. This is why scholars rely on NAEP to gauge student progress over time, whether at the national, state or urban district level, or among subgroups of students.

State test scores promise no such consistency. Indeed, states often simplify content and/or shift “proficiency” standards to suggest progress when none has been made. A study by the Consortium for Chicago School Research made this point explicitly. Comparing changes in NAEP scores to those in the Illinois Standard Achievement Test, it found that reported improvements were the result of test preparation and changes in content, not actual learning:

Many of the findings in this report contradict trends that appear in publicly reported data [public reports of large math and reading gains contradict much smaller ones]. … The discrepancies are due to myriad issues with publicly reported data—including changes in test content and scoring—that make year-over-year comparisons nearly impossible without complex statistical analyses, such as those undertaken for this report. This leads to another key message in this report: The publicly reported statistics used to hold schools and districts accountable for making academic progress are not accurate measures of progress.

Finally, and perhaps most importantly, NAEP scores are much less subject to the “gaming” that is increasingly used to manipulate state test scores. Because they have been attached to high stakes at both the individual teacher and school level, incentives to manipulate state test scores — by narrowing the curriculum, teaching to the test and, apparently, strategic suspensions and test score erasures — are widespread.

As such, we can derive little, and increasingly less, understanding from these scores, especially when taken in a longitudinal context — i.e., as the stakes have increased, we can expect that ways of boosting test scores have increased with them. The fact that NAEP scores are sampled, such that a different, anonymous group of students takes the test every two years means that there are no stakes attached, and no incentives for gaming.

Test scores reflect every aspect of a child’s life — home and community much more than school — but tell us only a sliver of what he or she knows or is capable of. They do not tell us anything about a student’s physical or emotional health, or about school environment or leadership.

That is why we go far beyond NAEP scores in our report. We assess the impact of market-oriented reforms on teacher recruitment and turnover and look at the systemic impacts of increased charter school access and of school closures. To the degree possible, we report on changes in teacher, student and parent morale, and on other aspects of daily reality that affect student well-being and achievement.

It is not clear what Mr. Glazerman would have had us do. There are no longitudinal data that we could have used. Presumably, then, he would prefer to allow the false claims made by reformers stand, and let parents, students, and teachers be misled, than to use NAEP for one of the purposes for which it was intended.