David Catania. Photo from the DC Council website.

DC Council Education Committee Chair David Catania has alleged that testing officials inflated the percentage of students reported as “proficient” on standardized tests given earlier this year.

Officials say they were just trying to ensure this year’s scores could be compared with those from previous years. But according to multiple sources, the real story has to do with inappropriate questions from DC’s testing vendor.

In recent years DC schools have begun teaching more rigorous content aligned with the Common Core standards, and this past school year students took a revised version of the DC CAS designed to test that content. Because the test had changed, some OSSE officials were working with the District’s testing vendor, CTB McGraw Hill, to change the grading scale. The new scale would have used different minimum scores, or “cut scores,” to define levels like “proficient” and “advanced.”

In June of this year, responsibility for testing was transferred from the Director of Assessments to Director of Data Management Jeff Noel. Noel says he was surprised to learn that testing officials had expected to implement the new grading scale this year.

Using the new scale would have made it impossible to compare this year’s proficiency rates to the levels reported in previous years, a fact no one outside OSSE had been made aware of. Only 6 days after taking over responsibility for testing, Noel decided to switch back to the old grading scale, with the support of others at OSSE.

Catania alleged in a hearing last Thursday that OSSE chose to switch to the old grading scale at the last minute in order to ensure gains in both reading and math. If OSSE had used the new grading scale, with its new cut scores, math scores would have been lower, by 3.6 points, and reading scores would have been higher, by 6.6 points.

These scores would not have been comparable to previous scores, since the grading scales were different. But observers might have missed that point. When other states, like New York, adopted new Common Core-aligned grading scales, they saw dramatic drops in scores. These states made it clear that comparisons to previous years would not be possible, but the decline in scores led to a public outcry nonetheless.

Test vendor’s question

At the hearing Catania accused OSSE officials of manipulating the grading scale to produce gains in both reading and math scores—gains that Mayor Vincent Gray declared “historic” when they were released in July.

But two people involved in the process told Greater Greater Education that Noel was also concerned about the test vendor’s approach to setting new cut scores.  Those individuals said that at a June 17 meeting, a CTB executive asked OSSE officials: “What growth do you think makes sense for the state?”

In addition, CTB gave OSSE a form to guide the cut score process that allowed officials to explicitly indicate where the scores would be expected to end up.  Choosing lower cut scores would have allowed them to report greater improvement in proficiency rates.

DC CAS Reflection Form provided by CTB to OSSE

The two individuals, who asked to remain anonymous, said that the decision to return to the old cut scores was partly motivated by concerns about CTB’s process and a desire to distance OSSE from it.  CTB spokesperson Brian Belardi said CTB McGraw Hill has no comment.

Catania also made another allegation, with some justification: OSSE didn’t reveal that even under the grading scale it ended up using, this year’s scores are not as comparable to prior years’ scores as they have been in the past. In most years the content covered by the test is the same as in previous years, but between 2012 and 2013 some of the content changed. Emily Durso, interim State Superintendent of Education, said that OSSE’s failure to mention this qualification was simply an oversight.

While the true motives of OSSE officials in switching back to the old grading scale may be different than those Catania alleged, they are no less concerning. 

Catania told the Washington Post that this controversy has him questioning whether so many high-stakes decisions should be made based on scores involving so much “subjectivity.” Many advocates have been saying as much for years.

Others have argued that the problem isn’t high-stakes testing generally, but rather that fundamental changes are necessary to restore faith in the testing system.

OSSE needs more independence

The first of these changes would be to give OSSE independence from the Mayor, similar to the autonomy conferred on DC’s Chief Financial Officer. In fact, Catania has proposed legislation providing that the Mayor could dismiss the State Superintendent only for cause, like the CFO.

Mayor Gray is the only head of a public school system who also hires and fires the state superintendent in charge of testing and auditing the schools. Some observers feel that the Gray administration must have pressured OSSE to switch to the old grading system, although there’s no hard evidence to support that conclusion.

Even if no such pressure was applied, the testing vendor’s apparent willingness to base scoring decisions on expected improvements in proficiency rates creates a temptation that must be isolated from political officials.

The CAS controversy also demands a fresh look at the testing measures we rely on.  It’s precisely because proficiency metrics are so subjective that they are unreliable and open to political manipulation.

Measure growth, not “proficiency”

Instead of focusing on the “percent proficiency” metrics that are at the heart of this controversy, we need to use measures of growth—the ability of a classroom teacher to increase students’ educational attainment.

When test results are based on a proficiency cut score, they indicate the percentage of test-takers who scored higher than that minimum. The advantage of this approach is that it gives the public a sense of what an acceptable score is. But it tends to magnify small changes and reveals little about changes in scores that are either well above or well below “proficiency.”

Tests that are scored for “growth,” on the other hand, use averages based on all participants’ scores and compare them to previous years’ averages. This method allows all changes throughout the test-taking group to be reflected in the final results. It’s also objective: the calculation doesn’t require making any year-to-year judgment calls about how to interpret the results.

Measures of growth would also highlight varying changes at different skill levels. Teachers would have an incentive to raise all scores, not simply to push students who are slightly below proficiency to being slightly above.

OSSE actually does report a metric of growth, or value added, known as MGP. But, with the notable exception of the IMPACT system for assessing teachers, few evaluations are based on such scores. Principals, for example, are assessed on their ability to increase percent-proficient numbers, even though such numbers can be impacted more by demographic changes or students transferring between schools than instructional quality. And DCPS, OSSE, and the Public Charter School Board continue to present “percent proficiency” numbers most prominently in school profiles, leading parents to believe these are the best indicators of school quality.

Do you think the credibility of the testing regime in DC can be restored through these changes? Or do you think the problem is high-stakes testing itself, and that test scores should figure less prominently in school decisions?

Ken Archer is CTO of a software firm in Tysons Corner. He commutes to Tysons by bus from his home in Georgetown, where he lives with his wife and son.  Ken completed a Masters degree in Philosophy from The Catholic University of America.

Rahul Sinha was born in DC and grew up nearby in Bethesda. He now lives in Kalorama Triangle. He has a Masters of Public Policy from the University of Maryland and moonlights as a macroeconomist.