Statistical Crap

Speaking of data (as in the previous post), the members of the American Statistical Association (ASA) probably know something about that topic. And they recently released a statement about the Value-Added model (VAM) of teacher evaluation, a very popular reform among those top-level education experts we all love and respect.1

In theory, VAM is supposed to measure how much value a teacher has added to the learning of their students, using standardized test scores and some complex mathematics that is supposed to exclude other “non-relevant” factors from the final numbers. Many districts and states (including Virginia, to a small degree) are using or planning to use some variation of VAM as a teacher evaluation tool and to determine continued employment, pay raises, tenure, even whether to close schools.

The ASA statement is, as you might expect, very academic in it’s assessment of VAM but they are still quite clear in their conclusions that this system is… how shall we put this? – statistical crap (my very non-academic interpretation of their 7 page report).

A few very relevant statements from the executive summary.

VAMs are complex statistical models, and high-level statistical expertise is needed to develop the models and interpret their results.

Expertise that is quite lacking in most schools, not to mention in pronouncements from supporters of this concept.2

Estimates from VAMs should always be accompanied by measures of precision and a discussion of the assumptions and possible limitations of the model. These limitations are particularly relevant if VAMs are used for high-stakes purposes.

Too many advocates of VAM consider the numbers as fact, not “estimates”, and are not open to any “possible limitations”.

And my favorites,

VAMs typically measure correlation, not causation: Effects — positive or negative — attributed to a teacher may actually be caused by other factors that are not captured in the model.

Under some conditions, VAM scores and rankings can change substantially when a different model or test is used, and a thorough analysis should be undertaken to evaluate the sensitivity of estimates to different models.

Most VAM studies find that teachers account for about 1% to 14% of the variability in test scores, and that the majority of opportunities for quality improvement are found in the system-level conditions. [my emphasis]

In other words, teacher quality, while important, is only one factor to consider in the very complex process of student learning, and the far-less-than-perfect method of assessing that learning, standardized test scores.

That “majority of opportunities for quality improvement” will only come from making systemic changes to educational policies at the district, state, and national levels.

By All Means, Argue With It!

Tomorrow here in the overly-large school district, we will be electing a new school board, and since, more than half of the incumbents are not running again, it really will be new. Maybe.

One block of candidates is basically running on a platform that begins with an assumption that the system is doing a good job, only requiring a few tweaks, with one even interviewing that “you can’t argue with success”.

However, what if that “success” is based on faulty or outdated measures?

Of course one of the primary evaluations for our schools (and pretty much every other school in this country) are scores on the variety of tests students take every year, from the state SOLs to AP/IB to whatever. But are those many tests really valid assessments of student learning, especially the skills they will need in their life after our schools? It’s a question that needs to be addressed more often, here and elsewhere.

Our district also likes to boast that something like 95% of our graduates go on to “post secondary” programs. But how well prepared are they to succeed in those programs? While that 95% number is found in many places on the website and other publications (including places like the Chamber of Commerce and real estate brochures), any follow-up information on alumni is sparse to nonexistent. I wonder if anyone even tries to collect it.

And then most high schools also like to trumpet their numbers on meaningless lists like the Washington Post’s “challenge” index, one of the most superficial measures of high school quality every invented. Oh, but it does make for good headlines.

So, not only is it possible to argue with our district’s past successes, more people running the show, as well as those who want to, should be challenging many aspects of what we do as a school system.

Instead of spending lots of valuable time tossing around all the trivial, cliched crap that usually passes for serious discussion of education issues these days.

Failing to Make The Connection

I’m not sure there is one but Valerie Strauss at the Washington Post’s Answer Sheet blog, is seeking a connection between mobile technology and school reform.

It’s a tenuous link at best but no more far fetched than one at the foundation of national education policy connecting college attendance for the vast majority of students and the country’s economic health.

William J. Mathis, managing director of the nonprofit National Eduction Policy Center at the University of Colorado at Boulder’s School of Education, wrote recently on this blog that 70 percent of U.S. jobs require only on-the-job training, 10 percent require technical training, and 20 percent require a college education.

He wrote further that while the Obama administration insists that future jobs will require much higher and universal skills, the Washington-based Brookings Institution says that the country’s job structure profile is likely not to change much in the near future, and the proportion of middle skill jobs (plumbers, electricians, health care, police officers, etc.) will remain robust.

Then there’s the larger misconception that education quality (in the form of international standardized test scores) is directly connected to America’s economic success.

America’s reclaimed dominance in mobile technology — and its ability to economically compete — don’t have much to do with international tests, or, for that matter, school reform that is obsessed with measuring schools, students and teachers on standardized tests that weren’t designed for such assessment.

It’s time that our leaders stop saying otherwise.

Unfortunately, it doesn’t seem as if that’s going to happen any time soon.


Aiming For a Higher Level

In his Monday morning Post education column, Jay Mathews relates the story of a disagreement between a teacher and his principal over the issue of student cheating.

The teacher, an instructor of AP US History in DC, during his evaluation conference explained the steps he took to discourage copying during tests, which included creating multiple versions of the exam and printing the pages in a smaller font.

His principal was not especially impressed.

“You are creating an expectation that students will cheat,” Martel [the teacher] recalls Cahall [the principal] saying. “By creating that expectation, they will rise to your expectation.”

When I asked Cahall about it, he did not deny that he said it. His intention, he said, was not to prohibit Martel’s methods but to urge him to consider another perspective.

“I am not opposed to multiple versions of a test or quiz; it is standard operating procedure for every type of testing program,” the principal said in an e-mail to me. “Instead, I would prefer that teachers use more rigorous assessments when possible, that require written responses and higher levels of thinking. In addition to being more challenging and requiring a sophisticated skill set, these types of assessments are also more difficult for students to copy.”

Mathews sides with the teacher in the dispute since “questioning a teacher’s approach to cheating may be going too far”.

Especially when dealing with an AP classroom, since, of course, that program is the golden salvation of high school education.

However, in this case the principal makes the better point.

We should be asking more of students than just copying back material they’ve been given or making rudimentary connections between the facts, stuff that’s easy to rip off without detection since it doesn’t ask for any value-add from the individual.

In the larger context, we should consider that if a test, or any other assignment, is easy to cheat on, it’s likely a poor or invalid assessment of their learning.

For Better Test Scores, Use Better Tests

I guess the problem with international assessments, the ones that show US kids doing poorly compared to their peers in other countries, is that we’re using the wrong ones.

At least according to the expert Jay Mathews interviewed.

He says the PISA (Programme for International Student Assessment (PISA)) is a bad one because it doesn’t fit “the way U.S. students are taught” and aligns to the “losing” side in the debate over how to teach math.

Which is to say that the designers of the PISA expect that schools are using curriculums that “make math instruction more relevant to the real world, and emphasize mathematical reasoning more than calculation”.

How dreadful to expect that kids should actually be able to understand and apply math concepts!

PISA includes questions like this one:

For a rock concert a rectangular field of size 100 m by 50 m was reserved for the audience. The concert was completely sold out and the field was full with all the fans standing. Which one of the following is likely to be the best estimate of the total number of people attending the concert?

A. 2000
B. 5000
C. 20000
D. 50000
E. 100000

Mathews thinks this is a bad question because it involves too many variables, such as the fact that “some people don’t like to get close to at concerts”.

Certainly it’s a lousy item when students are expected to locate the one and only “right” answer, according to the test writers who have been programmed to create just the right kinds of distracters (been there, done that :-).

However, it’s an excellent question when you want them to consider all those different, and sometimes messy, factors that clutter up problems here in the real world.

And if we also expect students to justify their answer, explain the logic they used to arrive at it, and use that interpretation as part of their assessment.

So, is the problem that we’re giving kids the wrong test?

Or that we’re not teaching to the right assessment?