Playful Assessment

Media Lab Main Room 2

Returning to the general maker topic, when you bring that whole concept into school, how do you assess the work students do for a project? Because we know that anything done in the classroom must be assessed.

That’s one of the questions researchers from the Massachusetts Institute of Technology’s Playful Journey Lab1 wanted to answer.

Advocates of maker education have a lot of student success stories to share but not a lot of data. Measurable results could help convince cautious administrators and skeptical parents that kids should spend more time on open-ended, creative pursuits rather than reading more books or memorizing the formulas and facts that burnish grade-point averages and standardized test scores. Plus, evidence-based assessments could improve the overall quality of project-based learning by helping educators tailor projects to specific skills and vet a lesson’s overall effectiveness.

In order to address that lack of measurable results, researchers created what they call “playful assessment” tools and worked with a few teachers in two different schools to see how they might work.

The term describes gamelike measures of knowledge and abilities, and also the tracking of skill development in playful learning activities, which was piloted over the past year by middle-school teachers at Corte Madera and the Community Public Charter School in Charlottesville, also known as Community Middle. The goal is to blend mini-evaluations into learning activities, collecting evidence about student choices and behaviors throughout the process, rather than focusing on just the final result.

According to the writer of this article, the tools were largely successful at one school but not so much at the other. The reason was not the difference in students or teachers, but in the overall cultures of the schools.

MIT’s assessment tools were a great fit at Community Middle, which is an experimental lab school and already steeped in interdisciplinary, project-based learning. But most schools are more like Corte Madera — governed by schedules, academic standards, report cards and other ties to traditional measures of student achievement — and there, the pilot was a mix of triumph and struggle.

Plus lots of pushback from parents who believed teachers were abandoning instruction in the traditional areas of reading and writing.

However, nothing in this story is surprising. We hear educators and political leaders talk about transforming school using the maker concept, along with its cousins STEM/STEAM, coding, PBL, and others, but few are willing to make the necessary changes to the traditional structure.

Maker in most schools is usually done in a “space” – outside both the classroom and “regular” work. Students work on maker projects during lunch, participate in pull-out programs, are given the time as a reward for completing their academic tasks early, or drop in after school hours.

All of those “open ended, creative pursuits” are not included in the standard curriculum, are not officially assessed (playfully or otherwise), and are not an integral part of the school culture. Reading more books, memorizing formulas, and passing tests are still the most important part of students’ time during their work day.

But I’m just not sure our society really wants an educational system built around playful assessments. Where classrooms look very different from what we saw. Spaces where students have some autonomy to work on projects of their own choosing. And learning cannot be described using those “traditional measures of student achievement”.

The kids may be ready, but most adults, including their teachers and parents, are not.


The picture shows the main room at the MIT Media Lab when I visited about ten years ago. For me, that’s what a classroom should look like: lots of open space with flexible work areas and plenty of toys. Especially for high schools, most of which still expect students to sit still and listen for anywhere from 45 to 90 minutes at a stretch.

1. Let’s face it, MIT has the coolest names in all of academia. By far!

Statistical Crap

Speaking of data (as in the previous post), the members of the American Statistical Association (ASA) probably know something about that topic. And they recently released a statement about the Value-Added model (VAM) of teacher evaluation, a very popular reform among those top-level education experts we all love and respect.1

In theory, VAM is supposed to measure how much value a teacher has added to the learning of their students, using standardized test scores and some complex mathematics that is supposed to exclude other “non-relevant” factors from the final numbers. Many districts and states (including Virginia, to a small degree) are using or planning to use some variation of VAM as a teacher evaluation tool and to determine continued employment, pay raises, tenure, even whether to close schools.

The ASA statement is, as you might expect, very academic in it’s assessment of VAM but they are still quite clear in their conclusions that this system is… how shall we put this? – statistical crap (my very non-academic interpretation of their 7 page report).

A few very relevant statements from the executive summary.

VAMs are complex statistical models, and high-level statistical expertise is needed to develop the models and interpret their results.

Expertise that is quite lacking in most schools, not to mention in pronouncements from supporters of this concept.2

Estimates from VAMs should always be accompanied by measures of precision and a discussion of the assumptions and possible limitations of the model. These limitations are particularly relevant if VAMs are used for high-stakes purposes.

Too many advocates of VAM consider the numbers as fact, not “estimates”, and are not open to any “possible limitations”.

And my favorites,

VAMs typically measure correlation, not causation: Effects — positive or negative — attributed to a teacher may actually be caused by other factors that are not captured in the model.

Under some conditions, VAM scores and rankings can change substantially when a different model or test is used, and a thorough analysis should be undertaken to evaluate the sensitivity of estimates to different models.

Most VAM studies find that teachers account for about 1% to 14% of the variability in test scores, and that the majority of opportunities for quality improvement are found in the system-level conditions. [my emphasis]

In other words, teacher quality, while important, is only one factor to consider in the very complex process of student learning, and the far-less-than-perfect method of assessing that learning, standardized test scores.

That “majority of opportunities for quality improvement” will only come from making systemic changes to educational policies at the district, state, and national levels.

By All Means, Argue With It!

Tomorrow here in the overly-large school district, we will be electing a new school board, and since, more than half of the incumbents are not running again, it really will be new. Maybe.

One block of candidates is basically running on a platform that begins with an assumption that the system is doing a good job, only requiring a few tweaks, with one even interviewing that “you can’t argue with success”.

However, what if that “success” is based on faulty or outdated measures?

Of course one of the primary evaluations for our schools (and pretty much every other school in this country) are scores on the variety of tests students take every year, from the state SOLs to AP/IB to whatever. But are those many tests really valid assessments of student learning, especially the skills they will need in their life after our schools? It’s a question that needs to be addressed more often, here and elsewhere.

Our district also likes to boast that something like 95% of our graduates go on to “post secondary” programs. But how well prepared are they to succeed in those programs? While that 95% number is found in many places on the website and other publications (including places like the Chamber of Commerce and real estate brochures), any follow-up information on alumni is sparse to nonexistent. I wonder if anyone even tries to collect it.

And then most high schools also like to trumpet their numbers on meaningless lists like the Washington Post’s “challenge” index, one of the most superficial measures of high school quality every invented. Oh, but it does make for good headlines.

So, not only is it possible to argue with our district’s past successes, more people running the show, as well as those who want to, should be challenging many aspects of what we do as a school system.

Instead of spending lots of valuable time tossing around all the trivial, cliched crap that usually passes for serious discussion of education issues these days.

Failing to Make The Connection

I’m not sure there is one but Valerie Strauss at the Washington Post’s Answer Sheet blog, is seeking a connection between mobile technology and school reform.

It’s a tenuous link at best but no more far fetched than one at the foundation of national education policy connecting college attendance for the vast majority of students and the country’s economic health.

William J. Mathis, managing director of the nonprofit National Eduction Policy Center at the University of Colorado at Boulder’s School of Education, wrote recently on this blog that 70 percent of U.S. jobs require only on-the-job training, 10 percent require technical training, and 20 percent require a college education.

He wrote further that while the Obama administration insists that future jobs will require much higher and universal skills, the Washington-based Brookings Institution says that the country’s job structure profile is likely not to change much in the near future, and the proportion of middle skill jobs (plumbers, electricians, health care, police officers, etc.) will remain robust.

Then there’s the larger misconception that education quality (in the form of international standardized test scores) is directly connected to America’s economic success.

America’s reclaimed dominance in mobile technology — and its ability to economically compete — don’t have much to do with international tests, or, for that matter, school reform that is obsessed with measuring schools, students and teachers on standardized tests that weren’t designed for such assessment.

It’s time that our leaders stop saying otherwise.

Unfortunately, it doesn’t seem as if that’s going to happen any time soon.

 

Aiming For a Higher Level

In his Monday morning Post education column, Jay Mathews relates the story of a disagreement between a teacher and his principal over the issue of student cheating.

The teacher, an instructor of AP US History in DC, during his evaluation conference explained the steps he took to discourage copying during tests, which included creating multiple versions of the exam and printing the pages in a smaller font.

His principal was not especially impressed.

“You are creating an expectation that students will cheat,” Martel [the teacher] recalls Cahall [the principal] saying. “By creating that expectation, they will rise to your expectation.”

When I asked Cahall about it, he did not deny that he said it. His intention, he said, was not to prohibit Martel’s methods but to urge him to consider another perspective.

“I am not opposed to multiple versions of a test or quiz; it is standard operating procedure for every type of testing program,” the principal said in an e-mail to me. “Instead, I would prefer that teachers use more rigorous assessments when possible, that require written responses and higher levels of thinking. In addition to being more challenging and requiring a sophisticated skill set, these types of assessments are also more difficult for students to copy.”

Mathews sides with the teacher in the dispute since “questioning a teacher’s approach to cheating may be going too far”.

Especially when dealing with an AP classroom, since, of course, that program is the golden salvation of high school education.

However, in this case the principal makes the better point.

We should be asking more of students than just copying back material they’ve been given or making rudimentary connections between the facts, stuff that’s easy to rip off without detection since it doesn’t ask for any value-add from the individual.

In the larger context, we should consider that if a test, or any other assignment, is easy to cheat on, it’s likely a poor or invalid assessment of their learning.