Tuesday, October 6, 2015

A response regarding an assay

I responded to a colleague's email this morning and thought it worthy of a post:

Hi Ted--

Our accountability tests are necessarily uni-dimensional. That is, in order to generate consistent results over time we pick a single "construct," find "average" within the target population on that construct, and then measure out from that point to determine how far above or below average a student is (of course we don't use those terms, replacing them with numbers that sound less judgmental). That construct has to be uni-dimensional or "average" as the basis of consistent scores or the scores won't be stable over time. Thus a standardized test could be viewed as a sort of assay since it attempts to detect the degree to which a "thing" varies within a population.

However, a standardized test--contrary to much popular belief--is only designed to measure the distance from average for a tested subject, which means it can never and has never provided an indicator regarding the amount of actual learning or achievement that took place. It is not a graduated cylinder that measures volumes, but rather, a narrow measuring device that estimates relative distances from a stable central point. Those distances identify differences in a population without ever identifying what is actually in those distances (if only I could help policy makers understand this).

As far as the notion of a single target entity such as a standardized test being the point of an assay, the idea is spot on.

As to "analysis" you point out its most relevant meaning and how far gone it is in today's schools. This occurs exactly as you point out when schools fail to realize that they aren't paying attention to all the elements of a student's learning in both quantity and quality. In fact, when it comes to both quantity and quality our accountability tests are absolutely silent, by design, which then leads to the massive inefficiencies that plague our systems today when we treat tests as saying something light years beyond their design. (It is nearly as bad as if we'd asked a thermometer to tell time by dumping out the mercury and trying to read it like tea leaves.)

The assay effect (a fine essay title) is greatly exacerbated by uni-dimensional tests that schools and policy makers pretend have something to do with what was actually learned. This misunderstanding hurts the opportunity for actual analysis by replacing it with an assay and then pretending analysis is actually occurring. That leaves schools thinking that they have an analytical environment that can support learning, when very little of that sort of thing actually exists given the lack of understanding regarding the choice of instrumentation.

Wednesday, August 19, 2015

Gene Glass Resigns

A few days ago Gene Glass, one of the greats in the field of measurement, announced that he would no longer offer his intellect or his talents to the measurement community. It's worth reading why.

Thursday, February 12, 2015

The prevailing wind that the only thing worse than having standardized tests is not having them

I ran into an Ed Week editorial this morning (click here) which again exemplifies how few people actually know what a standardized test is, what it was designed to do, and as a result what the limits are regarding possible interpretations. The editorial starts this way: "What's worse than annual standardized testing? Not having it at all."

In response I posted the following:

What would truly be helpful is for folks to understand what a standardized test is, what it was designed to do, and what the limits are to the interpretive range of a test score. That understanding would quash most of the argument here.

A standardized test is designed to offer a rank ordering of students that can then be used for comparative purposes. In order to provide a consistent rank ordering the design of the test requires that we sacrifice any ability to answer questions regarding why a student lands at a particular score.

Thus a standardized test score, by design, is incapable of offering any information as to what caused it to come to be. Good teaching? A high or low SES? A rock solid curriculum? Quality of any kind? Or a lack thereof? The test scores from any of the standardized tests offered by any of the fifty states and used for accountability are designed to shrug when such questions are asked, point to their offer of a comparative lens, and be forever silent on everything else.

And lest we split hairs here every state utilizes the same test-building methodologies used in the standardized tests prior to state standards, regardless of whether they name it a criterion-referenced exam or a standards-based  exam or whatever. Every state offers a technical manual on their instruments and the statistical underpinnings by which they are judged are based upon the methodology we use to produce a rank ordering, which by design renders them useless for any other purpose beyond presenting a comparative lens.

Pretending that such tests can transmogrify into something that was never a part of their design is just plain dangerous. For as long as I can remember no one seems to much care about that fact and thinks these things can say anything and everything we might ever want them to say. In the meantime, large numbers of policy makers and even a good number of educators continue to be blind as to how we might actually make things better, thinking they have answers when in fact they don't.

Friday, January 30, 2015

In response to a reporter about stagnant test scores in Texas...

The big questions to me are at the systems level. We intended to build a system that would foment excellence and now everyone is scratching their heads and wondering why we don't seem to be getting there. Implicit in that is the notion that our tests and our standards are themselves systems capable of fomenting excellence and so the problem must be elsewhere. Perhaps instruction? Teacher quality? etc. I noticed that the commissioner blames teaching in one of your articles, but I don't buy that.

What is missing is an honest conversation about what the tools we call tests and standards were actually designed to do. Were they designed to foment excellence or were they designed for some other purpose? And if they were designed for some other purpose and we adopt them and don't see the excellence we want, wouldn't it behoove us to lift the hood and see if somehow we have a mismatch?

What you've identified in your work is evidence of the mismatch. Our standards consist of hundreds of behavioral statements whose design is to control and constrain, and our tests are designed to give us an empirical reflection of the status quo so we can see where we fit. Slavish adherence to those two things risks constraining the system of education and repeating the status quo from one year to the next, rather than moving ahead of it. What your analysis suggests is that the system as it is constituted is performing perfectly against its design. That's why the gaps continue and why so little seems to change from one year to the next.

I don't want to stick my neck out too far on suggesting why the whole system seems so stagnant compared to previous iterations of the standards and exams, but one major policy change is likely contributing: the inclusion of test scores in teacher evaluations. Teachers who are told that the test is the basis for their evaluation are much more likely to limit their teaching to the test, making it much more likely that they will merely repeat the status quo from one year to the next. I'd be interested to see the correlation between tying test scores to teacher evaluation to see if that hypothesis holds, but I'll bet it does.

Bottom line is that if excellence is the goal we need a profoundly different system than the one in which we work. My work shows that excellence was never even included in the present system and yet our policies insist that teachers adhere slavishly to the present system. The contradictions in that are huge: teachers are told to foment excellence and adhere to the system, but you can't do both.

Here's where that leaves us: the more we hold teacher's feet to the fire for test scores, the more likely we are to see the stagnation continue, and the more likely we are to blame teachers for the problem. The hope I see in that is that we gave teachers a lousy definition of success and insisted at the expense of their livlihood they get there and to a large degree they've done it. What teachers are hungry for is a proper definition of success, one worthy of actually standing as a goal. Where I'm hopeful is that I truly believe teachers are capable of meeting whatever definition of success we place in front of them, and now its up to us to offer them a new and improved definition, one actually in line with the goals of education.

If building codes were built by the same people who make educational policy...

I find it unremarkable and simply a given that when establishing policy in virtually every professional field: medicine, engineering, construction, etc., policy makers and regulators turn to experts who understand the field. This helps ensure that the technical components are properly addressed and in turn helps ensure that the policies will have the desired effects.

I wish policy makers would offer the same courtesy to the field of education.

The lone qualification for someone to make education policy in this country is the completion of a high school education, as if having been in a school imparts expertise. I walk through buildings every day and yet that in no way qualifies me to build them. I go to my doctor when I'm sick but that doesn't qualify me to practice medicine. I use a computer but that doesn't qualify me to write code.

But education seems to be different, treated as something any idiot can understand, and so no expertise is required to write the laws that will govern one of the most critical components of our society.

The analogy I want to make is this: if our building codes were given the same professional courtesy as our education policy we'd be lucky when a building doesn't fall down. That is worth thinking about.