Thursday, December 26, 2013

A proper analagy

No pilot is allowed to fly a plane unless they can demonstrate a thorough knowledge of each instrument and the information it provides. In education, however, we find it perfectly acceptable to allow someone who has no understanding of what a test instrument is, how it was made, or what it was designed to tell us, use that instrument however they see fit.

Such use is akin to a pilot flying a plane without regard to altitude, the amount of fuel left in the tanks, or whether or not the wheels were lowered for landing. No amount of good can come from it. Strange that we now run schools under a similar scenario and then accuse those willing to work in that environment as being responsible for whatever failures occur.

Friday, November 1, 2013

Raising expectations

In a recent Ed Week editorial Marc Tucker at the National Center on Education and the Economy argues that the Common Core standards offer a way out of a twenty-year trend of declining expectations for students. He points to the notions of grade inflation and fewer hours of study per college course as evidence for these declining expectations, noting that students now receive credit for coursework that would have been deemed sub-standard twenty years ago.

I find those points entirely valid.

But in order for the Common Core—or any other set of statements regarding what students should know and be able to do as a result of schooling—to contribute to a solution we first need to examine the motivational system in which such a thing operates. By “motivational system” I mean the processes by which we attempt to induce behavior in teachers and students such that it moves us towards a better state.

The motivational system in schools is pretty straightforward: students take standardized tests designed to show where a school or student stands relative to others. Schools and students that rank towards the top are deemed to be doing well, while those that rank towards the bottom are doomed to be doing poorly. Schools at the bottom adjust instruction in the direction of the tested content. Repeat.

The selection of such an instrument as the yard stick to judge education risks making Tucker’s assertion that the Common Core stands a chance of fixing what he sees to be real problems moot. If indeed the Common Core standards offer a remedy regarding our expectations for students, the selection of a standardized test as the method for determining compliance renders the new expectation as unnecessary for success.

The real risk is this: if Tucker is right about the Common Core standards, and people fail to understand the limits of what even the best standardized test can tell us, we risk success on the test being misinterpreted as a positive step regarding our rising expectations. We risk looking back in five years and again wondering how we might actually, finally, raise expectations for our students.

Friday, October 18, 2013

What if you built an educational system and it didn’t work as planned?

That question is one that we absolutely must ask ourselves in 2013. Policy makers adopted an educational formula that imposes behavioral statements as educational standards, standardized tests as the basis for all quality determinations, and accountability to those tests as if they capture the bulk of what students should have learned over the year.

The system cannot be said to be working effectively by anyone examining the overwhelming evidence to the contrary. Our standing internationally continues to head in the wrong direction, curriculum winds up limited by the tested material, and the goal of a K-12 education to produce students ready to face the worlds of college and work seems ever further away.

The response, however, is the system of standards, standardized tests, and accountability is itself just fine—in spite of the limited evidence the system is moving us closer to the overall goals—and we just need to do two things: fine tune a few of the parts, and really hold teacher's feet to the fire.

There is, however, an additional set of responses. Perhaps the formula itself is flawed, or one of more of the pieces within the formula was never designed to play its assigned role, or there exists a better way. Almost no attention is paid to these additional responses.

Why is that? It could be because education is very difficult for outsiders to understand. It could be that the idea of reducing education to a single number or metric is appealing in the face of not really being able to understand. It may be that the political capital to put standards, standardized tests, and accountability programs in place came at such a cost no one feels up to actually making another set of changes

I think it is far simpler than that.

I think it has to do with fear. Somehow, we seem to think, anything different than the current path brings with it the possibility that we might just mess things up. The old medical admonition to “first, do no harm,” seems to have found its way into educational circles creating a paralysis when it comes to change of any type besides what we have at the present.

But what if the present system already violates the rule? What if the intent was certainly commendable but the result is now anything but? What if in hindsight we had applied that “first do no harm” rule? Might we have made very different decisions than what we did?

Its time to get a little creative; its time to see that the system in its present form fails the test of being a rational response to a major need; its time we started to seek some answers beyond the dated mantra of test-based accountability, because the present system may well be the cause of many of the problems it purports to want to change. It may be the single greatest source of the harm it hopes to avoid.

Thursday, October 17, 2013

Defining personalized learning

At a conference this week I heard a panel attempt to define what personalized learning is. It was interesting, all over the board, and inconclusive. I felt for the educators given the challenges they faced and their willingness to try something new.

What was missing from the conversation was a defined rationale. The panel agreed it was the right thing to do, but at no point did I hear them offer a reason that was compelling. It was like they sensed it and yet weren't quite sure how to give voice to their logic.

What I hoped to hear was the idea that individualized learning is about moving from the current state of things where time is a constant with the result that achievement varies widely, to one where the level of proficiency is the constant and time, effort, and the instructional path by which each student arrives at that point are the variables. The path would be determined explicitly against a student's needs.

The lack of a rationale for any educational endeavor--whether a good one or a bad one--leaves us vulnerable to the risk of making change for the sake of change. That must not be the case, especially for something as powerful as the idea of personalized learning.

Friday, September 27, 2013

Data-less decisions

(Reading the previous post first may help—this one follows from it)

A data-less decision in education is just that: a decision made absent supporting data. Data-less decisions are bad for the simple reason that whatever decisions are made tend to be in support of an existing bias. Such bias can be positive or negative, very fair and objective or extremely unfair and subjective. Sometimes the bias is based in what is actually true, but just as often it is based on an untruth or a stereotype. All this is why the mantra of data-driven decision-making has been established as a proper goal for educators.

The problem is that if I look at a student in a particular situation and I possess no meaningful data I am highly likely to let any number of my biases enter in to my view of the student. This can include but is certainly not limited to my views on gender, race, socioeconomic status, whether the school is in an urban, suburban, or rural setting, and perhaps the quality of the football team or the student’s status as a star athlete.

For example, the data are quite clear that suburban schools tend to outperform urban schools by a large margin for any number of reasons. Thus if the only thing I know about a student is the that the school of record is in an urban setting it would be a fairly natural thing to presume that from an achievement perspective the student might be expected to under-perform against suburban peers. If I acted upon such a supposition when assigning that student to classes I would be making a data-less decision heavily colored by that bias.

Having done so I may be guilty for promulgating a status quo I dislike. It may be that the student has the capacity to be the next Einstein and yet having assigned the student to remedial classes I helped preserve a stereotype rather than shut it down. Most of the time data-less decisions are made against a huge combination of different types of bias that are manifest in far subtler ways, but the pattern almost always seems to be toward the preservation of the status quo and not the other way around.

This isn’t to suggest that people are by nature racist or evil or mean. Many data-less decisions are made with the best of intentions. The point is that we are each products of a history that is anything but neutral, and the simple truth is that much of our bias has some basis in fact—e.g., urban schools as of this particular moment in time do in fact under-perform against their suburban counterparts—or stems from a historical precedent that can be tough to shake. The promise of data-driven decisions is that such bias can be removed from our decisions.

The most dangerous data-less decisions are those that appear to be supported by data. Those decisions risk reinforcing whatever societal norms exist under the false pretense that the data suggested that as the proper thing to do. The data, in that situation, act as a Rorschach blot, allowing you to think you see something that puts forth an argument for your approach when the empirical reality may be otherwise. Data-less decisions that appear to have the support of data risk justifying a bias that need not in fact exist. Such decisions help solidify such bias rather than disrupt it.

Nowhere is the data-less decision more prevalent than in the use of test data in schools. Standardized test data have a very limited range of potential uses by design. Included in that design is the ability to compare schools and students to each other as an aid to identifying which schools have solutions that are working that can then be applied elsewhere.

Not included in that design is pretty much everything else. Those comparisons are silent as to their cause, so any assumptions about the practices that produced the comparisons have to come from a place outside of the test data. So too with judgments regarding the quality of the school, the nature of the curriculum, and whether or not a teacher did or did not do his or her job properly. (Policy makers continue to assume otherwise much to the detriment of our students and schools, but bad policy cannot make test scores magically perform an act for which they were never designed.)

The majority of judgments being made about schools and teachers from test data are then data-less judgments, and any decisions made from such judgments are themselves data-less.

Due to the nature of test data, however, the data-less-ness is almost impossible to see. Remember that a set of standardized test data offers a statistical representation for how things are at the moment, and the test itself is designed to show where each student and school falls within that overall representation at that moment in time. Test data, then, actually reflect whatever biases happen to exist as of a given moment in time. Test data are neutral when it comes to what those biases are, in that they don’t care what biases exist. Test data will reflect them regardless.

That means it is a very tempting thing to look at the rank ordering of schools and conclude that those that rank near the bottom are lousy schools and those at the top are great schools, because in many cases that may well be true. However, any such judgment from the test data alone is a data-less judgment, since test data are silent as to their cause and are not designed to make judgments regarding quality.

That is so hard to see through. If we take a slice in time measure it will show the effect of whatever bias exists in the world. If at that same moment we add several additional slice in time measures designed to answer the additional questions we have we may be able to make some fairly accurate statements regarding the quality of a school and those working inside it.

But having taken those slice-in-time measures our goal must be to take a set of actions that remedy the shortcomings and advance the cause of education. If those remedies are successful, then a new set of slice in time measures should provide evidence of our progress.

Instead, what we now do is make one of those original slice-in-time measures—a test, which by definition is very limited and incapable of comments regarding quality—the basis for our remedies. We take an instrument selected and created for its capabilities to show us what a rank ordering at looked like yesterday and use it as the basis for defining tomorrow.

Here is where we need to really pause and ask ourselves a very tricky question: if we are basing tomorrow upon a measure designed to show yesterdays rank ordering, might we in fact be guilty of preserving the bias that existed when that original rank ordering took place? Rather than allowing education to progress and designing a new instrument that showed us the results of that progress, by using the original instrument over and over and placing the quality determination squarely within it might we in fact be guilty of further entrenching ourselves in an old status quo when all we really want to do is escape it?

As we heap data-less judgments regarding the quality of the teacher and the school on to the system two very clear and very contradictory messages emerge. The first is the altruistic insistence that teachers and schools advance the cause of education for their students and serve them well. The second is the very pragmatic demand that success is about teaching to a test designed to reveal the biases of yesterday. Accountability will be measured not by the altruistic message, but by how well you perform against a definition of reality that should now be out of date.

Lots of metaphors apply here: running on ice, running in circles, shooting yourself in the foot, you name it.

Having operated under such a scenario should we really be surprised that our test-based culture has failed to produce the transformations it promised? Or should we finally realize that believing the false promise of a test-based culture to magically transform the status quo is perhaps one of the greatest barriers to seeing that actually happen.

Saturday, September 21, 2013

What reliability doesn’t say

A standardized test is at its simplest a data collection tool. It only works if the data collected meet a certain standard in terms of statistical reliability. Reliability is all about the consistency of the measure or observation, and generating a sufficient level of reliability to allow for reasonable inferences to be made requires both skill and planning. In the process of achieving that reliability, however, you impose a whole series of limitations on what the resulting data can say in the name of allowing it to say a few things well.

To make the idea of reliability more concrete, imagine that you have two observers of a rat moving through a maze and you ask each observer to record their observations by writing down what the rat is doing during the experiment, with no other tools than a pen and a piece of paper. Odds are the two observers will offer a related but very different narrative, which means that from a research perspective the observations would be of limited use.

The reason for the limited use is that some of the inferences drawn from one observation risk being refuted or not supported by the other, and so a researcher making any inference risks that inference not being supported by data.

Statistical reliability can be obtained if a certain amount of discipline is introduced to the observations. Adding a stopwatch would of course help because both observers would be likely to agree upon the amount of time it took. So too would limiting the observations being recorded to a list that included only the salient points of the hypothesis being studied.

Under this second scenario the agreement would offer an indication of the reliability of the judgments, and with a high degree of reliability a researcher has an increased level of confidence that their inferences can be based on good data.

What must be remembered, however, is that while such reliability enables a great deal in terms of making valid inferences, it also conceals a great deal in what goes unobserved or unrecorded. If, for example, our focus is on the number of wrong turns the rat takes under a number of different scenarios we would make note of those particular phenomena, and not on whether the rat was brown or white or tended to make its wrong turns more often to the right or to the left. It isn’t that those things are unimportant or irrelevant in a broader context, but rather, they are not a part of the question being asked and thus are not included as part of the observational milieu.

No set of reliable observations is therefore ever complete—the fact that they can be made to be reliable is an artifact of the manner in which the observations are controlled. They aren’t controlled for some nefarious purpose, but in order to create a limited number of powerful observations that can lead to increased understanding. Once those controls are put in place, however, a researcher, by definition, draws a very firm line in the sand that limits the range of possible inferences. Having done so, the vast majority of the universe of inferences is removed as a possibility, leaving only the few that are the focus of the research.

The price of reliability, then, is that it must pick and chose, and in doing so always leaves most of the universe outside of its gaze.

Standardized tests are a type of observation, with the data being collected in the form of a student’s responses. What that means is that the test itself is necessarily limited in its scope and the vast majority of the universe in which the tested content is contained is external to the tested material.

For the purposes behind a standardized test those limitations don’t pose a problem, because the purpose is pretty straightforward: show the rank ordering of students in a manner that students can be compared to each other, and one group of students can be compared to another group of students. To do that you need only test items that behave in a very narrow way: roughly half the students need to answer each of them correctly, and half incorrectly. Those are the ideal items for answering such a question since 25-30 of such items are generally enough to provide enough data to answer the question regarding where each student and group of students rank. Items within that narrow range do a very nice job of spreading students across every possible number of correct responses.

Within the universe of a domain such as reading the items on a reading test needed to show the rank ordering of students represent a tiny sliver of the domain.

Reliability is built into the instruments so that they allow for a high level of confidence in the inference regarding where a student ranks against the tested material. What they don’t allow for are for any additional inferences that were not a part of what was observed. That’s just plain logic: if you choose to narrow the focus of your observation to achieve a reliable set of data what would make you think that inferences outside and beyond what was observed are suddenly available?

So now lets think about what is outside the “observations” being made by a standardized test. That would include everything that isn’t in those items. It includes the most challenging of the material that was taught and the most basic, since none of that is on the test. That content consists of material that by the end of the year every student would have answered correctly, or perhaps the vast majority would still answer incorrectly, and thus that material fails to help answer the question regarding rank ordering. You could include it, but it wouldn’t contribute to the reliability of the measure—in fact, it hurts the reliability because it doesn’t contribute anything to the purpose of the measure.

Such tests don’t include any observations as to the quality of the teacher or the school. This often comes as a surprise to most people since the entire world now seems content to assign a quality judgment to a school based on test scores. But where within the limits of the observational lens is any question as to school or teacher quality? Rank orderings are good for the purpose of comparisons—in fact, for the purpose of offering up meaningful comparisons, they are ideal—but the placement of a student or a school within a ranking says absolutely nothing as to what caused a student or school to land at that point.

Filling in the silence beyond the student or school ranking with statements as to the quality of the school or the teachers may seem on the surface to be justified—and such judgments may in fact correlate somewhat with the reality regarding quality schools—but such statements are themselves entirely unsupported by the data. Inferences about quality made from any standardized test are in the same category as speculating that the faster mice in the maze ate a better breakfast than their counterparts without having a speck of data as to whether or not that is true.

Finally, the answers that students provide offer no advice as to what should be changed in the curriculum to support better instruction the following year, and yet policy requires that state test scores be returned in order for schools to use them for just this purpose. This is perhaps the greatest crime we commit with test scores, and takes us so far beyond the observational lens offered by a test score that it really is both shameful and laughable at the same time.

The best way I can show the paucity of instructional value from such a test is to point out the types of observations that would be needed in order to identify the candidates for improvement from one year to the next. Consider the following as a representative yet incomplete list of the kinds of questions that need to be answered, and compare that to the only question a standardized test is designed to answer regarding the rank ordering of students:
  • Did a teacher teach a rich curriculum and teach it well? 
  • Did a teacher differentiate instruction according to the needs of individual students?
  • Were students under prepared coming in to school this year and thus in need of extra support to catch up to their peers?
  • When learning failed to occur, what was the cause? Discipline issues? Personality conflicts? Novice teachers?
  • Did teacher re-teach concepts and ideas that were not understood in ways that represented a “once more, but louder” approach, or did they attempt to teach the same thing but through a different lens or approach?

What should be obvious is that the data-gathering efforts that would help provide answers to these and other questions like them would require an entirely different set of observations than those required to identify the rank ordering of students. What should be just as obvious is that no matter how well we answer the question regarding how students and schools rank we cannot suddenly ask the observations to extend their reach into areas that the test doesn’t cover.

We need reliability in social science research and certainly in testing, but we need to understand that imposing it upon our observations—whatever their form—means that our opportunity to make inferences narrows to the observational target. If we think otherwise we are quite likely making inferences that lack any actual support.

If I am right in this last point—and I would argue I am—then much of what currently passes for data-driven decision-making is actually data-less decision making that we pretend is valid.

Tuesday, September 17, 2013

Why I don’t hate the Common Core

Multiple sources have accused me on multiple occasions of hating the Common Core and thereby the Common Core assessments. This is understandable. I am one of the few people to criticize the overall selection of behavioral statements as the paradigm for what we call an educational standard, and the Common Core follows that trend.

By “behavioral statement” I mean that our standards in education tell students what behaviors they should engage in: understand this, comprehend that, multiply two digit numbers, etc. As a curriculum guide such statements are extremely useful since that is precisely what a teacher attempts to do everyday: get students to behave in ways that further a students’ learning.

As the basis for a standardized test such statements are also more than appropriate, since such instruments are designed to allow for inferences about student performance relative to such behaviors and to other tested students—when used properly, which is another issue for another time.

But should such statements be allowed to serve at the elevated level of a standard? And, if we answer that question in the affirmative—which we have for nearly twenty years now—what are the consequences for having done so?

The easiest way to answer this question is to ask about the purpose of standards in industry and government, since their ability to transform broken industries, improve the quality of our air and water, create real and meaningful competition, and reduce the price of goods and services to the end consumer was the reason education decided it too could experience a similar transformational benefit from a rich set of standards.

But industry standards are most notably precise. They have to do with making the world more efficient (the size of gas nozzles), safer (clear air standards), cheaper (35 mm film as the standard that helped make cameras affordable for everyone), or even just better (minimum highway EPA for new cars). These standards care little about the behaviors that cause them to be met, but leave that up to those with the expertise to achieve such things. Rather, the standard allows for an infinite number of behaviors to lead to the standard being met, with the benefit to follow.

It is interesting that education chose the behaviors as the standards system. Had we chosen a standards system more in line with those that had created the types of changes we had hoped for, we might well have seen the types of transformations that industry and government experienced when they adopted such standards. Instead, we chose behavioral standards and yet we expect them to produce the exact same result as the more precise industry standards.

And what—to close out this entry—might a precise educational standard look like? Here are just a few examples:
  • All students must write well at least once in order to matriculate to the next grade, with the difference being the level of effort and time, but not the expectation.
  • In each year of schooling, a student’s teachers will select an assignment, project, or area of study that a student struggled with and reassign the work, with the additional requirement that the work be competed to an A standard, and provide the supports and scaffolding necessary to see that happen.
  • Over each three-year period of service teachers must present a paper, a research project, a content area project (such as a play or a novel) to their peers.
For a school to do any of these would require a whole range of behaviors that would differ by student and school. Differentiated curricular and instructional decisions would have to made against need, and the system of schooling would have to organize itself very differently than at present for these types of standards to be met. Instead, we now have a system that attempts to align the behaviors and then generate a similar outcome for everyone.

In education (and in virtually any field or endeavor) you have to pick: align the behaviors and you all but guarantee a differentiated outcome, since students will respond differently to those behaviors, or align the outcomes and allow the behaviors to differentiate against need.

The behavioral statements we position as educational standards bear no resemblance to the standards that created the desired level of transformation elsewhere. Education now aligns the behaviors and demands a similar outcome, when transformational standards define an outcome and leave the behavioral piece to those who truly understand how to achieve the outcome.

That is why I don’t hate the Common Core. As a guide to generate a rich curriculum it may be more than adequate or even amazing—I am not a curriculum person with the ability to make that determination and so I won’t try. But as a guide the opportunity still exists to differentiate instruction in anticipation of a similar outcome; as a standard the message is for instruction to standardize but then the guarantee is that the outcomes will differ.

I am disappointed in our inability to see what we have done: to repeat, we attempted to replicate a standards environment with real transformational power but then failed to adopt the type of standards that had actually produced such a transformation. The result is that we now standardize the wrong pieces.

We are left with the expectation of transformation when we failed to include any transformational tools anywhere in the educational package.

Monday, September 9, 2013

Pictures of "rigor"

If one Googles "rigor" and selects the image option the result offers clear evidence that the term has a huge variety of meanings in education--many of which are incommensurate with each other--strengthening my argument that its use as an educational term is more about the creation of a community around a set of terms than in serving as a useful adjudicator for what should comprise a quality of education. Consider the following:

1. The dictionary definition clearly has not yet caught up with the current use of the term as "the adjudicator for everything that is to be valued in a quality education." The definition below is from a recent unabridged dictionary and even dictionaries on the internet have not yet posted anything resembling the new meanings. Perhaps that is because nothing approaching actual agreement exists in the term's current usage. It is more likely the severity of the semantic offense in that these new possible meanings are so far removed from the old denotative meanings that they are unrecognizable as having anything to do with the historical term.

 2. See below for evidence of the claim in #1 that its all about rigor.

3. The quote below suggests that the "rigor" in the term "rigor" and the "rigor" in the Latin "rigor mortis" are not the same thing, even though every dictionary I have so far found says that they are. Its a folksy sentiment but that doesn't make it accurate.

4. There is a head banger band called Rigor Mortis (at least they look like head bangers), and now, apparently, a clown.

5. I've seen this Venn diagram many times and find it hard to reconcile which of the new meanings of rigor are being referenced here, and not surprisingly none of those contained in the dictionary seems to fit either. Not even "a sudden feeling of cold with shivering accompanied by a rise in temperature, often with copious sweating."

6. Rigor, of course, must be applied in an equitable fashion. These students all seem to be smiling a lot considering that the education they are advocating for will be one that is severe, strict, and unyielding in its harshness.

7. I find it remarkable in the next image how the terms "flexible" and "rigor" are used in the same sentence, given that they are listed in nearly every thesaurus I can find as antonyms.

In attempting a bit of humor here my point is this: I am often accused of splitting hairs on this topic, but I assure you that every image above (except the clown) was created by someone that had something meaningful to say and their message now fails because of imprecise language. They chose a popular term with a meaning far removed from whatever message they were actually trying to send, and now the actual meaning is lost, or at best confused.

That is what I object to: educating 55,000,000 children each year is tough enough without putting a confusing set of vocabulary in the way. It makes us look like we don't actually know what we're talking about, or we're trying to hide something through language that obfuscates. Neither is good.

A contrarian article and talk

In October 2011 an article I wrote appeared in Educational Leadership in which I compared 
teaching to the test in schools to studying for an eye exam. You can read it (and see a horrible picture that seems to have deteriorated with time) here.

The same comparison will be made in a book I have coming out this fall on the pitfalls of education reform, and Peg Tyre used it in her book The Good School.

I also spoke at the AASA conference the following spring and you can see an article on the talk here and see a brief interview here.

Skills assessment

Lots of attention is now being given to the notion that a skills-based education is a good thing and in our test-obsessed culture many are starting to look for assessments that indicate the presence or absence of such skills. That’s likely to lead to a whole bunch of inauthentic behaviors if we aren’t careful.

Consider that researchers are quite adept at finding ways to identify the presence or absence of such things under the guise of research, but in order to do so must take a fairly circuitous route. Questions, observations, and a host of other data gathering efforts that distill information into usable chunks are extremely valuable in allowing a researcher to make statements regarding skill attainment in schools, but the data elements almost always represent a correlation that in turn enables the inference.

In order for the inference to be valid, however, the correlation must be to the desired behavior. The instant any sort of accountability is tied to the correlation the correlate will substitute for the desired behavior—the system will follow the formal definition of success, even if it differs from what is actually desired.

If our history with standardized testing offers any lessons, policy makers—who have a long history in education of confusing correlations with cause and effect—will at some point legislate success on the substitutes. They then run the risk of presuming that as the substitute measure climbs and falls it represents something real and meaningful, when odds are, like a test score in the current system, it is as much an indicator of the degree of manipulation the system has undergone as opposed to a measure of the presence or absence of a desired behavior.

The fact that systems sway when accountability is placed on some component of them should come as no surprise—that is the intent. Educational policy makers have up to now shown not one bit of hesitancy in placing accountability on the thing that was used by researchers as the correlate to something larger and more important rather than the thing itself and then delude themselves into thinking that the correlate and the thing are one and the same.

The hope for the skills movement is that the correlates often don’t really even resemble the desired skill. A simple commonsense look should suggest that the patterns of responses in a researcher’s survey or the presence of certain traits in an observer’s checklist that may correlate to the presence or absence of a desired skill are not themselves a representation of the skill. Such things are easily manipulated by anyone who realizes that a right answer exists, and even those looking to answer honestly will skew towards any right answers they know to exist. Just knowing that a right answer exists will have that effect when the results will be used as a basis for judgment.

Accounting for a skills-based environment is critical and doable, but we are going to have to do so using a very different set of tools than anything in the current educational toolbox.

Monday, August 26, 2013

The trouble with "rigorous standards"

(This was adapted from the most read post from another blog I wrote that seems to fit with the theme of Ed Contrarian)

The educational environment suffers from imprecise language about the most important elements of our activity and the lack of clarity harms us in subtle but significant ways.

The word “rigor” refers to the quality of being thorough, exhaustive, or precise. Its secondary meaning is severity or strictness. Only in its noun form (rigors) does it take on the idea of being demanding, but this refers to things like “the rigors of the harsh winter.” The etymology of the word comes from Latin and literally means “stiffness”: think rigor mortis. Nowhere in the history of the word has it meant what we seem to think it means when used today in education.

Strange that in education today we hear a great deal about the need for rigorous standards, rigorous tests, rigorous passing scores on those tests, and rigorous accountability standards. Google "rigor" and "education" and the hits are in the millions.

In our talk about education we should excise the term "rigor" from our vocabulary and be a little more rigorous in our word choice when describing what we want—seriously.  Suggesting that a “rigorous education” is a good thing means  education should exhaustive, precise, harsh, and strict—a nineteenth century notion that lacks creativity, treats students and teachers as automatons, and isn’t going to serve 21st century students well.  Its misuse in a modern context is embarrassing.

If you want tough standards or standards that are more precise or based on high levels of performance when compared internationally, say it. Don't use a term that fails the test of having an application in today's educational environment.

"Standard" suffers from the same malady. We attach a variety of things the term, including content statements that describe what students should know and do, and passing scores on a state test. "Meeting" the standard is now the goal for students, teachers, schools, and districts, and yet the goal is defined differently for each.

We have to get a handle on vocabulary. Standards, content expectations, cut scores, and passing rates are NOT the same thing.

However, the idea of "higher standards" is now interpreted to mean better content expectations, higher cut scores, increased passing rates, and the goal of a quality education. These are fundamentally different from each other, and furthermore, all fail the test of ensuring that success at any of them will result in higher degrees of numerate, literate citizens ready to hit the ground running in the twenty-first century. A term with great power: "standard," has been co-opted to make it mean what it never intended, and it confuses a lot of very smart people. We need to get back to basics on this one if standards are going to have a real role in education.

Precision around "rigor" and "standards" will clarify a great deal.

The other side of understanding

One way to create real understanding of something is to shine a new light on it in the hope that something new will be revealed. Due to the way the human brain works this is actually harder to do than it sounds.

Our brains seem wired to pay attention to smallest amount of material possible in the construction of meaning. It isn’t that they are lazy, but rather, we seem to have limited amounts of memory and processing power and natural selection seems to have made us very efficient in this regard. We take what we need to generate meaning and then we’re on to the next thing.

That’s why metaphor works. I can read “my love is like a red, red rose” and imagine the vibrant color and the rich fragrance and my associations are positive, and yet that ignores the fact that roses have thorns such that gloves have to be worn when you pick them. Or in the case of the Christian image that Jesus’ followers are like sheep, our brains go to the caring part of the image in which the shepherd will spend considerable time going after and caring for a wandering lamb. This, in turn, ignores the fact that shepherding is an economic function and that many of those lambs will soon be turned into roasts and chops.

The field of education is particularly susceptible to this phenomenon. As just one example, standards, assessments, and accountability, on their own, are all powerful terms that most reasonable people would want to be a part of the educational package. But most people—and this includes most of those in a position to address the policy issues in education—stop there and leave the rest of whatever that means to others.

What if what we call educational standards aren’t actually standards? How do standardized tests actually work? What are they actually designed to tell us? Are they the best tool for determining the quality of a school? Is the premise that rising test scores are always an indicator of good things actually true? Is the accountability system actually designed to produce excellence?

We owe it to ourselves to ask and thoroughly explore such questions.

At present, the educational conversation seems to stop at the pronouncement that standards, standardized tests, and accountability are all necessary parts of the system. We will never know the veracity of that statement until we force our brains to a level of understanding well beyond the level of utterance.