Sunday, May 31, 2020

What happens if we remember we're all actually related?

We are all of us cousins. Every human being on the planet. It may not be through a long-lost aunt or a great grandfather, but it’s probably not much more than a great great great grandparent. That’s remarkable. If you believe in science, we all come from a common ancestor from 200,000 years ago and our ancestors’ paths have probably crossed multiple times since. If you believe the world came into being six or seven thousand years ago—I don’t but I’ll grant it for the moment—then our common ancestor is even more recent, and we’re more cousins than ever.

We need to acknowledge this, that we’re all related, all one family. Millions of our cousins around the world are sick. Millions more are impoverished and living in slums. Millions are governed by cousin tyrants who don’t seem to care about their extended family, or forgot they are part of one. And many millions more continue to wonder why bombs and militaries are more important to some than food, healthcare, education, and children. A few thousand of our wealthiest cousins control most of the world’s resources and could change the course of history if they wanted, but they’ve shown few signs that’s what they intend to do.

From within this big collection of cousins we have some who are just flat out terrible, and they deserve a spanking and then some. But we also have a ton of our cousins willing to do something, even though it’s hard. These are the cousins who see a wreck in the middle of the night and without hesitation risk their life to pull another cousin to safety. Or who put on last week’s soiled surgical mask to help a cousin overcome COVID. Or who teach the future generation of cousins and give a hungry child their lunch, because that’s just what you do when you have a little more than someone who has nothing. Or that protest against racism, sexism, despotism, bias of all kinds, and generally mean people. All these cousins and more need to pull together now more than ever. The future depends on it. Our youngest cousins may not have a future if we don’t.

I live in America, by the simple fact of birth. I’m white, male, and while I was raised by loving parents who struggled financially it was never a question whether or not I would make it. And I did. I’m not rich, but squarely middle class, and I have what I need. And here’s the truth. I worked really hard, but I didn’t work any harder than a million others who didn’t make it. I legitimately tried, but I can’t say I tried harder than all the rest. And yet I’m here and so many aren’t.

I always had an invisible advantage, an unseen leg up on the part of America that didn’t look like me. I have never walked down a street worried about being singled out and silenced or harmed. I have never walked by a police officer hoping they weren’t one of the few bad ones and today is about to be my unlucky day. I walk out my door every day expecting I’ll get a fair shake. I never worried I was being under paid. No one has ever crossed the street to avoid me or been fearful just by being in my presence. My bet is that if I ever commit a crime, I’ll be given the chance to turn myself in, and probably even negotiate on the terms of my surrender. If I do go to prison, my invisible advantage is likely to get me the benefit of the doubt when it comes to my sentencing.

And let me be clear about something—I get a lot of attention when I walk down any street. I’m a one-armed man. I lost an arm nearly to my shoulder in an accident forty-nine years ago when I was six, so people—cops included—have been staring at me my whole life. I got stared at when I was six and it happened a few days ago on my fifty-fifth birthday, and most days in between. And while a few who don’t know me may feel sorry for me (shame on them for judging—they should get to know me first), not one time did I come under suspicion for my difference. Not one time was I ever mistreated by an authority for being who I was. Not one time was I ever presumed guilty for being born. I used to think that made me lucky, but that’s wrong. To claim I’m lucky to be born white presumes it’s best to be white, when it should be best to be who you are. It has never been the case in this country where everyone is given the chance to do that.

Way too many of us have forgotten that 200,000 or six thousand years ago, take your pick, we would have called the same people grandma and grandpa. They wouldn’t look exactly like us or communicate like us, but that’s not the point. The point is we’re all connected.

Imagine explaining to this grandma and grandpa slavery, and describing how the tiniest of genetic differences, the pigmentation in one’s skin, led to the notion that a certain color made some cousins worth not very much as humans, but a great deal as property. Imagine explaining the massive scope of slavery in America’s history, and the fact that much our country was built on their uncompensated backs. I imagine these original grandparents would express outrage and fury at that sort of treatment of their family members, and then relief that it was outlawed a hundred and fifty years ago. I can also imagine them expressing even more outrage when they learned that it took a century for the country to finally admit a bit of wrongdoing and extend some basic civil rights to those descendants of former slaves who had been denied even that. And even more outraged when they discovered the number of cousins who had their fingers crossed when the admission was made. And even more if they could see the number of people who act as if they’re sorry it was even said.

Imagine telling them that a lot of people in the wealthiest country that their great great great great great grandchildren had ever created now regularly apply bias to practically everyone with non-white skin, and accept as good and right treatment of those with dissimilar pigmentations that they condemn and punish harshly when done to the similarly pigmentated. And now it’s not just pigmentation, but language as well. Who knows what will be next? Our grandparents would likely wonder why so many of the cousins always seem to need someone to pick on, or even hate, and how that could possibly make a person feel better.

Last week some of my cousins who I don’t know thought it was a good idea to gather their semi-automatic weapons and march into the Michigan senate. Their pigmentation happened to be white and so they were kindly escorted out and given a scolding. This week an unarmed black cousin who I also didn't know but had a lot fewer opportunities than me and may or may not have tried to buy something with a counterfeit twenty-dollar bill died when a cop decided handcuffs and compliance with his demands weren’t enough and kneeled on his neck until he was dead. I’m just glad that the cousins with their semi-automatics weren’t black and didn’t meet that cop, because I don’t think they would have been kindly escorted anywhere, except to prison, via tear gas, handcuffs, and some knees on some necks.

I’ve had it. I haven’t always understood the advantage I have of being a white guy because from my angle it looked invisible. Just part of the status quo. But it isn’t invisible to lots of my cousins. It’s not innocent or innocuous. And the more it gets ignored the more likely it is to become malignant, to justify violence against those who do see it by those who refuse to. I’m learning to see it. I’ve been learning for a long time, and I’ll be learning it for the rest of my life if that’s what it takes. I hope someday we’re all able to see it, maybe even at the same time, because at that exact moment suddenly there won’t be anything to see. We’ll all just be cousins again.

Let’s get there. It’s time.

Monday, May 18, 2020

Being accountable to a test result...

Being accountable to any test result is being accountable to the wrong thing. Right now, the most important test in the world is for the Coronavirus. The information it provides is immensely useful, and yet to treat that information as more than information about the presence or absence of the virus is a mistake.

Neither outcome tells us anything about a person’s overall health. Neither outcome signals anything about what has happened or what will happen. And both outcomes come with a caveat—there is a small possibility of the result being wrong, of suggesting you have it when you don’t, or that you don’t when you do. To treat either outcome as more than it is absent contexts, details, and a whole lot of additional information renders any next step invalid, likely to be unhelpful, or even harmful.

All tests suffer from this limitation. It is a consequence of trying to squeeze as much precision as possible out of a single result, and the necessary price we pay for needing and trying to do so. More accurate results provide confidence that studies of the contexts, details, and any applicable information can be more expertly applied. But really, all any result does is move us a step or two away from chaos. It does not, as is so commonly and wrongly presumed, put us a step or two away from surety. And while that is still so much better than having no information at all, it is no more than one piece of a much larger puzzle.

What would be terrible for all of us is a lockstep approach that failed to consider context, that applied a generic solution to a result, or that refused to consider the unique conditions of an individual. Medicine would be reduced to a simple decision tree and we would be infinitely worse off than we are. It would be like thinking we’re through with a puzzle after the first two pieces come together.

Educational testing based on a specific methodology—the variety used in state testing programs, or the norm-referenced tests sold commercially, such as the Iowa Test of Basic Skills, or NWEA’s MAP—is now guilty of encouraging that exact sort of behavior. These too are tests that produce a narrow result that move us a step or two from chaos but no further. The results are nothing more than points on a continuum (some of which will be wrong) based on a moment in time that lacks context, cause, or professional interpretation. Yet to sell more product or to support bad educational policy, the declaration gets made that the results are more than they are, that they can directly inform teaching and learning, indicate quality or effectiveness, and replace professionalism.

This is as false and misleading and harmful as thinking that a diagnosis equates with a solution. All test results require interpretation through the broader technical lens of a professional equipped with the full context of the individual’s situation and current best practices. And they require the ability to question that lens, to recognize it as always incomplete and able to be improved upon. Only then is the professional capable of determining an optimal path forward for that student or patient while at the same time being responsible for making that path better for the next time.

I used to be kinder to the test publishing world—especially when I was in it and it was paying my bills and I still believed we were capable of staying within the limitations of what a test is—but the field has strayed way too far from its usefulness of putting tools in the hands of a researcher and instead has become something else altogether.

We would never tolerate straying so far from what a thing is in the tools that will help us through the pandemic because the consequences would be unthinkable. We shouldn’t tolerate it in the education of our nation’s children for the exact same reason.

Thursday, April 23, 2020

On Teacher Evaluation During a Crisis

The headline in my email this morning from Ed Week asked whether it was appropriate to do teacher evaluations in light of the Coronavirus. I wish they would ask the more honest question: is it appropriate to beat up on teachers during the Coronavirus, or should we give it a rest for a year?

If the evaluation systems were based on a true accountability, this question wouldn’t exist. The fact that it does, that accountability and teacher evaluation in schools are in fact being put on hold—means that we don’t have anything even close to an effective accountability or evaluation environment. I continue to argue that if it can be put it on hold you have to stop calling it accountability because it isn’t. I would argue the same for evaluations.

Education's myopic autopsy-based approach to everything inserts a punishment and punishment avoidance mentality into the process, not a how can we be great mindset that is at the heart of great organizations. As I study accountability in those organizations, they do things along the lines of the how can we be great mindset.

Here’s how they do that.
  1. They imagine themselves standing in front of a stakeholder at some point in the future. They ask, “what will I need to say at that moment to prove my effectiveness? To show that I’ve done something great and that my work matters?”
  2. They ask, “what would count as evidence of effectiveness or movement towards greatness?”
  3. They look at the current state of things and figure out what needs to be done so that at that future moment they can state that they have indeed been effective and done something great, with the evidence to show it.
  4. They get to work with a shared understanding of what effectiveness and greatness look like.
  5. When they hit snags—and they always will—they think about that accountability moment in the future and how best to get back on track.
  6. They seek help and support early, when it can make a difference.
  7. The goal in the organization is for every person to have highest evaluation marks possible, because that would mean the organization is highly effective and ready for whatever comes next.
It really is that simple.

In a crisis, nothing changes. In fact, a crisis is when this system is most effective. It is when we most need to develop a clear understanding of what greatness at some point in the not too distant future needs to look like, of what would pass for evidence of that greatness, and what needs to be done between now and then to make it happen.

Millions of educators have already answered that first question in this crazy new environment no teacher could have prepared for: what will greatness look like? They don’t have to ask if their work matters—it does.

They are at this very moment in the process of getting to that new definition of greatness. And as they hit snags—and they have hit a ton of them and aren’t even close the end of it—they don’t punt the moment of greatness down the road, but adjust, and figure out a way around them. And part of that figuring is seeking answers from others and asking for help when they need it.

They get—although they may not use these words—that evaluation should never be a gotcha at some point down the road, based on a day’s worth of test scores from last year, but rather, a summary of what I’m doing right now. Which means my days aren’t spent trying to avoid trouble in the future but figuring out ways to do great things.

If I were in the classroom, I would beg for someone to evaluate me in exactly this way. I would deserve it, because it would finally show the truth. It would show where I was effective during a terrible time, where I was challenged and needed to adjust, and whether or not I accomplished what I set out to do. All of that would be shared with my principal and teacher leaders who are now rooting for me, not staring over my shoulder trying to catch me at something, and it wouldn’t come as a surprise at the end of the process as to whether I had been effective or need to rethink things going forward.

And if that system can work well in a crisis, imagine what it could do when some sense of normalcy returns.

So, should we put the current teacher evaluation programs on hold during the Corona Virus? No. We should simply end them all together. In their place we should have an evaluation system based on the how can we be great mindset, which is how it works in effective organizations, rather than the punishment-avoidance nonsense we’ve had for years—that has never worked to make any organization better than it was.

We do that and maybe something good can come from this mess.

Thursday, April 16, 2020

A chance to rethink accountabilty

In this age of the Coronavirus and its overwhelming impact on literally everything, a bright spot in an otherwise ominous cloud is the way we are thinking differently about old problems, rethinking our relationships with each other, and reflecting on what is actually important.

We should do the same with educational accountability. And we have a window in which to do it.

Of all the problems to rethink, educational accountability should be at the top. For the past two decades (longer in some places) educational accountability has followed the "better autopsy" method for improvement, which will always fail. At the end of a school year the state performs an autopsy (and a partial one at that) and then forces schools to ask, "what could we have done last year to have had a better autopsy last year?" and then whatever the response do that this year.

The better autopsy accountability is nonsensical for lots of reasons, but none more so than it will force schools not to change with the times. It presumes that whatever conditions existed last year and the year before will continue (forget that the world is changing faster than we can ever imagine). It makes our our job in education to get kids ready for a world that does not yet exist by getting them ready in a world that hasn't existed for years. In other words, the closer we can align ourselves to a definition of things that was developed years ago but doesn't exist any more (if it ever did), the more likely we are to be declared successful in what is arguably a dumb system. And the more successful we are in that world, the less prepared our students will be for the one that is surely coming.

But it is also nonsensical because it isn't actually accountability. Accountability in effective organization is about the future. It is about ensuring that those in the organization are the right people to take it forward, or that the organization is prepared to do the work we need it to do. Accountability is about what we do in answer to the question, "will my child be safe in school today, and tomorrow, and the next day?" A business that substituted the better autopsy approach instead of actual accountability would, like schools have for years, find it difficult to change, impossible to adjust to new circumstances without tremendous amounts of energy better spent elsewhere, and in the meantime risk stagnating itself into oblivion.

The better autopsy mindset existed in education long before what passes for educational accountability put it on steroids--which helps explain why education looks surprisingly similar to what it looked like when I was in school in the 1970s. And now it's time to knock it off, and we have an opportunity to do just that.

We have some things planned over the next few months, so stay tuned. And if you're interested drop me a note at john.tanner@brave-ed.com (new email--new organization will be announced shortly) and we'll get you on the list for announcements.

Wednesday, January 15, 2020

The gross misunderstanding in educational accountability


For a word used with ease in educational policy circles, accountability is a term that is surprisingly misunderstood and misused.

Seeing this is relatively simple. Ask an audience to brainstorm a list of terms they associate with accountability and a pattern will quickly emerge. Many of the words will be positive such as:
  • Transparency
  • Effectiveness
  • Responsibility
  • Outcomes
And many of the words and phrases will be negative, such as:
  • Feet to the fire
  • Testing due to lack of trust
  • Blame
  • Shame
If you list these words in two columns on a sheet of paper what you will be observing are the two sides to accountability.

The negative terms represent what happens when an organization refuses to be accountable and/or is perceived as failing. In that case, accountability is something imposed on that organization by outside stakeholders for the purpose of bringing the organization in line. Such an accountability focuses the organization on failure prevention at the expense of everything else.

The positive terms represent what happens in effective organizations. These are organizations that internalize the principles behind these terms and attempt to exemplify them in their efforts.

This type of accountability focuses the organization on how best to sustain itself long-term, and how best to communicate its effort to its stakeholders.

Both types of accountability are perfectly valid depending on the circumstance.

What should be clear is that the objective for any organization should be an accountability focused on long-term sustainable excellence. This properly aligns the organization with its long-term goals and the idea of continuous improvement.

What should also be clear is that imposing an accountability of failure prevention by stakeholders must be performed thoughtfully. Its intent is not long-term sustainable excellence, but just the opposite: an immediate, short-term failure correction. The intent of an imposed accountability is to focus the organization and its resources on correcting the failure at the earliest possible moment or the organization’s existence may well be at risk.

An imposed accountability’s purpose is thus temporary: to force an immediate correction after which the organization can turn its focus towards long-term sustainable excellence. When an organization is having its feet held to the fire its job is not long-term sustainable excellence but something else. The sooner it can correct its errors and turn its attention towards long-term sustainable excellence, the sooner it can return to a state of effectiveness.

It would be deeply illogical and harmful to any organization required to operate in the perpetual shadow of an imposed accountability when the goal is long term effectiveness. The reason for this is simple: it would make the formal focus of the organization failure prevention, and thus attempts at long-term effectiveness would be perceived as secondary.

Even if the organization’s leaders recognized they were in an illogical system and attempted to focus stakeholders on their long-term approach, the fact that the imposed accountability was at the behest of stakeholders while the long-term approach was not, means the imposed accountability is likely to triumph. At best this would cause any positive message to be diluted, and at worst ignored or not believed.

Getting the balance right is always a challenge as organizations consist of lots of moving parts and it will regularly be the case that some of those parts are deserving of an imposed accountability. So long as such accountabilities are temporary that part of the organization can correct itself and return to a focus on the long term the accountability system. In that case the overall accountability system will be seen as contributing to the overall well-being of the organization.

The objective must be for any organization to spend the majority of its existence in an accountability focused on long-term sustainable excellence, and as little time as possible under the pressure of an imposed accountability. Only then will it be in a position to deliver effectively for its stakeholders

Sunday, January 5, 2020

How standardized tests do what they do (which isn’t what most people think)

Standardized test is the name most people assign to the tests used in state accountability systems, commercially available norm-referenced tests, and college admittance tests such as the ACT and SAT. I have long encouraged folks to drop the term “standardized,” since that merely refers to the conditions under which tests can be administered, rather than what this narrow family of tests are and do.

Instead, I prefer to call them predictive tests. This describes what they are intended to do.

I have also strongly encouraged a more critical use of vocabulary regarding predictive testing. This is because of the massive confusion that results from the plethora of terms now applied to testing that don’t mean what most people think, such as standards-based, or criterion-referenced.

What sets a predictive test apart from all other forms of testing is its ability to produce predictive scores. Simply (and crudely) put, if I am slightly above average this year you can predict that I will probably be slightly above average next year. If I am not, if I am well-above or below average, you can note it and begin the search for causes. Perhaps there are lessons to be learned or perhaps not, but as a signal for where to look such test scores have some use.

Confusion is created when people presume that their names for testing, such as standards-based or criterion-referenced, are parallel forms of testing to a predictive test. This is inaccurate. If the tests produce consistent results across administrations, they are first and foremost predictive tests. You may have drawn the content from a state’s written standards and labeled it a standards-based test, or drawn a line in the sand and assigned it a label, in which case you created a criterion (as you have assigned a score meaning that is external to the test). Or you may have conducted a comparative study after the fact that allowed you to apply norms. Regardless, the style of tests in which you are operating is predictive.

And, by the way, creating this narrow sort of instrument requires real specialization and training, as the sorting function will only occur in a consistent fashion with test items that perform within a narrow set of statistical criteria, and that combine to create a specific effect. This is a far cry from a teacher building a test to understand the effectiveness of their teaching or whether students learned a lesson—that isn’t even in the same ballpark. The last thing a teacher should care about regarding learning is whether their items sort kids into a curve, while that concern is first and foremost in order for a predictive test to work.

The greatest mistake people make with a predictive test is to presume that the consistency in the results has more meaning than it does, when the fact is that the meaning is surprisingly limited.

The consistency is created by first finding average and then calculating how far from average each test taker is. Since averages are reasonably consistent over time, as is a student’s relationship to average, the results will be as well.

The usefulness in this is that a student’s position is predictive as described above, and movement can be explored for potential lessons. The resulting orderings are also useful in that they show broad patterns behind them, often regarding socioeconomics, gender, race, etc. As researchers identify these and policies and procedures are put in place, future parallel instruments can be used to understand the effectiveness of those policies and procedures by noting whether or not negative patterns dissipate.

A perfect ordering on an entire domain is simply not possible—that would result in a test that was thousands of items long. Instead, test makers locate a few items that will order students about the same as if the ordering were done on the entire domain. This makes the test a proxy for the domain, and still useful in spite of the fact that it is not a statistically representative sample of it. So long as the ordering on the limited selection of content will be roughly the same as on the entire body of content it is still useful in the hands of a thoughtful researcher who understands how the tested content was derived.

The fact that such tests are proxies for the larger domain adds another limitation to the scores: they are estimates only, with some amount of imprecision in each. That just means that while a majority of the time students taking similar tests on consecutive days will score similarly, some will not, and some will have scores that differ a great deal. Again, in the hands of a researcher who understands these limitations and that the scores are simply a broad signal for where to look for patterns and causes, these limitations don’t render the results useless. While they are limited, they can be useful so long as that use can tolerate the fact that scores are estimates based on a proxy and nothing more.

The primary confusion comes because the predictive test methodology produces reasonably consistent scores over time even though the test is based on a proxy for the entire domain. The resulting estimates (scores) are still sufficiently consistent over time to allow for researchers to find some value in them. But that doesn’t magically transform them into something they are not, opening up a world of uses beyond their design. Any use that assumed so would be silly.

Which is why the use of state test scores can rightfully be called silly. They are derived from the predictive test methodology yet are treated not as proxies, but as representative of an entire domain, worthy of teaching to and guiding learning when that cannot be the case. They are treated not as estimates useful for research, but absolutes to make judgements. And worst, they are treated as signals of quality when that was never in their design.

This last point has been particularly disastrous for schools that serve students from historically marginalized communities. It is a fact that if you order students as of a day on a domain such as literacy—whether via proxy or a more complete measure—and some aspect of society contributes heavily to students’ ability to acquire knowledge within that domain, the ordering will reflect that. But as of that moment no judgment is available to be made. Some set of students may be behind because of real failure in their efforts or those of the school, in which case remedies for failure should be available and applied. But they may just as well be behind due to a lack of opportunity. In that case a failure judgment and remedy would be wrong, even unethical, as it would be the wrong remedy.

Rather, a different remedy should be applied that addresses the issue of being behind as being behind, but not failure. Mislabeling the problem would be a huge mistake as it would create perceptions that may not be real, force actions that run counter to need, and justify historical biases. Even worse, labeling being behind as failure risks converting being behind to failure, in which case the current system of test-based accountability could be said to have been a contributing cause to the further suffering of those who can least afford it, to the detriment of our nation as a whole.

In short, every role educational policy asks predictive tests to play is outside and beyond their design, with a profound number of ill effects that come from their bad assumptions. Predictive tests cannot be used to judge quality or effectiveness, guide or drive instruction, or indicate the effectiveness of policy.

So, there you have it: predictive tests work by being predictive, but in order to be predictive they can’t be much else, and they certainly cannot be used as the primary tool in school accountability. The sooner we all realize that fact the better.

Monday, November 25, 2019

Response to a common set of questions on how best to use tests in an accountability system

I received a note the other day with an inquiry. It contained five question. I took the opportunity to craft a response I’ll share below, since I get these sorts of questions a lot.

Here were the questions:

1. How can a standards based adaptive assessment used throughout the year be one tool used for accountability purposes?

2. If an assessment covering a set range of standards is used throughout the school year, what other factors need to be considered to more effectively determine if students are reaching developmentally appropriate learning targets?

3. Content mastery and student progress on state standards measure student proficiency towards specific items. How should student work samples, portfolios, or other student level artifacts be used as an indication of a school’s ability to develop independent young adults?

4. In terms of accountability, what value is there in communities creating annual measurable goals aligned to a 5 year strategic plan and progress towards those goals being the basis of accountability?

These questions are similar to those I get almost every day from people understandably trying to fit square pegs into round holes. There are multiple layers to a response.

First, accountability over the years has become commensurate with test scores and objective data. When trying to gather information about learning, proficiency, or progress, test scores are presumed to be the best, and often the only source for answers. Even when other sources are considered, test scores tend to occupy the primary position in the conversation.

Second, coverage is now the dominant paradigm in learning. Coverage is now a common goal regarding a state’s content standards, and most other educational targets such as development, mastery, and progress are presumed to relate to the amount of content consumed. This is due almost entirely to the fact that the tests are said to cover a broad swath of content, and given that success is in those tests, success and coverage are presumed to be one and the same.

“Success” in such a system is in fact anything but, due entirely to the design of that system. Consider that tests that produce predictive results over time result in far less interpretive information then state accountability systems presume. The assumption on the part of the state is that a predictive test score is capable on its own of signaling success or failure, both of the student and the school. But that assumption belies the design. Predictive tests produce scores that indicate where a thoughtful educator or researcher may want to explore further, but they cannot contain within them the causes behind the indicator—in fact, that ability to make direct causal connections is removed during the design process in order to create the stability in the results over time.

Once a cause is understood it may be worthy of judgment, but until it is explored any judgments (whether good or bad) are premature, made without evidence. Any judgments made prior to an exploration of causes will make an organization less, not more effective, because absent an understanding of cause any change is a shot in the dark at best. If an effect does occur it will be presumed the shot in the dark actions caused it, and those actions will be repeated or discarded without understanding if they did or did not contribute to the effect.

Any accountability that fails to allow for the identification of causes prior to judgments will do this. I know of no other field with an accountability that commits such an egregious mistake, as it is a recipe for confusion and inefficiency.

And please know, what I describe above is baked into the current design of educational accountability so that the questions you pose are common. Underlying each question is a deep desire for effective teaching and deep learning and preparing children for their lives, as well as the need to build long term sustainable solutions. But that isn’t what the current system was designed to do, which is where the misfit comes.

The best way to see this is to recognize that there are two sides to accountability. The first is easily understood if an audience is asked to list all of the terms they associate with accountability. Most will offer up things such as responsibility, transparency, effectiveness, outcomes, and success against mission. These are all positive and any effective leader includes all of them in their leadership practice.

But there is another side to accountability that we do to organizations that refuse to be accountable. In this case accountability is imposed upon these organizations. When it is necessary to impose an accountability, the positive terms are presumed to be absent and it becomes necessary to hold people to account, to motivate through blame and shame, to test claims due to mistrust, and to inflict punishment or sanctions when compliance does not occur.

The objectives in these two accountabilities are different. In the first the goal is a long-term sustainable effect. In the latter the goal is failure-avoidance. If the goal is failure avoidance there isn’t time to think about long-term sustainable effects as you aren’t yet there. First you need to prove you can avoid failure, then you can think about doing great things.

This is why in every case other than education, imposed accountabilities are temporary, meant to resolve a crisis in the short term so the organization can get back to long-term sustainable thinking. It would be folly to think that an imposed accountability can focus on long-term excellence as that is not in its purpose nor in its design.

It is this difference that defines the tension in the questions you propose. Those questions each contain the desire for a long-term effectiveness, and yet they are being asked from within an imposed accountability environment designed to promote failure avoidance (the coverage paradigm of our current standards environment is a perfect example, as it is about control in support of failure avoidance, not long-term excellence). Our policies use language that aligns with long-term effectiveness while imposing a system designed as a short-term response to failure.

All of which is exacerbated by the selection of a predictive testing methodology they assume can do things for which it was never designed, most notably signal on its own the success of a school or the quality of a student’s performance without actually knowing the cause.

With that as a context let me now start to address the questions you pose a bit more directly.

Any test score, be it a predictive test score with its underlying psychometrics, or a classroom quiz, is a form of evidence. But in order to serve as evidence for a thing you must first have a sense of what that thing is. Evidence is necessary to answer critical questions such as: who is learning? What are they learning? Who is not learning? What is preventing learning from happening?

None of these on their own are answerable through a single evidentiary source, and each question requires sources other than test data to create a sufficient understanding regarding what to do next. Any action that attempts to treat any data source as absolute risks a decision based on incomplete evidence, which makes the decision invalid, even if by luck it happens to be the right decision. In any case it makes the organization less, not more effective, by creating dissonance between the effects that can be observed and their causes. This in turn risks promoting the wrong causes for observed effects, which is never a good thing.

Finally, accountability in effective organizations occurs at a level both the technical experts within a profession and amateurs outside it can both relate to and understand. Think about a visit to the doctor and you'll understand what I mean. Those of us who are not doctors can stare at a battery of test results for hours and still not understand what they mean. We may go on WebMD and attempt to view each indicator in isolation, but a meaningful interpretation requires a doctor with a much broader and deeper understanding then those of us who have not been through medical school possess.

The doctor does not start by taking us one by one through each of the dozens of tests, but rather, at a level we can both relate to as an amateur and a professional: the relative health of the patient. From there, the doctor can take a patient into the weeds for a deeper conversation where technical understanding is required, but through a lens appropriate to those of us without medical training.

The same is true for any profession that requires technical understanding: engineering, mechanics, computer programming, education, etc. In each of these there exists a level at which professionals and amateurs can have meaningful conversations about the work, and it is at that level that organizational accountability must occur.

It would be difficult, if not impossible, for outsiders to engage in a meaningful way with the technical part of an organization. The nature of technical information is such that the further into it you go the more likely you are to identify contradictions, counterintuitive thinking, and a lack of absolutes, which requires a technical understanding to work through and still be effective. Someone without that technical understanding is at risk of seeing the contradictions, counterintuitive thinking, and lack of absolutes as negative, as evidence of something other than what they had hoped to see.

It would be na├»ve to think that the non-technical person could dictate a response based on their limited understanding that would be meaningful which is why it isn't done—it would make the organization less, not more effective. I don’t argue with an engineer over how far his or her beams can be cantilevered over an open space, but rather start at a point we both understand—what I want the building to look like—and let the professional then do their job.

Test scores represent technical information, especially predictive test scores with their psychometric underpinnings. As such they require technicians to interpret them properly given that those interpretations will often run counter to what an untrained eye might see. For example, an untrained eye may equate a low test score with failure and insist a school act accordingly. But a technician who understands such scores would first look to causes and other evidence before arriving at any conclusion.

It may be that the evidence suggests some amount of genuine failure exists, in which case the remedies for overcoming failure should be applied. But it may also be the case that the evidence suggests the student is simply behind his or her peers given that their exposure to academic content outside of schools is limited. In that case the remedies for being behind should be applied, which are very different then the remedies for failure. To apply the wrong remedy would make the school and the student less, not more effective.

Starting with test scores as the basis for any accountability absent a technical interpretative lens creates this very risk. Test scores, contrary to popular opinion, are not simple to understand, do not produce immediately actionable results, and should not be interpreted bluntly. They are always in the weeds of an organization, part of the technical environment in which professionals work. While we should never be afraid of sharing them broadly, it is imperative that we take our outside stakeholders into them through an interpretive lens appropriate to both the amateur and the professional. The failure to do this will result in misunderstandings and frustration on all parts.

The answer to all four questions that started this off is this: educational decisions require a rich evidentiary environment that goes well beyond traditional data sources to understand the educational progress of a child. Tests can certainly be a part of that evidentiary environment, and better tests and assessments are useful in that regard and we should encourage their production. But better tests or better assessment vehicles do not solve the accountability problem.

That problem is only solved once we can shift from an imposed accountability focused on failure avoidance to a true accountability focused on long-term sustained excellence. Continuing to treat testing as our primary accountability source mires us in the technical weeds and as a result is highly likely to create misunderstandings regarding school effectiveness.

My advice: ask the right questions, treat test scores as one evidentiary source but never the only evidentiary source, question the interpretations alongside other professionals so that the best possible conclusions can be reached, and define success in any long-term plan by answering the question: what is it we hope to accomplish? rather than: what should we measure? That latter question will tie you up in knots as what is measurable empirically represents only a small percentage of what matters in a child's life and to a school.

Evidence is the proper term, as we can gather evidence on anything we need to accomplish so long as we can observe it. Focus at that level and you’ll arrive at meaningful answers to each of your questions.

Best,
John