Posts

Showing posts from May, 2018

Why state tests make lousy instructional tools

Items are selected for tests according to the test’s purpose. If a teacher is building a unit test from scratch he/she will build items that reflect what they needed students to learn, and the expectation would be that most of the students would answer the majority of them at least partially correct if they paid attention at all: it would be rare that a student who was at least partially present would score a zero. We say that those items signal the learned/not learned moment. The statistics behind those items are not particularly important—an important item that all students answered correctly would signal good teaching and learning, while one they all answered incorrectly would signal the opposite. The point here is to assess learning . Researchers interested in analyzing people have long known that if you can order human beings on a human trait or characteristic you can detect patterns to explore. In the case of negative patterns, such as an ordering that shows women generally mak...

A testing glitch on STAAR--and a response to a question

Texas experienced some computer glitches today administering the state testing program (the system shut down for a little more than an hour). A superintendent friend wrote me a note asking about the potential impact on the reliability of the results. I'm posting below what I wrote to him. -------------- Reliability refers to (among other things) the kids doing about the same on parallel tests—but that assumes similar conditions for each test. The conditions for this test compared to another would be different given the interruption—therefore it is reasonable to suspect that the results would differ. For example, kids who tried hard before the break may think that the grownups don’t care enough to create a system that works and not take the part after the break seriously—you could see that by comparing scores before and after to see if effort decreased. If it did, then that definitely affects the reliability of the scores, since they would very likely perform differently on a ...