Saturday, April 1, 2017

Inactive Data

So about that actionable data...

One of the frequently-offered reasons for the Big Standardized Tests is that they are supposed to provide information that will allow classroom teachers to "inform instruction," to tweak our instruction to better prepare for the test better educate our students. Let me show you what that really means in Pennsylvania.

Our BS Tests are called the Keystones (we're the Keystone State-- get it?). They are not a state requirement yet-- the legislature has blinked a couple of times now and kicked that can down the road. Because these tests are norm-referenced aka graded on a curve, using them as a graduation requirement is guaranteed to result in the denial of diplomas for some huge number of Pennsylvania students. However, many local districts like my own, make them a local graduation requirement in anticipation of the day when the legislature has the nerve to pull the trigger (right now 2019 is the year it all happens). The big difference with a local requirement is that we can offer an alternative assessment; our students who never pass the Keystones must complete the Binder of Doom-- a huge collection of exercises and assessment activities that allow them to demonstrate mastery.  It's no fun, but it beats not getting a diploma because you passed all your classes but failed on bad standardized test.

Why do local districts attach stakes to the Keystones? Because our school rating and our individual teacher ratings depend upon those test results.

So it is with a combination of curiosity and professional concern that I try to find real, actionable data in the Keystone results, to see if there are things I can do, compromises I can make, even insights I can glean from breaking that data down.

The short answer is no. Let me walk you through the long answer. (We're just going to stick to the ELA results here).

The results come back to the schools from the state in the form of an enormous Excel document. It has as many lines as there are students who took the test, and the column designations go from A to FB. They come with a key to identify what each column includes; to create a document that you can easily read requires a lot of column hiding (the columns with the answer to "Did this student pass the test" are BP, BQ and BR.

Many of the columns are administrivia-- did this student use braille, did the student use paper or computer, that sort of thing. But buried in the columns are raw scores and administrative scores for each section of the test. There are two "modules" and each "module" includes two anchor standards segments. The Key gives a explanation of these:

I can also see raw scores broken down by multiple choice questions and "constructed" answers. The constructed answers can get a score of 1 through 10.

Annnnnnnnd that's it.

You might think that a good next step would be to look at student results broken down by questions with those questions tagged to the particular sub-standard they purport to measure. That's not happening. In fact, not only are these assessment anchors not broken down, but if you go to the listing of Pennsylvania Core Standards (because we are one of those states that totally ditched renamed Common Core), you will see that L.F.1 etc only sort of correspond to specific Core Standards. 

You might also think that being able to see exactly what questions the students got wrong would allow me to zero in on what I need to teach more carefully or directly, but of course, I am forbidden to so much as look at any questions from the test, and if I accidentally see one, I should scrub it from my memory. Protecting the proprietary materials of the test manufacturer is more important than giving me the chance to get detailed and potentially useful student data from the results.

You'll also note that "reading for meaning" is assessed based on no more than six or seven questions (I don't know for a fact that it's one point per question, but the numbers seem about right based on student reports of test length-- not that I've ever looked at a copy of the test myself, because that would be a Terrible Ethical Violation).

So that's it. That's my actionable data. I know that my students got a score by answering some questions covering one of four broad goals. I don't know anything about those questions, and I don't know anything about my students' answers. I can compare how they do on fiction vs. non-fiction, and for what it's worth, only a small percentage shows a significant gap between the two scores. I can see if students who do well in my class do poorly on the test, or vice-versa. I can compare the results to our test prep test results and see if our test prep test is telling us anything useful (spoiler alert-- it is not).

But if you are imagining that I look at test results and glean insights like "Man, my students need more work on interpreting character development through use of symbolism or imagery" or "Wow, here's a list of terms I need to cover more carefully" or "They're just now seeing how form follows function in non-fiction writing structures"-- well, that's not happening.

In the world of actionable data, the Keystones, like most of the Big Standardized Tests, are just a big fat couch potato, following a design that suggests their primary purpose is to make money for the test manufacturing company. Go ahead and make your other arguments for why we need to subject students to this annual folly, but don't use my teaching as one of your excuses, because the BS Test doesn't help me one bit.


  1. I was thinking, it's sort of like you're a mechanic, and all you have to help you make a diagnosis is that the engine check light is on, but you don't have the diagnostic check that tells you cylinder six is misfiring or the oxygen sensor doesn't work.

  2. Same in Florida, but it's an even bigger joke in mathematics. I can look at my current students' three results in Algebra 1 strands, but what has that to do with Geometry?

  3. I met up with school data early in my 1st year of teaching--in NY City. I was required to enter ethnicity of every student on my rolls. When I pointed out that some students had never appeared, I was told to make "an educated guess."

    Since Lance O'Brien, a student who attended every day, was African-American, I wondered how to go about such guesses based on student names.

  4. Not to mention that you there is NO information about what any given student has "learned." But wow! You can COMPARE how this year's students did compared to last year's students...but of course, the questions probably changed, so that's not true either. Inform instruction? Yeah, right!

  5. I remember how frustrating this was when I taught in PA, though it was the biology Keystone that I had to use to "inform" my teaching. It was utterly useless but we were still required to go through the charade of "data analysis" every year to prove that we were doing our jobs based on "real data". Complete nonsense and the biggest waste of time I've ever had to pursue as a teacher (and that's saying something.)

  6. EVERY teacher who has ever tried to find something they can use from these tests could tell their own version of your story, Peter.

    My state (WI) used to require this test called the Wisconsin Knowledge and Concepts Test. Same damn thing. It was a norm-referenced test that we were supposed to use for a graduation requirement, which meant that if our kids were average, a whole pile of them would have to not graduate, without an alternative demonstration of proficiency (like your binder of doom).

    One year our administration finally got frustrated with our lousy scores and came down really heavily on my science department to do something to get the scores up. So we totally violated the ethics of BS testing and actually copied tests and sat down to analyze them. What we found was that the tests were science factoid trivia contests. They weren't testing our kids' knowledge of concepts or ability to think scientifically at all. There was no possible way to prepare them for that. So even if you do cheat the system, it doesn't help.

  7. The tests are a fraud. We all know the tests are a fraud. That's why teachers and parents are not allowed to see the test items. It's all about ripping off tax money and has nothing whatever to do with useful teaching or accountability. Our teacher-made tests and diagnostics are readily available, useful and part of the teaching profession. Let teachers do their jobs!!

  8. New York State CC ELA tests continue to use the objective MC format to ask highly subjective questions that purport to test for reading comprehension. The MC items in the test recently administered were a joke. In my opinion, as a former consultant test writer (science), most of the 2017 MC test items were neither valid or reliable.