In my previous post, “Playing TLEopoly,” I discussed some of the silliness surrounding the current implementation of the Oklahoma Teacher and Leader Effectiveness system.
The theory of action behind the TLE model of supervision and evaluation is that the process will improve teacher effectiveness, which in turn will boost student achievement. This assumption seems logical, but what if the theory is wrong?
This year, as we begin to fully contemplate the enormous administrative requirements to fully comply with the multiple qualitative and quantitative components of TLE, many of you are starting to think out loud that this process makes very little sense. And you would be correct. Current practices will not make sense if you assume that the purpose of the evaluations is to actually evaluate teachers accurately and effectively.
A good evaluation system should give teachers clear and useful feedback—a snapshot of what they do well, and a plan for what and how they can improve. An effective system also provides principals and district leaders with insight on their school’s strengths and weaknesses. The national trend in teacher evaluations is not interested in either of these.
National reformers and accountabullies have already decided that our schools are failing, and that we are failing because we are filled to the rafters with terrible, horrible, no good, very bad teachers. For that reason, we really don’t need to create evaluations to answer the question, “How are we doing?” Reformers have already made up their mind on this question. We’re failing. Therefore, what THEY need is an evaluation system that “confirms” what they already know.
The sheer deluge of new acronyms and terms brought to shore in the flotsam of TLE would be laughable if the stakes were not so high. Other Academic Measures (OAMs), Student Learning Objectives (SLOs), Student Outcome Objectives (SOOs), Value Added Models (VAMs), teachers of tested subjects, teachers of non-tested subjects, linkages, growth models—it’s all enough to make your head spin and your stomach turn.
But, back to the original question: Will any of these things actually improve teacher effectiveness?
I have written earlier about the fallacy of value added models (“Why VAM Must Die“), but will include a quick review here. VAMs are a statistical tool that have now been as discredited as Janet Barresi’s political astuteness. But some folks still insist that if we take very narrow standardized test results and run them through an incoherent number-crunching formula, the numbers we end up with represent useful objective data. Why? Well…they’re numbers.
But, they don’t. We start with standardized tests, which are not objective, and filter them through various inaccurate variable-adjusting programs (which are not objective), which leaves us with a number—a number that is little more than crap.
It’s not just me that feels this way. Even smart mathy people tend to agree. On April 8th, The American Statistical Association (ASA) issued a cautionary statement about the use of VAMs for assessing teacher effectiveness. In the executive summary, the ASA makes the following recommendations:
VAMs are generally based on standardized test scores, and do not directly measure potential teacher contributions toward other student outcomes.
VAMs typically measure correlation, not causation: Effects – positive or negative – attributed to a teacher may actually be caused by other factors that are not captured in the model.
Most VAM studies find that teachers account for about 1% to 14% of the variability in test scores, and that the majority of opportunities for quality improvement are found in the system-level conditions. Ranking teachers by their VAM scores can have unintended consequences that reduce quality.
Doesn’t that make you feel all warm and fuzzy about using VAM to make high stakes decisions about teachers? Remember that these VAMs will count for 35% of next year’s evaluations for Oklahoma teachers of tested subjects, mostly math and reading teachers.
So, what about these “new” Student Learning Objectives (SLOs) that school districts are currently being trained on by our own state department? This is the measure that the majority of teachers of non-tested subjects will be forced to swallow. Here is the “evidence” of SLO effectiveness shared in the OSDE’s own presentation at this summer’s Vision 2020 Conference:
Some positive correlations have been found between the quality of SLOs and student achievement and between the number of objectives met by teachers and student achievement, but mixed results point to a need for more research (Austin Independent School District, 2010; Community Training and Assistance Center, 2013).
Wow! If that’s not a powerful endorsement worthy of an investment of hundreds of thousands of dollars and untold hours in implementation, I don’t know what is! Seriously, is this meant as a joke? I can substitute almost anything for “quality of SLOs” and keep this an accurate statement. How about “phase of the moon,” “day of the week,” or “Number of Lexus’s in the parent pickup lane?” I submit the latter is probably the most accurate indicator in the group.
Teacher evaluation systems only work if the system actually measures what it purports to measure. The current “new” systems in place across the country, including TLE, do not do this. Linkage to student data is spectacularly weak. We start with tests that claim to measure the full breadth and quality of students’ education. They do not. Then we attempt to create a link between those test results and teacher effectiveness, and that simply hasn’t happened yet. VAM was supposed to make us fall in love with our standardized test data, but turning small amounts of bad data into heaping piles of bad data is not particularly helpful.
Again, we must remember that the goal of this system is not accuracy. Don’t be fooled. As we can observe in other states farther down this road to oblivion, this type of quantitative, test-based data is simply being used to “prove” the existence of a vast throbbing pool of teacher awfulness. When administrator’s evaluations don’t match the data spit out by the machines, the administrator is labeled lazy or ineffective. Because the state controls the development of the tests and the setting of cut scores, they have the capacity to work this data any way they wish.
I originally said this would be a three-part series. Thinking about it now, it might be four, maybe five. Whatever it takes to get the message out to as many as I can.
In closing, here are a few things I know to be true:
1. You do not retain and recruit great teachers by making their continued pay and employment dependent on an evaluation system that is no more reliable than a monkey with a random number generator.
2. No evaluation system will ever be administrator-proof. If your principal hates you and is out to get you, no system in the world can keep him from finding a way to game your evaluation to hurt you. On the other hand, If she’s a reputable and respectful person who is trying to do the best for her people, no system can keep her from doing so.
3. Attempting to provide more oversight will actually reduce effectiveness, because more oversight = more paperwork, and more paperwork = less time. This shifts the job from providing meaningful feedback and support to teachers to “filling out the forms correctly and on time.” Most administrators are pretty good at this.
4. The Tulsa Model, which comprises the 50% qualitative component of most teachers’ evaluations, is not objective. It is a subjective list of twenty indicators of teacher behavior. This list is itself a reflection of the bias of the people who made it, and the observer’s own biases will affect what he does or doesn’t see during any observation. There is no such thing as an objective measure of teacher quality. It does not exist. It has never existed. It will never exist. To present a system and claim that it is objective is in and of itself a demonstration of subjective biases about teaching.
To illustrate, why does the Tulsa Model use twenty indicators instead of 23 or 31? Simple, it was a subjective decision designed to keep the math easy. Why is the indicator “student relations” equal in weight to “closure?” Again, even though we all know forming positive relations with students is critical to success as a teacher, it gains no more value than summing up the daily objectives. And, why do a disproportionate number of teachers in many districts, including Tulsa, earn an even 3.0 on their evaluation. This is due to the fact that awarding any number other than a three for any indicator on the rubric requires additional documentation. Giving threes is easy and takes little time.
The worse aspect of the rubric is it makes no provision for teachers of widely varied subjects. A first grade teacher is evaluated with exactly the same rubric as a middle school PE teacher, a high school forensics teacher, an AP chemistry teacher, or an in-house supervisor. This would be akin to football scouts evaluating all players for the NFL using the same rubric. One size does NOT fit all!
The current evaluation model being perpetrated on Oklahoma teachers is based on a flawed theory of action and will do little to change what teachers do in their classrooms during the 99.7% of the time the principal is not there. What will drive high student achievement in the future is teacher teams working collaboratively toward common curriculum expectations and using formative assessments to continuously improve teaching and address students who are not successful.
To achieve this, we must create a system that supports this foundation. There needs to be a shift away from a process owned exclusively by school administrators centered on the inherently time-consuming evaluation of individual teacher lessons, to a more dynamic, informal process owned by teacher leaders and teams.
For now, you’ll have to wait and see what comes ashore with the next wave of flotsum!