Tabitha

Tabitha By Tom

In my house we have two television sets. There’s a small one in the family room. Its sole purpose is to broadcast baseball games. I control that set. Then there’s the other one. The big one in the living room. My wife controls that set.

So it was when I walked into the living room the other day to eat my lunch. My wife was watching a show called Tabitha’s Salon Takeover. Although there had to be Something Better On, I was doomed to watch it. It’s a reality show featuring a famous hairdresser (Tabitha) who goes around the country fixing up failing beauty salons.

Let me explain at this point that I have absolutely no interest in hair. My personal regimen entails a daily shampooing and a weekly, self-administered trim, using a set of $17.00 clippers set to the shortest length. I treat my hair to no more attention than I give my toenails.

As you can surmise, I’m probably not among the target audience for Tabitha’s show. Nevertheless, I found myself captivated.

You see, it was really not about hair. It was about analyzing a dysfunctional system and fixing it.


I watched Tabitha streamline the check-in process by introducing a new software platform and adjusting the time allotment for each service. That part was somewhat interesting.

But then she turned her attention to the employees. And that’s when the action started. She noticed that the guy charged with cleaning the floor was lazy. She noticed that the manager wasn’t really supervising the stylists. She noticed that one young man, Drew, wasn’t shampooing correctly, which I didn’t think was even possible. When she told Drew what he was doing wrong, he cried and ran into the back room.

She pointed out these flaws to the owner, telling him that he needed to decide whether he wanted his business to take off and fly or if he wanted it to remain a mediocre place where second-tier hairdressers get their start.

By this point I was riveted. This might work in schools. We could get our own Tabithas to come in, walk through the schools and determine the degree to which teachers are using best practices.

As far as I know, there is no reliable data on haircut quality. It is, however, entirely possible to watch someone cut a client’s hair and see if he’s doing it right.

Likewise, it’s entirely possible to walk into a school, spend time watching individual teachers teach and tell whether or not they’re using best practices. In fact, it seems like someone at the district level should already be doing that; going from school to school, making thorough, in-depth observations and reporting back to the building principal.

They could take it a step further. They could film the teachers who were effective and show those lessons to the teachers who were less so. They could set up workshops for the teachers who seem to lack certain skills, taught by teachers who are strong in those skills.

And if certain teachers prove to be both untrained and un-trainable (like Drew) they should be dismissed. Let go. Fired.

Teachers might not like this. They might not trust the observers. They might not trust the whole process.

Fair enough. But given the choice, I’d much rather be evaluated by an objective person, using a set of commonly accepted best practices, than to be judged based on my students’ test scores. I can control what I do in my classroom. I can decide whether or not to begin today’s math lesson with a review of yesterday’s lesson. I can decide whether or not to address the behavior of the kid in the back who’s off-task. I can decide whether and how to use formative assessments and how to report feedback to students and parents. That stuff is completely under my control. Student test scores aren’t.

Test results are random, but teachers' classroom performance can be controlled. By teachers. We have research that identifies best education practices. We should be held accountable for using them. And there should be an objective way of determining whether we are or aren’t.

There should be a Tabitha.

 

9 thoughts on “Tabitha

  1. Tom

    Great discussion!
    A few comments:
    1. Getting back to Tabitha, what makes that style of evaluation powerful is that it focuses holistically on the entire establishment. Each stylist was observed, but the focus of the evaluation was on the entire salon and the role each person played in its mission. That seems to be lacking in our evaluation system, at least where I work. And Jason, if VAM can achieve that, bring it on.
    2. The focus of any evaluation system needs to be on improvement, not termination. As much as we try to deny it, ours is a profession that loathes firing one another.
    3. Student tests in Washington were designed to test students. They do a fair job of that. They’ve since found a side job as teacher tests. They don’t do so well at that.

  2. Kristin

    Wow – it took me a long time to get through the comments. Great challenges on both sides.
    I love the idea of a Tabitha striding around my building and whipping everyone into shape.
    In fact, I kind of have one. She’s our executive director, and she’s my principal’s boss. Her background is in charter schools, and whatever else you’ll say against them, the ones that perform well have teachers who do not make excuses, who believe every child can learn and succeed, and who are willing to work really, really hard.
    Since she’s been popping in and out of classrooms, my principal in her wake, things have started to change. I believe my principal is actually beginning the difficult process of firing some terrible teachers. He’s starting to step up and do his job. We’re actually having staff meetings, and he’s actually telling us to teach well.
    Putting pressure on the administration. Walking into classrooms, taking copious notes on what teachers are doing, and starting to put some pressure on classroom instruction. What a concept.
    And I fully believe it will raise our test scores.

  3. Jen R

    Tom-
    Your blog post took me back to my days in a Reading First school, where we had a consultant come through a few times a year and walk through classrooms and provide structured feedback for teachers. As a teacher, honestly I actually looked forward to the feedback. I had no relationship with this person, so her feedback was completely honest. I was a good teacher. But I wanted to be better. Her feedback pushed my thinking and improved my teaching.
    I agree that we as teachers need “real” feedback that will impact our teaching and student learning. Some teachers may not be ready for it, but our kids need it.
    Thanks for your post.

  4. Mark

    I don’t think everyone on this blog is opposed to quantitative assessment. What many of us are opposed to is the manner which that quantitative assessment is presently being employed… it lacks all the nuance that you describe and thus cannot honestly be an “impact” assessment as you describe because there is no baseline established within the teacher’s classroom. In my case, the only “baseline” score would be a child’s 7th grade reading and writing scores to compare with their 10th grade reading and writing scores (when they are in my class). A heck of a lot can (or might not) happen between 7th and 10th grade that is unrelated to the impact I have on the child’s learning from September to March. Specifically, the current test model it lacks the value-added piece. Currently, in the state of Washington, all assessments are single snapshot assessments, not impact-over-time value-added style assessments.
    I have no problem if my performance is partly assessed by my students’ test data, provided that there is a context for this test data and the opportunity for me to show longitudinal growth (on my students’ part) rather than a single test in March.

  5. Jason

    “Jason, I have a hard time understanding your position (whether you are opposed to observer-based assessment), but I certainly see that any time anyone criticizes testing, you are a willing advocate for the pro-testing side…”
    I am strongly pro-observation. I’ve never met anyone who isn’t. I advocate for testing here mostly because the group that writes on this webpage is practically hostile to quantitative measurement. I think we need both quantitative and qualitative, process and outcome information in evaluation and improvement plans.
    “it is good to hear from both sides, as I lean toward the anti-test side of the fence. However, you make these two criticisms of the observer-based assessment of teacher effectiveness:
    “Different material may excite one set of students but put another set to sleep by little fault of your own. … The quality of your student’s lives outside of your classroom will affect their behavior within the classroom on any given day.”
    Couldn’t the same be said about testing as a measure of teacher effectiveness? Probably every teacher in a tested discipline has a story of a kid who showed up the day of the test sick, on an empty stomach, or having just had some trauma in their lives.”
    Absolutely. The point I was making is that one of the most commonly pointed out flaws with “testing” is also a flaw of observations. This is all a component of measurement error (which, by the way, is quantifiable on tests).
    “The other issue I see with testing is that it actually doesn’t assess what I did with my students. Here’s what I mean: the Reading and Writing tests are skills tests, not content tests. They assess my students’ ability to read a passage and answer questions and to write two short essays. These are not skills exclusive to the English classroom–History, Science, Art, even Math all require students to read and write.”
    I’m firmly convinced reading (beyond the earliest years… but I’m starting to convinced even then this applies) cannot be described as a discrete skill which is transferable. So you won’t hear me advocate for the type of tests many states have when it comes to ELA. They measure something, and they do measure something which can and should happen across multiple disciplines, but I’d be the first to say we need far better tests for ELA.
    “There are some students to enter my classroom good writers because their SCIENCE teacher demanded high level thinking in their writing. Should I be assessed as a better teacher than I am because another previous teacher had high standards?”
    If it was a previous teacher, we can control for that. If it’s concurrent, it’s much harder. There is no doubt that the “teacher of record” problem is real. I don’t have a good answer for this one– not many do. Other than hoping and advocating that curriculum should inform our course structure which should inform our assessments, I think this is a complex issue. That’s why in a fair environment a portion of writing should be something a science teacher is held accountable for. Goes both ways.
    “What about those kids who were tremendously well prepared by a solid 9th grade English teacher, only to get to lazy me who assigns one paragraph every semester and shows films all day–and then the kids do well on the test because of what they did last year not what they learned from me, the awful teacher who ends up with a stellar group of kids (honors, AP, etc.) who excel in spite of, not because of, their teacher. This is more common than people want to believe… I’ve seen it first hand in at least two situations.”
    These are some of the more trivial and less complex things to control for with value-added. Prior achievement, and even prior growth, can be accommodated so that we have a priori expectations of where students at their level with their achievement level typically end up at the end of the year. This is one reason why value-added is far more complex and far more fair than simply taking the difference of year 2 and year 1.
    “Testing is ONE measure, I agree. However, I will never be satisfied with a teacher-evaluation system which does not include frequent, deep observation of actual practice.”
    Agreed.
    “Test results may be all that the powers care about,”
    Strongly disagree.
    “but the reality is that these scores are a shallow an ineffective measure of actual teacher effectiveness. Let’s all think back to our assessment design classes from grad school: the assessment should only be credited for measuring what it is intended to measure. If the test is to assess a student’s learning at that snapshot point in their academic career…then that is ALL it should be assessing.”
    1) I made the point earlier we need to structure our assessments better to monitor growth, so we have agreement there.
    2) You’re confusing assessment and impact evaluation. It’s common that these two bodies of work are compared but they are different. This is not the equivalent of asking a “double barreled” question. Student learning is the desired outcome and that’s what the test is measuring. Using student learning as an outcome in impact evaluation is not changing the use of the test– it’s still solely being used to measure student outcomes. We’re just able to then start to parse out the mechanics of what leads to those outcomes.
    “Those assessments were not designed to measure teacher effectiveness… that would be like measuring doctor effectiveness by tracking my weight gain, despite what my doc keeps telling me to do… that scale isn’t measuring my doc’s effectiveness, it is measuring my weight and my weight only.”
    I hear this complaint all the time and I don’t understand it at all from a technical standpoint. I would say that it’s far more like assessing your personal trainer based upon his ability to lower the BMI of 30-150 of his clients (depending on the “level”) as compared to several tens of thousands of other personal trainers who worked with their clients for the same amount of time, while taking into account many factors that their clients bring with them (income, prior BMI, prior changes in BMI, gender, race, etc).
    BMI is not designed to measure a personal trainer’s success specifically, but it is one of the most important and comprehensive, snap shot measures of overall health and should be the top goal of nearly everyone who goes to a personal trainer looking to increase health. An outcome does not have to be designed specifically for the a posteriori impact evaluation.
    In fact, to suggest that’s the case is somewhat bonkers to me.
    Where I am consistent with your belief is that if we’re gonna start using BMI for important decisions, we may want to use a better method for determining BMI. Right now we may use those electrical current, handheld devices. A full water tank is too expensive and impractical on a wide scale and the increase in precision isn’t worth it compare with implementing caliper tests which are much more precise and relatively inexpensive.

  6. Mark

    Jason, I have a hard time understanding your position (whether you are opposed to observer-based assessment), but I certainly see that any time anyone criticizes testing, you are a willing advocate for the pro-testing side… it is good to hear from both sides, as I lean toward the anti-test side of the fence. However, you make these two criticisms of the observer-based assessment of teacher effectiveness:
    “Different material may excite one set of students but put another set to sleep by little fault of your own. … The quality of your student’s lives outside of your classroom will affect their behavior within the classroom on any given day.”
    Couldn’t the same be said about testing as a measure of teacher effectiveness? Probably every teacher in a tested discipline has a story of a kid who showed up the day of the test sick, on an empty stomach, or having just had some trauma in their lives.
    The other issue I see with testing is that it actually doesn’t assess what I did with my students. Here’s what I mean: the Reading and Writing tests are skills tests, not content tests. They assess my students’ ability to read a passage and answer questions and to write two short essays. These are not skills exclusive to the English classroom–History, Science, Art, even Math all require students to read and write. There are some students to enter my classroom good writers because their SCIENCE teacher demanded high level thinking in their writing. Should I be assessed as a better teacher than I am because another previous teacher had high standards?
    What about those kids who were tremendously well prepared by a solid 9th grade English teacher, only to get to lazy me who assigns one paragraph every semester and shows films all day–and then the kids do well on the test because of what they did last year not what they learned from me, the awful teacher who ends up with a stellar group of kids (honors, AP, etc.) who excel in spite of, not because of, their teacher. This is more common than people want to believe… I’ve seen it first hand in at least two situations.
    Testing is ONE measure, I agree. However, I will never be satisfied with a teacher-evaluation system which does not include frequent, deep observation of actual practice. Test results may be all that the powers care about, but the reality is that these scores are a shallow an ineffective measure of actual teacher effectiveness. Let’s all think back to our assessment design classes from grad school: the assessment should only be credited for measuring what it is intended to measure. If the test is to assess a student’s learning at that snapshot point in their academic career…then that is ALL it should be assessing. Those assessments were not designed to measure teacher effectiveness… that would be like measuring doctor effectiveness by tracking my weight gain, despite what my doc keeps telling me to do… that scale isn’t measuring my doc’s effectiveness, it is measuring my weight and my weight only.

  7. Jason

    Of course lots of people can’t use test data because there is no test data– why even make that argument?
    Of course if you teach 3rd grade and that’s the first test you don’t fall under a group for which they could currently assign a value-added score.
    If your value-added measure was so insensitive to clear differences in the baseline ability of your classes changing over time that your value-added fluctuated for that reason alone, then you face a technical probably of a poorly specified VAM. You know this because you admit that your own class has no baseline and you would never b assigned a value-added. So why make the case that you can’t control for prior ability when its clearly possible?
    The results are not random– if the teacher matters at all then you are responsible for some component of how much your students learn each year.
    There is a difference between containing error and randomness. Right now, for many folks, value-added has a low signal to noise ratio. It takes too many years to get data which is clearly signal and not noise. This is a very common problem, but it is also one with many solutions.
    One of the best solutions is to use additional data which provides answer to the same question through a completely different means. The observations your talking about is one of the major ways we do this. Observations themselves are not occurring under controlled conditions. Your more advanced class of students also adjusts how you teach (if you’re a good teacher, and I suspect you are) just how it adjusts the test results that come back at the end of the year. Different days affect what techniques you use. Different material may excite one set of students but put another set to sleep by little fault of your own. Different observers will see different things in your classroom. The quality of your student’s lives outside of your classroom will affect their behavior within the classroom on any given day.
    While you may feel more in control of what you do than the results you get, nearly everything that effects your outcomes will undoubtedly affect your process. The beauty of observations is they are more desirable to repeat than testing for value-added (at least right now). But the beauty of both is that they can reinforce each other so that the signal to noise ratio is increased!
    There are lots of issues with value-add, not the least of which is its not widely applicable to all teachers. Some of these problems can be worked through by using tests that are designed a lot better than the current ones for the purpose of evaluating growth (one of the major problems as I see it). Another is through continued research to refine and better specify value-added modeling. The use of multiple assessments that already exist within the structure of the school can added signal and downplay noise. A meaningful system of observations can add quite a bit of information to the picture and should be at the center of any strong evaluation system. And of course, the best technique is to use multiple years of all these kinds of data. That’s one of the reasons I can’t imagine giving teacher tenure (if it should even exist) in fewer than 5 years. Even without a formalized data structure with stuff like value-added, I find it exceedingly impossible to believe that you can tell if a teacher deserves lifetime appointment based on two or three years of success or failure.
    One final thing to gnaw on: Great Clips is in business because they successfully cut hair at a level perceived to be valued at least the amount they charge. If that wasn’t the case, they’d be closed. But of course, I’ll remember to strongly caution myself whenever using someone who is up against a very different set of market-pressures than a public good. If my parents could pay fewer taxes than they did for schooling by choosing somewhere that did the same or similar job for less money they probably would have. I suspect some folks would still pay lots for boutique services for various reasons– just look at how many private schools there are in a place like Connecticut where rich white kids do amazingly well in public schools but folks will still increase their school bill 4x to go somewhere to get a product that’s 90% the same (maybe even 95%).

  8. Tom

    Let me clarify:
    I would define “best practices” as those teacher behaviors which, when tested under controlled conditions, result in increased learning.
    When I teach, conditions are not controlled. The third graders I have this year are far more advanced than last year’s class. Two years ago, my class was far higher. If I was measured by student test scores, my evaluation would have fluctuated wildly, although my teaching performance, what I actually did in the classroom stayed about the same. Results, Jason, are random.
    And don’t give me “valued-added.” In Washington, we don’t start testing until third grade. My students’ test scores are the baseline.
    Another reason why test scores are a lame way to measure teacher performance is the fact that over half of the teachers don’t even teach students or subjects that get tested. Yet they could easily be measured by the extent to which they used those practices that were shown by research to yield elevated student learning.
    Something else to gnaw on: my wife recently took the kids to Great Clips, the cheapest, walk-in hair salon we have out here. Their hair looked just as good, if not better, than the $30 fancy-schmancy, over-booked salon they went to last time. Their hair looked good because it was cut correctly.

  9. Jason

    You do realize that pretty much everything you mentioned is involved in pretty much every serious incarnation of teacher evaluation I’ve ever read except that those evaluations will also look at what happens in the end to your kids.
    Because the truth is if, “Test results are random”, then why teach? Why are some things “best-practices” other than they seem to make sense if they don’t result in systematically deeper learning?
    That’s not to say that the test results are completely indicative of your success in the classroom– that’s why all the other things you mention should be a part of teacher evaluation. But it is to say that the tests results are not PURELY random, which is what ignoring them completely would suggest.
    I don’t know anything about the Washington State tests, nor do I know what you do in the classroom. But I find it hard to believe that the test is constructed in such a way that a student who knows what they’re supposed to have learned in a particular course is still unlikely to do well on that exam. I also find it hard to believe that you don’t construct your own formalized assessments in the classroom, some of which take the same form as the test, and that these assessments would have looked similar to the state tests even if the state test didn’t exist.
    Tabitha is precisely what teacher evaluation is supposed to be about. And while there is subjectivity in determining what makes a good hair cut, there is little doubt that we can make some good measurements about quality. It’s pretty unlikely that the hairdresser who is most often requested and is always booked is worse at cutting hair than the folks who almost exclusively do walk-ins because they’re rarely requested. It’s pretty unlikely that a salon which charges the same amount as another salon but receives 2 times as much business is producing worse haircuts. It’s pretty unlikely that we would say a salon which has twice as much annual profit is worse.
    Everything has it’s place.

Comments are closed.