After I read Tom's pungent post about the bill that proposes to change Reduction in Force criteria from seniority to evaluation, I spent Friday at a workshop sponsored by ESD 114 and WEA-Olympic Uniserv thinking about how to create the new evaluation system (mandated by last year's SB 6696 and to be in place statewide by 2013-2014) that will put a number to an evaluation. Tom's post sparked a lively debate, but to debate seniority vs. effectiveness we first have to agree that we could even come up with a number from a teacher's evaluation. That is a huge assumption that requires some critical thinking. And we have to think about what we want the purpose of an evaluation system to be. If we want the system to promote growth, instead of just verifying compliance with state law, will applying numbers help?
First here are the key changes in teacher evaluation:
Old Teacher Evaluation Categories
- Instructional Skill
- Classroom Management
- Professional Preparation and Scholarship
- Effort Toward Improvement When Needed
- Handling of Student Discipline and Attendant Problems
- Interest in Teaching Pupils
- Knowledge of Subject Matter
Proposed Teacher Evaluation Categories
- Centering instruction on high expectations for student achievement
- Demonstrating effective teaching practices
- Recognizing individual student learning needs and developing strategies to address those needs
- Providing clear and intentional focus on subject matter content and curriculum
- Fostering and managing a safe, positive learning environment
- Using multiple student data elements to modify instruction and improve student learning
- Communicating with parents and school community
- Exhibiting collaborative and collegial practices focus on improving instructional practice and student learning
Each of the new categories must be judged on a 4 tier system, which may or may not be similar to the scale used to award National Board Certification (based on local decisions). To become National Board Certified a candidate must have an aggregate score of 275 (the cut line) when the four entries and six assessments are weighted and combined. Each entry and assessment is scored on a scale of 1 to 4.25. The first three entries are worth 16%, entry 4 is 12%, and each of the six assessments is 6.67%.
Under the current law (the Legislature has a way of creating moving targets) local districts can determine their own cut line, and weighting of categories. Some districts may decide certain categories are more important than others. Others may decide that each one will be worth 12.5%. The district and local association will bargain the overall number that a teacher must achieve to avoid probation. Teachers with evaluations below that number will be put on a plan of improvement, which can lead to non-renewal of their contract.
But once your evaluation has a number, it could be put to other uses. I may be paranoid, but when I heard President Obama say he wanted to reward good teachers, I'm pretty sure he was thinking of merit pay. An evaluation number could be used for that. Or for Reduction in Force criteria. Like my friend Vince says: It's never about what it's about.
One problem is that under the current system, in my district at least, a teacher who is unsatisfactory in one area can be placed on probation. Let's say they are pretty competent at everything but handling of student discipline. They would get a plan of improvement and 60 days of probation. Under the new system it might be possible to get a 1 in fostering a safe learning environment, but 3s and 4s in the other categories.The overall aggregate number might be above the cut line, but there should still be an intervention to help the teacher.
Another huge problem is consistency between administrator evaluators. When different evaluators are assigning numbers to the categories the rubrics will have to be very well written, and it will take practice. I haven't noticed that the administrators in my building have a lot of extra time for training to be a better evaluator.
And my biggest fear? It's number 6: Using multiple student data elements to modify instruction and improve student learning. In order to be competitive in our Race to the Top application our state had to add student test scores as a component of the evaluation instrument. This is as far as they went last year. But it can easily be changed to: Student growth as evidenced by state test scores.
I think the evaluation system should be better. It should create and foster growth, and risk-taking opportunities. It should not just be a compliance check. But I don't see how trying to put numbers in the system helps.
The satisfactory/unsatisfactory form is easily criticized, but it's also easily understood. We know good teaching when we see it, and when we don't. Numbers won't help that; they may make it worse instead of better. Evaluation should be constantly formative, not summative. What can I do with a number in June? Give me feedback in October, so I can do better this year.
Every school district will need to have the new evaluation system in place by September 2013. Are you thinking about what you want yours to look like? Because it's like the future: if you don't work for the one you want, you'll get the one you deserve.
NBPTS’ score is an average of three evaluators. If the stakes of evaluations are increasing (merit, RIFs, etc.) then the variability of evaluations among administrators is problematic. I’d appreciate seeing a teacher evaluator, a principal, and a content area specialist all average their scores to determine my evaluation.
Similar to other education initiatives, 6696 has some very good intentions, but the potential for misuse, training needed, and proper implementation will take it off what could be a good direction.
You had me until here:
“The satisfactory/unsatisfactory form is easily criticized, but it’s also easily understood. We know good teaching when we see it, and when we don’t. Numbers won’t help that; they may make it worse instead of better. Evaluation should be constantly formative, not summative. What can I do with a number in June? Give me feedback in October, so I can do better this year.”
An easy to understand rating is definitely important, but a binary rating is the easiest rating with the least information. Surely there’s a better balance on the spectrum of unintelligible to contains no information whatsoever.
“We know good teaching when we see it, and when we don’t.” The percentage of evaluations that end up with a “satisfactory” rating in this country makes this comment a moot point. While it may be true that there is simply “good teaching” and “bad teaching” and this is all about a simple sniff test (I would disagree…), the truth is that no one receives an unsatisfactory rating in nearly all the evaluation systems across the country. Seeing it and writing it in an evaluation are two different things.
“Numbers…”
Again, it’s hard for me to accept the idea that a simple, binary, overall ranking is useful and that a more complex ranking is likely to be worse. Your own example of a teacher who has needs in student discipline but does well in all other areas demonstrates that increased complexity is required for true formative, actionable feedback.
“What can I do…”
A few of things– first, toward the end of the year is when management can make personnel decisions without disrupting the students’ education. Second, it will take multiple observations to provide good, honest, and accurate feedback. Third, I don’t understand why the gap between June and September can’t be spent attempting to improve. Sounds to me like the time when you’ll have the most opportunity to self-assess, plan, seek out further education and training, etc.
I just have to add that, when I leave comments peppered with capitalization and punctuation errors, it means I’m on my husband’s iPad. I hate it. If you’ve been feeling sad you don’t have one, be happy instead. It’s easier to write on my phone.
I did get a number, but that number was connected to a specific skill. being “satisfactory” twice a year for the fifteen years tells me not very much. I equal “satisfactory” to “Congratulations! you haven’t sold drugs from your room, slept with or killed a student!”
I agree that you can’t quantify teaching entirely, but you do a lot better than satisfactory. what I like about the detailed rubric for classroom instruction Seattle has is that it really spells out good teaching. Even reading it made me change me practice.
Brian, your story about the STPs is funny and sad. If we don’t identify the superb teachers, no one has the credibility necessary to have that hard conversation over the copier. We have failed to be courageous, out of fear of being arrogant, and we haven’t done enough to expect good teaching from each other.
“I think qualitative scoring with narrative feedback is better than pretending that we can quantify teaching practice.”
EXACTLY.
Kristin, I had the advantage of bargaining our evaluation form (with a really smart principal: Tim Madden) and we created four tiers 15 years ago. we called them: Unsatisfactory; Needs Improvement; Meets Professional Expectations (MPE); and Superior Teaching Performance (STP). I thought teachers would appreciate getting some acknowledgement beyond just satisfactory, like you said. But some administrators were reluctant to give an STP, and teachers got their feelings hurt. Teachers started asking each other how many STPs they had received. (Growing old is mandatory; growing up is optional).
We worked with the administrators, and eventually we arrived at a system that gave teachers feedback and accolades for what they were doing well. I think qualitative scoring with narrative feedback is better than pretending that we can quantify teaching practice.
And about the “detailed scoring” you got from the NBPTS: did you really get anything but a number?
The criteria overlap, and the rubric must be written carefully. Doing one thing well should not give a teacher two times the credit and vice versa.
I also think the entire system will be reliant on two pieces: 1) a professional development system based entirely on the evaluation model and tools and 2) a consistent, reliable training for evaluators to assess the same way. A teacher must be able to have series of classes/trainings to help move up the rating system, and a teacher should not have two widely varying scores based on the same observation.
Plus, we may have to move beyond the notion that a classroom observation is the end-all, be-all in teacher evaluation. Why not include the plan book, the assignments given, the collaborative nature of the teacher, the teacher as leader, and so on?
I like the evaluation criteria; you can’t argue against any of those bullet points. The old ones aren’t bad either, for that matter.
But more important than WHAT is evaluated is HOW it’s evaluated. Right now my administrator barely has time to visit three or four times a year. I think he thinks I’m doing all right, but based on what?
Kristin has a good point; maybe we should be evaluating each other.
I too like the idea of feedback in October.
I don’t think principals should do all the evaluating. There are other resources.
The regional executive director who is overseeing schools in my area is amazing. Everything she says about education indicates she was an excellent teacher and a highly-capable principal. It doesn’t take her long to examine a classroom and pinpoint what needs to be done. So maybe we need to speak up and ask that people like her, or qualified teacher-leaders (like my department head, whom I really respect) are given a job evaluating?
I think the struggle to assign numbers is about having a more detailed evaluation than the old thumb-up / thumb-down. I used to be really frustrated that “satisfactory” was as good as you could do – I wanted to be better than satisfactory, but there was no “exceptional” target to shoot for. The best part about getting my national board certification was the detailed scoring – where was I strong and where was I weak? I liked that.
Overall, I like the direction things are going. I like a detailed instructional rubric.