
What is the model of evaluating teaching effectiveness?
I sometimes like to think how my employers evaluate my teaching "effectiveness". Before I say anything, I must put a disclaimer first. The text that is to follow is not a view of my current employment. Rather, what I am going to say are based on my observations and reflections from a long involvement with engineering education. I also want the readers to read this in the "pitfalls to avoid" spirit. Back to the post.
What is their model, I ask myself? Surely, they cannot rely on the exam performances of thirty to forty students. Statistically this does not make sense. My academic supervisor would shout (and laugh) at me if I were to build a model on such a small sample size. The sample is too small given all things that affect how students perform on a given day on an exam. Just look at Facebook. To change a word or colour or font of a heading-text they consider thousands of variables, feeding it constantly to the algorithm going back-and-forth, fine-tuning the model based on user feedback and experience. It takes a massive computer to change text colour from "blue" to "red". Now, think about the challenge of building an "effectiveness" evaluation model that accounts for the variations in socio-economic background of the students. It has to be there because every semester a teacher is teaching to different group of students. This is just one of the many variables I would put in the computer model. Suffices to say, the task is far from trivial.
Then there is the case of the sample size. If I was teaching a class of one lakh students, then one could make a case for a basic statistical model with random sampling. Still, a feedback loop would be missing to tell how far off-track the model is. So how can my employers ever know if my teaching is "effective"? What is their model? Where is their feedback-loop? I personally think decision makers benefit by enrolling in a statistics course. It is easy to be influenced by opinions when facts are saying something different. Also, I have seen many people in the decision making position who are unaware about how bad their evaluations and generalizations really are. Evaluating "effectiveness" of a teacher based on poor exam performance of thirty or so students would be a clear demonstration of poor grasp of statistics. On a side note, could the decision be overturned by putting forward "poor use of statistics" argument in a court? If not, how can this be ensured in the future? Food-for-thought for legal minds out there.
All this might sound silly to some, but I ensure you that this happens way too often, and not just in Nepal. Teachers are praised (maybe even rewarded by bonuses) if most students do well in exams of their respective subject and asked to teach better otherwise. The consequences of such evaluation schemes can easily lead (and have led) to artificial inflation of scores. Such schemes and mechanisms incentivize "behaviour modification" rather than promoting "effective teaching". For example, if I know that the performance appraisal heavily scores students performance in internal exams, I will start giving high marks to all students.
I just presented an example where a poor model led to poor evaluation of teaching effectiveness. The model can be improved by asking students, HoD and the principal to rate the teacher on objective criteria. They might even assess the effort put into lectures, assignments and practical, from pedagogy as well as learning goals perspective. Many other considerations will be needed for a fair assessment. There are things we human can learn from good computer-based modelling practises. Performance assessment should be based on continuous evaluation like constant and repeated tests of computer models. This places a mechanism akin to fine-tuning of the computer models. A continuous feedback-loop is important because an imperfect assessment model can get an employee fired wrongly. Although the impact is not the same as shopping websites recommending wrong items, the recommendation engine learns quickly that it has done a mistake. "Teaching effectiveness" evaluation needs such a robust approach.