Friday, November 28, 2014

8 Years Ago: Student Evaluations.

One of the 5 most popular links in RYS History:

Today I woke up with a knot in my stomach, and I was out of sorts all day. I was giving my students the evaluation instrument my college uses. As soon as the large white envelope came out of my bag the students started their energetic twittering. I even heard the same comments I always hear: "Yeah, now we get to give the grades," etc.

I always read the preamble that my college gives me to read, about anonymity, about how grateful we all are to gather comments. How we're eager to find ways in which to teach the courses better. There's even a line that reads, "Your instructor welcomes your criticism."

And of course it's all complete bullshit.



  1. Evaluations were, for me, a complete waste of time. Students used them to say things about and to me that they didn't have to courage to say to my face. Although they were supposed to be anonymous, I often quickly figured out who wrote what about me in the comments.

    As a way of improving teaching, they were ineffective. I remember having some difficulties with an administrator about my teaching method, and an associate dean sat in on some of my lectures. He quickly spotted certain things that I wasn't aware of and which certainly were never mentioned in the evaluations.

    The real value evaluations had was for administrators. They used them to gather evidence against someone they didn't like and my department head and the assistant DH were only too willing to use them for that purpose against me. I found that I could present my material effectively, grade and return assignments and exams quickly, make all sorts of useful comments, and maintain discipline and this showed up in the evaluations. But, let there be something trivial, such as not being overjoyed to see the kiddies each time, I'd end up in deep you-know-what.

    As for the evaluations being confidential and only between students and instructors, that was also malarkey. Somehow, my DH and ADH always knew what was said about me, even though those two administrators didn't conduct their own surveys (which they had the right to do, of course). I long suspected that either they snooped through the envelope or someone in the IT department, who processed them, leaked that information.

  2. There's a paper that compares teacher evaluations with student grades and student outcomes in subsequent courses. It was based on research done at one of the military academies, if my memory serves. The data indicated that students graded their teachers based on their expected grade on the class. Also, students of teachers who got the better evaluations fared worse in subsequent classes. I'll try to dig up that paper, but maybe from the above description someone can beat me to it; I may even have been introduced to it on this blog.

    So, my recalled conclusion of that paper reinforces my view of the worth of 'customer' opinion in the educational setting, which was cogently summarized in this comic strip [link]. I encountered the strip when it originally ran in my local urban weekly alternative newspaper, whereupon I promptly cut it out and mounted it on the wall of my lab. (To avoid antagonizing the 'customers', I didn't display it in my office where I ran extra help sessions.) The timing couldn't have been more perfect: my research collaborator had just dismissed a student from his office after refusing to accept an alternative answer on a multiple-choice question, and I had just suffered the indignity of one of my questions being dropped from a (different) test because 2/3 of the class had bombed it and the course coordinator didn't want to endure the complaints.

    When my lab moved, I lost the strip, but fortunately it lived on in cyberspace. When I was trying to find it today, I first found this one [link], which amused. I've got the 2004 one bookmarked now.

    1. I suspect this is the study you're remembering (conducted in West Point math classes): .

      From the abstract:

      In the postsecondary environment, scores on student evaluations of professors are typically used to measure teaching quality. We possess unique data that allow us to measure relative student performance in mandatory follow-on classes. We compare metrics that capture these three different notions of instructional quality and present evidence that professors who excel at promoting contemporaneous student achievement teach in ways that improve their student evaluations but harm the follow-on achievement of their students in more advanced classes.

      I'm open to the idea that consistently very low student evaluations can serve as a sort of canary in the coal mine/red flag for instructors who are having serious trouble interacting effectively with a significant portion of their students, and I also suspect that they accurately identify some very good teachers (though perhaps mixed in with some merely very entertaining ones). For somewhere between 75 and 90% of the faculty, however, differences in evals probably have more to do with factors such as personality, strictness or leniency in grading, gender, age, race, appearance, class composition, timing, and location, etc., etc. than with the actual quality of instruction (which students are, as a rule -- and as we have noted many times here before -- ill-equipped to measure). I'm not sure how well measuring instructor quality by student outcome would work in an environment less selective (and less regimented) than West Point (the idea has, after all, been tried at the secondary level, with considerably less-than-ideal outcomes), but it strikes me, if measurable (and that may be the rub), as at least equally valid (or invalid, as the case may be).

    2. I'm pretty sure that's the one. 2010 pub date matches my recollection, and I should have it at the office to confirm, but I'll probably forget to look for it tomorrow, so let's just assume you got it.

      I agree that bad evals may be a red flag. Or it seems they may indicate a rigorous instructor. Good evals may indicate a good instructor, or an easy one. So we're down to them being just one component in what must be a holistic review of instructor performance. Strictly numerically, I favor measures based on scores in subsequent courses, for those courses amenable to such treatment.