Friday, August 21, 2015

Dr. Amelia wants to know why assessment is often BS.

The words accreditation and assessment seem to strike serious fear in the hearts of proffies everywhere. I get that It's a giant timesuck, and often seems like meaningless hoop jumping. But what I don't understand is why, at every school I know, smart people who know how to answer questions don't seem to be able to use a technique that will lead to reliable information.

Here's an example: You want to know if students are learning hamster fur fashion history, so you use their test scores in fur weaving technologies class. Here's another: you want to know if a sequence of courses leads to a desired learning outcome of the major, so your sole measurement is asking students at the end of the intro course if they feel well prepared for the rest of their major.

In the words of my students, #srsly.


  1. Might some of the problem be disciplinary? Not as in "these people need to be disciplined with the Cane of Quantitative Reasoning," as tempting as the thought may be, but as in "my discipline is in the humanities and I haven't taken a class that requires objective measurement since my undergrad ged-ed science class, and maybe not even then because it was a 500-student survey course"?

  2. I suspect the problem is with Assessment Evangelists rather that with assessment. All my colleagues are willing to talk about trying to figure out how we are and are not preparing our students for the futures they want, but many of them (and I) don not care for the revival tent atmosphere that some people bring to telling us that we should be assessing: all hellfire, hallelujah and trite generalities masquerading as advice.

  3. Because university administrators and other statistical innumerates have been using the arbitrary and often completely bogus numerical scores from anonymous student evaluations as a bludgeoning weapon for many years now, and why should we expect assessment to be any different? One of the great things about knowing some stats is that it can be quite easy to run rings around university administrators, if they still pretend to follow logic, which sadly they too often don't.

  4. A commenter over on the CHE forum observed that problems arise when people act like the purpose of assessment is to prove that 100% of students are successful 100% of the time. This kicks in especially hard when the data point to the need to do things (smaller classes, higher entrance standards, reining in the Office of Student Retention and Appeasement) that the administration has already decided it doesn't want to do.

  5. I teach in a rather large department, and so we sometimes have 20 instructors teaching 20 sections of Algebra (as an example). Let's say we decide to discuss the following learning objective: Determine the slope of a line, given the coordinates of two points on the line.

    In my honest opinion, the best use of our time in determining the best way to teach the topic is to have instructors get together, discuss the learning objective, and talk about how we cover the topic, what hurdles the students face and how to help them overcome those hurdles. Any teacher who's paying even half of the attention in class that they should will be able to actively participate in this discussion, and having a discussion like this might help some of the instructors who don't cover the material well learn some new (and hopefully better) techniques.

    Unfortunately, the accreditation commission requires hard data. We could use a simple, unreliable assessment (a single shared final exam problem that they may skip due to lack of time) or a much more encompassing assessment (by looking at each student's progress on the homework and exams for every question related to the learning objective). Either way, we will get the same generic data that is generally unhelpful. We will learn that most of the students (about 60-70%) were fully competent at the learning objective, some of the students (another 15-20% or so) were partially competent at the learning objective, and some of the students (the last 15-20% or so) showed no competence with respect to the learning objective. The percentages change a bit, based on the difficulty of the particular learning objective, but we have large enough data sets that high scores from the great teachers will offset the low scores from the bad teachers, and we'll end up with average results across the board. We have never learned anything useful enough to justify the amount of time we have spent on assessment data.

    So when we have to decide what method we're going to use to collect this data, many of us are against using more comprehensive approaches that will give us better data because even though the data will be more reliable, it will take three times as much work, and we won't learn anything more than if we used a single shared test question or asked the students to raise their hands if they feel as though they really understand that particular objective.

    With so many instructors teaching the same classes in slightly different ways, what we need is a way to see which instructors are adequately preparing their students for subsequent courses, and which instructors aren't. And the instructors who aren't preparing their students should have conversations with those instructors who are to figure out what they can do to improve how they teach particular courses and/or topics.

    Or if we had data comparing the success rates of students in various courses, we might be able to learn things like "Students who move from Algebra I to Algebra II are doing great in Algebra II, but students who move from Algebra II to Precalculus are really struggling in Precalculus." Knowing that the jump from Algebra II to Precalculus is much harder for students than the jump from Algebra I to Algebra II might cause us to push our Algebra II students just a bit harder so that the difficulty increase when they start Precalculus would be a bit easier to handle.

    Why don't we do this? Because the accreditors are so focused on Learning Objectives that we have to spend a lot of our free time worrying about them and collecting data to prove that we worry about them, that there's no time to do work that might actually help improve the way we educate our students.

  6. I agree with all of the hypotheses/descriptions above, and suspect the answer involves varying degrees of each at varying institutions/in various departments. I could certainly use a statistics course tailored for humanities proffies who need to better understand/design assessment (sans revival elements, please) at this point (and would probably take one if offered. Anybody seen a relevant MOOC floating around? this is exactly the sort of thing for which that format might actually be useful, especially if the class also served as an opportunity for experienced proffies to band together, identify problems with the present system, brainstorm solutions, etc.)

    I especially like Mathew's point that the sort of informal, local assessment that arises when the great majority of instructors have service (i.e. time for talking to each other about the curriculum, and what's working and not, and how students' needs are changing) built into their jobs can actually be more effective than the "big data" approaches that are being touted on my (and, I'm sure, others') campuses. (And, of course, I'm intrigued that it's coming from someone with "mathy" inclinations -- though perhaps that makes Matthew all the more qualified to comment on the limits of mathematical modeling).

    That said, I must say that my program is actually doing this stuff pretty well -- no thanks to me, since my position doesn't include service/designing assessment, but thanks to a lot of hard work by several members of my program, combined with a member or two of the institutional data-gathering office who actually dedicated to making assessment work for faculty, and (perhaps most crucial) some money that came into the budget thanks to an accrediting-body requirement, and has been cleverly deployed by some of the people mentioned above to support well-designed assessment (including paying people quite nicely for the labor-intensive work of sitting down with instructor materials, student work, and a rubric or three, and doing their/our best to translate the messy realities of instruction and learning into numbers that might actually mean/measure something).


Note: Only a member of this blog may post a comment.