Several sources have pointed me to this neat web site of spurious correlations, showing graphically how, for example, the age of Miss America correlates with the number of murders by steam, hot vapours and hot objects, or my favorite:
Though spurious correlations can be dangerous (and hilarious), it’s often useful to look for correlations in data that might reveal interesting (and perhaps testable) hypotheses. I like to do this when I give exams in my classes. I gave one example nearly a year ago, on the correlation between exam performance and sitting in the front vs. the back of a long, narrow lecture hall. Here’s another:
There are several types of assignments in my “Physics of Life” class this term. I was curious which of these best correlates with performance on my midterm exam. The winner: The score on the “reading quizzes” — little clicker-based quizzes I give at the start of most classes that ask questions about the pre-class reading, intended to be simple if one has done the reading and challenging if one has not. Here’s the graph:
The diamonds give the mean values from partitioning the class into two bins: students with reading quiz scores {<= , > } 60%. The low group has a mean exam score of 49% (D-), more than a full letter grade lower than the high group, whose mean was 80% (mid B). The line is a linear fit (slope 0.7 +/- 0.1); r2 = 0.44.
Of course, as the class correctly noted, there are two plausible non-exclusive interpretations: (1) doing the readings cause people to understand the material better, and they then do better on the exam, and (2) doing the readings and doing well on the exam are both driven by the underlying mechanism of being a good student. I suppose we might also attribute this to the age of Miss America, but I haven’t made the graph yet. In the class-seating case, as I noted earlier, controlled studies do point to a causal relationship between seat location and class performance. For the readings, I’m not inclined to force random people to read or not read in order to study the results! I might, however, at the end of the term look at students whose reading quiz scores changed after seeing the graph above, and see if that correlates with better overall class performance. (It wouldn’t be a perfect test, of course, but that’s fine.)
Though my students seem to be properly awed by these graphs (seating location, reading performance, etc.) I find it interesting that a good number don’t bother changing their habits. One would think: what’s to lose by sitting near the front and reading what I’m supposed to read? But of course, as we all know, statistics only apply to other people.