How to prove anything with statistics: gender bias edition

“Students see male professors as brilliant geniuses, female professors as bossy and annoying.” This is a widely touted conclusion based on frequency analysis of words in student evaluations. “Brilliant” was used more often to describe men and “annoying” more often to describe women. Hence gender bias against women, QED. So the argument goes.

In reality this doesn’t prove much of anything. As always with statistics you can “prove” whatever you like by cherry-picking whatever supports your preferred conclusion. Let’s say for example that I wanted to prove the opposite: student evaluations are biased against men. Easy. Take the very same data set used to establish the conclusion above and search for other evaluative words instead. Sure enough, it turns out that for instance “idiot,” “fool,” “crap,” “junk,” and “insensitive” are more often used in evaluations of male instructors more than female ones. Meanwhile female instructors are much more often called “terrific,” “splendid,” “lovely,” “loved,” “wonderful,” which doesn’t exactly seem to square so well with the narrative of patriarchal oppression. Perhaps the strongest bias at work here is that determining which conclusions are fashionable enough to be featured in the New York Times.