A note on “Exposing Gender Bias in Student Ratings of Teaching”

This 2015 study is often cited as evidence of bias against women is student evaluations of teaching. The authors studied an online course, which meant that they were able to test the gender effect on teaching evaluations by experimentally varying the screen name of the instructor. Students rated their instructor’s overall quality. The results showed no significant gender difference. The abstract of the paper states the exact opposite, however:

> Students rated the male identity significantly higher than the female identity, … demonstrating gender bias. (291)

How can this be? Simple: the authors were so determined to prove gender bias that they decided to cheat and move the goalposts, as they admit in a footnote:

> While we acknowledge that a significance level of .05 is conventional in social science and higher education research, … we have used a significance level of .10 for some tests where: 1) the results support the hypothesis and we are consequently more willing to reject the null hypothesis of no difference; 2) our hypothesis is strongly supported theoretically and by empirical results in other studies that use lower significance levels; 3) our small n may be obscuring large differences; and 4) the gravity of an increased risk of Type I error is diminished in light of the benefit of decreasing the risk of a Type II error. (288)

In other words: We decided to call things significant even when they’re not if: 1) it agrees with what we already decided in advance that the results of the study should be; 2) everyone already knows we’re right anyway so we don’t need any of that pesky “scientific evidence” stuff (even though finding such evidence is ostensibly the whole purpose of our paper); 3) our study is so ridiculously small that the results could mean anything; and 4) we may be dead wrong but on the other hand maybe we’re not.

Only with this sham, unprecedented definition of significance did the authors manage to find a so-called “significant” pro-male bias.

In any honest universe, the title of the paper would be “Failure to Expose Gender Bias in Student Ratings of Teaching” and the abstract would say “Students did not rate the male identity significantly higher than the female identity, … demonstrating that no gender bias can be inferred.”

The study does, however, demonstrate one clear and undeniable bias, namely that of the authors in favour of their preconceived hypothesis.

Actually, the authors were not even content with this amount of cheating. They lie even more when they say:

> These findings support the argument that male instructors are often afforded an automatic credibility in terms of their … expertise. (300)

In reality, they specifically asked the students to rate how “knowledgable” their instructor was, and the results (299) showed no significant gender effect even with the authors’ sham definition of significance. Since this didn’t stop them from concluding the exact opposite, one wonders why they bothered gathering any data at all.