More on Rankings

I made one mistake in my note on Leiter’s M&E rankings yesterday. Texas is actually fifth, behind Notre Dame at outright fourth. My apologies for that.

As Aidan notes in the comments on that post, the rankings do come attached with the following disclaimer.

bq. This measure obviously favors large departments (which can cover more areas) and does not discriminate between the relative importance and prestige of sub-fields within the metaphysics and epistemology category.

There’s a suggestion that this makes up for some of my criticisms. I rather think it doesn’t – it still looks like a junk stat to me. Much more as to why under the fold.

First, a few disclaimers.

I don’t think the overall departmental ratings are junk stats – I think they’re incredibly useful. And I don’t think that the individual ratings are junk stats – they’re really useful too. And I’m not displeased with where Cornell ended up. I’m not sure there is anything meaningful to be measured around here, but if there is, about 13th sounds not off by an order of magnitude to me. And as I said, there are good people at the departments that are overrated on this list.

Having said all that, here are eight worries about the stat.

The first thing to say is that if you need a disclaimer like that attached to the list, then it is probably best to not publish the list in the first place. Everyone knows that lists like this will be discussed over email, in department lounges etc, without the disclaimer. It’s a bad idea to prominently publish lists without being confident the list can stand on its own, because within minutes of it being published it will be having to stand on its own in conversations around the world.

Second, the disclaimer doesn’t really address the obviously bizarre claim, which is that _Princeton_ is ranked so low. It isn’t that Princeton is a particularly small department. If someone chooses not to go to Princeton because of this list, that would be a really unfortunate result I think. (MIT is obviously a different case here, because the disclaimer, when it is attached, does apply there.) I’d also note the oddness of ranking Michigan 17th, and U-Mass 18th. Again, these can’t be written off as large department bias.

(Berkeley is a little low at 13th as well, but I think that has a different explanation. I actually thought Berkeley would have been helped by the odd way of drawing the boundaries, since it is strong in philosophical logic and, I thought, philosophy of action. Getting 0, or close to it, in philosophy of religion, however, really makes a difference. Going from 0 to 3.5 in one category, say philosophy of religion, lifts a department’s average by 0.5 points, or the difference between 5th and 17th. That seems excessive to me. Whether that’s exactly what’s happened to Berkeley is hard to tell without the raw data.)

Third, because ‘metaphysics & epistemology’ is not really a natural kind, it ends up being sensitive to what seem like the wrong things to me. I don’t know how well Princeton did on philosophy of action, but if I was rating it one of the large considerations would be Michael Smith’s work on moral psychology. Now I think Michael’s work on moral psychology is really very good, and very important, but it isn’t obvious to me that it’s value should play a particularly large role in determining how good his department is in metaphysics & epistemology. As things stand, it plays a pretty large role. That’s because having one person who does important work on action theory suffices to be pretty good in philosophy of action. Such a person may make as much difference to a department’s ranking in philosophy of action as three or four good epistemologists make to a department’s ranking in epistemology. But this is absurd.

Fourth, there is a striking bias here in favour of people who work across categories. If someone writes a great book on the metaphysics of free will, and how this relates to the problem of evil, that will improve their department’s ranking (perhaps considerably) on three of the seven rankings that go into this stat. But if someone writes a great book on scepticism, that only impacts the epistemology ranking, or one-seventh of the stat. But it is absurd that work on the metaphysics of free will should be *three times* as important to metaphysics and epistemology as work on scepticism.

Note that this is independent of worries about department size. The point is that the rank favours people whose work cuts across the different categories. Small departments can be helped in this way as much as large departments.

Fifth, the stat is incredibly dependent on just which categories we choose to measure. For various reasons, action theory and philosophy of religion are included. But there’s no category for, say, philosophy of perception. (Should it be wound into philosophy of mind or epistemology?) Including it would make a difference. (Probably helping Texas as it turns out.) So there’s some amount of arbitrariness in choosing what we did.

Sixth, the stat leads to obviously crazy results in some hypothetical situations. Consider the three hypothetical departments A, B and C in the following example.

  A B C
Metaphysics 5 3 1.5
Epistemology 5 3 1
Mind 4 3 2
Language 4 3 3
Philosophical Logic 3 3 5
Religion zero 4 5
Action zero 3 5

If we take the average of the seven categories, then they are ranked C, B, A. Now this is nonsense. One might object that C is obviously a fictional department. (For one thing, a department that good in action theory and philosophy of religion would already get a metaphysics ranking higher than 1.5, because of the double counting mentioned above.) But A and B seem not particularly unrealistic to me. There are two distinct problems this example brings out, that lead (finally!) to our last two objections.

Seventh, the five point scales that are used are not linear. Consider what it takes, in terms of adding people or publications or citations, to get from 4 to 5 in a category. Now consider what it takes to get from 1 to 2, or 2 to 3. When you have a set of non-linear measures like this, averaging them is a bad idea. (That’s why the A-B comparison looks so silly.)

Eighth, even if you are going to average them (despite the worries about double counting and non-linear scales) treating these categories all the same is a little absurd in terms of what is being intuitively captured. (This is what the A-C comparison brings out.) There should be some weighting of the categories to their relative importance in M&E broadly construed.

Which weighting? Good question! I think there’s no right answer here; differences about the appropriate weighting are really differences of taste. Here’s one natural suggestion, however, that is at least sorta kinda objective. Different categories in the Leiter report have different numbers of people contributing rankings. To a first approximation, the number of rankers is proportionate to the importance to the field. So we could use that as the weighting. This would give a slightly more sensible number, but it still wouldn’t deal with the double counting and non-linear scales problems.

Short version of my advice to undergraduates: Don’t use these “broad category” measures to make any decisions at all! Stick to the overall rankings and the speciality rankings.