Accuracy measures on conditional probabilities

I just proved a result about probability aggregation that I found rather perplexing. The proof, and even the result, is a little too complicated to put in HTML, so here it is in PDF.

What started me thinking about this was Sarah Moss’s excellent paper Scoring Rules and Epistemic Compromise, which is about aggregating probability functions. Here’s, roughly, the kind of puzzle that Moss is interested in. We’ve got two probability functions Pr1 and Pr2, and we want to find some probability function Pr that ‘aggregates’ them. Perhaps that’s because Pr1 is my credence function, Pr2 is yours, and we need to find some basis for making choices about collective action. Perhaps it is because the only thing you know about a certain subject matter is that one expert’s credence function is Pr1, another’s function is Pr2, and each expert seems equally likely, and you want to somehow defer equally to the two of them. (Or, perhaps, it is because you want to apply the Equal Weight View of disagreement. But don’t do that; the Equal Weight View is false.)

It seems there is an easy solution to this. For any X, let Pr(X) = (Pr1(X) + Pr2(X))/2. But as Barry Loewer noted many years ago, this solution has some costs. Let’s say we care about two propositions, p and q, and Boolean combinations of them. And say that p and q are probabilistically independent according to both Pr1 and Pr2. Then this linear mixture approach will not in general preserve independence. So there are some costs to it.

One of the thing Moss does is come up with an independent argument for using linear mixtures. Her argument turns on various accuracy measures, or what are sometimes called scoring rules, for probability functions. (Note that I’m leaving out a lot of the interesting stuff in Moss’s paper, which goes into a lot of detail about what happens when we get further away from the Brier scores that are the focus here. Anyone who is at all interested in these aggregation issues, which are pretty central to current epistemological debates, should read her paper.)

Thanks to Jim Joyce’s work there has been an upsurge in interest in philosophy in accuracy measures of probability functions. Here’s how the most commonly used scoring rule, called the Brier score, works. We start with a partition of possibility space, the partition that we’re currently interested in. In this case it would be {pq, p ∧ ¬q, ¬pq, ¬p ∧ ¬q}. For any proposition X, say V(X, w) is 1 if X is true, and 0 if X is false. Then we ‘score’ a function Pr in world w by summing (Pr(X) – V(X, w))2, as X takes each value in the partition. This is a measure of how inaccurate Pr is in w, the higher this number is, the more inaccurate Pr is. Conversely, the lower it is, the more accurate it is. And accuracy is a good thing obviously, so this gives us a kind of goodness measure on probability functions.

Now in the aggregation problem we’re interested in here, we don’t know what world we’re in, so this isn’t directly relevant. But instead of looking at the actual inaccuracy measure of Pr, we can look at its expected inaccuracy measure. ‘Expected’ according to what, you might ask. Well, first we look at the expectation according to Pr1, and then the expectation according to Pr2, then we average them. That gives a fair way of scoring Pr according to each Pr1 and Pr2.

One of the things Moss shows is that this average of expected inaccuracy is minimised when Pr is the linear average of Pr1 and Pr2. And she offers good reasons to think this isn’t a quirk of the scoring rule we’re using. It doesn’t matter, that is, that we’re using squares of distance between Pr(X) and V(X); any ‘credence-eliciting’ scoring rule will plausibly have the same result.

But I was worried that this didn’t really address the Loewer concern directly. The point of that concern was that linear mixtures get the conditional probabilities wrong. So we might want instead to measure the accuracy of Pr’s conditional probability assignments. Here’s how I thought we’d go about that.

Consider the four values Pr(p | q), Pr(p | ¬q), Pr(q | p), Pr(q | ¬p). In any world w, two of the four ‘conditions’ in these conditional probabilities will be met. Let’s say they are p and ¬q. Then the conditional inaccuracy of Pr in that world will be (Pr(q | p) – V(q))2 + (Pr(p | ¬q) – V(p))2. In other words, we apply the same formula as for the Brier score, but we use conditional rather than unconditional probabilities, and we just look at the conditions that are satisfied.

From then on, I thought, we could use Moss’s technique. We’ll look for the value of Pr that minimises the expected conditional inaccuracy, and call that the compromise, or aggregated, function. I guessed that this would be the function we got by taking the linear mixtures of the original conditional probabilities. That is, we would want to have Pr(p | q) = (Pr1(p | q) + Pr2(p | q))/2. I thought that, at least roughly, the same reasoning that implied that linear mixtures of unconditional probabilities minimised the average expected unconditional inaccuracy would mean that linear mixtures of conditional probabilities minimised the average expected conditional inaccuracy.

I was wrong.

It turns out that, at least in the case where p and q are probabilistically independent according to both Pr1 and Pr2, the function that does best according to this new rule is the same linear mixture as does best under the measures Moss looks at. This was extremely surprising to me. We start with a whole bunch of conditional probabilities. We need to aggregate them into a joint conditional probability distribution that satisfies various nice constraints. Notably, these are all constraints on the resultant conditional probabilities, and conditional probabilities are, at least relative to unconditional probabilities, fractions. Normally, one does not get nice results for ‘mixing’ fractions by simply averaging numerators and denominators. But that’s exactly what we do get here.

I don’t have a very good sense of why this result holds. I sort of do understand why Moss’s results hold, I think, though perhaps not well enough to explain! But just why this result obtains is a bit of a mystery to me. But it seems to hold. And I think it’s one more reason to think that the obvious answer to our original question is the right one; if you want to aggregate two probability functions, just average them.

Only Knowledge is Evidence

Juan Comesaña and Holly Kantin have a paper forthcoming out in PPR that argues against Williamson’s E=K thesis. (UPDATE: Actually it’s not forthcoming, it’s in the March 2010 edition. My apologies.)

Now I don’t believe E=K. But I do sorta believe that only knowledge is evidence, and that’s the target of most of their arguments. They call the thesis that only knowledge is evidence E=K 1.

They argue that certain Gettier cases are impossible given E=K 1. Here’s one such Gettier case.

Coins. You are waiting to hear who among the candidates got a job. You hear the secretary say on the telephone that Jones got the job. You also see Jones empty his pockets and count his coins: he has ten. You are, then, justified in believing that Jones got the job and also that Jones has ten coins in his pocket. From these two beliefs of yours, you infer the conclusion that whoever got the job has ten coins in his pocket. Unbeknownst to you, the secretary was wrong and Jones did not get the job; in fact, you did. By chance, you happen to have ten coins in your pocket.

Now this seems like an easy case to me. “You” have two pieces of evidence. First, that Jones has ten coins in his pocket. Second, that the secretary said that Jones got the job. Those bits of evidence justify a belief that whoever got the job has ten coins in his pocket. But Comesaña and Kantin think this isn’t a good enough explanation of the story. They insist that the intermediate conclusion that Jones got the job is also part of the evidence. I’m not sure quite why they think that; it seems contradictory to me to say that p is part of someone’s evidence, but ¬p. They do offer this consideration.

And there is no argument that we can think of to the effect that your belief that Jones got the job plays no part whatsoever in justifying you in thinking that whoever got the job has ten coins in his pocket.

That seems like a failure of imagination to me. In general, there’s always an argument to the effect that p. Namely God knows that p, therefore p. Now the first premise might occasionally be false, but still, it’s an argument.

A little more seriously, here’s one such argument.

  1. Only knowledge justifies.
  2. “You” do not know that Jones got the job.
  3. So the (false!) proposition that Jones got the job plays no part whatsoever in justifying you in thinking that whoever got the job has ten coins in his pocket.

I know that Comesaña and Kantin don’t believe the first premise. Indeed, that’s the conclusion of their paper. But to use the falsity of that premise in an argument against it seems somewhat circular.

Learning and Knowing

I used to think the following was a nice little analytic truth.

  • If immediately prior to t, S does not know that p, and at t she does know that p, then at t, S learns that p.

But now I’m convinced there are counterexamples to it. Here are four putative counterexamples, some of which might be convincing.

A few months ago, Alice learned that the President McKinley was assassinated. Soon after, she forgot this. Just now, she was reminded that President McKinley was assassinated. So she now knows that President McKinley was assassinated, and just before now she didn’t. But she didn’t just learn that President McKinley was assassinated, she was reminded of it.

Bob starts our story in Fake Barn Country. He is looking straight at a genuine barn on a distant hill, and forms the belief that there is a barn on that hill. Since he’s in fake barn country, he doesn’t know there is a barn on the hill. At t, while Bob is still looking at the one genuine barn, all the fake barns are instantly destroyed by a visiting spaceship, from a race which doesn’t put up with nonsense like fake barns. After the barns are destroyed, Bob’s belief that there is a barn on that hill is knowledge. So at t he comes to know, for the first time, that there is a barn on that hill. But he doesn’t learn that there is a barn on that hill at t; if he ever learned that, it was when he first laid eyes on the barn.

Carol is trapped in Gilbert Harman’s dead dictator story. She has read the one newspaper that correctly (and sensitively) reported that the dictator has died. She hasn’t seen the copious other reports that the dictator is alive, but the existence of those reports defeats her putative knowledge that the dictator is alive. At t, all the other news sources change their tune, and acknowledge the dictator has died. So at t, Carol comes to know for the first time that the dictator has died. But she doesn’t learn this at t; if she ever learns it, it is when she reads the one true newspaper.

Ted starts our story believing (truly, at least in the world of the story) that Bertrand Russell was the last analytic philosopher to win the Nobel Prize in literature. The next day, the 2011 Nobel Prize in literature is announced. A trustworthy and reliable friend of Ted’s tells him that Fred has won the Nobel Prize in literature. Ted believes this, and since Fred is an analytic philosopher, Ted reasonably infers that, as of 2011 at least, Bertrand Russell was not the last analytic philosopher to win the Nobel Prize in literature. This conclusion is true, but not because Fred won. In fact, Ed, who is also an analytic philosopher, won the 2011 Nobel Prize in literature. At t, Ted is told that it is Ed, not Fred, who won the prize. Since Ted knows that Ed is also an analytic philosopher, this doesn’t change his belief that Bertrand Russell was not the last analytic philosopher to win the Nobel Prize in literature. But it does change that belief from a mere justified true belief into knowledge. But arguably it is not at t that Ted learns that Bertrand Russell was not the last analytic philosopher to win the Nobel Prize in literature, since just like in the last two cases, Ted’s evidence for this conclusion does not improve.

Lewis, Meaning and Naturalness

I spent last weekend at the excellent OPW@25 conference at UMass. Philip Bricker and the students there did a really great job of putting together a wonderful conference. My primary role there was to comment on Laurie Paul’s paper on mereological bundle theory. I possibly wasn’t the most helpful commentator, since Laurie’s project is to try to do metaphysics with as few categories as possible (ideally, one), and I think having lots of categories in one’s theory is often a good thing. But I think the audience at least provided more helpful feedback than I did!

John Hawthorne presented a paper on, among other things, the role of naturalness in Lewis’s philosophy, and this touched on some issues about the role of naturalness in Lewis’s theory of meaning. In particular, he raised some objections to the idea that meaning might, in some sense, be a function of use and Lewisian naturalness. I pushed back a little on this, mostly by arguing that we could avoid some problems John raised by adding more into the notion of use.

On the train home, I tried to write up exactly what I meant by ‘use’ that could make my arguments at the conference worked. This got more complicated than I expected, and by the time I was done, I had a short paper on naturalness in Lewis’s theory of meaning.

The paper is incredibly drafty, even by my standards. (Though I’m very happy that my current work setup means that my zeroth draft papers have full bibliographies with hyperlinked DOIs.) And it owes a lot to Wolfgang Schwarz’s Lewisian Meaning without Naturalness.

The short version of the paper is that when thinking about Lewisian approaches to meaning, we have to distinguish between metasemantics, or the giant project of locating linguistic meaning in the pattern of noises we find in nature, from applied semantics, or the project of working out the meaning of meaning of one particular term in a language about which we know a lot. Naturalness matters to both projects. That’s because naturalness matters to rationality, and rationality matters to assignments of mental content, and linguistic meaning is ultimately reducible to mental states, in much the way described in “Languages and Language”. But when we’re doing metasemantics, there’s just no way to disentangle the role naturalness plays from whatever we might mean by ‘use’; roughly, in the sense relevant to metasemantics, use is what it is in virtue of naturalness. On the other hand, in applied semantics, we can say somewhat more clearly what we mean by use. And when we do that, it will fall out of a broader Lewisian theory that (predicate) meaning is given by use (in that sense) plus naturalness.

Obviously there was a lot more interesting that happened at the conference, much of which hopefully we’ll see in print in the near future. The only one I found an online draft of after a quick search was Cian Dorr’s How to be a Modal Realist, but I’m sure I missed some. Anyway, it was a great conference, and thanks to everyone at UMass for inviting me, and for putting on such a good event!