Equal Weight and Asymmetric Uncertainty

Not for the first time, I’m unsure what the Equal Weight View says about a case. Here it is.

bq. *Jack and Jill and Microsoft Excel*

bq. Jack and Jill both have the same evidence; they know that a paricular table has 83 rows and 97 columns. They both try to figure out how many cells are in the table. They know that the number of cells quals the number of rows times the number of columns. Jill concludes that it has 8051 cells. Jack can’t conclude anything about the number of cells. He thinks it might be 8051, but it might be something else. When they are each told about the other’s conclusions (or lack thereof), what should their final conclusions be?

A position half-way between Jack’s view and Jill’s view is presumably something like a view that the table probably has 8051 cells, but that it is a serious possibility that the table has a differen number of cells.

But surely Jill shouldn’t move her views in that way, should she? If she should hold firm here, is this just a counterexample to (some versions of) the Equal Weight View?

Cian Dorr on Imprecise Credences

In the latest Philosophical Perspectives, Cian Dorr has a very interesting paper about a puzzle about what he calls the “Eternal Coin”:http://users.ox.ac.uk/~sfop0257/papers/CoinPuzzle.pdf. I hope to write more about the particular puzzle in future posts, but I wanted to mention one thing that comes up in passing about imprecise probabilities. In the course of rejecting a solution to one puzzle in terms of imprecise probabilities, he says

bq. My main worries about this response are worries about the unsharp credence framework itself. In my view, there is no adequate account of the way unsharp credences should be manifested in decision-making. As Adam Elga has recently compellingly argued, the only viable strategies which would allow for someone with an unsharp credential state to maintain a reasonable pattern of behavioural dispositions over time involve, in effect, choosing a particular member of the representor as the one that will guide their actions. (The choice might be made at the outset, or might be made by means of a gradual process of narrowing down over time; the upshot is much the same.) And even though crude behaviourism must be rejected, I think that if this is all we have to say about the decision theory, we lack an acceptable account of what it is to be in a given unsharp credential state—we cannot explain what would constitute the difference between someone in a sharp credential state given by a certain conditional probability function, and someone in an unsharp credential state containing that probability function, who had chosen is as the guide to their actions. Unsharp credential states seem to have simply been postulated as states that get us out of tricky epistemological dilemmas, without an adequate theory of their underlying nature. It is rather as if some ethicist were to respond to some tricky ethical dilemma—say, whether you should join the Resistance or take care of your ailing mother—by simply postulating a new kind of action that is stipulated to be a special new kind of combination of joining the Resistance and taking care of your mother which lacks the objectionable features of obvious compromises (like doing both on a part-time basis or letting the outcome be determined by the roll of a dice). It would be epistemologically very convenient if there was a psychological state we could rationally be in in which we neither regarded P as less likely than HF, regarded HF as less likely than P, nor regarded them as equally likely. But we should be wary of positing psychological states for the sake of epistemological convenience.

I actually don’t think that imprecise (or unsharp) credences are the solution to the particular problem Cian is interested in here; I think the solution is to say the relevant credences are undefined, not imprecise. But I don’t think this is a compelling objection to imprecise credences either.

It is, I think, pretty easy to say what the behavioural difference is between imprecise credences and sharp credences, even if we accept (as I do!) what Adam and Cian have to say about decision making with imprecise credences. The difference comes up in the context of _giving advice_ and _evaluating others’ actions_. Let’s say that my credence in _p_ is imprecise over a range of about 0.4 to 0.9, and that I make decisions as if my credence is 0.7. Assume also that I have to make a choice between two options, _X_ and _Y_, where _X_ has a higher expected return iff _p_ is more likely than not. So I choose _X_. And assume that you have the same evidence as me, and face the same choice.

On the sharp credences framework, I should advise you to do _X_, and should be critical of you if you don’t do _X_. On the imprecise credences framework, I should say that you could rationally make either choice (depending on what other choices you had previously made), and shouldn’t criticise you for making either choice (unless it was inconsistent with other choices you’d previously made).

I don’t want to argue here that it makes sense to separate out the role of credences in decision making from the role they play in advice and evaluation. All I do want to argue here is that once we move beyond decision making, and think about advice and evaluation as well, there is a functional difference between sharp and unsharp credences. So the functionalist argument that there is no new state here collapses.

One other note about this argument. I don’t think of sharp and unsharp credences as different kinds of states, or as states that need to be separately postulated and justified. What I think are the fundamental states are comparative credences. The claim that all credences are sharp then becomes the (wildly implausible) claim that all comparative credences satisfy certain structural properties that allow for a sharp representation. The claim that all credences should be sharp becomes the (still implausible, but not crazy) claim that all comparative credences should satisfy those structural properties. Either way, there’s nothing new about unsharp credences that needs to be justified. What needs to be justified is the ruling out of some structural possibilities that look _prima facie_ attractive.

What is the Equal Weight View of Disagreement?

I often find it hard to apply the Equal Weight View (EWV) in practice. This makes it my task of counterexample generating a little harder than I feel it should be. I can come up with all sorts of cases where I *think* EWV gets the wrong result, but then I get worried that EWV doesn’t actually say what I think it says about that case. Here’s one example I was working with.

bq. A and B are peers in the salient sense. They have a long track record of checking each other’s work, and they both get things right a high and equal proportion of the time. There is no external evidence that B is in any way epistemically compromised right now. They both try to work out 14 times 27, and A gets 378, while B gets 368. What should A’s credence be that the right answer is 368?

I think the EWV is committed to the answer being 0.5 or thereabouts. After all, A and B are peers, they are just as likely to get the answer right, and probably one of them did get the answer right. So the EWV-endorsed probability distribution, I would think, is that the answers 378 and 368 both get probability nearly 0.5, and the remainder goes to the possibility that they were both wrong.

This strikes me as implausible, since it is easy for A to see that 368 is the wrong answer by using the rule I’ll call D9.

bq. D9. A number is a multiple of 9 iff the sum of the digits of its base-10 representation is a multiple of 9.

So I think this is a case where EWV is wrong, A shouldn’t assign equal weight to 378 and 368 being the correct answer. I can imagine some people denying this, and saying that 378 and 368 should be given equal weight. But I can also imagine some people denying that EWV really has that consequence.

If you’re an EWV-theorist, do you think EWV entails in this case that A should give equal credence to 378 and 368 being the correct answer? If the case is too vaguely described to answer that, consider some of the following variations.

Variation 1. A doesn’t commit to an answer before checking that it is consistent with D9. So that the answer 378 is consistent with D9 is part of her reason for believing the answer is 378. That means, I think, that Christensen’s Independence principle would rule out her going on to use D9 to conclude that B must have made a mistake.

Variation 2. B has never heard of D9. Perhaps this means A and B aren’t peers, because D9 is some evidence that A has and B lacks.

Variation 3. B doesn’t believe D9. Perhaps that’s because he thinks A is misremembering the rule (It’s really a rule for multiples of 11, not multiples of 9), or perhaps because he thinks there are restrictions on the rule (e.g., it is only guaranteed to work for numbers with an even number of digits).

Variation 4. B denies that all multiples of 27 are multiples of 9.

Variation 5. B denies that his answer is inconsistent with D9, since 3+6+8 = 18, while 3+7+8 = 19, so D9 actually supports his answer, not A’s.

I can sort of see how an EWV theorist would deny that EWV applies in variations 2 and 4, but in all the other cases, it seems to me that EWV implies, incorrectly, that A should give equal credence to 368 and 378 being the correct answer. But maybe that’s just because I haven’t understood EWV correctly. Anyone want to correct my understanding?

Philosophy Compass, Volume 6, Issue 5

Philosophy Compass

Cover image for Vol. 6 Issue 5

May 2011

Volume 6, Issue 5, Pages 300–373

Continental

Transcendental Arguments About Other Minds and Intersubjectivity (pages 300–311)
Matheson Russell and Jack Reynolds
Article first published online: 4 MAY 2011 | DOI: 10.1111/j.1747-9991.2011.00394.x
Abstract, Full Article (HTML), PDF(85K), References.

Epistemology

Bayesianism I: Introduction and Arguments in Favor, (pages 312–320)
Kenny Easwaran
Article first published online: 4 MAY 2011 | DOI: 10.1111/j.1747-9991.2011.00399.x
Abstract, Full Article (HTML), PDF(87K), References.

Bayesianism II: Applications and Criticisms, (pages 321–332)
Kenny Easwaran
Article first published online: 4 MAY 2011 | DOI: 10.1111/j.1747-9991.2011.00398.x
Abstract, Full Article (HTML), PDF(92K), References.

History of Philosophy

Reidian Metaethics: Part I, (pages 333–340)
Terence Cuneo
Article first published online: 4 MAY 2011 | DOI: 10.1111/j.1747-9991.2011.00393.x
Abstract, Full Article (HTML), PDF(65K), References.

Reidian Metaethics: Part II, (pages 341–349)
Terence Cuneo
Article first published online: 4 MAY 2011 | DOI: 10.1111/j.1747-9991.2011.00392.x
Abstract, Full Article (HTML), PDF(67K), References.

Logic & Language

Technical Modal Logic, (pages 350–359)
Marcus Kracht
Article first published online: 4 MAY 2011 | DOI: 10.1111/j.1747-9991.2011.00396.x
Abstract, Full Article (HTML), PDF(151K), References

Metaphysics

The Open Future, (pages 360–373)
Stephan Torre
Article first published online: 4 MAY 2011 | DOI: 10.1111/j.1747-9991.2011.00395.x
Abstract, Full Article (HTML), PDF(93K), References

Experimental Philosophy Month

If you go to “this link”:http://www.yale.edu/cogsci/XM/Home.html and click on the large picture of a brain on the left-hand side, you can participate in a bunch of experiments through the Yale Experimental Philosophy Month program. If you don’t like the experiment that comes up, just go back to the first page and click on the brain again!

Accuracy measures on conditional probabilities

I just proved a result about probability aggregation that I found rather perplexing. The proof, and even the result, is a little too complicated to put in HTML, so here it is in PDF.

What started me thinking about this was Sarah Moss’s excellent paper Scoring Rules and Epistemic Compromise, which is about aggregating probability functions. Here’s, roughly, the kind of puzzle that Moss is interested in. We’ve got two probability functions Pr1 and Pr2, and we want to find some probability function Pr that ‘aggregates’ them. Perhaps that’s because Pr1 is my credence function, Pr2 is yours, and we need to find some basis for making choices about collective action. Perhaps it is because the only thing you know about a certain subject matter is that one expert’s credence function is Pr1, another’s function is Pr2, and each expert seems equally likely, and you want to somehow defer equally to the two of them. (Or, perhaps, it is because you want to apply the Equal Weight View of disagreement. But don’t do that; the Equal Weight View is false.)

It seems there is an easy solution to this. For any X, let Pr(X) = (Pr1(X) + Pr2(X))/2. But as Barry Loewer noted many years ago, this solution has some costs. Let’s say we care about two propositions, _p_ and _q_, and Boolean combinations of them. And say that _p_ and _q_ are probabilistically independent according to both Pr1 and Pr2. Then this linear mixture approach will not in general preserve independence. So there are some costs to it.

One of the thing Moss does is come up with an independent argument for using linear mixtures. Her argument turns on various accuracy measures, or what are sometimes called scoring rules, for probability functions. (Note that I’m leaving out a lot of the interesting stuff in Moss’s paper, which goes into a lot of detail about what happens when we get further away from the Brier scores that are the focus here. Anyone who is at all interested in these aggregation issues, which are pretty central to current epistemological debates, should read her paper.)

Thanks to Jim Joyce’s work there has been an upsurge in interest in philosophy in accuracy measures of probability functions. Here’s how the most commonly used scoring rule, called the Brier score, works. We start with a partition of possibility space, the partition that we’re currently interested in. In this case it would be {pq, p ∧ ¬q, ¬pq, ¬p ∧ ¬q}. For any proposition X, say V(X, w) is 1 if X is true, and 0 if X is false. Then we ‘score’ a function Pr in world w by summing (Pr(X) – V(X, w))2, as X takes each value in the partition. This is a measure of how inaccurate Pr is in w, the higher this number is, the more inaccurate Pr is. Conversely, the lower it is, the more accurate it is. And accuracy is a good thing obviously, so this gives us a kind of goodness measure on probability functions.

Now in the aggregation problem we’re interested in here, we don’t know what world we’re in, so this isn’t directly relevant. But instead of looking at the actual inaccuracy measure of Pr, we can look at its expected inaccuracy measure. ‘Expected’ according to what, you might ask. Well, first we look at the expectation according to Pr1, and then the expectation according to Pr2, then we average them. That gives a fair way of scoring Pr according to each Pr1 and Pr2.

One of the things Moss shows is that this average of expected inaccuracy is minimised when Pr is the linear average of Pr1 and Pr2. And she offers good reasons to think this isn’t a quirk of the scoring rule we’re using. It doesn’t matter, that is, that we’re using squares of distance between Pr(X) and V(X); any ‘credence-eliciting’ scoring rule will plausibly have the same result.

But I was worried that this didn’t really address the Loewer concern directly. The point of that concern was that linear mixtures get the conditional probabilities wrong. So we might want instead to measure the accuracy of Pr’s *conditional* probability assignments. Here’s how I thought we’d go about that.

Consider the four values Pr(p | q), Pr(p | ¬q), Pr(q | p), Pr(q | ¬p). In any world _w_, two of the four ‘conditions’ in these conditional probabilities will be met. Let’s say they are p and ¬q. Then the *conditional* inaccuracy of Pr in that world will be (Pr(q | p) – V(q))2 + (Pr(p | ¬q) – V(p))2. In other words, we apply the same formula as for the Brier score, but we use conditional rather than unconditional probabilities, and we just look at the conditions that are satisfied.

From then on, I thought, we could use Moss’s technique. We’ll look for the value of Pr that minimises the expected conditional inaccuracy, and call that the compromise, or aggregated, function. I guessed that this would be the function we got by taking the linear mixtures of the original conditional probabilities. That is, we would want to have Pr(p | q) = (Pr1(p | q) + Pr2(p | q))/2. I thought that, at least roughly, the same reasoning that implied that linear mixtures of unconditional probabilities minimised the average expected unconditional inaccuracy would mean that linear mixtures of conditional probabilities minimised the average expected conditional inaccuracy.

I was wrong.

It turns out that, at least in the case where _p_ and _q_ are probabilistically independent according to both Pr1 and Pr2, the function that does best according to this new rule is the same linear mixture as does best under the measures Moss looks at. This was extremely surprising to me. We start with a whole bunch of conditional probabilities. We need to aggregate them into a joint conditional probability distribution that satisfies various nice constraints. Notably, these are all constraints on the resultant _conditional_ probabilities, and conditional probabilities are, at least relative to unconditional probabilities, fractions. Normally, one does not get nice results for ‘mixing’ fractions by simply averaging numerators and denominators. But that’s exactly what we do get here.

I don’t have a very good sense of _why_ this result holds. I sort of do understand why Moss’s results hold, I think, though perhaps not well enough to explain! But just why this result obtains is a bit of a mystery to me. But it seems to hold. And I think it’s one more reason to think that the obvious answer to our original question is the right one; if you want to aggregate two probability functions, just average them.

Only Knowledge is Evidence

Juan Comesaña and Holly Kantin have a paper forthcoming out in PPR that argues against Williamson’s E=K thesis. (UPDATE: Actually it’s not forthcoming, it’s in the March 2010 edition. My apologies.)

Now I don’t believe E=K. But I do sorta believe that only knowledge is evidence, and that’s the target of most of their arguments. They call the thesis that only knowledge is evidence E=K 1.

They argue that certain Gettier cases are impossible given E=K 1. Here’s one such Gettier case.

bq. _Coins_. You are waiting to hear who among the candidates got a job. You hear the secretary say on the telephone that Jones got the job. You also see Jones empty his pockets and count his coins: he has ten. You are, then, justified in believing that Jones got the job and also that Jones has ten coins in his pocket. From these two beliefs of yours, you infer the conclusion that whoever got the job has ten coins in his pocket. Unbeknownst to you, the secretary was wrong and Jones did not get the job; in fact, you did. By chance, you happen to have ten coins in your pocket.

Now this seems like an easy case to me. “You” have two pieces of evidence. First, that Jones has ten coins in his pocket. Second, that the secretary said that Jones got the job. Those bits of evidence justify a belief that whoever got the job has ten coins in his pocket. But Comesaña and Kantin think this isn’t a good enough explanation of the story. They insist that the intermediate conclusion that Jones got the job is also part of the evidence. I’m not sure quite why they think that; it seems contradictory to me to say that _p_ is part of someone’s evidence, but ¬_p_. They do offer this consideration.

bq. And there is no argument that we can think of to the effect that your belief that Jones got the job plays no part whatsoever in justifying you in thinking that whoever got the job has ten coins in his pocket.

That seems like a failure of imagination to me. In general, there’s always an argument to the effect that _p_. Namely God knows that _p_, therefore _p_. Now the first premise might occasionally be false, but still, it’s an argument.

A little more seriously, here’s one such argument.

  1. Only knowledge justifies.
  2. “You” do not know that Jones got the job.
  3. So the (false!) proposition that Jones got the job plays no part whatsoever in justifying you in thinking that whoever got the job has ten coins in his pocket.

I know that Comesaña and Kantin don’t believe the first premise. Indeed, that’s the conclusion of their paper. But to use the falsity of that premise in an argument against it seems somewhat circular.

Learning and Knowing

I used to think the following was a nice little analytic truth.

  • If immediately prior to _t_, _S_ does not know that _p_, and at _t_ she does know that _p_, then at _t_, _S_ learns that _p_.

But now I’m convinced there are counterexamples to it. Here are four putative counterexamples, some of which might be convincing.

A few months ago, Alice learned that the President McKinley was assassinated. Soon after, she forgot this. Just now, she was reminded that President McKinley was assassinated. So she now knows that President McKinley was assassinated, and just before now she didn’t. But she didn’t just learn that President McKinley was assassinated, she was reminded of it.

Bob starts our story in Fake Barn Country. He is looking straight at a genuine barn on a distant hill, and forms the belief that there is a barn on that hill. Since he’s in fake barn country, he doesn’t know there is a barn on the hill. At _t_, while Bob is still looking at the one genuine barn, all the fake barns are instantly destroyed by a visiting spaceship, from a race which doesn’t put up with nonsense like fake barns. After the barns are destroyed, Bob’s belief that there is a barn on that hill is knowledge. So at _t_ he comes to know, for the first time, that there is a barn on that hill. But he doesn’t learn that there is a barn on that hill at _t_; if he ever learned that, it was when he first laid eyes on the barn.

Carol is trapped in Gilbert Harman’s dead dictator story. She has read the one newspaper that correctly (and sensitively) reported that the dictator has died. She hasn’t seen the copious other reports that the dictator is alive, but the existence of those reports defeats her putative knowledge that the dictator is alive. At _t_, all the other news sources change their tune, and acknowledge the dictator has died. So at _t_, Carol comes to know for the first time that the dictator has died. But she doesn’t learn this at _t_; if she ever learns it, it is when she reads the one true newspaper.

Ted starts our story believing (truly, at least in the world of the story) that Bertrand Russell was the last analytic philosopher to win the “Nobel Prize in literature”:http://nobelprize.org/nobel_prizes/literature/laureates/. The next day, the 2011 Nobel Prize in literature is announced. A trustworthy and reliable friend of Ted’s tells him that Fred has won the Nobel Prize in literature. Ted believes this, and since Fred is an analytic philosopher, Ted reasonably infers that, as of 2011 at least, Bertrand Russell was not the last analytic philosopher to win the Nobel Prize in literature. This conclusion is true, but not because Fred won. In fact, Ed, who is also an analytic philosopher, won the 2011 Nobel Prize in literature. At _t_, Ted is told that it is Ed, not Fred, who won the prize. Since Ted knows that Ed is also an analytic philosopher, this doesn’t change his belief that Bertrand Russell was not the last analytic philosopher to win the Nobel Prize in literature. But it does change that belief from a mere justified true belief into knowledge. But arguably it is not at _t_ that Ted learns that Bertrand Russell was not the last analytic philosopher to win the Nobel Prize in literature, since just like in the last two cases, Ted’s evidence for this conclusion does not improve.

Lewis, Meaning and Naturalness

I spent last weekend at the excellent “OPW@25”:http://people.umass.edu/jcowling/opw25.html conference at UMass. Philip Bricker and the students there did a really great job of putting together a wonderful conference. My primary role there was to comment on Laurie Paul’s paper on mereological bundle theory. I possibly wasn’t the most helpful commentator, since Laurie’s project is to try to do metaphysics with as few categories as possible (ideally, one), and I think having lots of categories in one’s theory is often a good thing. But I think the audience at least provided more helpful feedback than I did!

John Hawthorne presented a paper on, among other things, the role of naturalness in Lewis’s philosophy, and this touched on some issues about the role of naturalness in Lewis’s theory of meaning. In particular, he raised some objections to the idea that meaning might, in some sense, be a function of use and Lewisian naturalness. I pushed back a little on this, mostly by arguing that we could avoid some problems John raised by adding more into the notion of use.

On the train home, I tried to write up exactly what I meant by ‘use’ that could make my arguments at the conference worked. This got more complicated than I expected, and by the time I was done, I had a short paper on naturalness in Lewis’s theory of meaning.

The paper is incredibly drafty, even by my standards. (Though I’m very happy that my current work setup means that my zeroth draft papers have full bibliographies with hyperlinked DOIs.) And it owes a lot to Wolfgang Schwarz’s “Lewisian Meaning without Naturalness”:http://www.umsu.de/words/magnetism.pdf.

The short version of the paper is that when thinking about Lewisian approaches to meaning, we have to distinguish between metasemantics, or the giant project of locating linguistic meaning in the pattern of noises we find in nature, from applied semantics, or the project of working out the meaning of meaning of one particular term in a language about which we know a lot. Naturalness matters to both projects. That’s because naturalness matters to rationality, and rationality matters to assignments of mental content, and linguistic meaning is ultimately reducible to mental states, in much the way described in “Languages and Language”. But when we’re doing metasemantics, there’s just no way to disentangle the role naturalness plays from whatever we might mean by ‘use’; roughly, in the sense relevant to metasemantics, use is what it is in virtue of naturalness. On the other hand, in applied semantics, we can say somewhat more clearly what we mean by use. And when we do that, it will fall out of a broader Lewisian theory that (predicate) meaning is given by use (in that sense) plus naturalness.

Obviously there was a lot more interesting that happened at the conference, much of which hopefully we’ll see in print in the near future. The only one I found an online draft of after a quick search was Cian Dorr’s “How to be a Modal Realist”:http://users.ox.ac.uk/~sfop0257/papers/ModalRealism.pdf, but I’m sure I missed some. Anyway, it was a great conference, and thanks to everyone at UMass for inviting me, and for putting on such a good event!

Ross and Schroeder on Belief

This is a short post where I jot down my initial impressions of Jake Ross and Mark Schroeder’s interesting paper ”
Belief, Credence, and Pragmatic Encroachment”:http://www-bcf.usc.edu/~jacobmro/ppr/Belief_Credence_and_Pragmatic_Encroachment.pdf, and in particular how it compares to my views on belief and credence. I’m not going to summarise the paper, so this post won’t make a lot of sense unless you’ve read their paper too.
Continue reading