# Untuned Probability

In the latest edition of Nous, (subscription required) Cory Juhl replies to Roger White’s discussion of the fine-tuning argument. While I am sympathetic to Juhl’s side of the debate, I think there are some missteps in his application of probability theory here. Many more details below the fold.

Juhl starts off with three cases.

Case 1: Suppose that we are told that a fair coin will be flipped. If the result
of the flip is heads, a roulette wheel will be spun exactly once on that night. If the result is tails, the wheel will be spun hundreds of times on that night. We then learn that the result of some spin was 3 (i.e., that there exists a spin whose result was 3). We are asked to bet on whether many spins were made (whether the coin landed tails), or whether the wheel was spun only once.

For concreteness, let’s say the wheel will be spun exactly 136 times if the coin lands tails. Juhl argues, correctly, that in this case we should conclude that the coin probably landed tails. In fact, our posterior credence that the coin landed tails should be almost exactly 37/38. (For the sake of numerical simplicity, I’ll pretend that the probability of at least one spin landing 3 in 136 tries is 37/38. In fact it’s closer to 73/75. We can ignore these complications in what follows.) So far so good. Now for the second example.

Let us suppose that after the gambler is shown a spin resulting in 3, he is shuffled out of the room and told that a fair coin was tossed, and if it came up heads, there was only one spin that night, whereas if it came up tails, many spins would take place. He is then asked to bet on whether one or many spins would take place (whether they might have taken place before, took place during, or would occur afterwards will be left unspecified). The gambler reasons as before: a 3 result would be unlikely if only one spin occurs that night, but highly probable otherwise, so he infers that many spins take place. But the reasoning here seems fallacious. What the gambler witnessed was not merely that some spin resulted in a 3, but that a particular spin, call it s, resulted in 3. We can stipulate (and have the gambler be given the information) that whether s results in 3 is probabilistically independent of whether other spins take place. What this comes to is that

(1) P(s3 | M) = P(s3 | O)

where M is the ‘many-spins’ hypothesis, O is the ‘one-spin’ hypothesis, and
s3 is the event that spin s results in a 3.When we stipulate this, it is clear that
the posteriors of M and O (conditionalizing on s3) remain the same as their
priors, namely 1/2.

This is already too quick, for reasons we’ll get back to below. But let’s wait and see what follows.

Now here is a really counterintuitive (initially) result. The gambler is first told, as in case 1, that a 3 resulted on some spin. He correctly infers that many spins took place (that is, the inference is not a fallacy; it could of course turn out nevertheless that the conclusion of the inference is false). Then someone says, ‘You want to see one? We have it on video.’ He is shown a video in which s results in 3. This extra information s3 defeats his first inference, so to speak, given his commitment to (1) (we are ignoring complications as to whether (1) simply has to be true or whether the gambler must know that (1), be justified in believing that (1), and so on). The gambler can no longer justifiably assign greater posterior weight to hypothesis M than he had prior to any further data, if he takes (1) to hold. (For simplicity, we are taking the (relevant) information acquired to be s3, and not, for example, that the gambler was shown a video showing that s3.)

Surprisingly, intuition is correct here. (This is surprising because in tricksy probabilistic cases, intuition is normally reliably counter-indicative.) Being shown the video can’t make a difference to the case. What has gone wrong? One hint is in the last line.

Juhl wants to assume that the gambler being shown the video is not relevant information. But this assumption simply can’t be made, at least without changing earlier parts of the case. It usually is crucial to bear in mind the difference between your evidence being that p, and your evidence being that someone is telling you (in a particular situation) that p, and this is no exception. We can’t just assume this away, any more than we could assume away, the fact that 38 is not prime. What evidence is and isn’t relevant is not a free variable in a story that we can just adjust to suit our purposes.

This distinction (between finding out that p, and finding out that someone is telling you that p) is fairly obvious to folk epistemology, but it is worth noting how striking its probabilistic consequences are. Here’s an old example (not one of mine) illustrating this.

A certain card game consists of Jill being dealt two cards from a deck consisting of just three cards, the ace of hearts, the ace of diamonds and the two of clubs. Jack is trying to figure out what cards Jill has. They have the following conversation.

Jill: I have at least one ace.
Jack: I knew that already – every possible hand includes at least one ace.
Jill: I can show you if you like.
Jack: It won’t tell you much.
Jill: That’s what you think
and Jill shows Jack the ace of hearts she is holding.

We need a dictionary to keep track of all the relevant propositions.

A1 = Jill has at least one ace
A2 = Jill has two aces
AH = Jill has the ace of hearts
AD = Jill has the ace of diamonds
SAH = Jill shows the ace of hearts
SAD = Jill shows the ace of diamonds

The relevant probabilities go as follows

Pr(A1) = 1
Pr(A2) = 1/3
Pr(A2 | A1) = 1/3
Pr(A2 | AH) = 1/2

Hopefully that should all be self-explanatory. So when Jack sees the ace of hearts, should his credence in A2 go up to 1/2? No, because the relevant probability is not Pr(A2 | AH), but Pr(A2 | SAH). And that can be calculated as follows (assuming Jill is going to show an ace one way or the other, and will pick an ace at random to show if she has both aces.)

Pr(SAH | AH & ~A2) = 1
Pr(SAH | AD & ~A2) = 0
Pr(SAH | A2) = 1/2
Pr(A2) = Pr(AH & ~A2) = Pr(AD & ~A2) = 1/3
Hence, Pr(A2 | SAH) = 1/3

So although AH moves Jack’s credences, SAH does nothing at all to Jack’s credences. It would be hard to know what one would make of the following story if I told it, and added in an afterthought “Assume the only relevant evidence is AH, and not SAH.” I think if I did that I’d be contradicting myself, and there’s not much to say (in this context) about contradictory stories.

Is this what is going on in Juhl’s case? It’s part of the story, though I suspect there’s also something deeper going on. Let’s get back to the second case. Why exactly is what the gambler sees not meant to be evidence for tails? Before we answer that, we need to be told a little more about the story. Compare the following story.

The gambler is told the initial setup – heads means one spin, tails means many spins. He has an 0.5 credence that there has been at least one spin already, and this is probabilistically independent of heads or tails. He is then shown a spin of the wheel, which happens to land on 3. What should his credence be that the coin landed tails?

Here’s the argument for this. Let H =heads, T = tails, E = there was an earlier spin. Here are the priors.

Pr(H&E) = Pr(T&E) = Pr(H&~E) = Pr(T&~E) = 0.25.

When the gambler sees the spin, no matter what it lands, he sees that H&E is not true, because then there wouldn’t be another spin to see. So he should conditionalise on ~(H&E), and Pr(T | H&~E) = 2/3.

Juhl says he wants to

stipulate (and have the gambler be given the information) that whether s results in 3 is probabilistically independent of whether other spins take place.

I think this is ambiguous. In one sense, it might mean that the objective chance of the spin resulting in 3 is probabilistically independent of whether other spins take place. But that’s not particularly relevant to the way the gambler shoudl update his credences. What Juhl needs is that whether s results in 3, and hence whether s exists, is independent of whether other spins take place. And by a generalisation of the previous case, we can see that that’s only true if the gambler’s prior credence in E is 0.

So the stipulation amounts to the claim that the gambler is told that this is the first spin of the wheel. And given that information, it’s true that the gambler shouldn’t change his credences in H and T.

The same goes for the spin the gambler is shown in case 3. Unless the gambler’s prior credence that the spin is the first spin is 1, the fact that the spin exists is evidence (potentially strong evidence) in favour of T. Juhl later says that we should “Suppose, for simplicity, that s occurs no matter what”, and s’s occurence has probability 1, but how on earth this is supposed to be possible unless we are told in advance that s is the first spin is a mystery.

So as far as I can tell, there is no way to support Juhl’s contentions about these cases unless (a) we ignore the distinction between the evidence’s existing and our being shown it, and (b) we are told explicitly that s is the first spin of the wheel. Without those stipulations, the intuitive reaction is correct.

This generalises to his discussion of the fine-tuning argument. We only get out of the argument this way if (a) we ignore the anthropic argument point that there is an important epistemic difference between the universe being fine-tuned and our perceiving that it is fine-tuned, and (b) we are told this is the first universe that was created. Given those constraints, I’m sure most people will agree the fine-tuning argument doesn’t work.

Having said all that, I don’t think much of the fine-tuning argument. It seems to me to turn on some symmetry principles of dubious coherence, and dubious plausibility given coherence. But that’s a story for different day.

## 5 Replies to “Untuned Probability”

1. Kenny Easwaran says:

I think your analogy between Juhl’s “counterintuitive” case and your example yielding 2/3 is a little bit hasty. But I think you’re right that being shown the video can’t by itself change much. I think your example doesn’t focus on the fact that you’re being told that E, but rather gives a different piece of information altogether, namely that at least one spin happened after the time at which one already had a prior of 1/2 that a spin had already occured.

Here’s an example that seems to me to better capture the distinction you’re after. Consider the following scenario:

(1) I will flip a fair coin, and if it comes up heads, I’ll spin the wheel once, and if it comes up tails, I’ll spin the wheel 100 times. After finishing with the spins, I’ll take some number x (uniformly at random) that came up at least once and tell you “at least one x was spun”.

and compare it to the following:

(2) I will flip a fair coin, and if it comes up heads, I’ll spin the wheel once, and if it comes up tails, I’ll spin the wheel 100 times. After finishing with the spins, if the number 3 came up at all, I’ll tell you “at least one 3 was spun”. Otherwise, I’ll tell yu nothing.

It seems obvious to me that in scenario (1), hearing “at least one 3 was spun” gives no evidence for or against the coin having come up tails, while in scenario (2) it gives very strong evidence in favor of tails. Different things happen if we consider (3), where I tell you the lowest number that came up, or have some other prior distribution over how I decide which number to tell you. None of these cases are changed if I then proceed to show you a video of the relevant spin.

In a sense, the relevant information isn’t either E, or that you’re being told that E, but rather, that you are learning E as opposed to the various alternatives that you could have learned otherwise. If you don’t know what the relevant alternatives are (so you don’t know if it’s scenario (1) or (2) or (3)), then I’m not sure exactly what you should do.

What matters is the likelihood of the evidence given the hypotheses, and this depends on the decision procedure by which the evidence was found. If, as in Kenny’s (1), we are simply given a number that came up, we get no support for Many. The possible pieces of evidence are ‘At least one 1 was spun, at least one 2 was spun etc.’ Each of these has a 1/38 probability given either Many spins or One. But in (2), we are searching for a particular number, 3, so the possible pieces of evidence are ‘At least one 3 was spun, no 3 was spun’. The former piece of evidence strongly confirms the Many spins hypothesis.

The reason it confirms the Many spins hypothesis is that the 3 had many more chances to land, given all the extra throws on the Many spins hypothesis. The complication arises when the 3 landed, but not on one of these extra throws – what if it landed on, say, the first throw, which would have existed given either Many spins or One? In this case, the confirmation of Many spins gets undercut. It only got confirmed because the 3 may have landed on one of the throws that only exists given Many spins. If it turns out it landed on one that exists either way, the confirmation disappears. So Juhl is right that if P(s exists | Many) = P(s exists | One), we get no support for Many spins. His mistake is that he introduces this as a ‘simplifying assumption’ when in fact it is the crux of his argument and needs defending.

I think in the cosmology case it is a reasonable assumption that our universe is more likely to exist given Many universes than Few.

This is the position I defended at the Formal Epistemology Workshop last month:

3. Brian Weatherson says:

Test

4. Cory Juhl says:

Thanks for the interesting comments on my recent Nous paper. I hope that the following remarks will help to clarify a few points.

First, I intended to argue primarily for the claim that Hacking and White are wrong to treat an inference to many universes from ‘improbable’ data as fallacious in general. (The first sentence of the second paragraph is supposed to make my main purpose clear.) Roughly, I took the dialectic to go something like this. Hacking: ‘You guys who infer the existence of many universes from fine-tuning are committing a fallacy, at least in the sequential-universe case.’ White: ‘Yeah, that’s right. Not only that, Hacking is too generous. It’s fallacious more generally, including even the non-sequential universe case.’ In response, I suggest that Hacking’s supposedly analogous case of the fallacious gambler is not analogous to all possible many-universe inferences. It depends, I claim, on a variety of background assumptions such as assumptions about conditional probabilities.

I do not deny that there can be differences between the evidential relevance of ‘s3’ and ‘I’ve been shown a video that shows s coming up 3’ in some possible cases. I merely focus on one difference between a class of cases in which the many-spins inference would be fallacious, and others in which the many spins inference would not be fallacious. The difference that I focus on has nothing to do with possible differences between learning that p and learning that someone has shown you that p.

It may be that Weatherson takes me to be attempting more (or something different) from my modest claim of showing that Hacking and White are wrong to claim that such inferences are fallacious in general. He says that he is sympathetic to ‘my side of the
debate’, but I’m not sure which debate he has in mind. I don’t take a stand on whether there is a plausible fine-tuning argument for God, or for many-universes. I take a stand only on the claim that many-universe inferences are generally fallacious. If someone argues for a conclusion, and another claims that the argument form is fallacious, a possible response is to show how the initial argument is not fallacious if appropriate
implicit premises are brought to light. It is then a further question whether the argument (or argument form) is sound, or ‘cogent’, or what have you.

I hope that it makes sense, given the dialectical background I’m presupposing, why I think that I can stipulate various structural features of the scenarios I’m envisaging. The cases are supposed to be counterexamples (to claims that these types of inferences would invariably be fallacious), so I get to say what features they have. All I need to show in the dialectical context is that there is a way of supplying further specifications to the case such that the inference in that case is not fallacious. In this sense I disagree with Weatherson’s assertion, ‘What evidence is and isn’t relevant is not a free variable in a story that we can just adjust to suit our purposes.’ Now my example is not spelled out in complete detail, and it could be that I’ve made inconsistent specifications. But I don’t yet see why my example cannot be filled out consistently. Suppose, for example, that the gambler simply has no idea what the conditional probabilities are involving his being given a video on the many-spins and one-spin hypotheses, for example. Then he may nevertheless conditionalize on what he sees in the video, and leave aside for the time being what the relevance of being handed a video is. Alternatively, I might stipulate that the gambler is told at the outset that the probability that he will be handed (or shown, or what have you) a video of spin s, is independent of whether or not many spins or one spin has taken place. I do not yet see any inconsistency in my required stipulations about the cases. If I were trying to do more, such as giving an argument of this type for the actual existence of many universes, I would not be free to stipulate such features of the relevant cases.

Weatherson states, ‘This generalizes to his discussion of the fine-tuning argument. We only get out of the argument this way if (a) we ignore the anthropic argument point that there is an important epistemic difference between the universe being fine-tuned and our perceiving that it is fine-tuned, and (b) we are told this is the first universe that was created. Given those constraints, I’m sure most people will agree the fine-tuning argument doesn’t work.’ To respond briefly: 1. I’m not trying to ‘get out of’ any fine-tuning argument. I’m objecting to Hacking’s and White’s claims that many-universe inferences are generally fallacious. 2. My remarks are not pertinent to all possible inferences from fine-tuning to many universes. I grant that for all I have shown, there may be arguments worth considering that crucially depend on differences between two distinct bits of evidence, that our universe is fine-tuned and that we perceive that it is fine-tuned. 3. Weatherson brings temporal relations to bear, but these seem structurally irrelevant. Returning to the spins case, the case can be filled out such that all spins occur simultaneously, or that spin s occurs at 2am no matter what, whereas others (if any) precede 2am. What seems crucial is, rather, the probabilistic independence of the existence of s to whether many or one spins have occurred. I agree with White’s view that Hacking is in error when he treats temporally sequential many-universe scenarios as essentially different from the non-sequential cases. I worry that Weatherson’s reintroduction of what I (and White) take to be a red herring may be a step backwards.

I will refrain for the present in commenting further on Weatherson’s interesting remarks. I will instead turn briefly to Bradley’s remark that ‘… Juhl is right that if P(s exists | Many) = P(s exists | One), we get no support for Many spins. His mistake is that he introduces this as a ‘simplifying assumption’ when in fact it is the crux of his argument and needs defending.’ As far as I can tell from his brief remarks (and a quick read of his paper a few weeks ago), Bradley and I are in nearly complete agreement. It would appear, though, that Bradley wants to pursue a more ambitious goal, that of arguing that there really is a good argument that there are actually many universes, from something like fine-tuning. I am currently agnostic on this question, and maybe Bradley will eventually convince me. I made various ‘simplifying assumptions’ in order to bring to light a particular structural difference between some fallacious and some ‘probabilistically sound’ arguments in the vicinity. I did not need to defend their actual truth or actual plausibility, given my purpose.

I think that I mostly agree with Easwaran’s remarks. Like Weatherson, Easwaran notes possible dependencies that I leave out of my discussion. Although one could say that the proposition that the sentence ‘A 3 was spun’ is uttered provides no evidence for tails (or ‘Many’), the content of the sentence does provide evidence for tails, that tails was the result of the coin toss, it seems to me (contrary to Bradley’s remark that in case 1 we ‘get no support for Many’). We are told in the description of the case that the sentence uttered is true, and this provides us with the proposition that a 3 came up, and when we conditionalize on that proposition the posterior of Many is boosted. It is not clear to me whether Easwaran would agree with this. In any case, I would respond similarly to Easwaran’s comments, if they are intended as objections. If one could never justifiably conditionalize on a proposition p without further considering what the probability of learning precisely that p as opposed to other propositions, this would seem to make probabilistic inference either impossible on pain of regress, or perhaps infinitarily complex. I had intended the ‘video’ case to be a picturesque way of showing how under some possible probabilistic background assumptions (including that P(s exists/M)=P(s exists/O)), learning that spin s in particular results in a 3 could undercut a many-spins inference that was made on the basis of learning merely that some spin results in a 3. I inadvertently triggered, it appears, worries among expert readers that I had illicitly neglected dependencies of the sort mentioned by Easwaran and Weatherson. I hope that the above remarks assuage those worries to some extent.

5. A Scott Crawford says:

I think the logic of this example would be clearer if it used the actual vigorish tables for a single zero roulette wheel. A roulette wager on one number landing on a 37 number wheel pays odds for a 36 number wheel, and the house pockets the statistical difference. In the case where there’s only one spin of the wheel the vig is dramatically lower than the case where there’s a hundred plus turns.

The point is that the analogy shouldn’t use a roulette wheel, which doesn’t offer true odds as a game, because it’s the vigorish that determines betting (reward) strategy in such a case and not the pure range of statistical probabilities.