# Updating Vague Probabilities

I’ve been thinking a bit recently about the following position, and I couldn’t see any obvious reason why it was incoherent, so I was wondering whether (a) it might be true or (b) I was missing something obvious about why it was incoherent. So feedback on it is more than welcomed.

Many Bayesians model rational agents using the following two principles

• At any moment, the agent’s credal states are represented by a probability function.
• From moment to moment, the agent’s credal states are updated by conditionalisation on the evidence received.

Of course these are idealisations, and many other people have been interested in relaxing them. One relaxation that has got a lot of attention in recent years is the idea that we should represent agents not by single probability functions, but by sets of probability functions. We then say that the agent regards q as more probable than r iff for all probability functions Pr in the set, Pr(q) > Pr(r). This allows that the agent need not hold that q is more probable than r, or r more probable than q, or that q and r are equally probable, for arbitrary q and r. And that’s good because it isn’t a rationality requirement that agents make pairwise probability judgments about all pairs of propositions.

Now what effect on the model does this relaxation have on the principle about updating? The standard story (one that I’ve appealed to in the past) is that the ideal agent updates by conditionalising all the functions in the set. So if we write PrE for the function such that PrE(x) = Pr(x | E), and S is the set of probability functions representing the agent’s credal state before the update, then {PrE: Pr is in S} is the set we get after updating.

Here’s the option I now think should be taken seriously. Sometimes getting evidence E is a reason for the agent to have more determinate probabilistic opinions than she previously had. (I’m using ‘determinate’ in a sense such that the agent represented by a single probability function has maximally determinate probabilistic opinions, and the agent represented by the set of all probability functions has maximally indeterminate opinions.) In particular, it can be a reason for ‘culling’ the set down a little, as well as conditionalising on what remains. So we imagine that updating on E involves a two-step process.

• Replace S with U(S, E)
• Update U(S, E) to {PrE: Pr is in U(S, E)}

In this story, U is a function that takes two inputs: a set of probability functions and a piece of evidence, and returns a set of probability functions that is a subset of the original set. (The last constraint might want to be weakened for some purposes.) Intuitively, it tells the agent that she needn’t have worried that certain probability functions were the ones she should be using. We can put forward formal proposals for U, such as the following

Pr is in U(S, E) iff Pr is in S and there is no Pr* in S such that Pr*(E) > 2Pr(E)

That’s just an illustration, but it’s one kind of thing I have in mind. (I’m particularly interested in theories where U is only knowable a posteriori, so it isn’t specifiable by such an abstract rule that isn’t particularly responsive to empirical evidence. So don’t take that example too seriously.) The question is, what could we say against the coherence of such an updating policy?

One thing we certainly can’t say is that it is vulnerable to a Dutch Book. As long as U(S, E) is always a subset of S, it is easy to prove that there is no sequence of bets such that the agent regards each bet as strictly positive when it is offered and such that the sequence ends in sure loss. In fact, as long as U(S, E) overlaps S, this is easy to show. Perhaps there is some way in which such an agent turns down a sure gain, though I can’t myself see such an argument.

In any case, the original Dutch Book argument for conditionalisation always seemed fairly weak to me. As Ramsey pointed out, the point of the Dutch Book argument was to dramatise an underlying inconsistency in credal states, and there’s nothing inconsistent about adopting any old updating rule you like. (At the Dutch Book symposium last August this point was well made by Colin Howson.) So the threshold for endorsing a new updating rule might be fairly low.

It might be that the particular version of U proposed above is non-commutative. Even if that’s true, I’m not 100% sure it’s a problem, and in any case I’m sure there are other versions of U that are commutative.

In the absence of better arguments, I’m inclined to think that this updating proposal is perfectly defensible. Below the fold I’ll say a little about why this is philosophically interesting because of its connection to externalist epistemologies and to dogmatism.

On the standard Bayesian model, the ideal agent assigns probability x to p after receiving evidence E iff she assigns probability x to p given E a priori. That is, she knows a priori what credences are justified by which evidence. But it isn’t obvious, to say the least, that these relations are always knowable a priori. Arguably, getting evidence E teaches us, among other things, what is justified by evidence E. At least there needs to be an argument that this isn’t the case, and the Bayesian literature isn’t exactly overflowing with it.

The proposal above, while still obviously an idealisation, suggests a model for a theory of learning on which evidential relations are sometimes only learned a posteriori. The idea is that a priori, the set S is the set of all probability functions. After receiving evidence E, we have reason to be less uncertain in Keynes’s sense. So we eliminate many functions, and the resulting credences we assign to propositions are supported by E, as is the knowledge that E supports just those credences.

That’s the first, and I think biggest, reason for taking this model seriously. The other concerns an objection to kinds of dogmatism. Let E be the evidence I actually have, and p something I know on this basis that is not entailed by E. (A wide variety of people, from sceptics to Williamson, believe there is no such p, but this is a very counterintuitive position that I’d like to avoid.) Consider now the material implication If E then p, which can be written as Not E or p. The dogmatist, as I’m using that term, thinks we can know p on the basis of E even though we didn’t know this conditional a priori.

Now there’s a Bayesian objection to this kind of dogmatism. The objection surfaces in John Hawthorne’s “Deeply Contingent A Priori Knowledge” and is directly connected to dogmatism in Roger White’s recent Problems for Dogmatism. The point is that for all q and r, and all probability functions Pr, the following is true.

Pr(~q or r | q) < Pr(~q or r)

So in particular we can prove that

Pr(~E or p | E) < Pr(~E or p)

Now with one extra premise, say that evidence that decreases the probability of a proposition can never be the basis for knowing that proposition, we conclude that if we didn’t know the conditional a priori we don’t know it now on the basis of E. The theory of updating I’m sketching here (though not the particular proposal about U above) provides a way out of this argument. Here’s one way it might go. (I’d prefer an ‘interest-relative’ version of this, but I’ll leave that out because it complicates the argument.)

• It is a necessary condition for an agent to know p on the basis of E that the ideal agent with evidence E assigns a credence greater than some threshold t to p
• An agent represented by a set of probability functions S assigns a credence greater than some threshold t to p iff for all Pr in S, Pr(p) > t

Now it is possible, given the right choice of U, that for some Pr in S, Pr(~E or p) < t, but for all Pr in U(S, E), Pr(~E or p | E) > t. That’s to say that evidence E does suffice for the ideal agent having a credence in ~E or p greater than the threshold, even though the agent’s credence in ~E or p was not greater than the threshold a priori. Of course, it wasn’t lower than the threshold or equal to it before getting the evidence, it was simply incomparable to the threshold. So as well as having a way to model a posteriori epistemology, we have a way to model dogmatist epistemology. Those seem like philosophical reasons to be interested in this approach to updating.

## 2 Replies to “Updating Vague Probabilities”

1. Kenny Easwaran says:

I’ll have to come back to this later to look at the stuff you mention about conditionals – it definitely sounds interesting, but I“ll want more time to think about it. It definitely seems to me that it might give some motivation for this proposal. But the initial motivation you give (which definitely seems more compelling at first) I think isn’t necessarily going to be very strong.

Sometimes getting evidence E is a reason for the agent to have more determinate probabilistic opinions than she previously had. (I’m using ‘determinate’ in a sense such that the agent represented by a single probability function has maximally determinate probabilistic opinions, and the agent represented by the set of all probability functions has maximally indeterminate opinions.)

I agree with all of this. However, there are still several ways to measure “more determinate”. The one initially suggested by this phrasing would be a measure of the cardinality of the set of probability functions – but this just isn’t going to be a very fine-grained measure of determinateness at all, because presumably the set of functions will have at least continuum-many functions most of the time, as long as there is some proposition whose probability is spread out over an interval.

Perhaps a better way to measure the determinateness of a set of functions is to fix a proposition and measure the difference between the highest and lowest probabilities assigned by functions in the set. (The reason I want to fix the proposition first is that I’m pretty sure that for any two distinct probability functions, you can probably find a proposition that one assigns probability 1 and the other assigns probability 0 – at least, if one considers conditional probabilities as well as unconditional ones.)

Now if we consider this “spread” on any particular proposition as the measure of indeterminateness of a set of probability functions, we might not ever need to eliminate any of the functions in order to increase determinateness as we update. I haven’t reviewed the limit theorems for bayesian updating, but I think they say that any two probability functions, with probability 1, will eventually converge (in probability?) to the truth, given reasonable evidence. This eventual pairwise convergence certainly doesn’t guarantee eventual convergence of an entire infinite set (after all, starting with the complete set of all functions, each of the updated versions will have maximal spread after any set of data that doesn’t logically entail the proposition in question).

However, if the initial set of probability functions has some bounded spread on a suitably large class of propositions, I conjecture that this spread will gradually decrease as they conditionalize on the same evidence, with no need to drop any functions.

Of course, there may be other reasons to drop functions, for instance to model the fact that E can cause us to believe P even if we don’t antecedently believe E->P.

2. Brian Weatherson says:

I’m not sure that we need a measure of determinacy in order to say that one credal state is more determinate than another. I was simply taking it as a premise that if S1 is a subset of S2, then a credal state modelled by S1 is more determinate than S2. I don’t think that commits me to the existence of a measure of determinacy – largely because I don’t think that commits me to ‘more determinate than’ being a linear order. This could be wrong though, I certainly haven’t worked through the details in any depth.

Now there is one complication here, namely that contracting and conditionalising in general won’t result in a set that’s a subset of the original. But it can be broken down into two parts, namely contraction and conditionalisation, and the first of these seems to be a way to get to a more determinate state.