# Distributivity of More Probable Than

I’ve thought for a long time that the relation more probable than was not a linear order. That is, it is possible to have propositions A and B such that none of the following three claims hold.

• A is more probable than B.
• B is more probable than A.
• A and B are equally probable.

This isn’t a particularly original idea of mine; it goes back at least as far as Keynes’s Treatise on Probability (which is where I got the idea from).

I’ve also thought for a long time that there was a nice way to model failures of linearity using sets of probability functions. Say there is some special set S of functions each of which satisfy the probability calculus, and then define the relations considered above as follows.

• A is more probable than B =df For all Pr in S, Pr(A) > Pr(B).
• B is more probable than A =df For all Pr in S, Pr(B) > Pr(A).
• A and B are equally probable ==df For all Pr in S, Pr(A) > Pr(B)

This isn’t particularly new either; the idea goes back at least to the 1960s, perhaps earlier. I did have one idea to contribute to this, namely to suggest that this sets of probability functions approach to understanding comparative probability claims was a good way of modelling Keynes’s somewhat hard to model ideas. But now I’m starting to worry that this was a mistake, or at least undermotivated in a key respect.

Note that on the sets of probability functions approach, we can identify probabilities with functions from each Pr in S to a real in [0, 1]. Call such functions X, Y, etc, and we’ll define X(Pr) in the obvious way. Then there is a natural ordering on the functions, namely X >= Y iff for all Pr in S, X(Pr) >= Y(Pr). This ordering will be reflexive and transitive, but not total.

What I hadn’t thought about until today was that there is a natural meet and join on probabilities that we can define. So the meet of X and Y will be the function Z such that Z(Pr) is the maximum of X(Pr), Y(Pr), and the join of X and Y will be the function Z such that Z(Pr) is the minimum of X(Pr), Y(Pr). This isn’t too surprising – it might be a little sad if probabilities didn’t form a lattice.

What’s surprising is that given this definition, they form a distributive lattice. That is, for any X, Y, Z, if we write XMY for the meet of X and Y, and XJY for the join of X and Y, we have (XMY)JC = (XJC)M(YJC). (Or at least I think we do; I might just be making an error here.) That’s surprising because there’s no obvious reason, once you’ve given up the idea that probabilities form a linear order, to believe in distributivity.

Open question: What other interesting lattice properties does more probable than have?

I know that it isn’t a Boolean lattice. There’s no way to define a negation relation N on probabilities such that (a) X > Y iff NY > NX and (b) XMNX is always the minimal element. I think that’s because the only way to define N to satisfy condition (a) is if NX(Pr) = 1 – X(Pr) for all Pr in S, and that relation doesn’t guarantee that XMNX is minimal. But as for other properties, I’m not really sure.

When I was working on truer, I spent a lot of time worrying about whether it generated a distributive lattice. Eventually I came up with an argument that it did, but it was very speculative. (Not that it was a particularly original argument; everything about lattice theory in that paper I borrowed from Greg Restall’s Introduction to Substructural Logics, which seems to be now out in Kindle version.) It feels bad to simply assume that more probable than generates a distributive lattice simply because the easiest way to model it implies distributivity.

## One Reply to “Distributivity of More Probable Than”

1. Michael Kremer says:

Brian,

A couple of quick comments that may or may not be helpful, ending with a possibly critical question. What you’ve mentioned so far (distributive lattice, not Boolean) has nothing really to do with the fact that you’re dealing with probabilities here. That is, suppose we have an arbitrary set S (which need not be a set of functions, let alone probability functions). Now consider the set P of all functions from S into [0,1]. You can then define an ordering on the set P and show that it is a distributive lattice as you’ve done here. (for any f, g, in P, f >=g iff for all x in S, f(x) >= g(x), meet (f,g) is that function h such that h(x) = max (f(x), g(x)) etc.)

Anything more interesting must surely exploit the additional structure you get from the probability functions. But (see end of this note) if you take this seriously you might not get your meet and join after all. Or you’ll at least need more argument.

I note that at one point you say “Note that on the sets of probability functions approach, we can identify probabilities with functions from each Pr in S to a real in [0, 1]. Call such functions X, Y, etc, and we’ll define X(Pr) in the obvious way.” Initially I was puzzled by this: X is a function from probability functions in S into [0,1] — what do you mean by “defining X(Pr) in the obvious way”? Then, I think, I understood: for each proposition A, we’ll define the probability of A to be the function XsubA: S —> [0,1] given by X(Pr) = Pr(A).

So let’s assume that the set of probabilities is the set of all such functions XsubA where A is a proposition in the domain of the original probability functions Pr. (If instead you take the set of probabilities to be all functions from S into [0,1] I want to ask whether some of those functions might not be inappropriate to call “probabilities.”)

So we have a set PROB = {XsubA | A is a proposition}

and XsubA: S —> [0,1]

and XsubA(Pr) = Pr(A).

Is that right?

Then all I mean to say is that if there is anything interesting to establish about the order on PROB that you’ve introduced, it will need to derive from the structure of the set of propositions and the structure of the probability functions in the original set S (that they are probability functions).

Actually on reflection if we restrict ourselves only to those functions XsubA where A is a proposition it is not obvious that your meet and join are defined. That is, for any two propositions A and B, it is not immediately obvious that the function Z such that Z(Pr) = min(XsubA(Pr), XsubB(Pr)) is a function XsubC for some proposition C. But if you don’t restrict yourself to such functions XsubA, then do you want to consider arbitrary such functions? Would it be reasonable to allow such arbitrary functions to all count as “probabilities”?