I’ve thought for a long time that the relation more probable than was not a linear order. That is, it is possible to have propositions A and B such that none of the following three claims hold.
- A is more probable than B.
- B is more probable than A.
- A and B are equally probable.
This isn’t a particularly original idea of mine; it goes back at least as far as Keynes’s Treatise on Probability (which is where I got the idea from).
I’ve also thought for a long time that there was a nice way to model failures of linearity using sets of probability functions. Say there is some special set S of functions each of which satisfy the probability calculus, and then define the relations considered above as follows.
- A is more probable than B =df For all Pr in S, Pr(A) > Pr(B).
- B is more probable than A =df For all Pr in S, Pr(B) > Pr(A).
- A and B are equally probable ==df For all Pr in S, Pr(A) > Pr(B)
This isn’t particularly new either; the idea goes back at least to the 1960s, perhaps earlier. I did have one idea to contribute to this, namely to suggest that this sets of probability functions approach to understanding comparative probability claims was a good way of modelling Keynes’s somewhat hard to model ideas. But now I’m starting to worry that this was a mistake, or at least undermotivated in a key respect.
Note that on the sets of probability functions approach, we can identify probabilities with functions from each Pr in S to a real in [0, 1]. Call such functions X, Y, etc, and we’ll define X(Pr) in the obvious way. Then there is a natural ordering on the functions, namely X >= Y iff for all Pr in S, X(Pr) >= Y(Pr). This ordering will be reflexive and transitive, but not total.
What I hadn’t thought about until today was that there is a natural meet and join on probabilities that we can define. So the meet of X and Y will be the function Z such that Z(Pr) is the maximum of X(Pr), Y(Pr), and the join of X and Y will be the function Z such that Z(Pr) is the minimum of X(Pr), Y(Pr). This isn’t too surprising – it might be a little sad if probabilities didn’t form a lattice.
What’s surprising is that given this definition, they form a distributive lattice. That is, for any X, Y, Z, if we write XMY for the meet of X and Y, and XJY for the join of X and Y, we have (XMY)JC = (XJC)M(YJC). (Or at least I think we do; I might just be making an error here.) That’s surprising because there’s no obvious reason, once you’ve given up the idea that probabilities form a linear order, to believe in distributivity.
Open question: What other interesting lattice properties does more probable than have?
I know that it isn’t a Boolean lattice. There’s no way to define a negation relation N on probabilities such that (a) X > Y iff NY > NX and (b) XMNX is always the minimal element. I think that’s because the only way to define N to satisfy condition (a) is if NX(Pr) = 1 – X(Pr) for all Pr in S, and that relation doesn’t guarantee that XMNX is minimal. But as for other properties, I’m not really sure.
When I was working on truer, I spent a lot of time worrying about whether it generated a distributive lattice. Eventually I came up with an argument that it did, but it was very speculative. (Not that it was a particularly original argument; everything about lattice theory in that paper I borrowed from Greg Restall’s Introduction to Substructural Logics, which seems to be now out in Kindle version.) It feels bad to simply assume that more probable than generates a distributive lattice simply because the easiest way to model it implies distributivity.