In posts here, here, and here, I discuss possible responses to a counterexample to the Law of Likelihood based on Borel’s paradox that stay within Kolmogorov’s theory of regular conditional distributions. In this post I discuss the alternative option of abandoning the Kolmogorov theory in favor of the theory of coherent conditional probabilities developed by de Finetti (1974) and Dubins (1975).

## Recap of the Counterexample

Take a sphere with circumference 360 miles equipped with an arbitrary system of latitudes and longitudes. Let $E$ be the datum that a point $P$ randomly selected from a uniform distribution on the surface of the sphere lies within one mile of the intersection of the equator and the prime meridian. Let $H_1$ be the hypothesis that $P$ lies on the equator. Let $H_2$ be the hypothesis that $P$ lies on the union of the prime meridian and its opposite (the “prime meridional circle”).^{1}

The Law of Likelihood^{2} implies that $E$ favors $H_2$ over $H_1$, regardless of which great circle is designated the equator and which the prime meridional circle.

**This result is wrong** because the distinction between the equator and the prime meridional circle is (by stipulation) arbitrary. Thus, the Law of Likelihood needs to be abandoned, restricted, or revised.

## Possible Remedy 3

This counterexample arises from the fact that regular conditional distributions for a probability-zero hypothesis depend on the $\sigma$-field in which that hypothesis is embedded.

One possible response is to adopt the alternative theory of *coherent conditional probabilities.* This theory agrees with Kolmogorov’s in requiring $\mbox{Pr}(A|C)=\mbox{Pr}(A\mbox{&}C)/\mbox{Pr}(C)$ when $\mbox{Pr}(C)>0$. It differs from Kolmogorov’s in requiring all and only the following when $\mbox{Pr}(C)=0$:

*Propriety*: $\mbox{Pr}(C|C)=1$*Finite Additivity*: $\mbox{Pr}(A \mbox{&} B|C)=\mbox{Pr}(A|C)+\Pr(B|C)$ if $A \mbox{&} B=\varnothing$*Non-Negativity*: $\mbox{Pr}(A|C)\geq 0$- “Nesting” (my term): $\mbox{Pr}(A \mbox{&} B|C)=\Pr(A|B\mbox{&}C)\Pr(B|C)$ if $B\mbox{&}C \neq \varnothing$

Requirements 1-3 are conditional counterparts to Kolmogorov’s axioms of unconditional probability, and Requirement 4 is a “doubly conditional” counterpart to the formula $\Pr(A\mbox{&}B)=\Pr(A|B)\Pr(B)$.

Kolmogorov’s regular conditional distributions also satisfy Requirements 2-4. Unlike coherent conditional probabilities, they are required to be countably as well as fintely additive. They are also required to satisfy a continuous generalization of the Law of Total Probability $\mbox{Pr}(A \mbox{&} B)= \sum_{B_i} \mbox{Pr}(A| B_i)\mbox{Pr}(B_i)$, where $\{B_i\}$ is a partition of $B$. This requirement ensures that regular conditional distributions are unique up to almost-everywhere equivalence relative to a $\sigma$-field. This uniqueness comes at the cost of making conditional probabilities relative to a $\sigma$-field–the source of Borel’s paradox–and sometimes violating Requirement 1 (see Seidenfeld et al. 2001, but also Easwaran 2008, 93-4). In addition, some probabilities distributions do not admit regular conditional distributions relative to a particular sigma field (see Billingsley 1995, Exercise 33.11).

## The Two Theories of Conditional Probability Applied to Borel’s Paradox

The theory of coherent conditional probabilities allows one to adopt any probability distributions one likes conditional on events of probability zero, subject to Requirements 1-4. Thus, it does not *require* us to assign uniform probability over every great circle in the version of Borel’s paradox presented above. However, it does *allow* us to do so.

One might think that we make no progress by adopting Remedy 3 because the theory of regular conditional distributions *also* allows us to assign uniform probability to every great circle. It just does not allow us to do so in the $\sigma$-field corresponding to any coordinate system of latitudes and longitudes.

Nevertheless, Remedy 3 has the advantage that **it does not require hunting for a $\sigma$-field that generates the “right” probabilities conditional on probability-zero hypotheses.** It allows us simply to adopt those probabilities directly.

A likelihoodist who is an advocate of Kolmogorov’s theory could reply that assessing evidential favoring for probability-zero hypotheses requires choosing the appropriate $\sigma$-field. If there is no preferred $\sigma$-field, then there is no degree of evidential favoring *simpliciter*. He or she may or may not wish to say that there are degrees of evidential favoring relative to $\sigma$-fields (see this post).

I see no knockdown argument against this view. However, it would at least be nice to be able to assess probabilities conditional on probability-zero hypotheses directly in light of symmetries in the problem, for instance, without the intermediate step of specifying a $\sigma$-field that respects those symmetries. For this reason if for no other, the theory of coherent conditional probabilities is worth considering.

However, this theory faces other difficulties, most notably the fact that it can violate *conglomerability.*

## Violations of Conglomerability in the Theory of Coherent Conditional Distributions

Conglomerability is the requirement that unconditional probabilities lie within the range of corresponding conditional probabilities.

In simple finite cases, it is intuitively obvious that probability distributions should be conglomerable. If the probability that I finish this blog post today given that I don’t develop a headache is .8, and the probability that I finish it today given that I do develop a headache is .6, then the unconditional probability that I finish it today must be between .6 and .8. The claim that the unconditional probability is .9 or .5 would seem to manifest serious confusion given these conditional probabilities.

When we require that probability distributions be finitely additive but not that they be countably additive, the formula $P(A|C)=P(A\mbox{&}C)/P(C)$ on which everyone agrees leads to failures of conglomerability (see Seidenfeld et al. 2013, 3). But when we require countable additivity, regular conditional distributions can never violate conglomerability in any countable or uncountable partition. Coherent conditional probabilities, by contrast, always violate conglomerability in some uncountable partition provided unless they happen to be fully additive, which for good reasons they are not required to be.

## A Likelihoodist Worry About Remedy 3

The Likelihood Principle says that the evidential meaning of $E$ with respect to a partition of the hypothesis space depends only on the probability of $E$ conditional on the elements of that partition as a function of those elements. Is this claim tenable in a partition that violates conglomerability? Such partitions violate many of our intuitions. Why not the intuitions that lead to the Likelihood Principle?^{3} And if the Likelihood Principle fails, then what about the Law of Likelihood?

If this worry can be developed into a proper argument, then all three of the remedies to Borel’s paradox to the Law of Likelihood I have considered require restricting the scope of the Law of Likelihood. Whether or not it can be so developed is a topic for further investigation.

- As in “Borel’s Paradox as a Counterexample to the Law of Likelihood”, one should remove from each hypothesis one of the two points at which the two circles intersect so that one cannot avoid the counterexample by restricting the Law of Likelihood to mutually exclusive hypotheses. I omit this point here for ease of exposition.↩
- The Law of Likelihood says that $E$ favors $H_1$ over $H_2$ if and only if $\mbox{Pr}(E|H_1)/\mbox{Pr}(E|H_2)=k>1$, and $k$ measures the degree of that favoring. ↩
- Thanks to Teddy Seidenfeld for raising this worry. ↩

To **share your thoughts about this post**, comment below or send me an email. Comments support $\LaTeX$: surround mathematical expressions with single dollar signs for inline mode or double dollar signs for display mode (just as you would in writing a $\LaTeX$ document).

## Leave a Reply