## The Counterexample

Suppose one were to draw a point *P* at random from a uniform distribution over the surface of a sphere equipped with lines of longitude and latitude with a circumference of 360 miles. Consider following pair of hypotheses:

- $H_1$:
*P*lies on the equator (0 latitude), omitting the point at which the equator intersects the line of 180 degrees longitude. - $H_2$:
*P*lies on the “prime meridional circle,” that is, the great circle consisting of the prime meridian (0 longitude) and its opposite (180 degrees longitude), omitting the point at which the prime meridian intersects the equator.

Now suppose one learns the following:

*E*:*P*lies within a one-mile radius of the intersection of the equator and the prime meridian.

**Intuitively, E is evidentially neutral between $H_1$ and $H_2$**. After all, neither the datum

*E*nor the setup of the problem distinguishes between the equator and the prime meridional circle.

**But according to the Law of Likelihood,**to a degree that is small but not negligible: the relevant likelihood ratio is about 1.6.

*E*favors $H_2$ over $H_1$“An Explanation of Borel’s Paradox That You Can Understand” explains how this result arises. To my knowledge, it has not been presented as a counterexample to the Law of Likelihood before.

## Three Red Herrings

There are three seemingly promising places to look for a response to this counterexample that in fact lead nowhere.

### Redding Herring 1: The Omitted Points

Contrary to what one might suspect, **the fact that one point is omitted from each hypothesis plays no role in generating the likelihood ratio of 1.6. ** Because those points have no length, the same likelihood ratio arises no matter how they are handled.

If anything, one would think that the omitted points would lead to the result that *E* favors $H_1$ over $H_2$ because $H_2$ omits the point at the center of the region in which *E *says that *P* lies, while $H_1$ omits the farther point from that region on the surface of the sphere. But the Law of Likelihood says just the opposite.

I omit one point from each hypothesis only so that **one cannot avoid the counterexample by restricting the Law of Likelihood to mutually exclusive hypotheses** as I advocate in “New Responses to Three Counterexamples to the Likelihood Principle.”

### Red Herring 2: The Small Likelihood Ratio

The misleading likelihood ratio is *only* 1.6, which is far short of the value of 8 that likelihoodists conventionally require in order to declare a result “fairly strong” evidence (Royall 2000, 761). One might think that this fact somehow excuses the Law of Likelihood.* * **But the fact that the likelihood ratio is small does not help, for two reasons:**

- Regardless of what the Law of Likelihood says about the
*degree*of evidential favoring in this case,**it still implies the incorrect qualitative claim**that the result favors the prime meridional circle hypothesis over the equator hypothesis. **One can produce an analogous but more dramatic result by using a strange coordinate system**. As I explain in “An Explanation of Borel’s Paradox That You Can Understand,” the fact that likelihood ratio in this case is greater than one comes from the fact that lines of longitude are farther apart at the equator than near the poles, while lines of latitude are spaced equally all the way around the sphere. One could get a larger likelihood ratio (as large as one likes, I suspect) by using a system of “pseudo-longitudes” that exaggerates this effect around the prime meridian.

### Red Herring 3: Circles with No Width and the Finite Precision of Measurement Techniques

Borel himself points out that actual methods of observation do not allow you to learn that a particular point on a sphere lies on a particular great circle (1909, 102-3). From a position on the prime meridian, you might be able to use astronomical observations and a chronometer to determine that your longitude is between 0.1″ East and 0.1″ West, but you would not be able to determine that your longitude is exactly 0.

In the standard (Kolmogorov) theory, the conditional probability for the location of *P* given that it lies on the prime meridional circle is the just the conditional probability for the location of *P* given that the point lies with ε degrees longitude of that circle in the limit as ε goes to 0. This distribution is thus appropriate if it is supposed to reflect one’s uncertainty about the point’s latitude upon learning that it lies on the prime meridional circle from a technique that has a small but finite margin of error.

**This line of reasoning is sufficient to defend Kolmogorov’s theory of conditional probability as a suitable tool for updating one’s uncertainty about P‘s latitude given any real measurement of its longitude. But it is not sufficient to defend the use of Kolmogorov’s theory of conditional probability in the Law of Likelihood.** There the probability-zero hypotheses upon which one conditions are not idealizations of facts learned by observation, but rather hypotheses one wishes to evaluate in light of some

*other*fact (such as

*E*) learned by observation. Those hypotheses are not associated with any particular measurement technique, so it is not appropriate to regard them as limits of latitude or longitude measurements, but instead to regard them simply as great circles.

## The Real Culprit: The Standard Theory of Conditional Probability for Events of Probability Zero

The likelihood ratio of 1.6 arises from the fact that** in the standard (Kolmogorov) theory, probability conditional on an event of probability zero is relative to the σ-field in which that hypothesis is embedded** (see “An Explanation of Borel’s Paradox That You Can Understand”). To characterize the blue circle in the figure above as an equator and the red circle as a union of two meridians is to impose on the sphere a system of latitudes and longitudes that encodes a particular σ-field that is not given by the problem itself. One would get the opposite result by reversing those characterizations. This feature of the Kolmogorov theory is the source of the counterexample.

## Possible Remedies

I can see **three possible remedies** for the Law of Likelihood:

- Deny that the Law of Likelihood applies to $H_1$ and $H_2$.
- Maintain that evidential favoring in cases in which Borel’s paradox arises is relative to the σ-field imposed on the hypothesis space.
- Deny that the likelihoods that appear in the Law of Likelihood are conditional probabilities in Kolmogorov’s sense.

None of these remedies provides a cheap and easy fix.

**One challenge for Remedy 1**is to provide a principled basis for excluding $H_1$ and $H_2$ from the scope of the Law of Likelihood without excluding too much.**One challenge for Remedy 2**is that it’s not clear what the Law of Likelihood is supposed to be doing if we adopt it: I for one do not seem to have an informal notion of “evidential favoring relative to a σ-field” for it to explicate.**One challenge for Remedy 3**is to explain what the likelihoods that appear in the Law of Likelihood are, if not conditional probabilities in Kolmogorov’s sense. One option is to claim that they are conditional probabilities in the sense of de Finetti’s alternative to Kolmogorov’s theory (1974), but this approach gives rise to other serious difficulties that would need to be addressed (see e.g. this blog post by Larry Wasserman and Chapter 3 of Kadane 2011).^{1}

I will discuss these remedies further in future posts.

1. The linked ebook is indeed free for non-commercial purposes.

Thanks to Teddy Seidenfeld for suggesting that Borel’s paradox might be a problem for likelihoodism. Thanks to Branden Fitelson for directing my attention to Kenny Easwaran treatment of Borel’s paradox in Chapter 8 of his dissertation. The idea for Remedy 2 came from Easwaran’s thesis that conditional probability is relative to a partition.

Want my posts to come to you? Use the toolbar to the left to sign up for updates via email or RSS feed!

Bryan says

I’m confused. All smooth curves of dimension 1 on the 2D surface of a sphere have Lebesgue volume 0. So, both the prime meridian and the equator would correspond to probability 0 if the probability measure is uniform on the surface, in which case the Law of Likelihood would not distinguish between them. Am I missing something? Going to check the Borel post now…

Michael K says

Cute problem!

I agree that this is a paradox, but it doesn’t seem to be a contradiction. The trick is that being “within epsilon degrees” of the equator and being “within epsilon degrees” of the international date line mean different things geometrically.

What happens if instead you say that you’re on a great circle whenever you’re within epsilon feet of it? It seems like here the likelihood ratio should be 1 again.

(Of course things are a bit fuzzy at the north pole, but I’m not concerned about that.)

Greg Gandenberger says

Yes, that’s right. If you’re conditioning on *data*, then you can avoid the problem by taking into account the uncertainty/imprecision of your measurement technique. It’s not so clear what you should do when you’re conditioning on a *hypothesis* that isn’t associated with any particular measurement technique.

Greg Gandenberger says

As I’ve said elsewhere, because I’m traveling this weekend it will be a few days before I can give a thoughtful response. But I can say quickly that the relevant quantity is the likelihood ratio $P(E|H_1)/P(E|H_2)$, so the problem involves conditioning on hypotheses that do indeed have probability zero.

Alexander R Pruss says

Neat argument!

But… “In the standard (Kolmogorov) theory, the conditional probability for the location of P given that it lies on the prime meridional circle is the just the conditional probability for the location of P given that the point lies with ε degrees longitude of that circle in the limit as ε goes to 0.”

Standard Kolmogorov theory leaves the conditional probability of a null-probability event undefined. You can define it using an appropriate limiting procedure. But then you need to give a justification for using the particular limiting procedure. If you use the longitude limit, you get your violation of the likelihoodism. But why not go with Michael’s distance measurement, in which case the problem disappears?

On independent grounds, the longitude limit is intuitively inappropriate when one thinks about errors of measurement. For a fixed ε>0, to get a longitude measurement to precision ε near the poles requires a much more precise apparatus than to get a longitude measurement to precision ε near the equator. For instance, it’s easy to measure longitude to within one degree at the equator. Good luck getting that sort of precision a picometer from the north pole!

Perhaps worse, “being within ε in longitude of the prime meridian” is simply undefined at the poles.

I’m not advocating Michael’s distance-based fattening of the sets in the limiting procedure. That has its own problems. E.g., consider a uniformly chosen number in [0,1], and consider the conditional probability that it’s a rational number with denominator a power of seven (i.e., x=n/7^m) given that it’s a rational number. Every number in [0,1] is within ε of a rational number with denominator that’s a power of seven, so the distance-fattening procedure gives 1 as the conditional probability–surely wrong. There just isn’t a good general limiting procedure for defining probabilities conditionally on a null set. Is this a problem for likelihoodism? I don’t know.

Greg Gandenberger says

Well said. The problem is that likelihoodists have not (to my knowledge) discussed how their claim that $P(E|H_1)/P(E|H_2)$ measures the degree to which $E$ favors $H_1$ over $H_2$ applies (or doesn’t apply) when $H_1$ and $H_2$ are probability-zero hypotheses and the setup of the problem provides no justification for favoring some particular limiting procedure over other possible alternatives. They have at least three options, as I discuss in the post, but each one comes at a cost.

Michael Lew says

Are you sure that H1 and H2 are part of a single likelihood function? I suspect that they are not because the blue line is one of a set of lines that do not intersect (latitude lines) whereas the red line is a latitude line that is one of a family that intersect at each pole. H1 and H2 are parameter values of distinct models.

If H1 and H2 cannot be made to lie on a single likelihood function then the ratio of their likelihoods is meaningless because of the presence of two unknown proportionality constants.

If you changed the problem so that the two lines were both within the same family pf great circles (they’d have to both be latitude lines so the region in question would be at a pole instead of the equator) then the likelihood ratio would be valid and it would be one!

Greg Gandenberger says

Good point. Standard formulations of the Law of Likelihood require only that $P(E|H_1)$ and $P(E|H_2)$ be well defined, which they are here within the standard Kolmogorov theory of conditional probability (modulo some fussy details) because I’ve stipulated that the sphere is equipped with (possibly arbitrary) lines of longitude and latitude. But I think you’re right to ask for a single partition of the hypothesis space that includes both $H_1$ and $H_2$ and gives rise to the purportedly anomalous likelihood ratio. In response, I offer a partition into zonal circles that excludes a tiny band of meridional circles around the prime meridional circle. I’ll need to work out the details, but I think this approach will work.

Michael Lew says

Whoops: the red line is a longitude line (and there needs to me a mechanism for editing these comments!).

Greg Gandenberger says

Yes, sorry about that. At some point I’d like to move to a self-hosted site so that I can use a more robust commenting system.

Alexander R Pruss says

Greg:

That’s a good way to put it, and, no, I haven’t seen this discussed.

But:

1 Isn’t the problem of how to condition on null-probability stuff just as much a problem for any probability-centric account of confirmation?

2. Also, it’s plausible that any limiting procedure should, as much as possible, be done in a way that’s symmetric with respect to the natural symmetries in the problem. The fattening by spherical distance method is invariant under all isometries of the sphere. The fattening by longitude method is not invariant under isometries of the sphere. Once one chooses a method not invariant under isometries of the sphere, the sky is the limit–you can get any result you want in the problem by just choosing a limiting procedure. Of course, we’ve got the problem of figuring out which symmetries are the relevant ones (there are lots of wacky families of symmetries), but that’s a problem that every probabilistic story that doesn’t want to go completely subjective has.

Greg Gandenberger says

1. How to condition on probability zero events/hypotheses is a bigger problem for likelihoodists than for Bayesians in at least one respect. Bayesians typically condition on data that arise within an observational context that naturally suggests a preferred partition of the hypothesis space (see Ch. 8 of Easwaran’s dissertation). It’s at least less clear that likelihoodist applications typically take place within a context that provides a preferred partition. Thus, likelihoodists seem to have more incentive than Bayesians to abandon the Kolmogorov theory of conditional probability in favor of the de Finetti theory. But that theory raises other difficulties such as nonconglomerability that may be particularly acute from a likelihoodist perspective. (That’s a topic for a future post.)

2. Yes. Again, the likelihoodist seems to have a bigger problem here than the Bayesian because it’s less clear for applications of the Law of Likelihood than for Bayesian conditioning where the relevant symmetries could come from.

arpruss says

Greg:

Why can’t the likelihoodist just say: “There is no preferred way of taking the limit here, and so likelihoodism is silent on which hypothesis is more confirmed by E than which, and rightly so. But likelihoodism is non-silent on the comparative evidence for H_{1,epsilon1} and H_{2,epsilon2} for any choice of epsilon1 and epsilon2, and any decent thickening procedures.”

Greg Gandenberger says

I’ve now addressed that response to the counterexample in my next post. It seems to be a viable position, but I’m not sure it’s best position available.

Bryan says

Is there any reason why Likelihoodists *must* condition on measure-zero events? In this example, you’re taking a quantity that’s undefined for the Lebesgue measure, i.e. (E|H)=(E&H)/H=0/0, assigning it a value, and doing so in a way that breaks the symmetry of a sphere. There are at least two ways to assign such a value — you can do it so that P(meridian)>P(equator), or such that P(equator)>P(meridian) — I wouldn’t be surprised if there’s an infinitude of different ways. But why would a Likelihoodist accept any such extension? It seems to me the following is their most reasonable option is:

4. Say that P(E|H) undefined when H has zero volume.

If a Likelihoodist is a Bayesian, then don’t Bayesians typically represent probabilities using a Lebesgue measure? In that case, they would seem to get Option 4 for free, and no symmetries are broken.

Greg Gandenberger says

See the reply I just posted to arpruss above: