*…at least, you can understand it if you understand the notion of a limit. Or so I think. If not, then I’m to blame!

## Borel’s Paradox

Consider a sphere equipped with lines of latitude (red) and longitude (blue):

Suppose we take a point at random from a uniform distribution over the surface of that sphere (i.e., a distribution that makes the probability that the point lies within a particular region proportional to that region’s area).

Now suppose we learn that the point lies on the union of the “prime meridian” (0 longitude) and its opposite (180 degrees longitude):

You might think that the probability distribution for the point over this circle would be uniform, but in fact it is greater around the equator than around the poles.

This result comes from a procedure that amounts to conditioning on the statement that the point’s longitude is between -ε and ε for some ε (illustrated for the case ε=π/36)…

…and taking the limit as ε goes to zero.

The resulting region is wider, and thus bigger, near the equator…

…than near the poles…

…and this inequality persists in the limit:

Now suppose that instead of learning that the point lies on the prime meridian, we had learned that it lies on the equator:

The probability distribution for the point over *this *circle is in fact uniform.

This result arises from the same kind of asymptotic reasoning as above:

## Why It’s a Paradox

The fact that the conditional probability distribution over the prime meridian is different from the conditional probability distribution over the equator is **puzzling** because because **a prime meridian in one coordinate system is an equator in another coordinate system**, and vice versa:

Borel’s paradox also gives rise to a previously overlooked **counterexample** to the **Law of Likelihood** that I will present in my next post.

## Upshot

We have to choose between learning to live with Borel’s paradox and developing an alternative theory of conditional probability on which it does not arise. There is such a theory, developed primarily by Bruno de Finetti (1974), but it gives rise to other serious difficulties (see e.g. this blog post by Larry Wasserman and Chapter 3 of Kadane 2011).^{1} I will discuss these issues more in future posts.

1. The linked ebook is indeed free for non-commercial purposes.

Thanks to Teddy Seidenfeld and Satish Iyengar for directing my attention to Borel’s paradox.

Want to keep up with new posts without having to check for them manually? Use the sidebar on the left to sign up for updates via email or RSS feed!

Ted says

I’m not a mathematician, so I apologize if this is naive. But why isn’t the paradox just a result of the difference between “longitudinal” and “latitudinal” lines? For the purposes of this discussion, suppose we say that a “line” is a continuous series of points that encircle the entire sphere Then, note that all longitudinal lines intersect at two separate points on the sphere, whereas latitudinal lines do not. That is true regardless of how you rotate the sphere.

Then, consider that for any two longitudinals A and B, the number of longitudinal lines between A and B is the same, no matter whether you’re counting at the equator or near one of the poles. Yet the distance between A and B is different at the equator versus the two poles. So a natural result is that the lines between them are “fatter” as they approach the equator. (And since A and B are arbitrary, the point holds for any longitudinal line.) I couldn’t tell you what “getting fatter” means exactly. But the point is that “getting fatter,” whatever it means, is a natural consequence of all the longitudinals varying their distances between each other.

In contrast, the latitudinals do not get “fatter” since given any two examples C and D, the distance between them is the same, no matter where on the sphere you measure.

So again, the point is that the difference re: “getting fatter” just seem owed to the difference between how the longitudinals are collectively related, vs. how the lattitudinals are collectively related.

Hope that makes sense. (It’s not sounding as clear to me as I would like, but hopefully it communicates something.)

Greg Gandenberger says

Yes, you’ve got it. The point is that Kolmogorov’s theory does not allow you to condition on a great circle as such: the results you get depend on what coordinate system you use. Whether this fact is a feature or a bug is debatable. I’ll discuss this question more in future posts.

Tom G says

I’m not an expert, but I suspect you will find Tarantola’s unfinished “Mapping of probabilities” of interest.

http://www.ipgp.fr/~tarantola/Files/Professional/Books/MappingOfProbabilities.pdf

On p. 140 he mentions how Borel’s paradox arises as a result of careless handing of conditional probability density when taking the limit.

Greg Gandenberger says

Thanks, Tom!

Bryan says

Hi Greg: This is interesting. I had never heard of “Borel’s paradox.” But I got stuck trying to understand your conclusions here. Here are some places where I got stuck.

(1) You said the probability is proportional to the area (I assume you mean the Lebesgue volume) on the surface of the sphere. The equator and the prime meridian are both smooth curves of one dimension less than the surface. This implies that they have Lebesgue volume 0, and therefore probability 0 according to your definition. But you say that the probability on the equator is greater than zero. Isn’t this false? What am I missing?

(2) What justifies the claim that the property you’re describing is “retained in the limit”? I can see that at each finite stage in the limit, one of the areas you describe is greater than the other. But this does not imply that it is greater in the infinite limit (indeed it isn’t — see the previous point above). The great John D. Norton has a very clear argument about the danger of assuming otherwise, http://www.pitt.edu/~jdnorton/papers/Ideal_Approx.pdf — is the Borel example any different than that?

(3) It is true that the prime meridian is related to the equator by an isometric coordinate transformation, the rotation through pi/2. This is not problematic, because the areas of both 1-dimensional curves are 0 (point (1) above). It would be paradoxical if the areas at each finite stage were related by an isometry. But they are not. The easiest way to see this is to notice that the boundary lines for your vertical areas are geodesics (great circles on the sphere), while the boundary lines for your horizontal areas are not (except for the ones with one boundary on the equator. But the property of “being a geodesic” is preserved under isometries. So, there is no isometric coordinate transformation that relates these two areas. So, why do we think there is a paradox here?

Greg Gandenberger says

Thanks for the comments! I’m traveling, so a thoughtful rely will have to wait until next week. As a quick reply, you’re definitely right that the heuristic argument I give here is not rigorous. The result is well established, so I don’t think there’s any doubt that there is a paradox here. Nevertheless, you raise interesting points, and I will attend to them as soon as I am able.

Greg Gandenberger says

(1) Yes, “area” means Lebesque measure. The relevant quantities are $P(E|H_1)$ and $P(E|H_2)$, which are non-zero in Kolmogorov’s theory even though $P(H_1)$ and $P(H_2)$ are zero.

(2) My aim here is not to establish Borel’s paradox–that has been done elsewhere, e.g. Kolmogorov 1933, 50-1. My aim is instead to provide some intuition about how it arises. You’re right to point out that those intuitions are not entirely reliable.

(3) You’re right to point out that there is no paradox for regions of the sphere that have positive area. The question is what a likelihoodist should say about regions of measure zero.

Richard Libby says

The moment we talk about a uniform distribution on the sphere, we bring in the sphere’s symmetry group O(3) whenever we want to talk about any point “within ε of” any subset of the sphere. The points within ε of any subset are equal to the union of translates under a specific subgroup of O(3) applied to the set of points within ε of any one point of our choosing in the subset, where the subgroup “fixes” the subset (i.e., all of the subgroup’s members map the subset to itself). The collection of points within ε of a line of longitude therefore doesn’t “pinch” at the north or south pole. The subgroup of O(3) in this case behaves like two copies of the circle, one copy for each orientation. If you don’t like orientation flipping maps, you can carry out the same argument using the group SO(3), in which case the subgroup is just the circle group.

Greg Gandenberger says

Thanks for the comment, Richard! I’m sorry to say that I don’t know group theory well enough to follow your remarks. Would you be able either to translate them for me or to point me to a resource that would bring me up to speed efficiently?

Bryan says

>> The relevant quantities are $P(E|H_1)$ and $P(E|H_2)$, which are non-zero in Kolmogorov’s theory even though $P(H_1)$ and $P(H_2)$ are zero.

Does “Kolmogorov’s theory” = the Kolmogorov axioms on a Lebesgue measure space? Sorry, I’m new to this problem, and am a little slow. If I use the definition, $P(E|H_1):=P(E\cap H_1|H_1)$ then I get $P$ of a ratio of two quantities each of which has zero Lebesgue volume. But no such ratio exists (i.e. 0/0 is undefined). So, since $P$ is proportional to the Lebesgue measure by assumption, I get that $P(E|H_1)$ is an undefined quantity. Can’t we avoid all problems by just stopping here and saying this conditional probability is undefined? Why do we have to define it?

I can now see that it is possible to

extendthe Lebesgue measure $P$ to $P’$ in such a way that $P'(E|H_1)$ gets assigned a value, and do it in a way that breaks the symmetry of the sphere. If there is one way, then there are ways to do this. But then it will no longer be a Lebesgue measure, since the Lebesgue measure preserves the sphere’s symmetries. Maybe your extended measure still provides a probability space, in that it still satisfies the Kolmogorov axioms. But I don’t see any a priori grounds for choosing one extension of $P$ over another. That is, we can extend the Lebesgue measure $P$ so that P'(equator)>P'(meridian) or vice versa — but why would someone (the likelihood-ist, or whoever) think that any such extended measure is an appropriate way to represent likelihoods?Greg Gandenberger says

In Chapter 5 of his seminal (1933), Kolmogorov provides a theory of probability conditional on a sub-σ-field that allows one to extend the notion of conditional probability to include conditioning on events of probability zero. But the results of such conditioning can depend on the sub-σ-field chosen. Kolmogorov’s own response to the Borel’s paradox is as follows (51):

It is indeed an abuse of the Kolmogorov theory to treat $\mbox{Pr}(E|H_1)/\mbox{Pr}(E|H_2)$ as a measure of the degree to which $E$ favors $H_1$ over $H_2$ in the absence of an argument for breaking the symmetry of the sphere by imposing on it an arbitrary system of latitudes and longitudes in terms of which $H_1$ and $H_2$ are formulated. But it’s an abuse that the Law of Likelihood as standardly formulated allows, hence the need for one of the three remedies I mention in the post.

Bryan says

p.s. I use the QuickLaTeX wordpress plugin to render LaTeX in comments. Just install the plugin and go to its preferences > advanced > Use LaTeX Syntax Sitewide

Greg Gandenberger says

Thanks, Bryan! I can’t use plugins at this point because I’m using a hosted wordpress.com site rather than a self-hosted wordpress.org site. I’m planning to make the switch to self-hosted fairly soon.

~~See here for how to use LaTeX on this site and other hosted wordpress.com sites, including in comments.~~Update: I have made the switch to wordpress.org. You can now use $\LaTeX$ in comments simply by surrounding your code in single dollar signs for inline mode or in double dollar signs for display mode.

Jack says

Hi Greg,

I came to your site after reading this really good article about the life of Kolmogorov and the Paradox of the Great Circle.

My friend and I still don’t fully understand the paradox. Is there a way you could try to explain in an even simpler way (we are not stupid, but not familiar with a lot of the statistical concepts anymore). Or is that really impossible without knowing the maths?

Cheers from the Netherlands.

Jack

http://nautil.us/issue/4/the-unlikely/the-man-who-invented-modern-probability

Greg Gandenberger says

Thanks for the message, Jack! Sorry it took me so long to respond.

Have you already read the heuristic explanation of Borel’s paradox that I give ? If so, where do I lose you or what more would you like to know?

Terry Moore says

From a practical point of view, sets with measure zero do not exist. If you can only measure your position to the nearest mm (or even the nearest micrometre) that means knowing the latitude is 0 really means it is between +- 90/10^9 (or 90/10^12) degrees. But the same deal for the longitude, for exactly the reason your give in your explanation, cannot be expressed uniformly. While an error in longitude of 90/10^12 degrees can be measured at the equator, near the poles it represents more precision than we have supposed we can achieve. All the pardox shows is that perfect precision is a myth–a useful one, but a myth nevertheless.

Greg Gandenberger says

That’s true if you’re talking about measurement outcomes, but perhaps not if you’re talking about parameter values (e.g. the mass of an electron).

Terry Moore says

Whether the mass of an electron is a parameter or a measurement is, I suppose, dependent on the particular analysis you are interested in. But either way, it is not known exactly.

Perhaps I can expand on why the myth of precision can be useful in probability. It allows us to turn a multiple integral into repeated integrals for the purpose of finding the probability of a set with measure large enough to be useful in practice. So there is no actual conflict between the Borel paradox and practicality.

Greg Gandenberger says

> There is no actual conflict between the Borel paradox and practicality.

I think that’s right, but it’s not obviously right given that likelihoodists appeal to probabilities conditional on hypotheses.