An Explanation of Borel’s Paradox That You Can Understand*

*…at least, you can understand it if you understand the notion of a limit.  Or so I think.  If not, then I’m to blame!

Borel’s Paradox

Consider a sphere equipped with lines of latitude (red) and longitude (blue):

Latitude and Longitude

Suppose we take a point at random from a uniform distribution over the surface of that sphere (i.e., a distribution that makes the probability that the point lies within a particular region proportional to that region’s area).

Now suppose we learn that the point lies on the union of the “prime meridian” (0 longitude) and its opposite (180 degrees longitude):

Prime Meridian

You might think that the probability distribution for the point over this circle would be uniform, but in fact it is greater around the equator than around the poles.

This result comes from a procedure that amounts to conditioning on the statement that the point’s longitude is between -ε and ε for some ε (illustrated for the case ε=π/36)…

Within ε of the Prime Meridian

…and taking the limit as ε goes to zero.

Limit Approaching Prime Meridian

The resulting region is wider, and thus bigger, near the equator…

Region Near Equator

…than near the poles…

Near the North Pole

…and this inequality persists in the limit:

Limit of Meridional Region Near Equator

Limit of Meridional Regions Near North Pole

Now suppose that instead of learning that the point lies on the prime meridian, we had learned that it lies on the equator:

Equator

The probability distribution for the point over this circle is in fact uniform.

This result arises from the same kind of asymptotic reasoning as above:

Limit of Equatorial Regions

17'

18'

Why It’s a Paradox

The fact that the conditional probability distribution over the prime meridian is different from the conditional probability distribution over the equator is puzzling because because a prime meridian in one coordinate system is an equator in another coordinate system, and vice versa:

Meridian

Equator

Borel’s paradox also gives rise to a previously overlooked counterexample to the Law of Likelihood that I will present in my next post.

Upshot

We have to choose between learning to live with Borel’s paradox and developing an alternative theory of conditional probability on which it does not arise.  There is such a theory, developed primarily by Bruno de Finetti (1974), but it gives rise to other serious difficulties (see e.g. this blog post by Larry Wasserman and Chapter 3 of Kadane 2011).1  I will discuss these issues more in future posts.


1.  The linked ebook is indeed free for non-commercial purposes.

Thanks to Teddy Seidenfeld and Satish Iyengar for directing my attention to Borel’s paradox.


Want to keep up with new posts without having to check for them manually? Use the sidebar on the left to sign up for updates via email or RSS feed!

Comments

  1. says

    I’m not a mathematician, so I apologize if this is naive. But why isn’t the paradox just a result of the difference between “longitudinal” and “latitudinal” lines? For the purposes of this discussion, suppose we say that a “line” is a continuous series of points that encircle the entire sphere Then, note that all longitudinal lines intersect at two separate points on the sphere, whereas latitudinal lines do not. That is true regardless of how you rotate the sphere.

    Then, consider that for any two longitudinals A and B, the number of longitudinal lines between A and B is the same, no matter whether you’re counting at the equator or near one of the poles. Yet the distance between A and B is different at the equator versus the two poles. So a natural result is that the lines between them are “fatter” as they approach the equator. (And since A and B are arbitrary, the point holds for any longitudinal line.) I couldn’t tell you what “getting fatter” means exactly. But the point is that “getting fatter,” whatever it means, is a natural consequence of all the longitudinals varying their distances between each other.

    In contrast, the latitudinals do not get “fatter” since given any two examples C and D, the distance between them is the same, no matter where on the sphere you measure.

    So again, the point is that the difference re: “getting fatter” just seem owed to the difference between how the longitudinals are collectively related, vs. how the lattitudinals are collectively related.

    Hope that makes sense. (It’s not sounding as clear to me as I would like, but hopefully it communicates something.)

  2. says

    Yes, you’ve got it. The point is that Kolmogorov’s theory does not allow you to condition on a great circle as such: the results you get depend on what coordinate system you use. Whether this fact is a feature or a bug is debatable. I’ll discuss this question more in future posts.

  3. says

    Hi Greg: This is interesting. I had never heard of “Borel’s paradox.” But I got stuck trying to understand your conclusions here. Here are some places where I got stuck.

    (1) You said the probability is proportional to the area (I assume you mean the Lebesgue volume) on the surface of the sphere. The equator and the prime meridian are both smooth curves of one dimension less than the surface. This implies that they have Lebesgue volume 0, and therefore probability 0 according to your definition. But you say that the probability on the equator is greater than zero. Isn’t this false? What am I missing?

    (2) What justifies the claim that the property you’re describing is “retained in the limit”? I can see that at each finite stage in the limit, one of the areas you describe is greater than the other. But this does not imply that it is greater in the infinite limit (indeed it isn’t — see the previous point above). The great John D. Norton has a very clear argument about the danger of assuming otherwise, http://www.pitt.edu/~jdnorton/papers/Ideal_Approx.pdf — is the Borel example any different than that?

    (3) It is true that the prime meridian is related to the equator by an isometric coordinate transformation, the rotation through pi/2. This is not problematic, because the areas of both 1-dimensional curves are 0 (point (1) above). It would be paradoxical if the areas at each finite stage were related by an isometry. But they are not. The easiest way to see this is to notice that the boundary lines for your vertical areas are geodesics (great circles on the sphere), while the boundary lines for your horizontal areas are not (except for the ones with one boundary on the equator. But the property of “being a geodesic” is preserved under isometries. So, there is no isometric coordinate transformation that relates these two areas. So, why do we think there is a paradox here?

    • says

      Thanks for the comments! I’m traveling, so a thoughtful rely will have to wait until next week. As a quick reply, you’re definitely right that the heuristic argument I give here is not rigorous. The result is well established, so I don’t think there’s any doubt that there is a paradox here. Nevertheless, you raise interesting points, and I will attend to them as soon as I am able.

    • says

      (1) Yes, “area” means Lebesque measure. The relevant quantities are $P(E|H_1)$ and $P(E|H_2)$, which are non-zero in Kolmogorov’s theory even though $P(H_1)$ and $P(H_2)$ are zero.

      (2) My aim here is not to establish Borel’s paradox–that has been done elsewhere, e.g. Kolmogorov 1933, 50-1. My aim is instead to provide some intuition about how it arises. You’re right to point out that those intuitions are not entirely reliable.

      (3) You’re right to point out that there is no paradox for regions of the sphere that have positive area. The question is what a likelihoodist should say about regions of measure zero.

  4. Richard Libby says

    The moment we talk about a uniform distribution on the sphere, we bring in the sphere’s symmetry group O(3) whenever we want to talk about any point “within ε of” any subset of the sphere. The points within ε of any subset are equal to the union of translates under a specific subgroup of O(3) applied to the set of points within ε of any one point of our choosing in the subset, where the subgroup “fixes” the subset (i.e., all of the subgroup’s members map the subset to itself). The collection of points within ε of a line of longitude therefore doesn’t “pinch” at the north or south pole. The subgroup of O(3) in this case behaves like two copies of the circle, one copy for each orientation. If you don’t like orientation flipping maps, you can carry out the same argument using the group SO(3), in which case the subgroup is just the circle group.

    • says

      Thanks for the comment, Richard! I’m sorry to say that I don’t know group theory well enough to follow your remarks. Would you be able either to translate them for me or to point me to a resource that would bring me up to speed efficiently?

  5. says

    >> The relevant quantities are $P(E|H_1)$ and $P(E|H_2)$, which are non-zero in Kolmogorov’s theory even though $P(H_1)$ and $P(H_2)$ are zero.

    Does “Kolmogorov’s theory” = the Kolmogorov axioms on a Lebesgue measure space? Sorry, I’m new to this problem, and am a little slow. If I use the definition, $P(E|H_1):=P(E\cap H_1|H_1)$ then I get $P$ of a ratio of two quantities each of which has zero Lebesgue volume. But no such ratio exists (i.e. 0/0 is undefined). So, since $P$ is proportional to the Lebesgue measure by assumption, I get that $P(E|H_1)$ is an undefined quantity. Can’t we avoid all problems by just stopping here and saying this conditional probability is undefined? Why do we have to define it?

    I can now see that it is possible to extend the Lebesgue measure $P$ to $P’$ in such a way that $P’(E|H_1)$ gets assigned a value, and do it in a way that breaks the symmetry of the sphere. If there is one way, then there are ways to do this. But then it will no longer be a Lebesgue measure, since the Lebesgue measure preserves the sphere’s symmetries. Maybe your extended measure still provides a probability space, in that it still satisfies the Kolmogorov axioms. But I don’t see any a priori grounds for choosing one extension of $P$ over another. That is, we can extend the Lebesgue measure $P$ so that P’(equator)>P’(meridian) or vice versa — but why would someone (the likelihood-ist, or whoever) think that any such extended measure is an appropriate way to represent likelihoods?

    • says

      In Chapter 5 of his seminal (1933), Kolmogorov provides a theory of probability conditional on a sub-σ-field that allows one to extend the notion of conditional probability to include conditioning on events of probability zero. But the results of such conditioning can depend on the sub-σ-field chosen. Kolmogorov’s own response to the Borel’s paradox is as follows (51):

      This shows that the concept of a conditional probability with regard to an isolated given hypothesis whose probability equals 0 is inadmissible. For we can obtain a probability distribution for $Theta$ on the meridian circle only if we regard this circle as an element of the decomposition of the entire spherical surface into meridian circles with the given points.

      It is indeed an abuse of the Kolmogorov theory to treat $\mbox{Pr}(E|H_1)/\mbox{Pr}(E|H_2)$ as a measure of the degree to which $E$ favors $H_1$ over $H_2$ in the absence of an argument for breaking the symmetry of the sphere by imposing on it an arbitrary system of latitudes and longitudes in terms of which $H_1$ and $H_2$ are formulated. But it’s an abuse that the Law of Likelihood as standardly formulated allows, hence the need for one of the three remedies I mention in the post.

  6. says

    p.s. I use the QuickLaTeX wordpress plugin to render LaTeX in comments. Just install the plugin and go to its preferences > advanced > Use LaTeX Syntax Sitewide

    • says

      Thanks, Bryan! I can’t use plugins at this point because I’m using a hosted wordpress.com site rather than a self-hosted wordpress.org site. I’m planning to make the switch to self-hosted fairly soon.

      See here for how to use LaTeX on this site and other hosted wordpress.com sites, including in comments.

      Update: I have made the switch to wordpress.org. You can now use $\LaTeX$ in comments simply by surrounding your code in single dollar signs for inline mode or in double dollar signs for display mode.

Leave a Reply

Your email address will not be published. Required fields are marked *