In “Borel’s Paradox as a Counterexample to the Law of Likelihood,” I present a new counterexample to the Law of Likelihood and briefly describe three possible remedies for it. Here I motivate the first of those remedies and raise a few worries for it. Those worries do not seem to be fatal, but they provide some motivation for considering other possible responses.

## Recap of the Counterexample

Take a sphere equipped with an arbitrary system of latitudes and longitudes. Let *E* be the datum that a point *P* randomly selected from a uniform distribution on the surface of the sphere lies within some distance ε of the intersection of the equator and the prime meridian that is small relative to the sphere’s circumference. Let $H_1$ be the hypothesis that *P* lies on the equator. Let $H_2$ be the hypothesis that *P* lies on the union of the prime meridian and its opposite (the “prime meridional circle”).

The Law of Likelihood implies that *E* favors $H_2$ over $H_1$.^{1}

**This result is wrong** because the two great circles in question differ only in that they were arbitrarily assigned different labels.

## Remedy 1

One possible remedy for this counterexample is to **restrict the Law of Likelihood so that it does not apply to $H_1$ and $H_2$.** The following proposal accomplishes this task in a natural way.

**Proposal 1.**Restrict the Law of Likelihood so that it does not apply to*hypotheses with prior probability zero*.

An adequate defense of this proposal would show accomplish the following **tasks:**

- Show that it exclude all problematic cases.
- Show that it excludes only problematic cases.
- Argue that it is not
*ad hoc*and does not limit the scope of likelihoodism too severely

I will address each of these tasks in turn.

## Task 1: Exclude All Problematic Cases

Borel’s paradox arises only for probability-zero hypotheses (see “An Explanation of Borel’s Paradox That You Can Understand”), so **this proposal does exclude all problematic cases**.

One might think that Proposal 1 does not go far enough because it does not exclude $H_1$ and $H_2$ in the absence of a prior probability distribution on the surface of the sphere. What one should really exclude, one might think, are lower-dimensional subsets of a hypothesis space. But in the absence of a prior probability distribution over the surface of the sphere, $H_1$ and $H_2$ do not imply simple statistical hypotheses, so they lie outside the scope of the Law of Likelihood already.

In the absence of a prior probability distribution, Proposal 1 would allow one to apply the Law of Likelihood to simple statistical hypotheses $H_1^*$ and $H_2^*$ such that $H_1^*$ posits a particular probability distribution over the equator and $H_2^*$ posits a particular probability distribution over the prime meridional circle. This fact is not a problem for Proposal 1 because Borel’s paradox does not arise for $H_1^*$ and $H_2^*$: the probability of E conditional on $H_1^*$ and $H_2^*$ is well defined without reference to an arbitrary choice of coordinate system.

## Task 2: Exclude *Only* Problematic Cases

**Unfortunately, Proposal 1 does exclude some cases that are not problematic.**

**Example.** Suppose one were to draw a number *r* at random from a uniform distribution on the unit interval and then flip a coin with the corresponding bias *r* for heads. Suppose the coin lands heads, and consider that datum as evidence in relation to the hypotheses *H: r*=.25 and the *H’: r*=.75.

Proposal 1 excludes this case from the scope of the Law of Likelihood, but the Law of Likelihood delivers the intuitively correct verdict that the coin’s landing heads favors *H’* over *H*.

**The source of the trouble in the sphere example is that the setup of the problem does not specify a preferred coordinate system for the hypothesis space** (see “An Explanation of Borel’s Paradox That You Can Understand”), but one gets different answers depending on which of the permissible coordinate systems one specifies. More generally, Borel’s paradox arises where the result of conditioning on a probability-zero event or hypothesis depends on the sub-σ-field in which that event or hypothesis is embedded. **No such problem arises in the coin example.**

**These considerations suggest the following revised proposal.**

**Proposal 2.**Restrict the Law of Likelihood so that it does not apply*when the relevant likelihood ratio depends upon an arbitrary choice of sub-σ-field*.

The class of cases to which this restriction applies just is the class of cases in which Borel’s paradox arises, so **Proposal 2 accomplishes tasks 1 and 2 quite nicely**.

## Task 3: Argue That the Restriction Is Not *Ad Hoc* and Does Not Limit the Scope of Likelihoodism Too Severely

The most promising way to try to accomplish Task 3 for Proposal 2 seems to be to argue that **hypotheses of probability zero are not of genuine scientific interest.** One could argue for the narrower claim that probability-zero hypothesis are not of genuine scientific interest when the conditional probabilities they imply depend on which of the permissible sub-σ-fields one imposes on the hypothesis space, but it is not clear that this narrower claim is any easier to motivate.

One possible argument for this claim is that** probability-zero hypotheses will still have probability zero after updating** by Bayesian conditioning. Why worry about hypotheses that can never be belief-worthy?^{2.}

This argument faces the objection that **scientists often perform significance tests of point null hypotheses** in continuous hypothesis spaces that should be regarded as having probability zero. Such cases can nevertheless be of tremendous scientific interest: take, for instance, the Michelson-Morley experiments, which one can regard as testing the claim that the vector velocity of the aether wind on the earth’s surface is zero. It would be a substantial blow to likelihoodism for the Law of Likelihood not to apply to such cases.

**A likelihoodist can take any of the following three lines about each case of this kind:**^{3}

- The point null hypothesis has positive probability.
- The point null hypothesis is an oversimplified representation of what is implicitly a range null hypothesis (i.e., that the true parameter value lies within ε of the value given by the point null hypothesis for some ε).
- The scientists are doing something that is not of genuine scientific interest.

Of course, a likelihoodist could also concede that the Law of Likelihood does not apply to some cases of genuine scientific interest.

Option 1 takes care of the Michelson-Morley experiment: in retrospect at least, we would want to assign nonzero probability to the hypothesis that there is no aether and thus no aether wind. Option 1 also takes care of other cases in which the null hypothesis has genuine plausibility, as in tests for extra-sensory perception.

Scientists do often test point null hypotheses that are not plausibly regarded as having positive prior probability. Perhaps a likelihoodist can maintain in all such cases either that the hypothesis being tested is not of genuine scientific interest or that it is best regarded as a range null hypothesis, such as that the null hypothesis is “close enough to the truth for practical purposes.” This possibility seems plausible to me. I plan to argue for it in a more formal treatment of this topic.

**Likelihoodists still face an additional problem:** the Law of Likelihood is supposed to “come into its own” in the absence of a prior probability distribution (Sober 2008, 38), but in that case it does not apply to range null hypotheses.

Fortunately for the likelihoodist, on any prior distribution the likelihood of a range hypothesis is a weighted average of the likelihoods of its constituents. Thus, it must lie within the range of the likelihoods of its constituents, which will typically be fairly small. As a result, a likelihoodist can pin down the relevant likelihood ratio to a degree of precision that will often be adequate for his or her purposes without being committed to any particular prior probability distribution.

## An Assessment of Remedy 1

A logical construction is such in so far as it is a whole in which “tout se tient” [“everything hangs together”]. Questions which are seemingly completely otiose and insignificant can have, and do have, interconnections with all the rest, and are essential for an understanding of them. To ignore them, or merely mention them in passing, is dangerous. –Bruno de Finetti (1974, 116)

Assuming that I can make good on the claim that likelihoodists can handle tests of point null hypotheses, it seems that **Remedy 1 for Borel’s paradox as a counterexample to the Law of Likelihood yields a tenable position. However, it is not clear that it yields the best available position. **

Another possibility within the Kolmogorov theory of conditional probability is that *evidential favoring itself is relative to a sub-σ-field *(or perhaps some associated structure such as a partition) in the cases that give rise to Borel’s paradox. Thus, for instance, *E* favors the hypothesis that *P* lies on the red circle over the hypothesis that it lies on the blue circle relative to a coordinate system in which the former is a meridional circle and the latter is an equator, but does the opposite relative to coordinate system that reverses those roles. For what it is worth, **this view seems somewhat tidier and less ad hoc than the view Remedy 1 provides.**

A third option is to abandon that Kolmogorov theory of conditional probability. Some Bayesians have worried that Kolmogorov’s theory **“merely avoid[s] the logical content of the problem”** (Hill 1970, 45) of conditioning on probability-zero hypotheses by making conditional probability relative to a sub-σ-field. Such problems seem to be well-posed. The fact that the de Finetti theory of conditional probability can handle them directly provides some motivation for adopting it.

I will discuss these alternative remedies further in future posts.

- The Law of Likelihood says that
*E*favors $H_1$ over $H_2$ if and only if $\mbox{Pr}(E|H_1)/\mbox{Pr}(E|H_2)=k>1$, and $k$ measures the degree of that favoring.

- Thanks to Wayne Myrvold for suggesting this response independently.
- I have in mind here substantive null hypotheses, e.g. hypotheses of the form “X does not cause Y,” as opposed to statistical null hypotheses that simply assert directly that the random variable being observed has a particular probability distribution.

To **share your thoughts about this post**, comment below or send me an email. Comments support $LaTeX$: surround your code in single dollar signs for inline mode or in double dollar signs for display mode.

To **share this post with others**, use the buttons below.

arpruss says

Very nice move.

1. A terminological point: “Topology” isn’t the right term. You want to talk about choice of gauge or coordinate system. Topology is too crude (you can change a metric in many ways and still have the same topology).

2. One dimensional contexts can also be weird. One can generate a uniform distribution on [0,1] in such a manner that intuitively the probabilities of different points are different ( http://alexanderpruss.blogspot.com/2013/06/a-funny-uniform-distribution.html ). You can make this more precise using infinitesimals or Popper functions. In such cases, a different limiting procedure from the obvious one may will be appropriate.

3. And even without such weird distributions, endpoint cases can come up in one dimension. Let E = [0,0.25] union [0.5,0.75]. Let H1 = {0, 0.3}, H2 = {0.4, 0.6}. Does E support H1 or H2 more? Intuitively, the two hypotheses get equal support–each hypothesis includes one point in E and one point outside E. But using the kind of limiting procedure you did, E supports H1 more. For when we apply your limiting procedure to H1, we get a hypothesis H1(e) = [0,e] union [0.3-e,0.3+e], and if we apply it to H2, we get H2(e) = [0.4-e, 0.4+e] union [0.6-e, 0.6+e]. Then P(E|H1(e))=1/3 (for small e) and P(E|H2(e))=1/2 (for small e). And so taking the limit we get that E supports H2 over H1, which doesn’t seem right.

4. It seems to me–and here I am on shaky ground, and just reporting a weak seeming–that sometimes when we specify a case there is a natural limiting procedure and sometimes there isn’t. If there is, all may be well. If there isn’t, things aren’t well. And we cannot formalize when there is and when there isn’t a natural limiting procedure.

Greg Gandenberger says

1. Good, thanks! I’ve changed the text. I worried about this point, not having yet worked out the details, but decided to stick my neck out rather than playing it safe by being evasive or referring to sub-σ-fields.

2. Nice example! Could your suggestion be fleshed out as follows? Suppose we draw a number

nfrom a random variable with a uniform distribution on [0,1], but instead of learningndirectly we learn the result of drawing a numberrfrom a random variableRwith a normal distribution with meannand variance 1. The appropriate probability distribution forRgivenndepends on whetherncomes from variable likeXor a variable likeZin your example. Thus, the Law of Likelihood may yield a different conclusion about, for instance, how the result .5 bears on the hypotheses $H_1:$n=.25and $H_2:$n=.75depending on whetherncomes from variable likeXor a variable likeZin your example.If I have the idea, then I would give my standard response to worries about applying the Law of Likelihood to data from continuous probability distributions: real measurement techniques have finite precision and bounded range, so those worries are artifacts of a convenient but inessential idealization.

Does that response seem adequate to you?

3. I’m not following. How are you getting your $P(E|H_1)$ and $P(E|H_2)$?

4. I agree.

Michael Lew says

You do not consider the restriction that comes naturally from the likelihood principle and the nature of likelihood: the law of likelihood applies only to likelihoods that fall on a common likelihood function. That restriction is neither ad hoc nor evasive, but it nonetheless disarms many alleged counter-examples to the likelihood principle.

I suspect that the reason that this simple point is missed by philosophers is their preference for statements of the problems as a choice between H1 and H2. That simple dichotomy seems to invite comparisons of hypotheses from distinct models. The two models for your Borel’s problem (I don’t see it as a paradox) are a model where circumference lines are drawn parallel to each other and a model where circumference lines are great circles and intersect at the poles. By considering only one line from each it seems that they should be commensurable whereas they clearly are not (and therein lies the alleged paradox).

Edwards’s statement of the likelihood principle begins like this: “Within the framework of a statistical model…” Now I understand that one could choose to read that “statistical model” as a reference to Kolmogorov’s axioms of probability, but it should be read more broadly. For your problem it refers to the spatial distribution of probability. That distribution differs for the latitude line family and the longitude line family. Thus the law of likelihood cannot sensibly apply to a comparison of the hypotheses relating to the latitude and longitude lines as presented in Borel’s problem. By simply choosing an appropriate viewpoint, either treating both lines are latitudes or both as longitudes, the lines can be treated as being parts of the same model and the problem disappears.

Recognition of the need for likelihoods to coexist on a single likelihood function (which may have arbitrarily many dimensions) allows for clarity on the issue of point null hypotheses. The point null is no different to any other arbitrarily specified point on the likelihood function: their probability mass or density is zero but their likelihoods need not be zero. Discard the dichotomous description of hypothesis testing and replace it with estimation-friendly significance testing, and you will see that likelihoodists need not be concerned by discussions based on an assumption that hypotheses with zero probability are in any way problematical or in need of some sort of exclusion.

Greg Gandenberger says

Thanks, Michael! I can see two ways to formulate the restriction you propose:

It seems to me that Restriction 1 is stronger than we need and Restriction 2 is weaker than we need.

Restriction 1 would rule out applying the Law of Likelihood to substantive hypotheses that merely imply simple statistical hypotheses (see “New Responses to Three Counterexamples to the Likelihood Principle”, p. 7). This restriction is stronger than my Proposal 2, and at present I don’t see any need for it. It would rule out my $H_1$ and $H_2$, but it would also rule out hypotheses of scientific interest that my Proposal 2 allows, e.g. “the subject of this ESP test is merely guessing.”

Restriction 2 fails to block the counterexample. Consider the likelihood function that arises from applying the Kolmogorov theory of conditional probability to a partition of the sphere’s surface into a thin band of latitudes around an arbitrarily designated equator and meridional circles that exclude that thin band. I believe that for a sufficiently small band of latitudes, you would still get the result that

Efavors the hypothesis thatPlies on the prime meridional circle over the hypothesis that it lies on the equator, to roughly degree 1.6.Am I missing anything?

Michael Lew says

Greg, you write about this as if it were a matter of simply drawing a good restriction out of the air, but it should not be like that. The restriction of the law of likelihood to likelihoods arising from a single likelihood function that I propose is a more natural restriction, one that is an inevitable result of the nature of likelihood.

Think about “likelihood” as being similar to “large” or “nasty”. The Australian bull-ant is a very large ant, and it has a very nasty sting. The US government is very large, and it has a very nasty response to whistle-blowers. Can you meaningfully compare the largeness of the bull-ant to the largeness of the US government? Compare their nastinesses? Not without some specified scaling principle, and there is no single natural universal scaling system. Likelihoods are like that.

The bull-ant is large among ants. The function of largeness among ants may have units of mass, or volume, lineal dimension, or the like, but it can never have units of number of employees, or annual budget etc. The “among ants” is equivalent to “arising from a single likelihood function”.

Your latitude band is the bull-ant and the longitude band is the US government.

The law of likelihood has natural limits that result from the fact that likelihoods are only defined up to a proportionality constant. The limits mean that it can only apply to cases where the likelihoods are on the same scale, and while normalising methods that impose a consistent scale on likelihoods can be imagined for some special cases, a consistent scale can only be generally guaranteed by having the likelihoods originate on a single likelihood function.

If the restriction that comes from comes from these considerations is stronger than we need, then bad luck, it is a natural restriction. If it is weaker (I don’t think so), then maybe there is another natural restriction that also applies.

Alexander R Pruss says

Greg:

Ad 2: Here’s a different way of fleshing out my example.

I’ve been friendly to the imprecision-in-measurement move. But still isn’t there something disquieting about relying on the fact that real measurement techniques have imprecision? Intuitively, we should be in better epistemic shape if we had access to perfect measurement techniques. (Say, we lived in a world where an angel would give us perfect measurements whenever we asked.) But yet these kinds of worries make one think that in such a world, a lot of standard probabilistic techniques would break down. That’s weird. Maybe we just need to be thankful that our measurement techniques are imprecise! I suppose it’s not so crazy to think lack of precision could epistemically help (I’m vaguely reminded of how randomness enables stochastic optimization methods).

Back to likelihood and Bayesianism. I take it that one way to put your worry is this. For Bayesians, zero probability worries come in when one conditions on evidence that has zero probability, and that doesn’t happen in practice. For likelihoodists, zero probability worries come in when one conditions on theories that have zero probability, and that seems rather more common. That’s a nice distinction. (But I think one could challenge the claim about the Bayesian side. The best example I can think of is that one of our observations is that space has only three large dimensions. What’s the prior probability of that observation? Plausibly (actually, I am not sure about this, but it’s plausible) it’s zero or worse–undefined!)

Ad 3: I’m assuming that P(Hi|E) = lim P(Hi(e)|E) as e goes to zero.

—-

Here’s a thought, and it touches on things other people have said about families of models. I think we sometimes conflate scientific models too much with theories, i.e., sets of propositions. Perhaps what the lihelihoodist is evaluating aren’t just theories, but models. And a model in addition to having associated with it a set of propositions has some other stuff. One of these other things a model may have would be a way of generating probabilistic predictions from the model, i.e., a measure. This measure is an integral part of the model. A not unreasonable way to think of QM is to think that, well, here’s some equations about wavefunctions, and then here’s the Born rule which gives us probabilistic predictions from the model. (Some cosmologists think of the measure on a multiverse as an integral part of the model. I have some worries about that application. But my worries may not generalize beyond the cosmological case.)

OK, now if we think in this way, then it’s a mistake in the Borel case to think of the likelihoodist as simply confirming H1 versus H2. Rather, she is confirming a model that includes H1 versus a model that includes H2. And in addition to including H1 or H2, the model must include a method for generating predictions, in this case the method being a probability measure on the relevant great circle. Thus, there are actually FOUR somewhat natural models in the vicinity of the Borel paradox: H1+metric thickening, H1+angular thickening, H2+metric thickening, H2+angular thickening. And we get the result that E confirms H1+angular thickening over H1+metric thickening as well as over H2+metric thickening, and likewise E confirms H2+angular thickening over H1+metric thickening as well as over H2+metric thickening. (Well, to be honest, I am still not convinced that the angular thickening models are as natural as the metric thickening ones, but let me concede that for the sake of the argument.) But E is neutral between the two models whose measure is given by the metric thickening limit, and it’s neutral between the two models whose measure is given by the angular thickening limit. And that is exactly how it should be.

If this is right, then we’ve learned something deep and interesting from your argument: it is crucial for likelihoodists to think of models as not just theories.

Greg Gandenberger says

I like your proposal that the Law of Likelihood applies to a substantive hypothesis only in the presence of a method for generating predictions from that hypothesis. I’d prefer to maintain this line if possible rather than restricting the Law of Likelihood to simple statistical hypotheses. It does fit well with the views of philosophers of science like Ronald Giere who emphasize the fact that scientific theories in the informal sense include more than just sets of sentences.

Relating this proposal to my example, you could read $H_1$ and $H_2$ as providing an implicit model, in which case we should perhaps accept the claim that

Efavors $H_2$ over $H_1$. Or you could read them as model-free, in which case one needs to choose between denying that the Law of Likelihood applies to $H_1$ and $H_2$ (Remedy 1) and claiming that it applies to them only relative to a model (Remedy 2). I’m not sure there’s basis for choosing between these two approaches other than verbal preference.Greg Gandenberger says

On further thought, I don’t think we can say that the Law of Likelihood applies to models in your sense. Instead, we should say that it applies to a pair of hypotheses either relative to a model or within an appropriate model (take your pick). Comparisons between hypotheses embedded in different coordinate systems seem quite inappropriate, as Michael has been emphasizing.

arpruss says

“Comparisons between hypotheses embedded in different coordinate systems seem quite inappropriate, as Michael has been emphasizing.”

I don’t see why. It’s not so much a matter of coordinate system, as that the different models make probabilistic predictions in some way or other. Their probabilistic predictions might be based on a coordinate system or on something else.

Greg Gandenberger says

2. Nice example, cleanly presented. I’m adding your posts on conditional probability to my reading queue.

I’m with you in thinking that it is odd that our probabilistic tools would break down if we tried to apply them to a datum from a perfectly precise measurement technique applied to a real-valued quantity. I find it vaguely comforting that (1) such techniques seem to be physically impossible even for reasons that have nothing to do with quantum phenomena, namely that they would have to be able to generate infinitely long output streams, and (2) if such a measurement technique did exist, it would with typically generate a datum such that fully processing that datum in finite time would be a supertask.

It would be nice to be more than vaguely comforted, especially given the point de Finetti makes in the passage I quote in the post (1974, 116):

It’s not clear to me whether putting aside the question of what would do if we had perfectly precise measurement techniques is dangerous in this way or not. Frankly, it’s hard to see how it could be. The question of whether probability should be countably or merely finitely additive seems to be a stronger candidate for a "seemingly completely otiose and insignificant" issue that actually is dangerous to ignore.

One thing we could do if we had perfectly precise measurements techniques would be to introduce imprecision by rounding off their outputs at some point. In fact, we would have to do so in the absence of a solution to the supertask problem. It might still seem a little odd that our current best theory would break down if we could find a way around the supertask problem and that there seem to be difficult if not intractable obstacles to developing a theory that would not break down. However, this fact seems less odd than the idea that we are lucky that our techniques are imprecise.

The example of the observation that space has only three dimensions is nice because one could want to consider how that observation bears on some cosmological theory that assigns it a positive probability. I don’t think that the possibility of such (plausibly) probability-zero observations is a problem for Bayesianism because an observation can’t have zero probability if a theory assigns it positive probability unless that theory also has zero probability, in which case Bayes’s theorem either yields a posterior probability of zero for that theory (if the datum has positive probability) or breaks down (if the datum has zero probability). I’m inclined to regard the fact that Bayes’s theorem breaks down for probability-zero data predicted by a probability-zero theory as excusable.

I’ll respond to your other points separately to try to keep the length of each comment within reasonable bounds.

arpruss says

Greg:

” I don’t think that the possibility of such (plausibly) probability-zero observations is a problem for Bayesianism because an observation can’t have zero probability if a theory assigns it positive probability unless that theory also has zero probability, in which case Bayes’s theorem either yields a posterior probability of zero for that theory (if the datum has positive probability) or breaks down (if the datum has zero probability).”

Right, but here we are precisely questioning whether Bayesianism works. Maybe we do have observations that have absolutely prior zero probability and they support theories that have absolutely prior zero probability, and so we do have breakdown? If that’s the only case of such breakdown (or if the number of cases is rare), this isn’t a big deal. Instead of working with absolutely prior probabilities (which are an abstraction anyway, of course), the Bayesian can just work with probabilities conditioned on there being three big dimensions.

That said, on reflection I am with you on the dimensions, because I think simple metaphysically possible theories shouldn’t be assigned prior probability zero, and there are simple metaphysically possible theories with three dimensions.

A different zero-probability worry occurs to me, though. Suppose consciousness is analog in nature and that evidence is constituted by conscious experiences. Then the probability of having this precise experience that I am now having may well be zero. Of course, any finite description of the experience will have a non-zero probability of being satisfied. But the evidence, perhaps, is the experience itself, and that has probability zero. Crazy?

Greg Gandenberger says

3. I think I must have misread something the first time because your point seems clear to me now. It is well taken. I’ll have to think more about how to address it. It does not seem to be an instance of Borel’s paradox, but it raises similar issues.

Michael Lew says

Greg, it seems that you have now got it, but I don’t think that it will really come as news to “likelihoodists”. Notice how Edwards’s statement of the likelihood principle and the likelihood axiom (which contains the law of likelihood) begin: “Within the framework of a statistical model”. The law of likelihood only tells us how to quantitate the evidence for hypotheses that are parameter values within a model, not evidence regarding distinct theories.

I have to say that the restriction to comparisons of likelihoods to those that lie on a single likelihood function encapsulates the requirement that the likelihoods relate to parameters within a model, but you can think of it in the other direction if you like.

People are much less likely to try to compare theories by way of likelihoods if we can avoid presenting problems in terms of H1 versus H2, and instead deal with continua of hypotheses. The hypotheses lie along axes of the likelihood function.

Greg Gandenberger says

We seem to be making progress!

Here’s another way to put my point. Restricting the Law of Likelihood to hypotheses that lie on a single likelihood function solves the problem presented by Borel’s paradox if it is understood to mean restricting it to simple statistical hypotheses themselves; it does not solve the problem if it is understood to mean restricting it to substantive hypotheses that merely

implysimple statistical hypotheses that can be included in a common likelihood function. (I would require that the substantive hypotheses be mutually exclusive, for reasons I explain in Section 3 of “New Responses to Three Counterexamples to the Likelihood Principle”.)The first restriction is natural and does conform to Edwards’s statements, but the second restriction also seems sufficient to account for the fact that likelihood functions are only defined up to proportionality constants. Moreover, unlike the first restriction it allows one to use the Law of Likelihood to address directly such questions as whether our evidence favors evolutionary theory over intelligent design or vice versa, a la Sober 2008.

Is there any reason to think that such applications are misguided? It’s just as true for substantive hypotheses as it is for statistical hypotheses that a likelihood ratio is the ratio of the posterior odds to the prior odds for the pair of hypotheses in question on the observed datum under Bayesian updating.

arpruss says

Michael:

The reason why restriction to a parametrized family of hypotheses can work is presumably that then there will then often be a natural way of defining the conditional probabilities for a fixed value of the parameter.

Often, but not always. Conditional probabilities with respect to a parameter are normally only defined up to almost-everywhere equivalence. In nice cases, there will be natural continuity constraints that will let one define the conditional probability not just almost everywhere, but everywhere. But still, that’s only in nice cases…

Greg Gandenberger says

Alexander:

Not crazy, I think. De Finetti seems to have had a similar thought (1972, 203):

There seem to be exactly two possibilities:

I suspect one could make a good case for 1, but I don’t know. 2 is not too troubling given that the full content of our conscious experience isn’t available for the kind of explicit reasoning for which we need a theory of inductive inference anyway.

Still, it does seem that a fully satisfactory theory should in principle be able to handle the full content of our conscious experience as evidence whether that content is analog or not.

Michael Lew says

Greg, you say that one version of restriction “allows one to use the Law of Likelihood to address directly such questions as whether our evidence favors evolutionary theory over intelligent design or vice versa, a la Sober 2008.” and then you ask “Is there any reason to think that such applications are misguided?” My response to your question is yes, there is a reason, and it might be a ‘big’ reason.

Evidence relating to evolutionary theory consists in part of observational measurements and in part experimental results. The evidence that relates to intelligent design consists of observational measurements (presumably) and argument from principles of “I don’t understand how… without a designer” (definitely) and from authority. Thus the evidence relevant to a decision between those two theories overlap by at most a little bit (observations). I don’t think that it would be possible to devise a scaling of importance of the non-overlapping bits of evidence in order to make any meaningful summary of the evidence in terms of one or more ratios of likelihoods. I would personally weigh experimental results quite heavily, someone else might find argument from authority as more compelling.

(Yes, I notice that I may have just used an argument in the form of “I don’t understand…”. Hoist on my own petard?)

Your example of competing theories is an extreme one, but it serves my purposes very well indeed. There are decisions that we may like to make for which it would be nice to have some sort of objective index of evidence but for which the nature of the evidence precludes any objective analysis.

Greg Gandenberger says

I’m not seeing the difficulty here. For any phenomenon, we can ask, “How probable is this phenomenon on the assumption that evolutionary theory is true?” and likewise on the assumption that intelligent design is true, and look at the resulting likelihood ratio. To the extent that the likelihood ratio is reasonably precise and objective, it tells us how learning the relevant datum would shift the opinions of a Bayesian agent. To the extent that it is vague and/or subjective, it tells us that an argument for evolution or ID based on that phenomenon will be vague and/or subjective to the same degree.

Sober analyzes design arguments in this way in

Evidence and Evolutionand in this paper. What he’s doing seems to me legitimate, with the caveat that I think likelihoodist arguments need to be understood as either robust Bayesian arguments or “choose your own priors” Bayesian argument templates.I don’t see why the fact that evolutionists and ID theorists use different kinds of arguments is particularly problematic. When an ID theorist says “I don’t understand how… without a designer,” for instance, the thing to consider is how probable the “…” is according to evolutionary theory and according to intelligent design. Insofar as one can pin down the relevant probabilities, I see no problem with applying the Law of likelihood to such a case.

Michael Lew says

Experience CANNOT be truly analogue: all perceptions are encoded by neural activity which consists of many types of information encoding, but a very important part of that encoding consists of all-or-none action potential events. You cannot sense anything in a truly analogue manner, and even if you could you wouldn’t ‘know’ about it because memories are stored by molecular mechanisms which have finite capacity.

Greg Gandenberger says

Good, thanks. That is the sort of argument I had in mind. It does leave some room for worries. For instance, for all I know my conscious experiences could depend in an analogue manner on the volume of blood in my amygdala.

Greg Gandenberger says

Alexander:

I’m proposing that the Law of Likelihood allows you to compare hypotheses within a coordinate system and an associated limiting procedure, but not to compare a hypothesis with one coordinate system and associated limiting procedure to another hypothesis with a different coordinate system and associated limiting procedure. The coordinate systems and associated limiting procedures are not part of the things they are being compared, but rather the backdrop against which the comparison is taking place.

Does that seem right to you? Obviously there are details to work out, but I’d like to focus on the basic intuition for the moment.