One possible response to “Borel’s Paradox as a Counterexample to the Law of Likelihood,” that I did not mention in my post introducing the counterexample is to **restrict the Law of Likelihood to simple statistical hypotheses**, as opposed to substantive hypotheses that merely give rise to simple statistical hypotheses when conditioned upon. I’ll call this response **Remedy 1′** because it is similar to Remedy 1 but proposes a stronger restriction.

An obvious objection to Remedy 1′ is that it is **unnecessary,** at least for the purpose of addressing Borel’s paradox: it proposes a stronger restriction than Remedy 1, which is sufficient to block the counterexample. Moreover, its additional strength is apparently costly because it would prevent us from applying the Law of Likelihood in many cases in which we would like to be able to evaluate the degree to which some data favor one hypothesis over another. For instance, it would prevent Sober from applying the Law to the debate between advocates of evolutionary theory and advocates of intelligent design (see Sober 2008, Ch. 2 and Sober 2004).

For what it is worth, Remedy 1′ returns likelihoodism to its roots in (Hacking 1965) and (Edwards 1972). Hacking states the Law in terms of “simple joint propositions” (59), which are of the form “the distribution of chances on trials of kind $K$ on set-up $X$ is $D$; outcome $E$ occurs on trial $T$ of kind $K$” (57). Edwards begins his statement of the Law (his “Likelihood Axiom”) with the words “Within the framework of a statistical model…” (31).^{1} For better or for worse, prominent likelihoodists have since abandoned this restraint (see e.g. Royall 1997, 3 and Sober 2008, 32).

Beyond an appeal to the authority of Hacking and Edwards, I can see **two arguments** for adopting Remedy 1′ rather than Remedy 1:

- Remedy 1′ allows you to deny that likelihood functions are conditional probabilities at all, thereby avoiding other difficulties in addition to Borel’s paradox.
- Remedy 1′ follows from the well-motivated requirement that the hypotheses to which the Law of Likelihood applies be arguments of a common likelihood function.

I will consider these arguments in turn. I claim that they are insufficient to warrant preferring Remedy 1′ to Remedy 1. Nevertheless, Remedy 1′ may be a viable fallback option if Remedy 1 fails.

## Argument 1

Restricting the Law of Likelihood to simple statistical hypotheses allows one to deny that the likelihoods that appear in it are conditional probabilities. Instead of $\mbox{Pr}(E|H_1)/\mbox{Pr}(E|H_2)$, one could express the Law of Likelihood in terms of $\mbox{Pr}(E;H_1)/\mbox{Pr}(E;H_2)$, where $\mbox{Pr}(E;H)$ is the probability of $E$ “under the assumption that $H$ is true,” to paraphrase Mayo (2005, 102).^{2}

Some frequentists, including Mayo, say that the likelihoods to which they appeal are to be understood in this way. This view is necessary for someone who takes $\mbox{Pr}(E|H)=\mbox{Pr}(E;H)/\mbox{Pr}(H)$ to *define* $\mbox{Pr}(E|H)$ and who agrees with the orthodox frequentist claim that $\mbox{Pr}(H)$ must be either zero or one in typical cases.

**For those who regard conditional probability as primitive, I see no advantage in denying that likelihood functions are conditional probabilities.** The debate about whether or not conditional probability should be regarded in this way is ongoing and cannot be adequately summarized here; suffice it to say that the view that it should be regarded as primitive seems to have strong arguments in its favor (see e.g. Hajek 2003). As a result, Argument 1 is far from compelling, although this issue cannot be regarded as closed.

## Argument 2

The Likelihood Principle says that the evidential meaning of a datum *E* with respect to a set of hypotheses depends only on its probability as a function of those hypotheses, up to a constant of proportionality. One can eliminate the clause “up to a constant of proportionality” from this formulation by stipulating that likelihood functions are only *defined* up to a constant of proportionality. This move could lead one to say that likelihoods from different likelihood functions are incommensurable because their proportionality constants may be different, and thus that the Law of Likelihood does not apply to hypotheses drawn from different likelihood functions. One possible interpretation of this claim would require that the relevant likelihoods be distinct *statistical* hypotheses, as Remedy 1′ proposes.

**This argument has three major steps, all of which are problematic.** First, the fact that the Likelihood Principle says that evidential meaning depends on a likelihood function only up to a constant of proportionality does not require us to regard likelihood functions as defined only up to a constant of proportionality. Second, the claim that likelihood functions are defined only up to a constant of proportionality does not require us to regard likelihoods from different likelihood functions as incommensurable.

Take, for instance, an observation $X=0$ for some random variable $X$. One possible likelihood function for this variable says that $X$ is normally distributed with unknown mean $\mu$ and known standard deviation $\sigma=1$. Another possible likelihood function says that $X$ is normally distributed with unknown mean $\mu$ and known standard deviation $\sigma=2$. Why can’t we use the Law of Likelihood to say that $X=0$ favors $\mu=0$ within the first likelihood function over $\mu=1$ within the second likelihood function? We can simply regard both of those hypotheses as belonging to a more comprehensive likelihood function that treats both $\mu$ and $\sigma$ as unknown.

Finally, the claim that the relevant likelihoods must originate on a single likelihood function could be taken to allow applying the Law of Likelihood to substantive hypotheses that when conditioned upon give rise to simple statistical hypotheses that belong to a common likelihood function. This interpretation seems to satisfy the motivation for requiring that the relevant likelihoods must originate on a single likelihood function, but it does not imply Remedy 1′.

## Conclusion

Arguments 1 and 2 are the only arguments I can see for preferring Remedy 1′ to Remedy 1. They are insufficient. Thus, it seems reasonable to prefer Remedy 1 to Remedy 1′ given that Remedy 1 allows what seem to be sensible applications of the Law of Likelihood that Remedy 1′ rules out.

- Thanks to Michael Lew for pointing out that Edwards restricts his Likelihood Axiom in this way in this comment on a previous post. ↩
- It is not clear to me what the notion of “probability under an assumption” is supposed to amount to in general. However, it is clear what the probability of $E$ is under the assumption that $H$ is true when $H$ is a simple statistical hypothesis with respect to a random variable of which $E$ is a possible value, which is all that I need for present purposes.↩

Thanks to Michael Lew for the comment that led me to consider Remedy 1′.

To **share your thoughts about this post**, comment below or send me an email. Comments support $\LaTeX$: surround mathematical expressions with single dollar signs for inline mode or with double dollar signs for display mode (just as you would in writing a $\LaTeX$ document).

## Leave a Reply