R.A. Fisher (1922, 310) and A.W.F. Edwards (1972, 9), among others, characterize likelihood functions as defined only up to constants of proportionality. **If those constants are necessary, arbitrary, and model-specific, then the Law of Likelihood applies only to hypotheses that belong to a common model.** For if a pair of hypotheses $H$ and $H’$ belong to different models, then their likelihood ratio on some datum $A$ is not given by $\Pr(A|H)/\Pr(A|H’)$, but rather by $c\cdot\Pr(A|H)/c’\cdot\Pr(A|H’)$ for some pair of constants $c$ and $c’$ that do not stand in any definite relation to one another.

**If the two hypotheses I consider in my counterexample to the Law of Likelihood do not belong to a common model, then this restriction on the scope of the Law of Likelihood suffices to block the counterexample.** The claim that they do not belong to a common model has some plausibility because one concerns a zonal circle while the other concerns a meridional circle. I will discuss this potential application of the claim that the Law of Likelihood only applies within a model in a future post. Here my focus is on the claim itself.

**I can see no argument beyond an appeal to authority for the claim that likelihood functions must be defined only up to model-specific constants of proportionality.** It might be convenient to multiply a likelihood function by a constant so that, for instance, the maximum of the function is always 1. But the claim that normalizing the likelihood function in this way is useful does not require defining “likelihood function” in such a way that a normalized likelihood function is itself a likelihood function.

Similarly, the fact that the Likelihood Principle says that two experimental outcomes are evidentially equivalent with respect to a set of hypotheses if their likelihood functions over those hypotheses are proportional does not require calling proportional likelihood functions the same likelihood function. It only requires giving those likelihood functions the same evidential interpretation. And the Law of Likelihood does so without any restrictions: if $\Pr(E_1|H)=c\cdot\Pr(E_2|H)$ for some constant $c$ and all $H$ in some partition, then $\Pr(E_1|H_1)/\Pr(E_1|H_2)=\Pr(E_2|H_1)/\Pr(E_2|H_2)$ for all $H_1$ and $H_2$ in that partition.

### Objection

There might be arguments against applying the Law of Likelihood across models that do not depend on the claim that likelihood functions must be defined only up to model-specific constants of proportionality. For instance, **consider the following example**, which generalizes an example Michael Lew presented to me in personal correspondence.

**Example.** Let $E$ be a report that a certain coin with unknown bias $p$ for heads landed heads $h$ times in $n$ tosses ($n\geq h$), where it is known that the tosses were independent and that either the number of heads or the number of tosses was fixed in advance, but it is not known which one. Let $H_1$ be the hypothesis that $p=p_0$ for some $0< p_0< 1$ and that the number of tosses was fixed at $n$ in advance. Let $H_2$ be the hypothesis that $p=p_0$ and that the number of heads was fixed at $h$ in advance.

It might seem that $E$ contains no information about whether the number of heads or the number of tosses was fixed in advance. But applying the Law of Likelihood to to this example yields the conclusion that $E$ favors $H_1$ over $H_2$ to the degree $n/h$ for any $p_0$. For $$\frac{\Pr(E|H_1)}{\Pr(E|H_2)}=\frac{\binom{n}{h}p^h(1-p)^{n-h}}{\binom{n-1}{h-1}p^h(1-p)^{n-h}}=\frac{n}{h}.$$ It also yields the conclusion that $E$ favors the claim that the number of tosses was fixed at $n$ in advance over the claim that the number of heads was fixed at $h$ in advance to the degree $n/h$ if we assume that evidential favoring obeys the following attractive constraint (an analogue of conglomerability): if $E$ favors $A\mbox{&}C$ over $B\mbox{&}C$ for all $C$ in some partition, then it also favors $A$ over $B$.^{1} (Let $A$ be the hypothesis that $n$ is fixed, $B$ the hypothesis that $h$ is fixed, and $C$ the set of hypotheses of the form $p=p_0$ for $0<p_0<1$.) Thus, **this example seems to speak against the claim that the Law of Likelihood applies to hypotheses from different models**.

### Response

The example just presented is not as compelling as a counterexample to the Law of Likelihood across models as it might seem. The Law of Likelihood does *not* yield the conclusion that $E$ favors the hypothesis $H_1^*$ that the number of heads was fixed in advance over the hypothesis $H_2^*$ that the number of tosses was fixed in advance. Instead, it says that $E$ favors the hypothesis $H_1$ that the number of heads was fixed at $h$** ** over the hypothesis $H_2$ that the number of tosses was fixed at $n$, where $n$ and $h$ are the number of tosses and the number of heads reported, respectively.

This difference is important, because it would seem to be problematic if the Law of Likelihood said that the data from an experiment would favor some non-random hypothesis over another non-random hypothesis regardless of the outcome of that experiment. But $H_1$ is a random hypothesis if $H_2$ is true, and $H_2$ is a random hypothesis if $H_1$ is true. The fact that you can *after the fact* formulate some hypothesis according to which the number of heads was fixed and some hypothesis according to which the number of tosses was fixed such that the data reported favor the former over the latter is not obviously troubling.

Even assuming the analogue of conglomerability stated above, the Law of Likelihood does not say whether $E$ favors the $H_1^*$ over $H_2^*$ or vice versa without a prior probability distributions over $h$ on the assumption that $n$ is fixed and a prior probability distributions over $n$ on the assumption that $h$ is fixed. Given such distributions, what the Law of Likelihood says about whether $E$ favors the $H_1^*$ over $H_2^*$ or vice versa depends on $h$ and $n$, as it should.

It might seem wrong to say that any $(n,h)$ pair favors the corresponding $H_1$ over the corresponding $H_2$, but **I find this claim acceptable upon reflection**. To evoke clearer intuitions, consider an extreme cases such as $n=1000$ and $h=1$. There are 1000 ways this result could have arisen if $n=1000$ was fixed in advance: the head could have occurred in the first toss, the second toss, …, or the thousandth toss. But there is only one way it could have arisen if $h=1$ was fixed in advance: the head must have occurred on the thousandth toss. Considering this fact makes it seem rather plausible to me that the datum $n=1000, h=1$ does in fact favor the hypothesis that $n=1000$ was fixed over the hypothesis that $h=1$ was fixed to the degree $n/h=1000$.

### Conclusion

So far, I have seen no reason I find convincing to think that the Law of Likelihood cannot be applied to hypotheses that belong to different models. **The claim that likelihood functions are defined only up to constants of proportionality does not convince me, nor does the purported counterexample considered above.**

**Question:** *Are there reasons to restrict the Law of Likelihood to hypotheses belong to a common model that I am missing?*

**Question:**

Thanks to Michael Lew for pressing the argument considered here in comments on this blog and in personal correspondence–which is not to say that he would agree with the way I have presented it or with any of my responses to it!

To **share your thoughts about this post**, comment below or send me an email. To use $\LaTeX$ in comments, surround mathematical expressions with single dollar signs for inline mode or double dollar signs for display mode.

The image of R. A. Fisher at the top of this post is in the public domain.

- This claim can be proven if there is a probability distribution over the relevant partition–regardless of what that distributions is–but it seems eminently plausible even when that partition is a hypothesis space over which many likelihoodists would deny that there can be a meaningful probability distribution. ↩

Michael Lew says

Greg, a belated thought: what happens if the example is changed slightly so that rather than specifying H1 and H2 it simply says that the stopping rule is unknown?

If one is allowed to hypothesise about the models then there is no reason to restrict the models to H1: stop at n tosses and H2: stop at h heads. Thus I propose H3 which is the set of any stopping rule other than those implied by H1 and H2. As [not (H1 or H2)] it is a perfectly well-formed hypothesis despite its non-specificity.

Imagine I came to you with a result of n and h and refused to tell you the stopping rule (I might actually do that just to make this real: I’ve tossed my 50 cent coin 11 times with a result of 4 heads!). You would be foolish to restrict yourself to H1 and H2 when I might well have stopped for entirely arbitrary idiosyncratic reasons. H1: I tossed the coin 11 times and counted heads; H2: I tossed the coin until I observed 4 heads; H3: I did neither H1 nor H2.

In suggesting that likelihoods can be used to quantify support of hypotheses across different models you have to be able to deal with H3, and I don’t think that your rationalisation can be stretched far enough to do so.

Greg Gandenberger says

$H_3$ does imply a distribution for $X$, so I would have to specify one somehow, presumably subjectively. But whether you want to allow subjective likelihoods or not is a separate issue from whether the Law of Likelihood can be applied across models, given the relevant likelihoods. I don’t see how $H_3$ is problematic for the latter claim.