### Theorem

Theorem.At least one of the following claims is false:^{1}

C1.$E$ favors $H_1$ over $H_2$ (where $H_1$ and $H_2$ are mutually exclusive) if and only if $\Pr(E|H_1)/\Pr(E|H_2)>1$, with $\Pr(E|H_1)/\Pr(E|H_2)$ measuring the degree of that favoring.C2.The degree to which one is justified in believing $H_1$ rather than $H_2$ is that degree to which one’s total evidence favors $H_1$ over $H_2$.C3.If the degree to which one is justified in believing $H_1$ is strictly positive, and $H_1$ and $H_2$ are mutually exclusive, then the degree to which one is justified in believing $H_1\cup H_2$ rather than $H_3$ is strictly greater than the degree to which one is justified in believing $H_2$ rather than $H_3$.C4.If $H_1$ is logically equivalent to $H_2$ and $H_3$ is logically equivalent to $H_4$, then the degree to which one is justified in believing $H_1$ over $H_3$ is the same as the degree to which one is justified in believing $H_2$ over $H_4$.C5. If one is justified in believing $H_1$ over $H_3$ and $H_4$ over $H_2$, then the degree to which one is justified in believing $H_1$ over $H_3$ is not the same as the degree to which one is justified in believing $H_2$ over $H_4$.

### Proof

The proof of this claim is a variant of Von Mises’ water-wine example (1951, 77-8). Suppose you are told that $1/2<r\leq 2$, where $r$ is the ratio of water to wine in a particular bottle. You have no further information about $r$ or how it was determined. A three-sided die is to be rolled that has been weighted according to $r$ so that for any $1/2<r_0\leq 2$, $\Pr(d=1|r=r_0)=1/2$, $\Pr(d=2|r=r_0)=-1/3r_0+2/3$, and $\Pr(d=3|r=r_0)=1/3r_0-1/6$.

Suppose $d=1$. Then C2 implies that the outcome of the die roll is neutral among $1/2<r\leq 1$, $1<r\leq 3/2$, and $3/2<r\leq 2$. We can simply stipulate that $d=1$ is our total evidence with respect to these hypotheses, so that C1 implies that one is not justified in believing any of them over any of the others. Thus, C3 implies that the degree to which one is justified in believing $1<r\leq 2$ rather than $1/2<r\leq 1$ is strictly positive.

But now consider the ratio $r’$ of wine to water instead of the ratio $r$ of water to wine. You know that $1/2\leq r'<2$ and have no further information about $r’$ or how it was determined. The three-sided die described above has the characteristics $\Pr(d=1|r’=r_0)=1/2$, $\Pr(d=2|r’=r_0)=-1/(3r_0)+2/3$, and $\Pr(d=3|r’=r_0)=1/(3r_0)-1/6$.

When $d=1$, C2 implies that the outcome of the die roll is neutral among $1/2\leq r'< 1$, $1\leq r< 3/2$, and $3/2\leq r< 2$. We can stipulate that $d=1$ is our total evidence with respect to these hypotheses, so that C1 implies that one is not justified in believing any of them over any of the others. Thus, C3 implies that the degree to which one is justified in believing $1\leq r'< 2$ rather than $1/2\leq r'< 1$ is strictly positive.

**But now we arrive at a contradiction.** For $H_1: 1<r\leq 2$ is equivalent to $H_2: 1/2\leq r'< 1$, and $H_3: 1/2<r\leq 1$ is equivalent to $H_4: 1\leq r'< 2$, yet we have said that we are justified in believing $H_1$ over $H_3$ and $H_4$ over $H_2$. By C4, it follows that the degree to which we are justified in believing $H_1$ over $H_3$ and the degree to which we are justified in believing $H_2$ over $H_4$ are the same. By C5, it follows that they are not the same.

**Therefore, at least one of C1-C5 is false.**

### Discussion

In my previous post, I showed that for any rule of the form “believe hypothesis $H$ to degree $k$ upon acquiring evidence $E$ if the likelihood ratio $\Pr(E|H)/Pr(E|\sim H)$ exceeds $a$” where $a>1/2$, there is a hypothesis $H$ and a possible piece of evidence $E$ such that this rule would lead one to believe both $H$ and its negation to degree $k$ upon acquiring $E$.

A comment from Michael Lew led me to realize that my proof of this result does not work if $H$ is required to be a statistical hypothesis with respect to $E$ (that is, a hypothesized probability distribution for the process that produced $E$). Thus, the result leaves open the possibility that some rule for fixing degrees of belief based on likelihood functions alone (presumably with some kind of prior ignorance or total evidence requirement) would provide useful guidance.

I take the theorem demonstrated here to speak against this possibility by showing that **a likelihoodist must reject C2.** A likelihoodist who rejected C1 would no longer be a likelihoodist because C1 is just the Law of Likelihood. I have argued for C1 here and here. C3-C5 seem innocuous. C3 is a qualitative superadditivity assumption. It would hold in both probabilistic theories and important alternatives to probabilistic theories such as Dempster-Shafer theory, if those theories were adapted in natural ways to the case of relative degrees of belief. C4 essentially says that justification for belief is invariant under substitution of logical equivalents. Abiding by C4 would require a kind of logical omniscience, but that fact is no objection to C4 as an ideal standard of rationality. C5 follows from the obvious structural assumption that the degrees to which one could be justified in believing $H_1$ over $H_2$ or vice versa lie along a one-dimensional continuum, passing monotonically from arbitrarily high justification for $H_1$ over $H_2$, to a neutral point, to arbitrarily high justification for $H_2$ over $H_1$.

**It is hard to see how likelihoodism could stand on its own without C2.** C2 seems to be the only *prima facie* plausible way to connect the notion of evidential favoring to that of justified belief without appealing to prior degrees of belief or the frequentist properties of one’s procedure. Likelihoodism simply does not work as a theory of inference or decision, even for statistical hypotheses and even when one abides by the Principle of Total Evidence.

**Question:** How could a likelihoodist respond to this argument?

To **share your thoughts about this post**, comment below or send me an email. To use $\LaTeX$ in comments, surround mathematical expressions with single dollar signs for inline mode or double dollar signs for display mode.

- C1-C5 are implicitly quantified over all $E$, $H_1$, etc. for which the relevant quantities are well defined. ↩

Branden Fitelson says

Another nice post, Greg!

Likelihoodists (e.g., Royall, Sober) generally (explicitly) reject your C2. They maintain only that the likelihood ratio is a measure of some sort of comparative or relational confirmation (i.e., a way to gauge the degree to which E — evidence generated by a particular sort of statistical experiment and which is not one’s total evidence — constitutes better evidence for H1 than H2). They offer no story about how to connect these “degrees of favoring” with anything like degrees of belief. Perhaps that is the upshot of your current complaint?

Branden Fitelson says

In defense of Likelihoodists (and as you know I am not one of them), they are looking for a principled way to guide the interpretation of particular experimental findings — which, to be fair, are the sorts of things that are generally reported in particular scientific papers. Scientists generally design and implement particular experiments with particular statistical designs, etc. And, these yield particular evidential outcomes. In general, these are not the sorts of things that reasonably sanction recommendations about “relative degrees of belief”, since those generally depend on the totality of all scientific evidence relevant to a comparison of hypotheses. And, generally, no author of a particular scientific paper is in a position to reasonably offer such advice. This, it seems to me, is the proper context in which to situate the Likelihoodist and their project (which is indeed a rather modest one, but that — in itself — doesn’t seem like such a fair criticism of them).

Greg Gandenberger says

Yes, good. Royall and Sober say quite clearly that likelihoodism addresses only questions about evidence and not questions about belief or action. I’m asking what good such a measure is without a Bayesian or frequentist bridge to connect it to belief or action.

A tempting response (which neither Royall nor Sober gives) is that you can at least sometimes simply identify relative belief with relative evidence. The point of the present post is that this approach won’t work even in the best-case scenario in which one takes into account all of the available evidence. Sober does say that likelihoodists accept the Principle of Total Evidence (2008, p. 41), although you’re right that it’s typically applied to particular experimental findings.

Branden Fitelson says

Thanks, Greg. That’s helpful. In a sense, this result shows that one *must* interpret Likelihoodism (or perhaps any notion of relative incremental confirmation, for that matter?) as being intrinsically “local” or “component” in nature. Some very substantive machinery will be required to form a bridge between these “component evidential forces” (if you will) and the “resultant epistemic force of the total evidence”. Neat stuff!

Greg Gandenberger says

The component forces/resultant forces analogy is potentially misleading because it is at least sometimes possible to combine likelihood ratios from multiple pieces of evidence to get a likelihoodist characterization of the evidential import of the total evidence. When the pieces of evidence are independent, for instance, one can simply multiply the likelihood ratios.

To extend the analogy, a Bayesian would say that the problem is that a correct characterization of the resultant force (the import of the total evidence) only tells us about the acceleration (warranted change in belief). What we really want to know is the final velocity (warranted posterior belief), but we can’t get that without an initial velocity (warranted prior belief).

That’s not the argument I give here, though. I don’t think there is a physics analogy for that argument. It would have to involve forces and velocities transforming differently across reference frames, or something like that.

One could say that acceleration (change in belief) is interesting in itself, or that it might have uses that we cannot foresee. Maybe. I grant that measures of confirmation/favoring could have uses in psychology, the history of science, and science journalism. What I doubt is that reporting a likelihood function provides information that is useful for evaluating rival hypotheses in a way that is not parasitic on some kind of Bayesian or frequentist argument.

Branden Fitelson says

Yes, of course, I’m inclined to agree, since I prefer a unified Bayesian approach where confirmation relations are measured via Bayes factors, which integrate in the usual way with priors to yield posteriors; thus, providing a unified approach to both confirmation as increase in firmness and confirmation as firmness (and “favoring” as well, although for me that is a derivative concept).

Michael Lew says

Greg, perhaps you should be looking at the hypothesis scale for estimation of parameter values rather than making interval hypotheses. (As always, plot a likelihood function and see if your treatment makes sense.)

The problem goes away entirely if you deal with only point hypotheses.

Michael Lew says

A likelihoodist would respond to the argument by pointing out that the data support all values of r between 1/2 and 2 equally, and all values of r’ between 2 and 1/2 equally. There is no problem or contradiction in that.

Greg Gandenberger says

Likelihoodism claims to be able to address both interval hypotheses and point hypotheses as long as the likelihoods are well defined, as they are in this case.

I agree with the likelihoodist characterization of the evidential import of the data (supporting all values of $r$ and all values of $r’$ equally). But what are we to do with that characterization, if not (at least an informal version of) Bayesian updating or frequentist testing? As far as I can tell, likelihoodists have no adequate answer to that question. The failure of C2 rules out one option that has some

prima facieplausibility.Isabela says

Michael LewOK, I’ve read your paper and, I think, understood much of it.I wdnoer if the problem might vanish if one were to insist that p-values be calculated only assuming a fixed sample size. The other calculations, as my previous comment suggests, are specified for the purposes of fixing ‘pre-data’ error rates (what I consider to be ‘global’ error rates) rather than for the purposes of inference. Neyman and Pearson (1933) were quite explicit in denying the notion that experiments could support inductive inference, so I think that attempting to apply inferential principles to p-values that have been adjusted to conform to their system is probably unhelpful.If the results in example 4 are determined as for a fixed sample size in both cases then the process that you call ‘Birnbaumization’ would not lead to any contradiction. (It would be silly to apply Birnbaumization to two results that were identical, but that is a different issue.)Claims:1. Error rates within the Neyman-Pearson error-decision framework exist in a different logical space from the likelihood principle et al., which apply to considerations of experimental evidence, and so any claims of compliance or non-compliance are valueless.2. This critique of Birnbaum’s argument relies, at least in part, on the non-compliance with the conditionality principle of results calculated according to the Neyman-Pearson framework.I cannot say that I understand your paper fully and I have only read it once, so my claims may be wrong. In particular, I’m assuming that your phrase ‘the sampling theorist’ and ‘sampling theory statistics’ denotes a frequentist system that includes Neyman-Pearson error-decision framework. However, even if I have made a mistake, be it minor or a howler, I’m fairly sure that there is something worth discussing in my claims.

Michael Lew says

I’m not sure that likelihoodist methods do allow consideration of intervals, and I suspect that they should not. Do you have a source for your first assertion?

The reason that I think that they should not allow consideration of interval hypotheses is the fact that P-values for two-sided interval nulls can be shown to be incoherent, [whatever that means], whereas the P-values for either point nulls and the P-values from one-sided significance tests are entirely coherent (Schervish 1996). Given that P-values index likelihood functions it seems plausible to me that likelihood-based inference is also affected by interval hypotheses.

(As a counter-argument to my own ideas, the conventional approach to dealing with continuous measures with finite precision is to treat them as intervals.)

Schervish, M. J. (1996). “P-values: What they are and what they are not,” American Statistician, 50, 203-206.

Greg Gandenberger says

Hypotheses of the form $a< r\leq b$ and $a\leq r'< b$ imply definite probabilities for $d=1$ (1/3 for any $a$ and $b$), so they seem to qualify as statistical hypotheses with respect to $d=1$ on any account of what it is to be a statistical hypothesis. Moreover, the interval vs. point distinction is representation-dependent in a way that the applicability of the Law of Likelihood is not. (You could simply replace $r$ with a variable that takes three values according to whether $1/2\leq r< 1$, $1\leq r< 3/2$, or $1\leq r< 3/2$.) I'll get back to you about the Schervish point after I've had a chance to look at his paper again with this issue in mind.