### Theorem

Assume that if $H_1$ and $H_2$ are logically equivalent given one’s evidence, then one believes them to the same degree.

Theorem:For any rule of the form “believe hypothesis $H$ to degree $k$ upon acquiring evidence $E$ if the likelihood ratio $\Pr(E|H)/\Pr(E|\sim H)$ exceeds $a$” where $a>1/2$, there is a hypothesis $H$ and a possible piece of evidence $E$ such that this rule would lead one to believe both $H$ and its negation to degree $k$ upon acquiring $E$.

This result implies as a special case that any likelihood ratio cutoff for full belief can lead one to accept fully both a hypothesis and its negation, assuming that full acceptance is a degree of belief (e.g. $k=1$ on the assumption that degrees of belief are probabilities).

### Proof

Fix $k$ and $a$. Let $X_1$ and $X_2$ be Bernoulli random variables. Let $E$ be the evidence $X_1=1$, $H_1$ the hypothesis $X_1=X_2=1$, and $H_2$ the hypothesis $X_2=1$. There is a joint distribution for $X_1$ and $X_2$ such that the likelihood ratio of $H_1$ against $\sim H_1$ on $E$ and likelihood ratio of $\sim H_2$ against $H_2$ on $E$ both exceed $a$. Thus, the rule under consideration would lead one to believe both $H_1$ and $\sim H_2$ to degree $k$. Yet given $E$, $\sim H_1$ and $\sim H_2$ are equivalent. Thus, one believes them to the same degree. Therefore, **the rule under consideration would lead one to believe both $H_1$ and $\sim H_1$ to degree $k$.**

The following joint distribution does the trick:

This distribution can be constructed as follows:

- Assign some fraction $0<b<1$ of the total probability mass (1) to $X_1=0,X_2=1$.
- Assign half of the remaining probability mass to $X_1=X_2=1$.
- Split the remaining probability mass between $X_1=X_2=0$ and $X_1=1,X_2=0$.

The likelihood ratios are $2(1+b)/(1-b)$ for $H_1$ against $\sim H_1$ and $\frac{1}{2}(1+b)/(1-b)$ for $\sim H_2$ against $H_2$. These ratios can be made both greater than $a$ by choosing $b>\frac{2a-1}{2a+1}$. (Note that this rule gives $0<b<1$ for any $a>1/2$.)

### Discussion

One could think that it is a proper constraint on measures of evidential favoring that some rule of the form “believe hypothesis $H$ to degree $k$ upon acquiring evidence $E$ if the degree to which $E$ favors $H$ over its negation evidentially is greater than $a$” would not for any $E$ and $H$ lead one to assign a large degree of belief to both $H$ and its negation. In the presence of this constraint, the theorem presented here implies that the Law of Likelihood is false.

To my mind, this constraint seems much less compelling than axiom sets that entail the Likelihood Principle and arguments from the Likelihood Principle to the Law of Likelihood (see Section 2 of the linked document). Thus,** I take the result shown here to be a point against the idea that the Law of Likelihood can provide useful guidance for belief in the absence of prior probabilities and frequentist test procedures, but not against the idea that the Law of Likelihood is a good explication of the informal notion of (incremental) evidential favoring.**

**Restricting the Law of Likelihood to mutually exclusive hypotheses (as I do) does not help** even though $H_1$ and $H_2$ are not ually exclusive because the Law is not applied to the comparison between $H_1$ and $H_2$ but between each of $H_1$ and $H_2$ and its negation.

Note that** there is no problem for Bayesianism here**: the posterior probabilities of $H_1$ and $H_2$ given $E$ are equal. (They are both 2/3.) $H_1$ is the conjunction of $H_2$ and $E$, so $H_1$ starts with a lower prior probability than $H_2$ ($(1-b)/2$ versus $(1+b)/2$) but gets more of a “boost” from $E$. It is these boosts that the likelihood ratios measure.

It is obvious that no rule of the form considered here would satisfy a Bayesian because it would lead one to accept $H$ in cases in which its posterior probability were as small as one likes. (Just make the prior probability sufficiently small.) The theorem given here is a useful supplement to this Bayesian argument against such a rule because it makes no reference to subjective prior probabilities.

**Restricting the rule so that it only applies when one is completely ignorant of the probability of $H$ prior to acquiring $E$ would not help**, assuming that one can be completely ignorant of both the probability of $H_1$ and the probability of $H_2$ prior to acquiring $E$ in a case of the kind constructed in the proof above.

An obvious response to this theorem is that of course likelihood ratios measure the *change* in relative degree of belief that the evidence warrants rather than the *posterior* relative degree of belief that it warrants. That is exactly my point. Given that point,** it is hard to see how a pure likelihoodist methodology could be a viable alternative to Bayesian and frequentist methodologies**. How are we to use likelihood ratios to guide our beliefs or actions without reference to prior probabilities or the pre-experimental operating characteristics of our procedures? And of what use is a method that we cannot use to guide our beliefs or actions?

Seidenfeld (1985, 265-6) uses a similar approach to show that a likelihood ratio cutoff of 19/9 for full belief can lead to inconsistencies.

To **share your thoughts about this post**, comment below or send me an email. To use $\LaTeX$ in comments, surround mathematical expressions with single dollar signs for inline mode or double dollar signs for display mode.

Michael Lew says

This problem seem to me to be entirely a consequence of comparing hypotheses for which the likelihoods have different meanings. If H1 and H2 are parameter points along the axis of a likelihood function then the worst that can happen is that they are located either side of the most likely parameter value and have equal likelihoods. So what?

If your H1 is a point or a restricted range of parameter values then ~H1 is a composite hypotheses or a hypothesis for which the likelihood is integrated over a larger range of parameter values. Why would anyone want to compare them?

If your H1 is a complicated hypothesis like “the sun revolves around the earth” then ~H1 is an ill-defined hypothesis which will in many places have different dimensions than H1 and thus a ratio of their likelihoods has no sensible meaning.

Your problem may point to natural restrictions on the scope of the law of likelihood, but it doesn’t invalidate it.

Greg Gandenberger says

Thanks for the comment, Michael!

How do the scenarios you’re considering relevant to the example I gave? I tell you what $H_1$ and $H_2$ are in terms of a pair of Bernoulli random variables. They’re neither parameter points nor complicated hypotheses like “the sun revolves around the earth.” $\sim H_1$ and $\sim H_2$ seem perfectly well-defined and have definite prior probabilities and likelihoods.

Maybe your idea is that $H_1$ and $H_2$ are inappropriate because they do not posit parameter points. Put another way, they are hypotheses about the

outcomeof a random process rather than about the (stochastic properties of the) process itself. Perhaps the Law of Likelihood should be restricted to what Hacking calls “hypothes[es] about the distribution of outcomes from trials of kind $K$ on some set-up $X$” (1965, 27).How does that restriction relate to what you have in mind?

That restriction would save me some headaches, but I’ve yet to be convinced that it’s necessary. Again, I don’t think the example given here is a counterexample to the Law of Likelihood.

I am interested in the idea that perhaps likelihoodist methods are viable alternatives to Bayesian and frequentist methods when applied to hypotheses about the distribution of outcomes from trials of kind $K$ on some set-up $X$. (Some kind of prior ignorance is obviously needed as well.) The argument given here doesn’t rule out that possibility, and I don’t think it can be tweaked a little bit to rule out that possibility either. The example can’t be adapted for a case in which $H_1$ and $H_2$ are hypotheses about the probability distribution from which the data come because the argument requires $H_1$ and $H_2$ to be logically equivalent given the data.

Branden Fitelson says

Nice post, Greg! I basically agree with what you say here, but I guess I wouldn’t have been inclined to attribute anything over and above a “favoring” explication to Likelihoodists in the first place.

Having said that, I think it’s also worth mentioning here that your spin on these sorts of examples is a bit different than Teddy’s. He seems to think these examples show that “confirmation as increase in firmness” (as Carnap would have called it) is in some sense “useless” (he wonders at the end of the passage you cite “what one can do” with this concept). On the contrary, I think these examples show how useful the concept is. It allows us to say that E is positively relevant to one of the hypotheses and negatively relevant to the other (which, intuitively, is what’s going on in such examples, right?). And, it seems to me, such assessments of evidential relevance can be useful in epistemology.

More generally, I would favor a more pluralistic attitude (a la Carnap). There are many useful epistemic concepts out there (e.g., evidential relevance, rational degree of confidence, etc.), and various formal explications of them can be given. They are not “in competition” in any interesting sense that I can see.

Greg Gandenberger says

Thanks for commenting, Branden!

I’m inclined to agree with Teddy. I agree that the claim that $E$ is positively relevant to $H_1$ and negatively relevant to $H_2$ is intuitively right. But I don’t see how such a characterization of data as evidence if useful except insofar as it can be used as an input for some kind of (possibly informal) inference or decision procedure.

Branden Fitelson says

Thanks for the reply, Greg. I’m not sure what you mean here. Surely, relations of evidential relevance can be illuminating and explanatory in various ways.

For instance, in the context of the “base rate fallacy” or the “conjunction fallacy”, these notions can be useful for explaining why people tend to make errors in probability judgment. That is, when relevance relations and total probability relations radically diverge, this (it seems to me) can often lead people to make mistakes in probabilistic reasoning. See, for instance, the following paper which is a nice survey of this sort of approach to the conjunction fallacy.

http://www.ncbi.nlm.nih.gov/pubmed/22823498

Similarly, the concept of evidential relevance (and the more general concept of expected confirmational power) can be illuminating in explaining what might be going on in the Wason selection task. See, for instance:

http://fitelson.org/wason.pdf

Surely, these are useful, psychological-explanatory applications of the concept, no? And, of course, other historical examples of useful applications are well-known (e.g., Good and Turing’s use of the “weight of evidence” in the code-breaking activities at Bletchley Park). Not to mention the various philosophical explanations in the literature on Bayesian confirmation theory, etc. So I’m not sure I understand exactly what you’re skeptical about here.

Greg Gandenberger says

I’m willing to believe that likelihoodist explications of evidential notions can be useful for describing people’s behavior in certain psychological experiments and that this fact has important implications for how those experiments should be interpreted.

My focus is on normative uses. Are there justified norms for belief or action that take only likelihood functions as inputs? If not, then of what use are likelihoodist explications for the methodology of science (other than as preliminary steps in or approximations to methods that use other information)?

I take it that Good and Turing were using (log-)likelihood ratios as a kind of shorthand for Bayesian updating with a flat prior. Good essentially says as much on p. 252 here.

Branden Fitelson says

Here’s a PDF of the conjunction fallacy paper I mentioned above:

http://www.vincenzocrupi.com/website/wp-content/uploads/2013/02/TentoriCrupiRusso2013.pdf

Branden Fitelson says

In the Wason paper, we discuss various normative evaluations of agents who exhibit the typical Wason responses. These (negative) normative evaluations seem (to us) to be illuminating, and they make essential use of the notion of expected confirmational power. More generally, the notion of expected confirmational power can be useful in guiding our evidence-gathering strategies (or, our design of experiments), since it can explain why certain experiments will be more probative than others, vis-a-vis testing a given hypothesis (relative to a given context of background information and knowledge). So, yes, I think there are plenty of useful normative (at least, evaluative) applications of the concept of evidential relevance (or confirmational power). To require that there be “justified norms for belief or action that take only likelihood functions as inputs” seems to beg the question here — i.e., to set a bar for “usefulness” that perhaps cannot be met. But, I would say, this just means that this meta-epistemological requirement (viz., putative desideratum) is too demanding for (i.e., not appropriate to) the explicandum in question.

Branden Fitelson says

Moreover, in the conjunction fallacy literature (as in the Wason task literature), normative (evaluative) considerations are quite important. So, it’s unfair to characterize that genre as “purely descriptive”. This crucial descriptive/evaluative interface is explicitly discussed in the papers I cited.

Greg Gandenberger says

This is a useful discussion. Thanks! I’ll read the relevant papers and get back to you.

Greg Gandenberger says

I agree that the Law of Likelihood is useful for two kinds of endeavors that have a normative dimension:

Thus, my appeal to the normative/descriptive distinction was not helpful.

What I want to say is that the Law of Likelihood does not provide a useful alternative to Bayesian and frequentist methods for evaluating hypotheses. There are two parts to this argument:

This post is part of my case for 2.

Does that help?

Branden Fitelson says

Thanks, Greg. I would really like the opportunity to do a hangout/skype sometime to chat about all these things. Your project is super-interesting, and it would be great to discuss it with you!

Greg Gandenberger says

That would be great. I’ll send you an email about finding a good time.

Michael Lew says

Greg

You suggested “Perhaps the Law of Likelihood should be restricted to what Hacking calls “hypothes[es] about the distribution of outcomes from trials of kind K on some set-up X” (1965, 27)” and asked if that was what I have in mind. Yes, it turns out that such a restriction does match what I am thinking.

That restriction follows directly from Fisher’s 1922 definition of likelihood: “The likelihood that any parameter (or set of parameters) should have any assigned value (or set of values) is proportional to the probability that if this were so, the totality of the observations should be that observed.” (As long as “set of parameters” is read as meaning a vector of parameters rather than a range of parameter values, which is how I understand his meaning.)

The restriction also solves your problem here because changing the scaling of the parameter scale to the reciprocal is a change in the “set up”.

Greg Gandenberger says

I agree that you can avoid the problem given in this post by restricting the Law of Likelihood to statistical hypotheses in Hacking’s sense. I also agree that Fisher also seems to have had statistical hypotheses in that sense in mind.

I don’t agree that you can avoid the problem presented in the next post in the same way. It doesn’t matter whether using $r$ and using $r’$ yield the same or different setups. All I need is that e.g. $1\leq r’ < 2$ and $1/2 < r \leq 1$ are each statistical hypotheses with respect to $d=1$.