This post is part of a series (introduced here) in which I present objections to possible responses to the claim that likelihoodism is not a viable alternative to Bayesian and frequentist methodologies because it does not address questions about what to believe or do. I am currently considering the response that likelihoodism is a viable alternative methodology because characterizing one’s data as evidence is valuable in itself. I addressed two arguments for this claim here and here.

A likelihood ratio is a ratio of posterior odds to prior odds under Bayesian updating. Likelihoodists agree with frequentists that prior probabilities that merely represent some individual’s degrees of belief are not appropriate for use in science. Unlike frequentists, they maintain that a likelihood ratio “means the same” as a measure of evidential favoring regardless of whether prior probabilities they regard as legitimate are available or not. But what is a measure of evidential favoring in the likelihoodist’s sense, if not a measure of the shift in one’s degrees of belief that the data warrant?

It seems that the likelihoodist claim that characterizing as evidence is valuable in itself requires an answer to this question. However, no likelihoodist to my knowledge has attempted to provide one. Edwards (1972) and Royall (1997) each address it by arguing in effect that no answer is needed. I responded to Royall’s argument in my previous post. Unfortunately, I do not have access to my copy of Edwards’s book at the moment, so instead of responding to his argument I will consider an objection to the claim that a likelihood ratio “means the same” across different contexts that is due to Ian Hacking.

Hacking coined the phrase “Law of Likelihood” in his (1965), but he used it to refer only to the qualitative claim that *E* favors *H*_{1} over *H*_{2} if Pr(*E*;*H*_{1})> Pr(*E*;*H*_{2}). He did not include the quantitative claim that the likelihood ratio Pr(*E*;*H*_{1})/Pr(*E*;*H*_{2}) is a measure of the degree to which *E* favors *H*_{1} over *H*_{2}. Edwards (1972) seems to have been the first to combine the qualitative and quantitative claims into a single principle, which he called the Likelihood Axiom (31). Today it is standard to follow Royall in using Hacking’s phrase “Law of Likelihood” for the conjunction of the qualitative and quantitative claims.

In a review of (Edwards 1972), Hacking expresses doubts about both the quantitative and the qualitative parts of the Law of Likelihood and argues specifically against the assumption of the quantitative part that a likelihood ratio “means the same” in different contexts (1972, 136). He starts this argument by saying that he “know[s] of no compelling argument” for this assumption. In this respect, a likelihood ratio is (at least *prima facie*)[1] different from a physical probability: if two independent, repeatable event types have the same physical probability, then they tend to occur equally often, roughly speaking. Of course, likelihood ratios are commensurable for a Bayesian, but what a likelihoodist needs is a kind of commensurability that does not depend on the availability of prior probabilities.

After saying that he sees no justification for assuming that likelihood ratios are commensurable across experiments, Hacking attempts to use a pair of examples to argue that in fact they are not commensurable. One of those examples he calls the “tank problem.” Suppose we capture an enemy tank at random and note its serial number. The serial numbers start at 0001. The tank we captured has serial number 2176. What would be the most reasonable estimate of the number of tanks the enemy made? According to the Law of Likelihood, the observation of serial number 2176 favors over all other possibilities the hypothesis that the total number of tanks is 2176.

Hacking’s point is not merely that this result is counterintuitive. A likelihoodist can square it with intuition by taking into account the disutility of underestimating the enemy’s forces and the fact that 2176, though the most favored single estimate, is almost surely an underestimate. But Hacking asks us to compare this situation to one in which we are measuring, say, the widths of a grating, using a technique that has a normal distribution with known variance. We can find two hypotheses that have the same likelihood ratio as the hypotheses, say 2176 and 3000 from the tank problem. Hacking reports that he has “no inclination” to say that the relative support is the same in the two cases, even though the likelihood ratios are the same.

I see two ways to read this argument. On neither reading does it add anything more than add perhaps some vividness and drama to the basic challenge of the question I posed near the beginning of this post: what is a measure of evidential favoring in the likelihoodist’s sense, if not a measure of the shift in one’s degrees of belief that the data warrant?

On one reading of Hacking’s argument, it is supposed to be intuitively clear that the observation of a tank marked 2176 does not favor the hypothesis that there are 2176 total tanks over the hypothesis that there are 3000 total tanks to the same degree that a particular measurement supports one hypothesis about grating widths over another where the same likelihood ratio arises. I do not have this intuition.

I find that filling in some of the details of Hacking’s grating example helps dispel the intuition he wants to evoke. The likelihood ratio of the hypothesis that there are 2176 tanks to the hypothesis that there are 3000 total tanks is about 1.4, which any conventional likelihoodist would classify as extremely weak evidence. It would be quite a coincidence if the tank we captured was the last one manufactured, but it is no more improbable that we would capture the 2176^{th} tank out of 2176 than that we would capture the 2176^{th} out of 3000. In fact, the former is somewhat more probable, as the likelihood ratio faithfully reports. We must not allow the fact that the former outcome is more “striking” to lead us to think that it is less probable.

In the grating example, suppose that the known variance were 1.0 cm and the observed measurement were 100 cm. Then the Law of Likelihood says that the tank observation favors the hypothesis that there are 2176 tanks to the hypothesis that there are 3000 total tanks to the same degree that the width measurement favors the hypothesis that the width is 100 cm over the hypothesis that it is 99.2 cm. Given that the degree of evidential favoring is extremely weak in the first case, and 100 cm and 99.2 cm are separated by less than one standard deviation, I do not find this claim hard to accept.

Perhaps Hacking’s point is merely that it would typically be unreasonable to use 2176 as an estimate of the number of enemy tanks, whereas it would not be unreasonable to use 100 cm as an estimate of the grating width. The same likelihood ratio does not “mean the same” in terms of its implications for our beliefs and actions. But if that is Hacking’s point, then he is subject to Royall’s response (discussed in my previous post) that doubts about the commensurability of likelihood ratios across contexts “come from failure to distinguish between the strength of the evidence, which is constant, and its implications, which vary according to the context of each application” (1997, 12).

It seems to me that the proper rejoinder for Hacking to give would be that likelihoodists have provided no clear non-Bayesian account of what it is that they take to be constant across applications with the same likelihood ratio. We are back to the same question with which we started: what is a measure of evidential favoring in the likelihoodist’s sense, if not a measure of the shift in one’s degrees of belief that the data warrant? Hacking’s examples fail to advance the dialectic. But the burden of proof remains on the likelihoodist.

**Citations**

Edwards, Anthony WF. *Likelihood*. Cambridge UP, 1984.

Hacking, Ian. *Logic of Statistical Inference*. Cambridge University Press, 1976.

–. “Likelihood.” *The British Journal for the Philosophy of Science *23.2 (1972): 132-137.

Hájek, Alan. ““Mises redux”—Redux: Fifteen arguments against finite frequentism.” *Erkenntnis* 45.2-3 (1996): 209-227.

Hájek, Alan. “Fifteen arguments against hypothetical frequentism.” *Erkenntnis*70.2 (2009): 211-235.

Royall, Richard M. *Statistical Evidence: A Likelihood Paradigm*. Chapman & Hall/CRC, 1997.

[1] This claim about physical probabilities faces its share of challenges, of course (see e.g. Hájek 1996 and 2009), but let it stand; the point of invoking physical probabilities here is simply to evoke some intuitions about the sort of property that Hacking takes likelihood ratios to lack.

Want to keep up with new posts without having to check for them manually? Use the sidebar on the left to sign up for updates via email or RSS feed!

Michael Lew says

A proper response to Hacking’s tank example might be to say that if you want the best available estimator of the number of tanks it is provided by the maximum of the likelihood function, 2176, but if you want the best number to advise the generals about enemy tank numbers you may wish to use an additional criterion. That is not a weakness of the maximum likelihood estimator.

To put it another way, if you wanted to bet on a single number of tanks with a payoff for only exactly correct guesses, then 2176 is the best possible bet. That is what the likelihood function provides in this case. If the payoff relates to non-underestimation, or to close enough then 2176 may not be the best bet, but the law of likelihood does not address loss functions…

Greg Gandenberger says

I basically agree. I wouldn’t say that the likelihood ratio is “best” without qualification, but it is best in the sense that it is evidentially favored over all other estimates according to the Law of Likelihood. As you point out, it does not follow that it is the best number to report because it fails to take utilities into account.

Michael Lew says

You are right, but also wrong. Right: utilities can affect what is the “best” estimate. Wrong when you say “evidentially favored over all other estimates according to the Law of Likelihood” because that makes it sound like there is an alternative whereby it is not favoured. There is not. The most likely single value for the total number of tanks is 2176. ANY other estimate can only be “better” on the basis of utilities rather than on the basis of evidence.

To imply that inference based on likelihoods is in some way faulty because it doesn’t take utilities into account is nonsensical because taking utilities into account can happen AFTER the evidence is quantitated by the likelihood function.

(There is a great deal of resistance to likelihood-based approaches, particularly among frequentists, that is not based on a fair evaluation of the real properties of likelihoods. Hacking’s discussion of the tank problem is an example where likelihood is denigrated for lacking a built-in utility function (although I can’t call Hacking a frequentist and I have no idea why he changed his mind on likelihood). You should look very critically at the alleged counter-examples to the likelihood principle before settling your thesis topic.)

Greg Gandenberger says

I don’t understand how saying that 2176 is “evidentially favored over all other estimates according to the Law of Likelihood” “makes it sound like there is an alternative whereby it is not favoured.” Could you clarify?

In general, I don’t see any conflict between what I wrote and what you wrote. I am not endorsing Hacking’s argument.

Michael Lew says

(No, we are mostly in agreement. I’m trying to redirect your attention to what I think are the details that matter. Feel free to ignore that redirection!)

I guess I don’t like the implication from “according to the Law of Likelihood” bit of your first response. If the likelihood principle is true (and I am unconvinced by alleged disproofs) then the evidence points to the maximally likely estimate. Other estimators differ on the basis of other consideration, not on the basis of what the evidence is. If you like, they process the evidence differently, but they cannot yield different evidence.