A draft of my paper “Why I Am Not a Methodological Likelihoodist” is here.

Abstract.Methodological likelihoodism says that it is possible to provide an adequate self-contained methodology for science on the basis of likelihood functions alone. I am not a methodological likelihoodist because I contend that an adequate self-contained methodology for science must provide good norms of commitment vis-á-vis hypotheses and that such norms cannot be given in terms of likelihood functions alone. Likelihoodists typically grant that likelihood functions are not sufficient for good norms of commitment, but they provide no sufficient account of what we are to do with likelihood functions if not to use them to inform our commitments. They could instead attempt to formulate norms of commitment on the basis of likelihood functions alone, but I argue that no norm of that kind is acceptable. Alternatively, likelihoodists could maintain that likelihoodism is the appropriate methodology for science despite not being self-contained because it goes as far toward guiding commitments as norms of scientific objectivity allow. Likelihoodists who take this line owe us an account of how the likelihood function of a reported datum is to be used that vindicates the claim that reporting likelihood functions is a useful alternative or addition to reporting the outcomes of Bayesian and frequentist procedures. No such account is currently available. The common idea that scientists should report likelihood functions for the recipients of their reports to plug into Bayes’s theorem is problematic in principle as well as in practice.

To **share your thoughts about this post**, comment below or send me an greg@gandenberger.org. To use $\LaTeX$ in comments, surround mathematical expressions with single dollar signs for inline mode or double dollar signs for display mode.

mayod says

You say you deny Cox’s claim that Armitage’s optional stopping example is a counterexample to the LP, but what he argues is simply that it contradicts the weak repeated sampling principle, i.e., the requirement that a method should not be wrong with maximally high probabilities. Royale, with whom I discussed this with at length, has no answer to this, but it is one of the big reasons likelihoods alone fail as an account of evidence. (As Armitage shows, the same thing happens for Bayesians.) The second reason is that the likelihood measures are radically incomparable. Finally, likelihoods alone do not allow self-correcting model testing as the error statistical account does. You don’t mention this. The error statistical approach is not vindicated by its long run properties but by its being capable of evaluating how well tested statistical hypotheses are, while self-correcting the underlying model.

“Why you cannot be just a little bit Bayesian” http://www.phil.vt.edu/dmayo/personal_website/EGEKChap10.pdf

It’s also discussed quite a big on my blog: errorstatistics.com

Greg Gandenberger says

I discuss the Armitage example more fully in Section 4 of this paper. It serves a very specific purpose in the paper given here.

Cox says that the Armitage example “is enough to refute the strong likelihood principle” in this book, p. 66.

What’s the reference?

Michael Lew says

I’ve played around a little with Armitage’s optional stopping example (as described in Greg’s footnote 10) and it seems to me that it is vastly over-blown in standard reports. The fact that the routine will stop with finite n is not very interesting as soon as the finite n becomes large. How large is large? Well, if you set k to 2 (a very low level strength of evidence) then it seems that you would need about 10^10 observations from a normal population to be nearly certain of stopping. With k set to 3 you would need about 10^100.

Those are numbers large enough to be infinity for practical purposes. Saying that the “sampling stops after finite n” is technically true but misleading in that the finite n value is impossibly large for the real world. Thus while it is possible to devise sampling schemes that maximise the probability of the evidence being misleading, there seems to be a _practical_ upper bound on either how strongly misleading the evidence can be or the probability that any particular result will be misleading.

mayod says

You’re missing the point. The problem is that no difference shows up for any number whereas, intuitively (at least for an error statistician) it ought to. The exact same problem occurs without sequential trials but with data-dependent alternatives and numerous other gambits that fraudbusters are keen to reign in. I discuss this at length in many places. For convenience, search my blog, if interested.

Greg Gandenberger says

Good point. I’m not surprised that the expected value of $N$ is huge for moderate $k$, but I hadn’t done the calculations. That point might warrant a footnote in my counterexamples paper (with credit to you, of course). It would warrant more than that if I were writing primarily for statisticians, but philosophers (for better or for worse) are typically not satisfied with in-practice solutions to in-principle problems.

Michael Lew says

Mayo: On the contrary: you’re missing my point. (Or maybe you’re avoiding it.) The allegation against likelihoodism that is in issue here must be the inflated risk of a false inference when unconventional sampling schemes are used. If there is not increase in errors then an error statistician would not be so bothered. The inflation is conventionally presented as being extreme (probability 1 of a false positive result with a probability 1 finite n), but simple simulations suggest that the probability of a false positive result is far less than 1 for practical purposes because the ‘finite n’ might as well be infinite.

It is also notable that the rate of false negative results falls towards zero for any reasonable true effect size much much faster than the false positive rate climbs towards 1. Thus if I were inclined to interpret the repeated sampling principle in terms of total errors (I am) then compliance with that principle does not require monomaniacal attention to false positive errors. Couple that with the fact, usually not mentioned, that the effect sizes of the experiments that end with a false positive quite quickly decline towards zero as the sample size increases. A simple report of the observed effect would mean that a false positive ‘significant’ outcome would be seen to be unimportant.

For those reasons I conclude that the real-world consequences of even undeclared ‘uncorrected’ sequential sampling schemes are quite minimal.

mayod says

Well your first paragraph makes sense. I get it. So long as the impact on error probabilities isn’t too bad in the case at hand, we needn’t worry. So all we have to do is check in each case where a selection effect, stopping rule, violated assumption, etc. could adversely effect error probabilities to see if it’s really so bad.The worry that likelihoods alone fail to pick up on features that adversely influence error probabilities is well taken care of because we separately check if error probabilities are altered (assuming, of course, we can tell how bad the impact is, which itself depends on satisfying probabilistic assumptions). We supplement our likelihood account with a rather different one–one that is at odds with it–and all is well.