Neyman and Pearson (e.g. 1933) treat the problem of choosing the best rejection region for a simple-vs.-simple hypothesis test as what computer scientists call a 0/1 knapsack problem. Standard examples of 0/1 knapsack problems are easier to grasp than hypothesis testing problems, so thinking about Neyman-Pearson test construction on analogy with those examples is helpful for developing intuitions. It is also illuminating to think about points of disanalogy between those scenarios and hypothesis testing scenarios, which give rise to possible objections to the Neyman-Pearson approach.
The Standard Bayesian-Frequentist Debate About Stopping Rules
Bayesians generally reject the frequentist view that inference and decision procedures should be sensitive to differences among “stopping rules”—that is, the (possibly implicit) processes by which experimenters decide when to stop collecting the data that will be fed into those procedures—outside of unusual cases in which the stopping rule is “informative” in a technical sense.
Frequentists often argue for their position by claiming that ignoring differences among noninformative stopping rules would allow experimenters to produce systematically misleading results. For instance, Mayo and Kruse (2001) consider the case of a subject who claims to be able to predict draws from a deck of ESP cards. On a frequentist approach, if a 5% significance level is used and the data are treated as if the sample size had been fixed in advance, then the probability of rejecting the “null hypothesis” that the subject has no extrasensory abilities within the first 1000 observations if that hypothesis is true is 53%, and the probability of rejecting it within some finite number of observations is one. Accordingly, Mayo and Kruse claim that whether the experimenter had planned to stop after 1000 trials all along or had planned to stop as soon as a statistically significant outcome had occurred must be reported and must be taken into account in inference and decision. [Read more…]
Where We’ve Been
I have argued for the Likelihood Principle, which says that the evidential meaning of a datum with respect to a partition depends on the probabilities that the elements of that partition ascribe to that hypothesis, up to a constant of proportionality. (Here)
From the Likelihood Principle, I have argued for the Law of Likelihood, which says that the degree to which a datum favors one element of a partition over another is given by the ratio of the probabilities that those hypotheses ascribe to that datum. (Here and here)
I have argued against methodological likelihoodism, which says that characterizing data as evidence in accordance with the Law of Likelihood is an adequate self-contained methodology for science (at least as a fallback option in cases in which prior probabilities are “not available”). (Here)
Where We’re Going: The Methodological Likelihood Principle
The next claim I want to consider is the Methodological Likelihood Principle, which says that an adequate methodology for science respects evidential equivalence as characterized by the Likelihood Principle. [Read more…]
The Problem for Methodological Likelihoodism
I have argued that methodological likelihoodism is false by arguing that an adequate self-contained methodology for science provides good norms of commitment vis-à-vis hypotheses, articulating minimal requirements for a norm of this kind, and proving that no purely likelihood-based norm satisfies those requirements.
The Solution? Appeal to Long-Run Operating Characteristics
One might attempt to rescue methodological likelihoodism by lowering one’s standards. Perhaps no purely likelihood-based norms of commitment are among the canons of rationality, but such norms are nevertheless useful in practice when deployed judiciously. [Read more…]
Likelihoodists admit that their methods are not useful for guiding belief and action directly. One could maintain that characterizing data as evidence is valuable in itself, apart from any possible use in guiding belief or action, but this view is remarkably indifferent to practical considerations. I do not know how else to argue against it, but I do know how to respond to various likelihoodist attempts to make it seem plausible (as I have done in posts here, here, and here).
Royall (2000) seems to provide some reason to believe that likelihoodist methods can in fact reasonably be used to guide belief and action when prior probabilities are not available. He claims that “a paradigm [for statistics] based on [the Law of Likelihood] can generate a frequentist methodology that avoids the logical inconsistencies pervading current methods while maintaining the essential properties that have made those methods into important scientific tools” (31). In other words, likelihoodist methods are warranted by their long-run operating characteristics in the same way that frequentists take their methods to be, without being subject to the many objections that frequentist methods face (such as that they violate the Likelihood Principle).
In fact, likelihoodist methods are not warranted by their operating characteristics in the same way that frequentist methods purport to be. [Read more…]
Evidence-Tracking: If two pieces of evidence are evidentially equivalent with respect to a partition of the space of hypotheses under consideration, then a method should give the same outputs over that partition given either of those pieces of evidence, all else being equal.Objective Reliability: A method should have good operating characteristics regardless of which of the hypotheses under consideration is true.
In her 1996 Error and the Growth of Experimental Knowledge, Professor Mayo argued against the Likelihood Principle on the grounds that it does not allow one to control long-run error rates in the way that frequentist methods do. This argument seems to me the kind of response a frequentist should give to Birnbaum’s proof. It does not require arguing that Birnbaum’s proof is unsound: a frequentist can accommodateBirnbaum’s conclusion (two experimental outcomes are evidentially equivalent if they have the same likelihood function) by claiming that respecting evidential equivalence is less important than achieving certain goals for which frequentist methods are well suited.
More recently, Mayo has shown that Birnbaum’s premises cannot be reformulated as claims about what sampling distribution should be used for inference while retaining the soundness of his proof. It does not follow that Birnbaum’s proof is unsound because Birnbaum’s original premises are not claims about what sampling distribution should be used for inference but instead as sufficient conditions for experimental outcomes to be evidentially equivalent. [Read more…]
My prospectus may be taking shape. Last spring I was planning to use proofs of the Likelihood Principle to argue against the use of frequentist methods. Over the summer, I discovered a problem in my argument. [Read more…]
I’m interested in candidates to be considered necessary and/or sufficient conditions for a particular instance of a “good” rule or method to “inherit” the “goodness” of that rule or method. I am particularly interested in this issue as it arises in the philosophy of scientific method, where many frequentists take specific conclusions to be licensed by the fact that they come from methods that have good long-run operating characteristics. However, I suspect that there are some discussions of the more general question in epistemology (where it’s relevant to debates between reliabilists and evidentialists) and in ethics (where it’s relevant to debates between act consequentialists and anyone who thinks that individual acts are justified by conforming to or “flowing from” good rules, maxims, character traits, social practices, etc.) I am familiar with some of the most important statements of reliabilist views and with Michael Thompson’s discussion of “transparency” (which is roughly the same as what I call “inheritance”) in his book. I’d love to know about other potentially relevant readings.
I’m inclined to regard “inheritance” as a default assumption that is subject to defeating conditions, e.g. that the instance in question belongs to a narrowest prospectively identifiable class of cases over which the characteristics in virtue of which the method is considered good are known not to hold (or something like that—I’d be grateful for help in improving this formulation). Is the absence of any such defeating conditions sufficient for inheritance?
Want to keep up with new posts without having to check for them manually? Use the sidebar on the left to sign up for updates via email or RSS feed!
Proofs of the Likelihood Principle have convinced me that frequentist methods fail to respect evidential equivalence—or, better, that they fail to respect some strong intuitions that I and many others have about evidential equivalence. On the other hand, it’s not clear to me that the fact that frequentist methods fail to respect evidential equivalence is a strong argument against their use. There is an important strand of frequentist thinking according to which frequentist methods should not be interpreted epistemically and are justified solely by their long-run operating characteristics. I am sympathetic to this perspective because it seems to me that what ultimately matters is not whether our methods gratify our intuitions, but rather how well they help us achieve our epistemic and practical goals. On the other hand, the fact that a method has good frequentist properties is not sufficient to ensure that it works well in a more general sense. [Read more…]