The Standard Bayesian-Frequentist Debate About Stopping Rules
Bayesians generally reject the frequentist view that inference and decision procedures should be sensitive to differences among “stopping rules”—that is, the (possibly implicit) processes by which experimenters decide when to stop collecting the data that will be fed into those procedures—outside of unusual cases in which the stopping rule is “informative” in a technical sense.
Frequentists often argue for their position by claiming that ignoring differences among noninformative stopping rules would allow experimenters to produce systematically misleading results. For instance, Mayo and Kruse (2001) consider the case of a subject who claims to be able to predict draws from a deck of ESP cards. On a frequentist approach, if a 5% significance level is used and the data are treated as if the sample size had been fixed in advance, then the probability of rejecting the “null hypothesis” that the subject has no extrasensory abilities within the first 1000 observations if that hypothesis is true is 53%, and the probability of rejecting it within some finite number of observations is one. Accordingly, Mayo and Kruse claim that whether the experimenter had planned to stop after 1000 trials all along or had planned to stop as soon as a statistically significant outcome had occurred must be reported and must be taken into account in inference and decision.
Bayesians have responded to this argument by showing out that their approach does not allow such “reasoning to a foregone conclusion” as long as the prior probability distributions that are used are countably additive (Kadane et al. 1996). In fact, the probability that a given experiment will produce a result that would lead a Bayesian agent to increase his or her odds in a particular hypothesis $H_a$ against a different hypothesis $H_0$ by a factor of $k$ is at most $1/k$ when $H_0$ is true, regardless of the experiment’s stopping rule.
Mayo and Kruse claim that this response misses the point of their objection, which does not require that the probability of being misled can be arbitrarily high, but only that it can be increased if stopping rules are ignored. Even on a fully Bayesian approach, disingenuous experimenters can tilt the odds in favor of their preferred hypothesis through their choice of stopping rule.
Bayesians often respond to this objection by saying that the probability that an experiment will produce a misleading result is an issue of experimental design only and is thus irrelevant to questions about inference or decision in light of the data.
A New Objection to the Standard Bayesian Position?
There seems to be a good response to this Bayesian claim that I have not yet encountered: issues of inference or decision cannot be separated from issues of experimental design when choices regarding the former may influence choices regarding the latter and the interests of those making the two kinds of choices are not aligned. For instance, consider the position of a government regulatory agency such as the FDA. The FDA has reason to adopt more or less explicit and consistent inference or decision rules regarding, for instance, when to approve a drug. If the FDA foresees that a certain policy would lead pharmaceutical companies to choose stopping rules that the FDA regards as undesirable in order to tilt the odds of getting desired decisions in their favor, then that fact is a reason for them not to adopt that policy. In this kind of case, issues of inference or decision and issues of experimental design are conceptually distinct but decision-theoretically entangled and thus cannot be treated separately.
Consider a simplified case in which a scientist can perform either a test with a fixed sample size of $n$ or a test that will continue until either the likelihood ratio of $H_a$ against $H_0$ exceeds some number $l$ or some maximum sample size $m>n$ is reached. A regulator has to decide what likelihood ratio $l_f$ would suffice for rejecting $H_0$ if the fixed-sample experiment is performed and what likelihood ratio $l_t$ would suffice if the target-likelihood-ratio procedure were performed. If there are no concerns about the regulator’s choice influencing the experimental design, then the regulator should set $l_f=l_t$, in accordance with the fact that the difference between the noninformative stopping rules in question does not affect the evidential import of the data according to the Likelihood Principle and does not affect the posterior probabilities under Bayesian conditioning. However, if the scientist can take the regulator’s choices for $l_f$ and $l_t$ into account in designing his experiment and the regulator prefers to reject $H_0$ only if it is false while the scientist prefers for it to be rejected no matter what, then under typical circumstances the regulator maximizes her expected utility by setting $l_t>l_f$ to avoid incentivizing undesirable behavior by the scientist. (A demonstration of this result is available upon request.) Thus, under these circumstances Bayesian principles entail that the regulator should act in accordance with the frequentist idea that differences among noninformative stopping rules are relevant to inference or decision.
This result is not an idle curiosity: government regulators, scientific journal editors, science journalists, scientific societies, evidence-based practitioners, and even the general public can greatly affect decisions about experimental design through their choices of inferential and decision-making practices. It is no objection to Bayesianism per se. Bayesian arguments for the irrelevance of stopping rules to inference and decision all assume that issues of experimental design can be treated separately from issues of inference and decision, and that assumption breaks down in the kinds of cases in question. In fact, the result is actually useful for defending basic Bayesian principles because it shows that those principles recover frequentist intuitions about stopping rules in precisely the kinds of cases in which those intuitions are most plausible. On the other hand, the result indicates that proposals for more widespread use of Bayesian methods in scientific practice would need to be implemented with care to avoid adopting inference and decision rules that would encourage scientists to “game the system” by using undesirable experimental designs.
To share your thoughts about this post, comment below or send me an email.
Comments support $\LaTeX$ mathematical expressions: surround with single dollar signs for in-line math or double dollar signs for display math.
Zener cards image was created by Mikhail Ryazanov. It is used here under the Creative Commons Attribution-Share Alike 3.0 Unported license. Its use here does not imply that its creator endorses any of the positions taken here.