Reporting Likelihood Functions Is No Silver Bullet1
Reports of scientific research results can take many forms, including frequentist test outcomes, Bayesian posterior probability distributions, and likelihoodist characterizations of data as evidence. The idea that scientific research reports should in principle take the form of likelihoodist characterizations of data as evidence is particularly appealing. Unlike frequentist test outcomes, such characterizations conform to the Likelihood Principle. Unlike Bayesian posterior probability distributions, they are typically objective. Moreover, they seem to interface well with Bayesian procedures: their recipients can use them to update their individual degrees of belief by plugging them into the appropriate form of Bayes’s theorem. In the odds form of Bayes’s theorem, for instance, a likelihood ratio is the ratio of the posterior odds to the prior odds for the pair of hypotheses in question: $\Pr(E|H_1)/\Pr(E|H_2)=[\Pr(H_1|E)/\Pr(H_2|E)]/[\Pr(H_1)/\Pr(H_2)]$. The practice of reporting likelihood ratios thus seems to provide the best of both frequentist and Bayesian worlds.
One problem with this case for reporting likelihood ratios is that plugging information about the likelihood function of a reported datum into an appropriate form of Bayes’s theorem typically violates the Principle of Total Evidence in ways that can be highly misleading. The evidential meaning of a reported datum typically differs from the evidential meaning of the information that can be gleaned from a report of that datum. A likelihoodist characterization of the reported datum as evidence captures the former, but the Principle of Total Evidence requires updating on the latter.
Example. Suppose I told you that I had correctly predicted the outcome of tosses of a particular coin sixty times in one hundred trials. Assuming that my correct guesses were independent and identically distributed Bernoulli events, the Law of Likelihood says that the datum I reported to you favors the hypothesis that I am able to guess with 60% accuracy over the hypothesis that I am able to guess with 50% accuracy to the degree 8. But if you know that I counted my number of successes in one hundred trials ten times and reported the largest result, then you are able to glean from my report not just that I guessed correctly sixty times out of one hundred in some trial, but that the maximum number of times I guessed correctly in ten trials was sixty. The likelihood ratio for that datum is not 8, but 1/30. Taking into account the total evidence you get from my report rather than just the evidence provided by the reported datum reverses the direction in which the evidence points.
In this example you should in principle update on the total information you are able to glean from my report, in accordance with the Principle of Total Evidence. Doing so would result in your personal odds for 60% accuracy against 50% accuracy decreasing, as the likelihood ratio 1/30 indicates. It would be a violation of the Principle of Total Evidence to update those odds by plugging the likelihood ratio of 8 for the reported datum into the odds form of Bayes’s theorem, as the hybrid likelihoodist-Bayesian approach under consideration would recommend. Doing so would result in your personal odds for 60% accuracy against 50% accuracy increasing rather than decreasing.
A likelihoodist-Bayesian approach may be viable in cases that do not involve the kind of cherry-picking that is present in this example. Unfortunately, however, cherry-picking is ubiquitous. Even if scientists report all of their data, journals generally prefer to publish striking results, and the news media reports on only the most striking of the results that make it through the publication filter. Just as in the toy example given above, so too in cases involving real scientific research, we will be misled if we do not take into account the filters through which data passes before it reaches us in evaluating its significance.
So What Should Scientists Report?
I have shown that one argument for reporting likelihood function fails because plugging a likelihood ratio for a reported datum into the appropriate form of Bayes’s theorem is an incorrect updating procedure. It does not of course follow that reporting likelihood functions is a bad practice. It is difficult to mount an argument either for or against that practice given that we lack an adequate account of how reported likelihood functions are to be used.
Cherry-picking is also a problem for the practice of reporting posterior probability distributions and the practice of reporting frequentist test procedures. Much has been written about the difficulty of publishing papers that report results that are not statistically significant and the problems that this “statistical significance filter” creates (see e.g. Gelman and Weakliem 2009). The performance characteristics that are used to justify frequentist methods concern the scientist’s procedure for generating and drawing conclusions from data. They are not directly relevant to the consumer of a scientific research report, whose procedure for encountering and drawing conclusions from the data is quite different and may have entirely different error characteristics.
Of course, reporting posterior probability distributions in the presence of a “posterior probability filter” would be problematic as well. And even without filtering it is difficult for the recipient of a scientific research report to know what to do with a reported posterior probability distribution. That distribution reflects not only the evidential import of the data, but also a prior probability distribution that was chosen either according to some formal rule or to represent someone else’s opinion prior to receiving the data. A likelihoodist characterization of data as evidence is appealing precisely because it separates the objective, non-arbitrary part of Bayesian updating from the part that is either subjective or somewhat arbitrary.
The ideal solution to this problem would involve eliminating or at least greatly reducing cherry-picking in the reporting of scientific research results. Reporting likelihood functions for the purposes of Bayesian updating would indeed be a highly appealing approach under these circumstances (at least in principle–there are serious problems for this approach in practice that I haven’t mentioned). In the meantime, a Bayesian approach does have some advantages. A scientist’s posterior probability distribution (should) reflect a career’s worth of experience rather than just his or her latest datum. The use of a subjective prior probability distribution would thus reduce the importance of publication bias for dramatic results. The use of objective prior probability distributions would generally have a similar “regularizing” effect, helping to “keep things unridiculous” even in the presence of publication bias. I for one on many questions would rather know the honest opinions of experienced scientists than try to formulate my own opinion on the basis of a likelihoodist characterization of the latest data set.
I do not claim to have given a strong argument for reporting posterior probability distributions. There are many difficulties for this practice that I have not even mentioned.2 My primary goal for this post is to invite reader’s comments on a vitally important question that I have only begun to address:
Question: What should scientists report?
To share your thoughts about this post, comment below or send me an email. To use $\LaTeX$ in comments, surround mathematical expressions with single dollar signs for inline mode or double dollar signs for display mode.
- Thanks to Satish Iyengar, Michael Lew, Edouard Machery, John Norton, Teddy Seidenfeld, and Jim Woodward for discussions that contributed to the development of the ideas presented in this post. ↩
- One difficulty that I have not mentioned is that it is not in general in one’s best interests to report one’s honest opinions. For instance, it may be in the best interests of a climate scientist who wishes to influence policy on global warming to either overstate or understate his or her degree of belief that catastrophic global warming will occur without government action. ↩