### Reporting Likelihood Functions Is No Silver Bullet^{1}

Reports of scientific research results can take many forms, including frequentist test outcomes, Bayesian posterior probability distributions, and likelihoodist characterizations of data as evidence. The idea that scientific research reports should in principle take the form of likelihoodist characterizations of data as evidence is particularly appealing. Unlike frequentist test outcomes, such characterizations conform to the Likelihood Principle. Unlike Bayesian posterior probability distributions, they are typically objective. Moreover, they seem to interface well with Bayesian procedures: their recipients can use them to update their individual degrees of belief by plugging them into the appropriate form of Bayes’s theorem. In the odds form of Bayes’s theorem, for instance, a likelihood ratio is the ratio of the posterior odds to the prior odds for the pair of hypotheses in question: $\Pr(E|H_1)/\Pr(E|H_2)=[\Pr(H_1|E)/\Pr(H_2|E)]/[\Pr(H_1)/\Pr(H_2)]$. **The practice of reporting likelihood ratios thus seems to provide the best of both frequentist and Bayesian worlds.**

One problem with this case for reporting likelihood ratios is that **plugging information about the likelihood function of a reported datum into an appropriate form of Bayes’s theorem typically violates the Principle of Total Evidence** in ways that can be highly misleading. The evidential meaning of a *reported datum* typically differs from the evidential meaning of *the information that can be gleaned from a report of that datum*. A likelihoodist characterization of the reported datum as evidence captures the former, but the Principle of Total Evidence requires updating on the latter.

**Example.** Suppose I told you that I had correctly predicted the outcome of tosses of a particular coin sixty times in one hundred trials. Assuming that my correct guesses were independent and identically distributed Bernoulli events, the Law of Likelihood says that the datum I reported to you favors the hypothesis that I am able to guess with 60% accuracy over the hypothesis that I am able to guess with 50% accuracy to the degree 8. But if you know that I counted my number of successes in one hundred trials ten times and reported the *largest* result, then you are able to glean from my report not just that I guessed correctly sixty times out of one hundred in some trial, but that the *maximum* number of times I guessed correctly in ten trials was sixty. The likelihood ratio for *that* datum is not 8, but 1/30. Taking into account the total evidence you get from my report rather than just the evidence provided by the reported datum reverses the direction in which the evidence points.

In this example you should in principle update on the total information you are able to glean from my report, in accordance with the Principle of Total Evidence. Doing so would result in your personal odds for 60% accuracy against 50% accuracy decreasing, as the likelihood ratio 1/30 indicates. It would be a violation of the Principle of Total Evidence to update those odds by plugging the likelihood ratio of 8 for the reported datum into the odds form of Bayes’s theorem, as the hybrid likelihoodist-Bayesian approach under consideration would recommend. Doing so would result in your personal odds for 60% accuracy against 50% accuracy increasing rather than decreasing.

A likelihoodist-Bayesian approach may be viable in cases that do not involve the kind of cherry-picking that is present in this example. Unfortunately, however, cherry-picking is ubiquitous. Even if scientists report all of their data, journals generally prefer to publish striking results, and the news media reports on only the most striking of the results that make it through the publication filter. Just as in the toy example given above, so too in cases involving real scientific research, we will be misled if we do not take into account the filters through which data passes before it reaches us in evaluating its significance.

### So What *Should* Scientists Report?

I have shown that **one argument for reporting likelihood function fails** because plugging a likelihood ratio for a reported datum into the appropriate form of Bayes’s theorem is an incorrect updating procedure. **It does not of course follow that reporting likelihood functions is a bad practice.** It is difficult to mount an argument either for or against that practice given that we lack an adequate account of how reported likelihood functions are to be used.

**Cherry-picking is also a problem for the practice of reporting posterior probability distributions and the practice of reporting frequentist test procedures.** Much has been written about the difficulty of publishing papers that report results that are not statistically significant and the problems that this “statistical significance filter” creates (see e.g. Gelman and Weakliem 2009). The performance characteristics that are used to justify frequentist methods concern the *scientist’s* procedure for generating and drawing conclusions from data. They are not directly relevant to the *consumer* of a scientific research report, whose procedure for encountering and drawing conclusions from the data is quite different and may have entirely different error characteristics.

Of course, reporting posterior probability distributions in the presence of a “posterior probability filter” would be problematic as well. And even without filtering it is difficult for the recipient of a scientific research report to know what to do with a reported posterior probability distribution. That distribution reflects not only the evidential import of the data, but also a prior probability distribution that was chosen either according to some formal rule or to represent someone else’s opinion prior to receiving the data. A likelihoodist characterization of data as evidence is appealing precisely because it separates the objective, non-arbitrary part of Bayesian updating from the part that is either subjective or somewhat arbitrary.

The ideal solution to this problem would involve eliminating or at least greatly reducing cherry-picking in the reporting of scientific research results. Reporting likelihood functions for the purposes of Bayesian updating would indeed be a highly appealing approach under these circumstances (at least in principle–there are serious problems for this approach in practice that I haven’t mentioned). In the meantime,** a Bayesian approach does have some advantages.** A scientist’s posterior probability distribution (should) reflect a career’s worth of experience rather than just his or her latest datum. The use of a subjective prior probability distribution would thus reduce the importance of publication bias for dramatic results. The use of objective prior probability distributions would generally have a similar “regularizing” effect, helping to “keep things unridiculous” even in the presence of publication bias. I for one on many questions would rather know the honest opinions of experienced scientists than try to formulate my own opinion on the basis of a likelihoodist characterization of the latest data set.

I do not claim to have given a strong argument for reporting posterior probability distributions. There are many difficulties for this practice that I have not even mentioned.^{2} **My primary goal for this post is to invite reader’s comments on a vitally important question that I have only begun to address:**

**Question:** What should scientists report?

To **share your thoughts about this post**, comment below or send me an email. To use $\LaTeX$ in comments, surround mathematical expressions with single dollar signs for inline mode or double dollar signs for display mode.

- Thanks to Satish Iyengar, Michael Lew, Edouard Machery, John Norton, Teddy Seidenfeld, and Jim Woodward for discussions that contributed to the development of the ideas presented in this post. ↩
- One difficulty that I have not mentioned is that it is not in general in one’s best interests to report one’s honest opinions. For instance, it may be in the best interests of a climate scientist who wishes to influence policy on global warming to either overstate or understate his or her degree of belief that catastrophic global warming will occur without government action. ↩

Michael Lew says

The report of sixty heads from 100 trials is incomplete. To claim or imply that likelihood methods suffer from some special disadvantage in such a case seems to me to be bizarre. After all, there are _no_ statistical approaches that can protect from incomplete reports of experimental or observational data. (And why would you want any approach to even try?)

A “hybrid likelihoodist-Bayesian” approach is just a Bayesian approach. No? A report of a Bayesian analysis of data should include exactly the same likelihood function (not just a single ratio!) as the likelihoodist report. No?

Whatever data analysis is intended or performed, scientists should report all of the evidence and all of experimental methods and protocols that are necessary to support an exact replication (in so far as such a thing is possible). Such a report would necessarily include the fact that the 60 from 100 observation was made as one of a series of experiments.

If scientists were to use likelihood more regularly in their analyses they would be much more likely to report all of the relevant information because they would necessarily be thinking about evidence and the features of their experiments that affect the interpretation of the evidence. They would be making the principled arguments that are needed to convert a report of evidence into a reasoned report of the state of the world. The advantages of likelihoodism over frequentism in this regard are that it deals explicitly and unambiguously with evidence and it is easier to understand. Even if deliberate misreports and fraud are still possible we could reasonably expect fewer reports with mistaken conclusions.

Greg Gandenberger says

Quite right. I don’t claim that likelihood methods suffer a disadvantage in such cases while other approaches do not. My immediate point is simply that plugging the likelihood function for a reported datum into Bayes’s theorem is not in general a correct updating procedure. And that’s true even if each scientist is honest and transparent, given the existing publication and reporting biases. Thus, the idea that scientists should in principle report likelihood functions so that everyone can simply updating their degrees of belief by plugging those likelihood functions into Bayes’s theorem is wrong.

As I said, it does not of course follow that reporting likelihood functions is a bad practice. But the claim that it’s a good practice requires some other argument.

Yes, I would characterize it as a kind of Bayesian approach. But it’s a specific kind of Bayesian approach that emphasizes the reporting of information about the likelihood function in some form, at worst under the mistaken belief that the recipients of that information should update their beliefs by simply plugging that information into Bayes’s theorem in some way.

Agreed. But cherry-picking/censoring/data filtering at the publication and dissemination stages is equally problematic and is ubiquitous and hard to address.

“Necessarily” is surely too strong, but you might be on to something here.

What sorts of arguments are those?

I agree that likelihoodism deals explicitly and unambiguously with evidence and is easier to understand. The claim that this fact is an advantage is intuitive plausible but in need of an argument. One could also say that the advantages of frequentism over likelihoodism are that it deals explicitly and unambiguously with avoiding errors in our search for the truth and that this is what we really want from science.

In a way we would have no reports with mistaken conclusions because the conclusions would only concern evidential relations between data and evidence. But claims that cannot be mistaken also cannot have much real content. We need more than likelihoodism in itself explicitly gives us.

Michael Lew says

I’ve just read the question under the image, so here is an answer. If the CERN scientists did report a P-value then as long as the P-value comes from a significance test and it was not ‘adjusted’ for multiple comparisons or repeated testing then there is a specific likelihood function that the P-value serves as a pointer or index to. (See my now three times rejected paper http://arxiv.org/abs/1311.0081 for details.) Thus one might argue that they _did_ report a likelihood function, only in a cryptic manner.

As far as I can gather the CERN scientists actually reported that the relevant data exceeded the “five sigma” level of significance, with the very stringent level of ‘significance’ being an attempt to deal with what they call the “look elsewhere” effect which I take to be a description of something that is akin to a combination of multiplicity of testing and testing a hypothetical parameter value with the same data as that which suggested it.