In addition to the problems discussed in previous posts, the claim that likelihoodist methods for characterizing data as evidence are valuable in themselves faces the problem that the evidential significance of the fact that one learned that Z=z can be very different from the evidential significance of Z=z itself. The Principle of Total Evidence requires taking into account facts about the process by which one learned that Z=z, but doing so is generally difficult at best and requires thinking about stopping rules, multiple testing, and other issues from which a likelihoodist approach is supposed to be gloriously free.
Consider the following example. Suppose we lived in an alternate universe in which psychologists reported likelihood ratios instead of p-values. The ESP research Daryl Bem in the actual universe has a counterpart Daryl Shmem in that universe. Shmem publishes a study in his universe’s Journal of Personality in which he reports results from a study that essentially amounts to asking a subject to guess one hundred times whether a fair coin will land heads or tails when flipped in order to investigate whether her probability of success p is greater than 0.5 or not. Let X be a random variable the value of which is the number of times the subject guesses correctly. Shmem reports that X=60 and that this result yields a likelihood ratio of 8 for the hypothesis that p=0.6 against the probability that p=0.5. By standard likelihoodist conventions, this result counts as somewhat strong evidence for p=0.6 against p=0.5.
I am not concerned with the fact that likelihoodists classify this result as somewhat strong evidence despite the fact that most of us would not take it to make the hypothesis that the subject has ESP remotely belief-worthy. Of course, a likelihoodist is not committed to the claim that one should believe that the subject has ESP in light of this result. They acknowledge that prior probabilities as well as likelihoods are relevant for belief updating. They claim only to characterize data as evidence.
I am concerned instead with the fact that the likelihood ratio of the data the study reports can be quite different from the likelihood ratio of the data the reader of the study receives. Shmem reports [X=60]. The reader receives [Shmem reports X=60]. The likelihood ratio of p=0.6 against p=0.5 for [X=60] may be quite different from the likelihood ratio of p=0.6 against p=0.5 for [Shmem reports X=60].
Suppose Shmem’s procedure was to run the experiment described ten times and report the largest number of successes recorded. If the reader knows this information, then for him or her learning [Shmem reports X=60] amounts to learning Y=60, where Y is a random variable the value of which is the maximum number of times that the subject guesses correctly in ten trials. The likelihood ratio of p=0.6 against p=0.5 for [Y=60] is not 8, but 1/30. By standard likelihoodist conventions, Y=60 is not somewhat strong for p=0.6 against p=0.5; in fact, it favors p=0.5 over p=0.6 and nearly meets the threshold for “strong” evidence.
To make matters worse, suppose that Shmem simply reported X=60 and that one was ignorant about how many other experiments Shmem performed but did not report. Then while the Law of Likelihood gives one the degree to which [X=60] favors any value of p one likes against any other value, it does not tell one anything definite about the evidential significance of [Shmem reports X=60]. In fact, what one learns is actually something even farther removed from [X=60] than [Shmem reports X=60], namely [I learned that Shmem reports that X=60]. Suppose many other researchers performed experiments like Shmem’s but didn’t have their work published, while others had their work published, but you did not hear about it because the mainstream media did not report on it. You need to take those additional layers of “filtering” into account in order to assess the evidential significance of what one actually learned. Doing so properly requires a great deal of knowledge about the process by which one learned about Shmem’s result.
I am not arguing that likelihoodism is incoherent or even false. The claim that X=60 favors p=0.6 against p=0.5 is compatible with the claim that Y=60 favors p=0.5 against p=0.6, and in fact both of those claims seem right to me. I am only arguing that reporting and interpreting likelihood ratios is a tricky business despite the apparent simplicity of the likelihoodist approach relative to the frequentist approach. A likelihoodist does not need to take multiple testing and the like into consideration in order to characterize a given datum as evidence, but does need to do so in order to characterize his or her act of learning that datum as evidence, on pain of violating the principle of total evidence in a way that can have dramatic effects.
It is worth noting that the same points apply to a Bayesian approach, in which one conditions on what one learns. Conditioning on [X=60] may yield very different results from conditioning on [I learned that X=60], and the latter is typically much harder to do properly. This fact is not an objection to Bayesianism in principle, but it is a great difficulty for Bayesianism in practice.
Want to keep up with new posts without having to check for them manually? Use the sidebar on the left to sign up for updates via email or RSS feed!