In their “What is the Likelihood Function?” (1987), Bayarri et al. argue that there can be no clear-cut definition of the likelihood function for a given statistical problem and that the impossibility of such a definition “severely restricts the applicability of the likelihood principle” (13). I think there’s no real problem here, as I explain below.
Bayarri et al. are Bayesians. For them, one should in principle solve a statistical inference problem by specifying a joint prior distribution over all relevant quantities, conditioning on what is observed, and integrating out anything that is not of interest. The result is a posterior distribution for what is of interest given what has been observed.
Likelihoodists cannot use this approach because they reject assignments of prior probabilities to hypotheses that cannot be regarded as data-driven estimates of long-run frequencies. They instead look at likelihood functions of the form $f(x|\theta)$, where $x$ is a datum and $\theta$ represents one or more parameters. The problem Bayarri et al. point out is that it is sometimes unclear what should be regarded as $x$ and what should be regarded as $\theta$. There is no unique way to decide what quantities to include in the likelihood function at all or on which side of the bar in the expression $f(x|\theta)$ to put those that are included. Different decisions on these matters result in different likelihood functions and thus can lead to different results within a pure likelihoodist approach.
In their (1988) and (1992), Bayarri and DeGroot make a case for including only what is observed to the left of the bar and all unobserved quantities that cannot be ignored or integrated out to the right of the bar. They write that “the evidence in the data is conveyed most efficiently and effectively” by this likelihood function. After all, “the basic purpose of a likelihood function is to serve as a function that relates observed and unobserved quantities, and conveys all the relevant information provided by the observed data about the unobserved quantities” (1988, 160.3; 1992, 10).
This solution requires distinguishing between “the evidence provided by” the observation about the quantities of interest and “the information that is needed to make inferences” about those quantities. I make the same distinction for different reasons–see my “Why I Am Not a Likelihoodist.” I am not even committed to the claim that inferences should depend on the data only through the likelihood function, which follows from the Likelihood Principle as I take it to have been convincingly proven only on evidentialist assumptions which are plausible but not to my mind obviously compelling. Bayarri and DeGroot do not explain what information they think is needed in addition to the likelihood function they specify for inference about the quantities of interest. The only example they give is that when one of the quantities of interest is a random variable whose value is to be predicted, one needs for inference the distribution of that variable as a function of the parameters of interest as well as the probability of the observation as a function of those parameters. This claim seems rather obvious and is in no way at odds with the Likelihood Principle as I understand it.
I’m having a hard time seeing how the Likelihood Principle requires thatbe such a thing as the likelihood function for a set of quantities. A likelihood function is an encoding of the shifts in degrees of belief that Bayesian updating on some set of observations (to the left of the bar) requires with respect to some hypothesis space (the set of possible values of the quantities to the right of the bar). Different likelihood functions are relevant for different questions. Including different observed quantities to the left of the bar allows one to consider the significance of different bodies of data. Including different quantities to the right of the bar allows one to consider the significance of those bodies of data for different hypothesis spaces. There is no ambiguity in the claim that the evidential meaning of a datum with respect to a specified hypothesis space depends only on the probability of that datum as a function of the elements of that space. Why shouldn’t there be ambiguity in decisions about which question to ask and which data to bring to bear on that question?
Bayarri, M. J. and M. H. DeGroot. “Discussion: Auxiliary Parameters and Simple Likelihood Functions.” 160.3-160.7. In James Berger and Robert Wolpert, The Likelihood Principle. 2nd ed. IMS, 1988.
Bayarri, M. J., and M. H. DeGroot. “Difficulties and ambiguities in the definition of a likelihood function.” Journal of the Italian Statistical Society 1.1 (1992): 1-15.
Bayarri, M. J., M. DeGroot, and J. Kadane. “What is the likelihood function? (with discussion).” Statistical Decision Theory and Related Topics IV. (eds. SS Gupta and JO Berger) Springer, New York 1 (1987).
Berliner, Mark. “Discussion of ‘What is the likelihood function?” Statistical decision theory and related topics. IV 1 (1987): 17-20.
Grossman, Jason. Statistical Inference: From Data to Simple Hypotheses (2011), unpublished manuscript.
Jantzen, Benjamin (2012) “Piecewise Versus Total Support: How to Deal with Background Information in Likelihood Arguments.” Philosophy of Science Assoc. 23rd Biennial Mtg (2012, San Diego, CA).
Want to keep up with new posts without having to check for them manually? Use the sidebar on the left to sign up for updates via email or RSS feed!