I have found it useful at times to distinguish between “substantive hypotheses” and “statistical hypotheses.” For instance, in this post I consider a proposal to restrict the Law of Likelihood to statistical hypotheses. **But what is the distinction between substantive and statistical hypotheses?** Can it be made precise? Can it serve the purposes for which I would like to use it?^{1}

### Attempts to Formulate the Distinction

#### Formulation 1

One seemingly promising way to formulate the notion of a simple statistical hypotheses is as follows:

**Formulation 1. **Hypothesis $H$ is a *simple statistical hypothesis* with respect to the datum $E$ if and only if there is a constant $a$ such that $\Pr(E|H,K)=a$ for any body of background knowledge $K$ such that $H\cap K\neq\varnothing$.

Formulation 1 says roughly that a hypothesis is a simple statistical hypothesis with respect to $E$ if and only if it assigns a definite probability to $E$ that does not depend on background knowledge. Good (in Barnard et al. 1949, 141) calls $\Pr(E|H)$ a “tautological probability” (omitting reference to $K$) when this condition is met.

A *complex statistical hypothesis *with respect to $E$ is simply a disjunction of simple statistical hypotheses with respect to $E$. A hypothesis is *statistical* if it is either simple statistical or complex statistical. Otherwise, it is *substantive*.

Formulation 1 seems adequate at first blush. Suppose that $E$ says that $X\in(-2,2)$ while $H$ says that $X$ has a standard normal distribution. Then it seems that $\Pr(E|H,K)$ has the same value (0.95) regardless of $K$, so $H$ qualifies as a simple statistical hypothesis with respect to $E$ on Formulation 1.

But a little more reflection reveals that **Formulation 1 does not work**. Suppose one’s background knowledge says that $X\in(-1,1)$. Then $\Pr(E|H,K)=1$, so $H$ does not in fact qualify as a statistical hypothesis with respect to $E$ on Formulation 1.

#### Formulation 2

My first instinct when I saw this problem was to restrict the kind of information that can be included in $K$. But an easier and seemingly adequate response is to replace Formulation 1 with the following:

**Formulation 2.** Hypothesis $H$ is a *simple statistical hypothesis* with respect to the datum $E$ if and only if there is a constant $a$ such that $\Pr(E|H)=a$ in the *absence* of any (contingent) background knowledge.

Formulation 2 says roughly that a hypothesis is a simple statistical hypothesis with respect to $E$ if and only if it ascribes a definite probability to $E$ “all by itself.”

**Formulation 2 does seem to work.** It classifies the hypothesis that $X$ has a standard normal distribution as a simple statistical hypothesis with respect to the datum $X\in(-2,2)$. In my counterexample to the Law of Likelihood, it classifies the hypotheses $H_1$ and $H_2$ as substantive with respect to the datum $E$ because they give rise to probabilities for $E$ only in the presence of background knowledge about the unconditional distribution of the point $P$.

### Objections and Responses

#### Objection 1

One possible objection to Formulation 2 is that it classifies what are intuitively conjunctions of statistical and a substantive hypotheses as statistical. For instance, it classifies “$H:X\sim \mathcal{N}(0,1)$ and the moon is made of green cheese” as statistical with respect to $E:X\in(-1,1)$.

#### Response to Objection 1

It does not seem highly counterintuitive to me to say that $H$ is a statistical hypothesis with respect to $E$ when $H$ satisfies Formulation 2 with respect to $E$ and also provides additional “substantive” information. More importantly, classifying $H$ as statistical with respect to $E$ in such cases is appropriate for the purposes for which I want to distinguish between substantive and statistical hypotheses in the first place, namely to distinguish between hypotheses to which the Law of Likelihood applies without any headaches (simple statistical) and hypotheses that at a minimum require more careful handling (substantive).

#### Objection 2

In my counterexample to the Law of Likelihood, Formulation 2 counts $H_1$ and $H_2$ as simple statistical hypotheses with respect to datum $E$ *when they are conjoined with* *the information* $I$ that $P$ is uniformly distributed over the surface of the sphere. Thus, it seems that Formulation 2 cannot in fact do the work that I want it to do.

#### Response to Objection 2

$\Pr(E|H_1\mbox{&} I)$ and $\Pr(E|H_2\mbox{&} I)$ have definite values *only relative to a coordinate system*. **Thus, Formulation 2 classifies $H_1 \mbox{&}I$ and $H_2 \mbox{&}I$ as simple statistical hypotheses only relative to a coordinate system.** And that seems OK: they should count as simple statistical hypotheses relative to a coordinate system, but not absolutely.

**Question:** Are there any problems with Formulation 2 that I am missing?

### Update

There is in fact a problem with Formulation 2 that I was missing. $H_1$ and $H_2$ in my counterexample to the Law of Likelihood are each (uncountable) disjunctions of hypotheses according to which the randomly selected point $P$ lies on a particular point on the surface of the sphere. If that point is within 1 mile of the intersection of the equator and the prime meridional circle, then the corresponding hypotheses assigns probability 1 to the datum $E$; if not, then it assigns probability 0 to $E$. **Thus, Formulation 2 classifies $H_1$ and $H_2$ as composite statistical hypotheses rather than as substantive hypotheses.**

Thus, Formulation 2 does not allow the distinction between substantive and statistical hypotheses to do the work I want it to do. It might nevertheless be the right formulation. The claim that $H_1$ and $H_2$ are composite statistical hypotheses is intuitively reasonable and still identifies them as potentially problematic targets for the Law of Likelihood.

To **share your thoughts about this post**, comment below or send me an email. To use $\LaTeX$ in comments, surround mathematical expressions with single dollar signs for inline mode or double dollar signs for display mode.

Thanks to Michael Lew for pressing me to clarify the distinction between substantive and statistical hypotheses.

The image used in this post is in the public domain.

## Leave a Reply