4. Inferring causal relationships using Hill’s Criteria

To diagnose the cause of a disease in honey bees, scientists typically compare observed symptoms with a list of exposures in colonies that implicate a particular pathogen, toxin or other detrimental aspect of the environment. Confirming the cause of the particular instance of these symptoms is relatively straightforward – the scientist either   tests for the presence of the diagnosed causal agent itself or removes it and checks for amelioration of the symptoms. These approaches are feasible when the symptoms occur at the level of the individual or colony, because effects on growth, short-term survival or reproduction are readily measured (see the BEEBOOK paper on measuring colony strength parameters (Delaplane et al., 2013)). In principle, it is possible to estimate the impact of the disease on the population’s dynamics by using demographic models that quantify the effect on population growth (Varley et al., 1973).

There are some cases, however, that are problematic for two reasons. First, the symptom is itself a population-level attribute; for instance, a general population decline. Second, the normal procedure is reversed because the causal agent is already identified, albeit as a hypothesis. An example is the supposed role of trace dietary pesticides in causing honey bee declines. In this case, scientists are asked whether dietary exposure to the pesticide is capable of causing the observed population decline. Studying impacts at the population level by experiments with replicated comparisons presents a severe logistical challenge because the required manipulations are at the landscape scale. Some alternative tools are available, such as the classic ‘life table’ method of insect population ecology (Varley et al., 1973), but these can be applied only if detailed census data are available that precisely identify causes of death over extended time periods. Where such resorts are stymied, scientists must use the available circumstantial evidence to pass an expert judgement. Hill’s criteria (Hill, 1965) provide a valuable framework that supports a repeatable and quantitative evaluation process.

Sir Austin Bradford Hill, a leading 20th century epidemiologist, identified nine types of information that provide ‘viewpoints’ from which to judge a proposed cause-effect relationship (Hill, 1965). The nine criteria include not only experimental evidence, but also eight kinds of circumstantial evidence that fall into two categories (Table 4).

For each criterion, scientists survey the available evidence and then formally describe the level of conviction with which they subsequently hold the proposed cause-effect hypothesis to be true: slight; reasonable; substantial; clear; and certain (Weiss, 2006). The descriptors are then associated with numerical values to produce a quantitative score of certainty (Cresswell et al., 2012). Specifically, an eleven-point scale for each criterion returns a positive value (maximum five) if the evidence suggests that the agent certainly causes population decline, a negative value (maximum minus five) if the factor certainly does not and a zero if the evidence is equivocal or lacking. For example, if the evidence for a criterion gives a reasonable indication that an agent does not cause the symptom, the score for that criterion would be -2, etc.

One major value of the criteria is that they disaggregate the different kinds of evidence, requiring the scientist to consider each kind carefully, separately and explicitly. Once the scores are given, there is no a priori reason either to give equal weight to the nine criteria or to calculate an average score. It is important, moreover, to consider whether any large scores have arisen principally on the theoretical criteria, because it is conventional in science to favour material evidence (i.e. associational criteria) over conjecture. For example, an evaluation by Hill’s criteria (Cresswell et al., 2012) revealed that the proposition that dietary pesticides cause honey bee declines was a substantially justified conjecture in the context of current knowledge (positive scores on the theoretical criteria), but  was substantially contraindicated by a wide variety of circumstantial evidence (negative scores on the associational criteria). The disparity in the scores on the two categories of criteria explains in part the controversy over this question, because different constituencies make differential use of the two kinds of evidence. Hill (1965) himself refused to weight the criteria because the evaluation of circumstantial evidence cannot be made algorithmic.

The use of Hill’s criteria formalizes the evaluation of cause- consequence associations and applies a quantitative scoring method which makes the conclusions both apparent and repeatable. Since their inception over 40 years ago and subsequent widespread use, no criterion has been abandoned and none added, which means that they provide a stable and well-established infrastructure in which to process scientific evidence.

Table 4. The nine criteria established by Hill (1965), each with a brief description.

Table 4