2.2. Probability of pathogen detection in a honey bee colony

2.2. Probability of pathogen detection in a honey bee colony

For diagnosis or surveys of pathogen prevalence, the more bees that are sampled, the higher the probability of detecting a pathogen, which is particularly important for low levels of infection. An insufficient sample size could lead to a false negative result (apparent absence of a pathogen when it is actually present but at a low prevalence). Historically, 20-30 bees per colony have been suggested as an adequate sample size (Doull and Cellier, 1961) when the experimental unit is a colony. However, based on binomial probability theory, such small sample sizes will only detect a 5% true prevalence in an infected colony with a probability of 65% (20 bees) or 78% (30 bees). If only high infection prevalence is of interest for detection, then small sample sizes may be acceptable, as long as other sampling issues (such as representativeness, see above) have been adequately handled.

In general, sample size should be based on the objectives of the study and a specified level of precision (Fries et al., 1984; Table 2). If the objective is to detect a prevalence of 5% or more (5% of bees infected) with 95% probability, then a sample of 59 bees per colony is needed. If the objective is to detect prevalence as low as 1% with 99% probability, then 459 bees per colony are required. Below are tabulated sample sizes (number of bees) needed based on such requirements, provided that every infected bee is detected with 100% efficiency. If detection efficiency is less than 100%, this is the equivalent (for sample size determination) of trying to detect a lower prevalence. For example, if only 80% of bees actually carrying a pathogen are detected as positive, using the diagnostic test, then the parameter P below needs to be adjusted (by multiplying P by the proportion of true positives that are detected, e.g. use 0.8*P instead of P if the test flags 80% of true positives as positive). Sample size needed for various probability requirements and infection levels can be calculated from Equation I (Equations adapted from Colton, 1974).

Equation I.: N = ln(1-D) / ln(1-P)

where

N = sample size (number of bees)

ln = the natural logarithm

D = the probability (power) of detection in the colony (i.e. 1-α = 1-β = power)

P = minimal proportion of infected bees (infection prevalence), which can be detected

with the required power D by a random sample of N bees (e.g. detect an infection rate of 5% or more).

Because the prevalence of many pathogens varies over space and time (Bailey et al., 1981; Bailey and Ball, 1991; Higes et al., 2008; Runckel et al., 2011), it is important, prior to sampling, to specify the minimum prevalence (P) that needs to be detected and the power (D). Colony-to-colony (and apiary-to-apiary) heterogeneity exists and needs to be taken into consideration in sampling designs. For example, a large French virus survey in 2002 (Tentcheva et al., 2004; Gauthier et al., 2007) showed that for nearly all virus infections there were considerable differences among colonies in an apiary. This suggests that pooling colonies is a poor strategy for understanding the distribution of disease in an apiary, and that sample size should be sufficient to detect low pathogen prevalence, because the probability of finding no infected bees in a small sample is high if the pathogen prevalence is low, as it may be in some colonies. For a colony with low pathogen prevalence, one might have falsely concluded that the hive is pathogen-free due to low power (D) to detect the pathogen.

For Nosema spp. infection in adult bees, the infection intensity (spores per bee) as well as prevalence may change rapidly, particularly in the spring, when young bees rapidly replace older nest mates. To understand such temporal effects on infection intensity or prevalence, sample size must be adequate at each sampling period to detect the desired degree of change (i.e. larger samples are necessary to detect smaller changes). Note that sampling to detect a change in prevalence requires a different mathematical model than simple sampling for prevalence because of the uncertainty associated with each prevalence estimate at different sampling periods. Because, for a binomial distribution, variances are a direct function of sample sizes n1, n2, n3, ..., one can use a rule of thumb which is based on the fact that the variance of a difference of two samples will have twice the variance of each individual sample. Thus, doubling the sample size for each period’s sample should roughly offset the increased uncertainty when taking the difference of prevalence estimates of two samples. For determining prevalence, limitations due to laboratory capacity are obviously a concern if only low levels of false negative results can be accepted.

Equation I gives the sample size needed to find a pre-determined infection level (P) with a specified probability level (D) in a sample or, in the case of honey bees, in individual colonies. If we want to monitor a population of colonies and describe their health status, or prevalence in this population, we first have to decide with what precision we want to achieve detection within colonies. For example, for composite samples for Nosema spp. or virus detection in which many bees from the same colony are pooled (one yes/no or value per colony), this is not a major concern because we can easily increase the power by simply adding more bees to the pool to be examined. For situations in which individual honey bees from a colony are examined to determine prevalence in that colony, we may not want to increase the power because of the labour involved. But if the objective is to describe prevalence in a population of honey bee colonies, not in the individual colony, we can still have poor precision in the estimates if we do not increase the number of colonies we sample. There could be a trade-off between costs in terms of labour and finance and the precision of estimates of the prevalence in each individual. However, if one decreases the power at the individual level one can compensate by an increase in colonies sampled. The more expensive, or labour intensive, the method for diagnosis of the pathogen is, the more cost effective it becomes to lower the precision of estimates of prevalence in each individual colony, but increase the number of colonies sampled.

Table 2. Example of sample sizes needed to detect different infection levels with different levels of probability (from Equation I.).

Proportion of infected bees, P

Required probability of detection, D

Sample size needed, N

0.25

0.95

11

0.25

0.99

16

0.10

0.95

29

0.10

0.99

44

0.05

0.95

59

0.05

0.99

90

0.01

0.95

298

0.01

0.99

459

 

 

 

2.2.3. Extrapolating from sample to colony