# 10.3.2. Loss calculations and Confidence Intervals

(1) Regarding loss rates, rather than the raw numbers of
colonies kept and number of colonies lost which are used in their calculation,
different quantities are of interest. The overall loss rate is the proportion
calculated as the total number of lost colonies in the sample of beekeepers
divided by the total number of colonies at risk of loss in the sample.
(VanEngelsdorp *et al.* (2013) refer to this as "total loss". As this
suggests to us the total number of colonies lost rather than any kind of rate
or proportion, we prefer the terms overall loss rate or overall proportion of
colonies lost). Adjustments can be made
to this calculation to take account of colony management (VanEngelsdorp *et al.*, 2012). The overall loss rate is
influenced disproportionately by the larger beekeepers, who are fewer in
number. Using this approach, confidence intervals for proportions may be
calculated. There are several ways to do this.

Alternatively, the average loss rate is the average of the individual loss rates (number of colonies lost divided by number of colonies at risk) experienced by different beekeepers in the sample. Using this approach, confidence intervals should be those for an average, not a proportion. However, a difficulty of using the average loss rate is that the loss rates experienced by beekeepers with different sizes of operation are not equally variable, yet they are weighted equally in the calculation of this average. While the loss rates can only range between 0 and 1 (0 to 100%), larger scale beekeepers have many more colonies which can be lost, and can experience a much larger set of possible loss rates within this range; therefore, their loss rates are subject to greater variation. Also, there are many ties in the individual loss rates, for example due to the large number of beekeepers with no losses. The median individual loss rate could well be zero. Average individual loss rate is often higher than overall loss rate, owing to the larger number of small scale beekeepers present in many populations of beekeepers, who can suffer extreme individual loss rates. For this reason, the use of medians and Kruskal-Wallis tests to compare loss rates should be avoided. Owing to these various difficulties, we recommend use of the overall loss rate.

(2) Another difficulty is that the usual procedure to
calculate standard errors and confidence intervals for the overall loss rate
(the proportion of colonies lost) is based on the binomial distribution, as the
number of losses is limited by the number of colonies at risk. This assumes
that each bee colony is lost or not independently of any other colony, and also
that the probability of loss is the same for all colonies. Within apiaries,
whether or not a colony is lost is likely dependent on whether or not neighbouring
colonies are lost. Furthermore, the probabilities of losing a colony are likely
to differ between beekeepers. One way to account for that extra source of
variation in the data is to model the data using a generalisation of the
binomial distribution. There are different ways to do this. One approach uses
generalised linear modelling using a quasi-binomial distribution and a logit
link function, and derives a confidence interval for the overall loss rate
based on the standard error of the estimated intercept in an intercept-only
model (see VanEngelsdorp *et al.*
(2012) and below).

(3) Another approach to calculating confidence intervals, when it is felt that formulae based on parametric models are not appropriate, is to use the nonparametric bootstrap approach, based on resampling the data (Efron and Tibshirani, 1994). This avoids the need to specify any particular model for the data. This is easy to implement in a software package such as R.