10.4.5. Example of advanced analysis

The analysis below uses the Dutch data collected with the full 2011 COLOSS questionnaire, as an example of how to estimate overall loss rates, calculate confidence intervals and fit GZLMs. It uses the quasi-binomial family of GZLMs, to account for any extra-binomial variation in the data. It is a simple illustration of how model fitting can be done in R, with factors and covariates, rather than a procedure for determining a best fitting model. Guidance on model building may be found, for example, in Dobson (2002) and Zuur et al. (2009).

The data was “cleaned” prior to use to remove some inconsistent values. The “glm” procedure in R is sensitive to invalid values in the data, and will generate error messages rather than omit the cases with invalid data values, so it is best to deal with these before attempting model fitting (or any other kind of analysis). The analysis below uses the variables ColOct10 as the number of colonies kept at 1st October 2010, and Loss1011, the stated number of colonies lost over winter 2010/2011, rather than the calculated population at risk or calculated colonies lost. Even so, in one case Loss1011 was missing and in six other cases Loss1011 was greater than ColOct10, causing negative calculated values of a new variable, NotLost, the number of colonies surviving. In some cases, though not all, this was due to winter management (making in/decreases) of colonies. These few cases were also removed before carrying out the analysis shown below.

The analysis does not show all available options for the “glm” procedure. Several diagnostic plots are available, for example.

a)  Calculation of overall loss rate and confidence interval from a null model (Boxes 12-14).

box12

box13

box14

b) Fitting a GZLM with an explanatory term.

The second step in model building is the use of explanatory variables. Explanation of the methods for evaluating model fit and determining optimal models is outside the scope of this document. For this example analysis, the variable Region is used. The region variable is one that is largely outside of the beekeeper’s control, rather like pesticide use by farmers, yet for various reasons may be associated with the loss rate. In some countries, region may be a substitute for meteorological variables. Boxes 15 to 18 and Fig. 3 show the analysis.

box15

box16

box17

box18

 
   Fig. 3.
Estimated probability of loss and 95% confidence interval per region.figure3