5.2.1. General advice for using GLMMs

If the response variable to be measured (i.e. the phenotype of interest that may change with treatment) is a quantitative or a qualitative (i.e. yes-diseased/no-not diseased) trait and the experiment is hierarchical (e.g. bees in cages, cages from colonies, colonies from locations), repeated over years, or has some other random effects, then a generalised linear mixed model (GLMM; as provided in the statistical software R, Minitab, or SAS) can be used to analyse the results. The treatment (control, Nosema, black queen cell virus) is a ‘fixed effect’ parameter (Crawley, 2005; Bolker et al., 2009). Several fixed and random effect parameters can be estimated in the same statistical model. The distinction between what is a fixed or a random effect can be difficult to make because it can be highly context-dependent, but in most experiments it should be obvious. To help clarify the distinction between the two, Crawley (2013) suggests that fixed effects influence the mean of your response variable and random effects influence the variance or correlation structure of your response variable, or is a restriction on randomisation (e.g. a block effect). A list of fixed effects would include: treatment, caste, wet vs. dry, light vs. shade, high vs. low, etc. i.e. treatments imposed by the researcher or inherent characteristics of the subjects (e.g. age). A list of random effects would include: cage, colony, apiary, region, genotype (if genotypes were sampled at random, not if the design was to compare two or more specific genotypes), block within a field, plot, subject measured repeatedly.


The experimenter must consider the structure of the GLMM by addressing two questions, as follows:

  • Which underlying distribution?
    - Gaussian, useful for data where one expects residuals to follow a ‘normal’ distribution
    - Poisson, useful for count data (e.g. number of mites per bee)
    - Binomial, useful for data on proportions based on counts (y out of n) or binary data
    - Gamma, useful for data showing a constant coefficient of variation
  • What link function to use?
    The link function maps the expected values of the data, conditioned on the random effects, to the linear predictor. Again, this means that the linear predictor and data reside on different scales. Canonical link functions are the most commonly used link functions associated with each ‘family’ of distributions (Table 4). The term “canonical” refers to the form taken of one of the parameters in the mathematical definition of each distribution.

If two or more experimental cages used in the same treatment group are drawn from the same colony of honey bees (Table 5), then a GLMM with ‘source colony’ as a random effect parameter should also be included, as described above. This random effect accounts for the hierarchical experimental design whereby, for the same treatment level, variation between two cages of honey bees drawn from the same colony may not be the same as the variation between two cages drawn from two separate colonies. This statistical approach can account for the problem of pseudo-replication in the experimental design.

         Finally, if the factor ‘cage’ and ‘source colony’ are not significant, the experimenter may be tempted to treat individual bees from the same cage as independent samples; i.e. ignore ‘cage’. However, individual bees drawn from the same cage might not truly be independent samples and therefore it would inflate the degrees of freedom to treat individual bees and individual replicates. Because there are currently no good tests to determine if a random effect is ‘significant’, we suggest retaining any random effects that place restrictions on randomisation - cage and source colony are two such examples - even if variance estimates are small. This point requires further attention by statisticians. The experimenter should consider using a nested experimental design in which ‘individual bee’ is nested within a random effect, ‘cage’, as presented above (see section 5.).

Table 5. Experimental design for studying the impact of Nosema ceranae and black queen cell virus (BQCV) on caged honey bees. Notation represents individual cages (Treatment, Colony 1, Cage 1 = T1_1; and Control, Colony 1, Cage 1 = C1_1), each containing equal number of honey bees (e.g. 30) exposed to the same conditions (except experimental treatment differences). Two replicate cages within treatments drawn from the same colony are displayed (T1_1 and T1_2), and more could be used (T1_3, T1_4, etc.). Additional control colonies would then also be required. ‘Colony’ should be used as a random effect in such cases. But, it is statistically more powerful to maximise inter- as opposed to intra-colony replication; that is, favour the use of replicate cages between colonies, rather than repeated sets of cages per treatment drawn from the same colony. Thus we recommend one set of treatment and control cages per colony of source honey bees rather than repeated sets of cages per treatment and control drawn from a single colony i.e. T1_1, T2_1, T3_1 …. T9_1 and C1_1, C2_1, C3_1 …. C9_1 would be a far superior design compared to T1_1, T1_2, T1_3 …. T1_9 and C1_1, C1_2, C1_3 …. C1_9.













N. ceranae & BQCV

T1_1, T1_2

T2_1, T2_2

T3_1, T3_2

T4_1, T4_2

T5_1, T5_2

T6_1, T6_2

T7_1, T7_2

T8_1, T8_2

T9_1, T9_2



C1_1, C1_2