5.2.1. General advice for using GLMMs
If the response variable to be measured (i.e. the phenotype of interest that may change with treatment) is a quantitative or a qualitative (i.e. yesdiseased/nonot diseased) trait and the experiment is hierarchical (e.g. bees in cages, cages from colonies, colonies from locations), repeated over years, or has some other random effects, then a generalised linear mixed model (GLMM; as provided in the statistical software R, Minitab, or SAS) can be used to analyse the results. The treatment (control, Nosema, black queen cell virus) is a ‘fixed effect’ parameter (Crawley, 2005; Bolker et al., 2009). Several fixed and random effect parameters can be estimated in the same statistical model. The distinction between what is a fixed or a random effect can be difficult to make because it can be highly contextdependent, but in most experiments it should be obvious. To help clarify the distinction between the two, Crawley (2013) suggests that fixed effects influence the mean of your response variable and random effects influence the variance or correlation structure of your response variable, or is a restriction on randomisation (e.g. a block effect). A list of fixed effects would include: treatment, caste, wet vs. dry, light vs. shade, high vs. low, etc. i.e. treatments imposed by the researcher or inherent characteristics of the subjects (e.g. age). A list of random effects would include: cage, colony, apiary, region, genotype (if genotypes were sampled at random, not if the design was to compare two or more specific genotypes), block within a field, plot, subject measured repeatedly.
Example:
The experimenter must consider the structure of the GLMM by addressing two questions, as follows:
 Which underlying distribution?
 Gaussian, useful for data where one expects residuals to follow a ‘normal’ distribution
 Poisson, useful for count data (e.g. number of mites per bee)
 Binomial, useful for data on proportions based on counts (y out of n) or binary data
 Gamma, useful for data showing a constant coefficient of variation 
What link function to use?
The link function maps the expected values of the data, conditioned on the random effects, to the linear predictor. Again, this means that the linear predictor and data reside on different scales. Canonical link functions are the most commonly used link functions associated with each ‘family’ of distributions (Table 4). The term “canonical” refers to the form taken of one of the parameters in the mathematical definition of each distribution.
If two or more experimental cages used in the same treatment group are drawn from the same colony of honey bees (Table 5), then a GLMM with ‘source colony’ as a random effect parameter should also be included, as described above. This random effect accounts for the hierarchical experimental design whereby, for the same treatment level, variation between two cages of honey bees drawn from the same colony may not be the same as the variation between two cages drawn from two separate colonies. This statistical approach can account for the problem of pseudoreplication in the experimental design.
Finally,
if the factor ‘cage’ and ‘source colony’ are not significant, the experimenter
may be tempted to treat individual bees from the same cage as independent
samples; i.e. ignore ‘cage’. However, individual bees drawn from the same cage
might not truly be independent samples and therefore it would inflate the
degrees of freedom to treat individual bees and individual replicates. Because
there are currently no good tests to determine if a random effect is
‘significant’, we suggest retaining any random effects that place restrictions
on randomisation  cage and source colony are two such examples  even if
variance estimates are small. This point requires further attention by
statisticians. The experimenter should consider using a nested experimental
design in which ‘individual bee’ is nested within a random effect, ‘cage’, as
presented above (see section 5.).
Table 5. Experimental
design for studying the impact of Nosema
ceranae and black queen cell virus (BQCV)
on caged honey bees.^{ }^{†}Notation represents
individual cages (Treatment, Colony 1, Cage 1 = T1_1; and Control, Colony 1,
Cage 1 = C1_1), each containing equal number of honey bees (e.g. 30) exposed to the same conditions
(except experimental treatment differences). Two replicate cages within
treatments drawn from the same colony are displayed (T1_1 and T1_2), and more
could be used (T1_3, T1_4, etc.). Additional control colonies would then also
be required. ‘Colony’ should be used as a random effect in such cases. But, it
is statistically more powerful to maximise inter as opposed to intracolony
replication; that is, favour the use of replicate cages between colonies,
rather than repeated sets of cages per treatment drawn from the same colony.
Thus we recommend one set of treatment and control cages per colony of source honey
bees rather than repeated sets of cages per treatment and control drawn from a
single colony i.e. T1_1, T2_1, T3_1 …. T9_1 and C1_1, C2_1, C3_1 …. C9_1 would be a far superior design
compared to T1_1, T1_2, T1_3 …. T1_9 and C1_1, C1_2, C1_3 …. C1_9.
Treatment 
Colony 


1 
2 
3 
4 
5 
6 
7 
8 
9 
N. ceranae & BQCV 
T1_1^{†}, T1_2 
T2_1, T2_2 
T3_1, T3_2 
T4_1, T4_2 
T5_1, T5_2 
T6_1, T6_2 
T7_1, T7_2 
T8_1, T8_2 
T9_1, T9_2 
control

C1_1, C1_2 
C2_1, C2_2 
C3_1, C3_2 
C4_1, C4_2 
C5_1, C5_2 
C6_1, C6_2 
C7_1, C7_2 
C8_1, C8_2 
C9_1, C9_2 