5.1.1 Tests for normality and homogeneity of variances
The flow diagram in Fig. 6 gives a simple decision tree to choose the right test; for more examples, see Table 6. Starting at the top, one has to make a decision based on what kind of data one has. If two variables are categorical, then a chisquare test could be applicable. When investigating the relationship between two continuous variables, a correlation will be suitable. In the event one wants to compare two or more groups and test if they are different, one follows the pathway “difference”. The next question to answer is how many variables one wants to compare. Is it one variable (for example the effect of a new varroa treatment on brood development in a honey bee colony), or is it the effect of varroa treatment and supplementary feeding on brood development? For the latter, one could conduct a 2way ANOVA or an even more complex model depending on the actual data set. For the former, the next question would be “how many treatments?”; sticking with the example, does the experiment consist of two groups (control and treatment) or more (control and different dosages of the treatment)? In both cases, the next decision would be based on if the data sets are independent or dependent. Relating back to the example, one could design the experiment where some of the colonies are in the treatment group and some in the control, in which case one could say that the groups are independent. However, one could as well compare before and after the application of the varroa agent, in which case all colonies would be in the before (control) and after (treatment) group. In this case it is easy to see that the before might affect the after or that the two groups are not independent. A classical example of dependent data is weight loss in humans before and after the start of diet; clearly weight loss depends on starting weight.
To arrive at an informed
decision about the extent of nonnormality or heterogeneity of variances in
your data, a critical first step is to plot your
data: i) for correlational analyses as in regression, use a scatterplot ii) for
‘groups’ (e.g. levels of a treatment factor), use a histogram or box plot; it
provides an immediate indication of your data’s distribution, especially
whether variances are homogeneous. The next step would be to objectively test
for departures from normality and homoscedasticity. ShapiroWilks W,
particularly for sample sizes < 50, or Lilliefors test, can be used to test
for normality, and the AndersonDarling test is of similar if not better value
(Stephens, 1974). Similarly, for groups of data, Levene’s test tests the null
hypothesis that different groups have equal variances. If tests are
significant, assumptions that a distribution is normal or its variances are
equal must be rejected and either
the data has to be transformed or nonparametric tests have to be conducted.
Fig. 6. A basic decision tree on how to select the appropriate statistical test is shown.
Table 6. Guideline to
statistical analyses in honey bee research including examples/ suggestions for
tests and graphical representation. Blank fields indicate that a wide variety
of options are possible and all have pros and cons.
Subject 
Variable 
Short description 
Fields of research where it is used 
Synthetic representation 
Measure of dispersion 
Statistical test 
Graphical representation 
Notes 
Honey bee 
Morphometric variables (e.g. forewing angles) 
Measures related to body size. Other data can be included here such as, for example, cuticular hydrocarbons 
Taxonomic studies 
Average 
Standard deviation 
Parametric tests such as ANOVA. Multivariate analysis such as PCA and DA 
Bar charts for single variables, scatterplots for PC, DA 
Please note that some morphometric data are ratios; consider possible deviations from normality 
Physiological parameters (e.g. concentration of a certain compound in the haemolymph) 
Measures related to the functioning of honey bee systems 

Average 
Standard deviation 

Bar charts or lines 


Survival 


Median 
Range 
Kaplan Meyer Cox hazard 
Bar charts or lines scatterplots 




















Pathogens (e.g. DWV, Nosema) 
Prevalence 
Proportion of infected individuals 
Epidemiological studies 
Average 
Standard deviation can be used but transformation is necessary due to nonnormal distribution 
Fisher exact solution or Chi square according to sample size 
Bar charts, pie charts 

Infection level 
Number of pathogens (e.g. viral particles) 
Epidemiological studies, studies on beeparasite interaction 
Average 

Parametric tests (e.g. t test/ANOVA) can be used after log transformation otherwise non parametric tests can be used (e.g. MannWhitney/KruskalWallis) 





















Parasites (e.g. Varroa destructor) 
Fertility 
Proportion of reproducing females 
Factors of tolerance, biology of parasites 
Average 
Range 
Fisher exact solution or chi square according to sample size 


Fecundity 
Number of offspring per female 
Factors of tolerance, biology of parasites 
Average 
Standard deviation 






























