5.1.1 Tests for normality and homogeneity of variances

The flow diagram in Fig. 6 gives a simple decision tree to choose the right test; for more examples, see Table 6. Starting at the top, one has to make a decision based on what kind of data one has. If two variables are categorical, then a chi-square test could be applicable. When investigating the relationship between two continuous variables, a correlation will be suitable. In the event one wants to compare two or more groups and test if they are different, one follows the pathway “difference”. The next question to answer is how many variables one wants to compare. Is it one variable (for example the effect of a new varroa treatment on brood development in a honey bee colony), or is it the effect of varroa treatment and supplementary feeding on brood development? For the latter, one could conduct a 2-way ANOVA or an even more complex model depending on the actual data set. For the former, the next question would be “how many treatments?”; sticking with the example, does the experiment consist of two groups (control and treatment) or more (control and different dosages of the treatment)? In both cases, the next decision would be based on if the data sets are independent or dependent. Relating back to the example, one could design the experiment where some of the colonies are in the treatment group and some in the control, in which case one could say that the groups are independent. However, one could as well compare before and after the application of the varroa agent, in which case all colonies would be in the before (control) and after (treatment) group. In this case it is easy to see that the before might affect the after or that the two groups are not independent. A classical example of dependent data is weight loss in humans before and after the start of diet; clearly weight loss depends on starting weight.

To arrive at an informed decision about the extent of non-normality or heterogeneity of variances in your data, a critical first step is to plot your data: i) for correlational analyses as in regression, use a scatterplot ii) for ‘groups’ (e.g. levels of a treatment factor), use a histogram or box plot; it provides an immediate indication of your data’s distribution, especially whether variances are homogeneous. The next step would be to objectively test for departures from normality and homoscedasticity. Shapiro-Wilks W, particularly for sample sizes < 50, or Lilliefors test, can be used to test for normality, and the Anderson-Darling test is of similar if not better value (Stephens, 1974). Similarly, for groups of data, Levene’s test tests the null hypothesis that different groups have equal variances. If tests are significant, assumptions that a distribution is normal or its variances are equal must be rejected and either the data has to be transformed or non-parametric tests have to be conducted.

Fig. 6. A basic decision tree on how to select the appropriate statistical test is shown.

Figure 6


Table 6.
Guideline to statistical analyses in honey bee research including examples/ suggestions for tests and graphical representation. Blank fields indicate that a wide variety of options are possible and all have pros and cons.


Subject

Variable

Short description

Fields of research where it is used

Synthetic representation

Measure of dispersion

Statistical test

Graphical representation

Notes

Honey bee

Morphometric variables (e.g. fore-wing angles)

Measures related to body size. Other data can be included here such as, for example, cuticular hydrocarbons

Taxonomic studies

Average

Standard deviation

Parametric tests such as ANOVA.

Multivariate analysis such as PCA and DA

Bar charts for single variables, scatterplots for PC, DA

Please note that some morphometric data are ratios; consider possible deviations from normality

Physiological parameters (e.g. concentration of a certain compound in the haemolymph)

Measures related to the functioning of honey bee systems

 

Average

Standard deviation

 

Bar charts or lines

 

Survival

 

 

Median

Range

Kaplan Meyer

Cox hazard

Bar charts or lines scatterplots

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Pathogens (e.g. DWV, Nosema)

Prevalence

Proportion of infected individuals

Epidemiological studies

Average

Standard deviation

can be used but transformation is necessary due to non-normal distribution

Fisher exact solution or Chi square according to sample size

Bar charts, pie charts

 

Infection level

Number of pathogens (e.g. viral particles)

Epidemiological studies,

studies on bee-parasite interaction

Average

 

Parametric tests (e.g. t test/ANOVA) can be used after log transformation otherwise non parametric tests can be used (e.g. Mann-Whitney/Kruskal-Wallis)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Parasites (e.g. Varroa destructor)

Fertility

Proportion of reproducing females

Factors of tolerance, biology of parasites

Average

Range

Fisher exact solution or chi square according to sample size

 

 

Fecundity

Number of offspring per female

Factors of tolerance, biology of parasites

Average

Standard deviation