# 4.2.4. Data analysis

For the analysis of differences between cuticular compounds or profiles, data can be arranged in a matrix with as many rows as the number of the studied hydrocarbons and one column for each analysed honey bee. In many cases the percentage composition (e.g. the proportion of a compound relative to the whole set of hydrocarbons is used for the analysis, in other cases the absolute amount of each hydrocarbon is used; sometimes this is expressed as the absolute quantity of a compound relative to the weight of the insect. If the percentage composition is used, data transformation according to Reyment (1989) is common using the following formula:

Z_{i,j}=log[X_{i,j}/g(X_{j})]

where:

Z_{i,j}
the transformed area of peak i for specimen j;

X_{i,j}
represents the area of peak i for specimen j;

g(X_{j})
the geometric mean of the areas of all peaks for specimen j.

Different
methods for data analysis are applied according to the purpose of the study.
Given the distribution of data, possible differences between experimental
groups can be tested using parametric methods such as ANOVA if experimental
groups are three or more, or Student’s t test if only two groups are
considered. In this case, the number of tests to be carried out corresponds to
the number of CHC considered, that can be rather high causing possible errors
related to multiple comparisons; therefore probabilities from the test should
be adjusted to allow for possible false positives using convenient formulas
(e.g. Bonferroni correction that is very common and conservative, see the *BEEBOOK* paper on statistics by Pirk *et al.* (2013)).

In many cases the whole set of CHC is considered using multivariate techniques such as principal components or discriminant analysis. This can be carried out with most commercial statistical packages and this allows for a plot of the specimens on the plane formed by the derived functions accounting for most of the variability. Possible differences between groups are denoted by isolated clouds of points grouped around the centroids, whose distance from other centroids can be tested for its significance with standard methods.

Discriminant analysis is carried out when samples belong to predefined groups; in this case, to account for multicollinearity, a preliminary principal component analysis is carried out and the discriminant analysis is applied on the extracted factors.