5.1. How to choose a simple statistical test

Before addressing the question of how to choose a test, we describe differences between parametric and non-parametric statistics. As stated in the introduction, one has to know what kind of data one has or will obtain. In the discussion below, we use a traditional definition of “parametric” versus “non-parametric tests”. In all statistical tests, parameters of one kind or another (means, medians, etc.) are estimated. The distinction has grown murkier over the years as more and more statistical distributions become available for use in contexts where previously only the normal distribution was allowed (e.g. regression, ANOVA). “Parametric” tests assume (1) models where the residuals (the variation that is not explained by the explanatory variables one is testing, i.e. inherent biological variation of the experimental units), following fitting a linear predictor of some kind, are normally distributed, or that the data follow a (2) Poisson, multinomial, or hypergeometric distribution. This definition holds for simple models only, parametric models are actually a large class of models where all essential attributes of the data can be captured by a finite number of parameters (estimated from the data), so include many distributions and both linear and non-linear models, but the distribution(s) must be specified when analysing the data. The complete definition is quite mathematical. A non-parametric test does not require that the data be samples from any particular distribution (i.e. they are distribution-free). This is the feature that makes them so popular.

For models based on the normal distribution, this does not mean that the dependent variable is normally distributed; in fact one hopes it is multimodal, with a different mode for each different treatment. However, if one subtracts (or conditions on) the linear predictor (e.g. subtract each treatment mean from its group of observations), the distribution of each resulting group (and all groups combined) follows the same normal distribution. Also, the discussion below pertains only to “simple” statistical tests and where observations are independent.

Note that chi-square and related tests are often considered “non-parametric” tests.  This is incorrect; they are very distribution dependent (data must be drawn from Poisson, multinomial, or hypergeometric distributions), and observations must be independent. While “non-parametric” tests may not require that one samples from a particular distribution, they do require that each set of samples come from the same general distribution. That is, one sample cannot come from a right-skewed distribution and the other from a left-skewed distribution; both must have the same degree of skew and in the same direction. Note that when one has dichotomous (Yes/No) or categorical data, non-parametric tests will be required if we stay in the realm of “simple” statistical tests (Fig. 4). For parametric statistics based on the normal distribution, an important second assumption is that the variance among groups of residuals is similar (homogeneous variances, also called homoscedasticity) (as shown in Fig. 5a) and not heterogeneous variances (heteroscedasticity, Fig. 5b). If only one assumption is violated, a parametric statistic is not applicable. The alternative in such a case would be to either transform the data (see Table 4 and section 5.2.), so that the transformed data no longer violate assumptions, or to conduct non-parametric statistics. The advantage of non-parametric statistics is that they do not assume a specific distribution of the data; the disadvantage is that the power (1-ß, see section 1.) is lower compared to their parametric counterparts (Wasserman, 2006), though the differences may not be great. Power itself is not of such great concern because biologically relevant effects shall be detected with a large enough effect size in a well-designed experiment. Table 3 provides a comparison between parametric and non-parametric statistics.

Fig. 5a. Two similar distributions with different means, where variances of the two groups are homogeneous; b. shows three different distributions where the means are the same but the variances of three groups are heterogeneous.

12181VD revised Fig5a

12181VD revised Fig5b

Table 3. Comparison between parametric and non-parametric statistics.










General data type

Interval or ratio (continuous)

Interval, ratio, ordinal or nominal




Example Tests




Independent data

t-test for independent samples

Mann-Whitney U test

Independent data more than 2 groups

One way ANOVA

Kruskal Wallis ANOVA

Two repeated measures, 2 groups

Matched pair t-test

Wilcoxon paired test

Two repeated measures, >2 groups

Repeated measures ANOVA

Friedman ANOVA


Table 4. Links for GLMM models for analyses of data from cage experiments.


Canonical Link


identity (no transformation)








5.1.1 Tests for normality and homogeneity of variances