Data analysis

A method of analysis alternative to F-statistics and phylogenetic trees is assignment testing, which can be applied with several variations (Manel et al., 2005). Two main types of assignment test can be distinguished:

Deterministic assignment compares the genotype of each individual, and groups are formed according to the sampling location or other likely categories. The assignment analysis then compares the probability for each sampled genotype being drawn at random from its own group of individuals, or from one or more alternative groups, based on the allele frequencies of each group. The population of origin is determined from the probability; however, it is also possible to reject the hypothesis that any of the reference populations are the source of origin, based on the calculated probabilities. The software package GENECLASS is the most advanced tool for this task (Piry et al., 2004). For small sample sizes of less than 30 individuals, it is best to consider the individual genotypes as belonging to each population (as is), for large sample sizes it is better to remove the individual genotype from all subgroups (“leave-one-out” approach) to avoid self-assignment.

Alternative to the classical or deterministic assignment test, Bayesian assignment works without prior knowledge of the number of populations. Instead, it tries to determine the best assortment of the genotypes, while varying the number of clusters that the individuals are sorted into. The data from microsatellites are entered raw for analysis without designation of population origin, and the software varies the number of clusters in order to determine not only their numbers, but also for each individual from which cluster it most likely originates. The program STRUCTURE (Pritchard et al., 2000) is the most commonly used, but there are also several other options. Ideally, the numbers of clusters found resembles the number of populations expected by the investigator. However, more objective methods exist to determine the optimal number of clusters for a given data set, based on the posterior probability calculated (Evanno et al., 2005). The Bayesian method is sensitive and can assign populations at various levels, like closely related subspecies and more distantly related branches. However, it is important to avoid genotyping related individuals, as the software is clearly capable of picking up differences based on resemblance due to common ancestry. An example of this method in honey bees is a study of various levels of introgression of A. m. Iigustica into populations of A. m. mellifera (Jensen et al., 2005). Assignment tests have also been used to detect recent hybrids using the software NewHybrids, because individuals with intermediate probability are likely to have mixed origin (Soland-Reckeweg et al., 2009).

Spatial methods have been developed for the use with DNA microsatellites. Currently there are studies underway with the methods in GENELAND (Guillot, 2005) and TESS (Durand, 2009), two software packages based on Bayesian assignment, and in ADEGENET (Jombart, 2008), a PCA based software package in the Statistical language R. We recommend analysis with ADEGENET, which produces interesting and rapid results, even without geographical data attached to the genotypes (Uzunov et al., 2013), and it has fewer underlying assumptions than the other methods.

if recommending a single analysis tool is not possible, please provide a sentence to guide the reader in his/her choice. What determines the choice of tool?[+]