3.3.3. Single nucleotide polymorphisms (SNPs)

Single nucleotide polymorphism (SNP) markers are the most recent addition to the molecular toolkit for honey bee genetic analysis (see also the section on SNPs of the BEEBOOK paper on molecular methods (Evans et al., 2013). A SNP is a change of a single base, usually by just one alternative nucleotide, in a given position of a DNA sequence. For example, in chromosome 5 of A. m. mellifera there is a DNA sequence in the gene that codes for subunit 1 of replication factor C that displays two alternative forms, either …AACTTATCAAA… or …AACTTGTCAAA… (Pinto, unpublished data). In this case there are two alleles, A and G, created by a transition mutation. While there would be four possible nucleotides at each position of a sequence stretch, due to the low mutation rate, which is about 10-8 to 10-9 changes per nucleotide per generation (Brumfield et al., 2003), SNPs are usually bi-allelic.

While SNPs have only been used in an evolutionary study (Whitfield et al., 2006; Zayed and Whitfield, 2008) and a QTL study (Spötter et al., 2012), they have great potential for application in subspecies identification, for several reasons. At the analytical level, the genome-wide coverage (coding and non-coding regions), ubiquity, codominance, and conformation to infinite sites model of evolution (Vignal et al., 2002) facilitate employment of more powerful and robust approaches, potentially leading to more reliable identification and more accurate estimates of introgression levels. At the technical level, the possibility of using new technologies enabling high throughput genotyping, data quality, and easy calibration among laboratories facilitate screening of large sample sizes (loci and individuals), data exchange among laboratories, and development of public databases.

Employment of SNPs for subspecies identification using high throughput technologies requires a SNP assay, which can be purchased, if commercially available. Otherwise, it must be developed (as in Whitfield et al., 2006 and Spötter et al., 2012), an expensive and time consuming endeavour requiring high tech equipment and expertise often only available in a core laboratory facility (see development details in Spötter et al., 2012). Unlike for humans and other model organisms, there is only one commercial SNP assay for honey bees, available via AROS Applied Biotechnology AS (Denmark). This assay was designed by Spötter et al. (2012) for detection of Varroa tolerance in A. m. carnica and allows screening of 44,000 loci. Hence, its application for honey bee subspecies identification may not be appropriate.

Genotyping is also costly, but it will likely become increasingly affordable. As an example, AROS Applied Biotechnology AS company charges 261€ (2012 price) per individual honey bee (minimum number of analysed individuals is 95) for screening the 44,000 loci, which is inexpensive if we consider the per locus price. While contracting the services of a private company is expensive, purchasing the equipment and software for SNP genotyping is not affordable any longer for an academic laboratory performing medium-scale studies, unlike the standard equipment needed for mtDNA and microsatellite analysis (Table 6).

Other obstacles of working with thousands of SNP loci are related with the size of the datasets as they require more powerful computers, especially for analyses that are computationally intensive such as Bayesian Markov Chain Monte Carlo methods (used by the popular software Structure, for example). In addition, the software packages must be able to handle large input files. However, for many of the standard analyses, this is not a problem anymore as packages have been modified to deal with large datasets.

The sampling scheme (e.g. number of individuals per colony, number of colonies per apiary and population) in SNP surveys will depend on the research question (e.g., a population genetics-related question requires only one individual per colony), as for the other markers described in this chapter. While the number of individuals used for the other markers could be adopted for SNPs, genome saturation with thousands of SNP loci may lead to violation of the assumption of independent (unlinked) loci, assumed by many analytical approaches. Therefore, either linked loci are removed or other analytical methods (haplotype-based, for example) are employed. Most software packages used for microsatellites, such as Structure (Pritchard et al., 2000), Arlequin (Excoffier et al., 2005), NewHybrids (Anderson and Thompson, 2002), GenAlEx (Peakall and Smouse, 2006), Genepop (Raymond and Rousset, 1995), GeneClass (Cornuet et al., 1999), FSTAT (Goudet, 1995), can also be applied to SNPs. However, for clustering analysis the recently developed software Admixture (Alexander et al., 2009) is much faster than Structure (Pritchard et al., 2000) and therefore more suited for large datasets.

In spite of the promising power of SNPs for subspecies identification, the cost of developing a SNP assay and genotyping will probably preclude widespread adoption of this cutting edge tool in the near future.