6.3.2. Single-nucleotide polymorphisms (SNPs)

In the honey bee and other species for which extensive data have been gathered on genomic sequence variants, it is possible to use SNPs to reconstruct past migration events, and to separate races and populations. A SNP is any validated nucleotide change between the genomes of two or more samples, and SNP’s can occur both within the coding regions (exons) of genes and in the vast regions that separate genes or lie in non-coding parts of the genome. SNP analyses are standard in human, veterinary, and agricultural systems, and this approach will continue to increase as a viable option for the study of honey bees. Unfortunately, high-throughput SNP genotyping remains an expensive endeavour that requires cutting edge technologies and the expertise often only available in a core laboratory facility or at larger institutions. In addition, prior to genotyping the honey bee sample of interest, a SNP assay must be developed (or purchased, if commercially available) from sequence data relevant for the study population. At present, there are only two SNP assays developed and published for honey bees. The first one (Whitfield et al., 2006), which consisted of 1536 SNP loci that were selected mainly based on spacing criteria, was developed for genotyping using the Illumina GoldenGate™ assay and is not commercially available, although a honey bee SNP database (over 1 million SNPs) is available at NCBI (http://www.ncbi.nlm.nih.gov/snp/) and this resource could be exploited to establish a system for genotyping. More recently, Spötter et al. (2011), published a 44,000 SNP assay designed for analysis of varroa-specific defence behaviour in honey bees. This assay uses Affymetrix™ technology, and it is now commercially available via AROS Applied Biotechnology AS.

As illustrated in Spötter et al. (2011), development of a SNP assay is a time and resource intensive undertaking, yet it can be designed to address a specific objective (e.g., to investigate varroa-specific defence behaviour). Once the design stage is accomplished, the assay can then be used to genotype honey bee samples at hundreds to thousands of loci via high-throughput technologies. Illumina® technologies, for example, offer a number of options for high throughput genotyping depending on the number of SNPs to be interrogated. The GoldenGate assay, employed by Whitfield et al. (2006), interrogates 96, or from 384 to 1,536 SNP loci simultaneously (plex levels can be 384, 768, or 1,536). For genotyping a number of SNPs larger than 6,000 up to 2,500,000 the Infinium assay (also a product from Illumina) is required.

Both the GoldenGate assay and the Infinium assay take three days for completion and require reasonable quality and accurately quantified genomic DNA. DNA concentrations should be 50 ng/µl, quantified a fluorometric assay (e.g., Picogreen) or spectrophotometry (e.g., Nanodrop, section 3.2.1). DNA can be extracted from the thoraces of honey bees that had been stored at -80 °C or in absolute EtOH. The GoldenGate assay involves several steps including DNA activation for binding to paramagnetic particles, hybridization of activated DNA with assay oligonucleotides, washing, extension, ligation, PCR, hybridization onto the BeadChip, and finally analysis of the fluorescence signal on BeadChip by the iScan System.

Unlike in the GoldenGate assay, where universal primers are used to amplify SNP-reactive DNA fragments, in the Infinium assay genomic targets hybridize directly to array-bound sequences. Following hybridization onto the BeadChip, samples are extended and fluorescently stained. As for the GoldenGate assay, the last step consists of analysis of scanned BeadChips using the iScan System. Genotype data generated by both assays using the iScan System (and other systems), are then analysed using the GenomeStudio Genotyping (GT) Module. The calls are automated but can be manually verified and edited if necessary (e.g., if there are signs of unequal proportions of an expected biallellic marker). Finally, summary statistics and results are exported for further analyses using standard population genetics software packages such as STRUCTURE (http://pritch.bsd.uchicago.edu/structure.html).

With increasingly affordable sequencing costs allowed by next generation technologies (e.g., 100 bp or shorter), it will be feasible to carry out population-genetic and strain-identifying projects via whole-genome sequencing. This technique involves a scan (usually 3-fold sequencing depth or more, i.e., > 750 million sequenced bases for the honey bee) of a genome or population of interest followed by an alignment of those short reads to a reference genome (for the honey bee this would be the genome assembly from HGSC, 2006). It is relatively straightforward, using free programs available for download (e.g., http://bioinformatics.igm.jhmi.edu/salzberg/Salzberg/Software.html) to identify and in some cases quantify SNPs that differ among samples. There are also public sites at which one can import data and benefit from a maintained supercomputer dedicated to such genomic analyses (e.g. https://main.g2.bx.psu.edu/). SNP analyses derived from sequencing data have yet to make an impact on honey bee science but they are expected to in the next few years.