11.5. Mapping and methylation assessment

  1. Trim the above sequence data using the Illumina Data Analysis Pipeline.
  2. Align bisulfite-converted sequencing reads to the honey bee genome using the BSSeeker software (http://pellegrini.mcdb.ucla.edu/BS_Seeker/BS_Seeker.html) as described in Foret et al. (2012); http://www.pnas.org/content/suppl/2012/03/12/1202392109.DCSupplemental/pnas.201202392SI.pdf#nameddest=STXT.
  3. Reads containing consecutive CHN nucleotides are the product of incomplete bisulfite conversion and must first be discarded.
  4. To increase the accuracy of methylation calls, only those cytosines fulfilling neighbourhood quality standards are counted (bases of quality 20 or more, flanked by at least three perfectly matching bases with a PHRAP quality score of 15 or more).
  5. The methylation status of each cytosine base can be modelled by a binomial distribution with the number of trial equal to the number of mapping reads and the probability equal to the conversion rate.
  6. A base is called methylated if the number of reads supporting a methylated status departed from this null model significantly at the 5% level after correcting for multiple testing.
  7. Differentially methylated genes are identified using generalized linear models of the binomial family; the response vector CpGmeth (number of methylated and non-methylated reads for each CpG in a gene) was modelled as a function of two discrete categorical variables, the caste and the CpG position: CpGmeth = caste * CpGi.
  8. P-values are corrected for multiple testing using the Benjamini and Hochberg method. These tests are carried out using the R statistical environment (http://www.r-project.org).
  9. Honey bee ESTs and predicted genes are loaded into a Mysql database and visualized with Gbrowse, where CpG methylation levels in queens and workers are added as separate tracks.