2.2.5. Exploratory techniques: beta diversity

The main goal of most bacterial community studies is to compare the composition of different communities (beta diversity).  The communities being compared differ in some trait or treatment, such as which section of the gut the samples are from.  There are numerous ways to visualize and analyse beta diversity, and a thorough review of multivariate techniques that are commonly used by microbial ecologists is presented by Ramette (2007). The beta diversity analyses that have been used in studies of bee-associated bacteria fall into two categories: exploratory techniques and tests of significance.  We recommend the following steps for ordination and hierarchical clustering (exploratory techniques):

  1. Determine the distance/dissimilarity matrix. The goal of ordination and clustering is to visually compare community composition.  Both approaches utilize community distance matrices as input, and these matrices are commonly computed using two methods.
    a. Bray-Curtis dissimilarity (Bray and Curtis, 1957):
    equation3
    where w is the sum of the of the lesser scores for only those species which are present in both communities, a is the sum of the measures of taxa in one community and b is the sum of the measures of taxa in the other community.  When proportional abundance is used, a and b equal 1 and the index collapses to 1-w.
    b. UniFrac distances (Lozupone and Knight, 2005).  UniFrac distances are based on branches in a phylogenetic tree that are either shared or unique amongst samples.  UniFrac distance matrices therefore depend on the quality of the input tree, which can be problematic for short NGS data (Ochman et al., 2010).  Given that caveat, UniFrac distances are commonly used, and can be calculated in QIIME given an OTU table that lists the abundance of each OTU in a sample and a phylogenetic tree.
  2. Evaluate ordination patterns. The Bray-Curtis dissimilarity matrix or UniFrac distance matrix is used as input for ordination and clustering analyses.  The two most common methods for ordination of NGS bacterial community data are principal coordinates (PCoA) and nonmetric multidimensional scaling (NMDS). NMDS is recommended, as NMDS is non-parametric, free of assumptions, and can reduce the data into fewer axes than PCoA (Quinn and Keough, 2002; Ramette, 2007). The number of axes for the NMDS ordination is determined beforehand, and will likely be a tradeoff between interpretability and goodness of fit (Quinn and Keough, 2002). When Kruskal’s stress formula is used, it is recommended to use as few dimensions as possible, while achieving stress values of less than at least 0.20 and preferably less than 0.10 (Quinn and Keough, 2002). Although currently not implemented in mothur or QIIME, analyses such as canonical correspondence analysis (CCA) relate environmental variables to ordination patterns (Ramette, 2007). CCA can also be used to determine which OTUs correspond with specific environmental variables.
  3. Hierarchical community clustering. To visualize community relatedness in the same format as a phylogenetic tree, we recommend UPGMA, or the Unweighted Pair Group Method with Arithmetic mean (Sokal and Michener 1958). Jackknife support for the branching patterns in the resulting dendrogram can be calculated in QIIME (Kuczynski et al., 2011), providing an estimate of confidence in the clustering patterns.