7.1. Introduction

The goal of this protocol is to provide the reader with an easy to use, reliable, and technically appropriate method to choose, align and analyse sequence data for phylogenetic analyses of taxa or genes of interest. Analysis of highly conserved loci (i.e. rRNA, cytochrome oxidase I) or population genetic studies from one species, require nucleotide level data to achieve necessary resolution in tree topology. Amino acid sequences are typically used when reconstructing phylogenies from an encoded protein across a large evolutionary distance, which can make alignment at the nucleotide level difficult.  Over time, one develops their preferred approach and programs to use in this process, of which there are many. While the following protocol reflects preferences of the authors, it is appropriate for a wide variety of applications, user skill levels, and relies on freely available programs with graphical user interface (GUI)-based options. Detailed information on use is available from each of the program sites, given below. As a disclaimer, concatenation of sequence data, while appropriate and employed for taxonomy classification, is a more specific approach some users may wish to use but will not be discussed here. Additionally, though PAUP is also widely used in phylogenetic analyses, it requires a small fee and therefore is not discussed here, though labs with frequent phylogenetic needs may wish to purchase this program. MEGA and other software free to the public can invoke many or all of the same phylogenetic analyses as PAUP.

The steps to perform a phylogenetic analysis are:

  1. Obtain and format sequences of interest.
  2. Format sequence data in FASTA format.
  3. Align sequence data.
  4. Trim aligned sequence data to equal length.
  5. Perform phylogenetic analyses.

Each step is described below in detail.