[EzEditor2] Phylogenetic analysis

[EzEditor2] Tutorial
05/15/2017
[EzEditor2] Working with protein-coding genes
05/15/2017

[EzEditor2] Phylogenetic analysis

This article will explain how to carry out phylogenetic analysis using EzEditor2 and other programs. We assume that you already aligned all sequences (either 16S or protein-coding genes). We will use the following examples files which can be obtained here.

  • Leuconostoc_16s.ezb (for 16S)
  • Leuconostoc_gyrB.ezb (for gyrase subunit B gene, a protein-coding gene)

Also, make sure that you installed MEGA program (see here for installation, if you haven’t done this). MEGA is a user-friendly software providing all major methods to build phylogenetic trees (neighbor-joining, maximum likelihood, maximum parsimony).

Exclude positions of problems

Even though we try to ensure that multiple alignments are correct, there are cases that we cannot be confident about the accuracy.

  • Parts of some sequences are missing, so it may affect the phylogenetic analysis, if not removed.
  • Alignment cannot be done with confidence due to the lack of sufficient evolutionary signals. For example, regions of too many mutations can not be aligned confidently.

To exclude specific regions of the multiple alignments, we need to create a special FILTER sequence entry in your EzEditor data file. Go to “Select Window” and click the right mouse button. Then, select  (1) “Add FILTER Entry”, then (2) open “Align Window”

Add a name of this FILTER entry as you wish (e.g. my filter).

In the Align Window, enter ‘E’ keys for the positions that you want to exclude from the “exporting” or “MEGA phylogeny” functions.

(1) Here, we excluded the first 35 positions by typing in ‘E’s.

Exporting multiple alignments for external programs

You can export alignment with EzEditor2’s “Export” menu. At present, we provide “FASTA” format which can be used by most of the phylogenetic software tools including MEGA, RaxML and FastTree. “Export” has multiple options to choose from.

(1) Sequence Label

  • Name: For example, Leuconostoc kimchii
  • Name + Strain: For example, Leuconostoc kimchii IMSNU 11154(T)
  • Name + Strain + Accession: For example, Leuconostoc kimchii IMSNU 11154(T)/CP001758
  • zZ + Accession + zZ: This is a special form of sequence label, developed for software tools that can not accept the formats mentioned above. For example, some software can not handle “Leuconostoc kimchii IMSNU 11154(T)/CP001758” as a sequence label. Once you have drawn a phylogenetic tree, save it as the newick format. The newick format is a text format, and this zZ-flanked label (e.g. zZCP001758zZ) can be replaced by EzEditor. To replace zZ-labels, go to the menu [Phylogeny -> Replace accessions in a newick file. Remember that only zZ-labels can be replaced. Since you can change labels in a newick file from single zZ-labeled newick file, it can be handy. View here for more details about the zZ-format.

(2) Output Sequence Type

  • Aligned sequences: Export sequences as aligned
  • Unaligned sequences: Export sequences without gaps. You need to start from multiple alignment in the external software tools. Some programs have that function (e.g. MEGA)
  • Skip third base of codon: In the case of protein-coding sequences, you may check this to exclude all third bases of codons. It is well known these positions have less evolutionary pressure. Excluding them will give you better analysis when rapidly evolving genes.
(3) Column Selection
  • All column: All alignment columns (positions) are included for the export.
  • All columns without gaps: Only columns where all sequences have bases/amino acids are selected.
  • Columns with at least x% bases: This is a filter you can set up. A popular choice is the 50% filter which automatically selects reasonable final dataset for phylogenetic analysis.

For protein-coding sequences, the program will ask you if you want to export DNA or translated amino acid sequences. Choose one for your needs. Obviously, if you selected “Skip third base of codon” option, you will use only DNA sequences.

Save the exported FASTA file and use for the subsequent phylogenetic analyses.

Phylogenetic analysis using MEGA program

MEGA is an all-in-one software tool with tree viewer. EzEditor2 will launch MEGA with exported data file automatically. This will provide you with seamless workflow for multiple alignment and phylogenetic analysis.
  1. Start MEGA program by selecting the menu [Phylogeny -> Run MEGA program] or clicking the MEGA button.
  2. Choose options of “Export” as mentioned above.
  3. Once MEGA program has launched, please consult the manual of the software.

If MEGA program does start, check if EzEditor knows where it is by going to the menu [File -> Option]

Phylogenetic analysis using RAxML program

RAxML is probably the software for inferring maximum likelihood tree. The detailed usage can found here. The below is the standard commands used for the 16S sequences.

(uses 20 cores, 100 replications for bootstrap)

raxmlHPC-PTHREADS -T 20 -m GTRCAT -s ../test.fasta -p 12345 -n T1 -#20

raxmlHPC-PTHREADS -T 20 -m GTRCAT -s ../test.fasta -p 12345 -n T2 -b 12345 -#100

raxmlHPC-PTHREADS -m GTRCAT -p 12345 -f b -t RAxML_bestTree.T1 -z RAxML_bootstrap.T2 -nT3
After running this, the notable resulting files are:
  • RAxML_bestTree.T1: this is the Newick file for the best ML tree.
  • RAxML_bipartitionsBranchLabels.T3 : this file is the same as above with the branches labeled with bootstrap supports.

Go back to the content of tutorials

Last modified on January 12, 2017 (JC)