[Tutorial] Identifying a bacterial strain using a genome sequence

[Tutorial] Comparison of respiratory tract microbiomes between healthy and diseased humans
[Tutorial] How to browse individual Human Microbiome Project (HMP) data

[Tutorial] Identifying a bacterial strain using a genome sequence


The purpose of this tutorial is to get familiar with EzBioCloud’s many functions, and to be able to apply that knowledge to a scenario in which you have obtained a genome sequence, and its bacterial strain needs to be identified. Through this tutorial, you will learn how to identify a strain using its genome sequence. To start off, let’s say you isolated the strain Ag2 from the midgut of mosquito’s gut (Anopheles gambiae) and its genome sequence was obtained. Download the genome sequence of Ag2 from here.

First, you need to obtain the 16S sequence from the genome sequence. We will use the ConEst16S tool which will extract 16S gene(s) and check if you sequence is contaminated.

To do this,

  1. Go to the ConTest16S tool’s website here.
  2. Under “Check your genome for contamination”, click the [Upload FASTA] button and upload the Ag2 genome that you downloaded. By the way, Ag2 is a bacterial strain.
  3. Once the file is done uploading, click [Run ContEst16S].

After a minute or less, you should be able to see the ContEst16S’s result, which consists of its decision of whether the genome is contaminated, and also the 16S sequence (extracted from genome sequence that you uploaded).

Next, you will need to use EzBioCloud’s “Identify” function to search against type strain DB of 16S sequences. To do this, copy and paste the 16S sequence you obtained in the previous task. Name it “Ag2” and press [Submit] to let the tool run. Click on the magnifying glass button, and a page will pop up with details about the sequence, as well as a list of hits from the EzBioCloud 16S database. You should expect to have two hits with over 98.7% sequence similarity (This may change if we have more species in the database). See here for more information of how sequence similarity is calculated. We will only consider species showing 98.7% or higher similarity to your query sequence(=Ag2). Anything below will show 95% or less average nucleotide identity (ANI). The threshold for species is 95~96% ANI. 16S is used as a means of reducing the number of species whose genome sequences are required to compare for calculating ANI. Remember that even two strains with identical 16S sequences can show >95% ANI, meaning they belong to the different species.

Now, the isolate’s genome is required to compare with the aforementioned two species that have the highest sequence similarity (see screenshot below). 

To compare the isolate’s genome with the two species, you will need to calculate the OrthoANI (a kind of ANI) between Ag2 and the type strains of the two species (Acinetobacter bereziniae, Acinetobacter guillouiae). Only type strains have a significance in taxonomy.

To do this, you will need to download the genome sequences of type strains.

  1. From the pop-up page with the list of hits, click on Acinetobacter bereziniae.
  2. From the resulting page, click [Genome] from under the “Data Type Count”
  3. Then, click [Browse] on the “type strain”. You should now be on a page describing the strain and the genome.
  4. Click on the [Download] tab on the far right.
  5. Finally, click on [Download] under Contigs, in which sequences of contigs are provided as FASTA format.

Repeat this process for Acinetobacter guillouiae as well.

Now that you have the two genomes, upload the genome in to the ANI calculator on EzBioCloud.net (->TOOLS->ANI Calculator). You will need to upload the isolate’s genome fasta file as genome sequence A, and the genome of a type strain of one of the species as genome sequence B. Press calculate, and note down the resulting OrthoANI value. Repeat the process for the type strain of the other species. The OrthoANI cutoff for species boundary is a 95~96% similarity.  Finally, using the ANI calculator, a precise identification can be made.

You should end up with an OrthoANI value of 98.32% for the isolate compared to Acinetobacter bereziniae and an OrthoANI value of 82.53% for the isolate compared to Acinetobacter guillouiae. Please note that our isolate showed very high and similar 16S similarities, but very different OrthoANI values. The former represents the evolution of a single gene, the latter represents that of whole genome. Just think about why they are so different.

So, Ag2 belongs to the species Acinetobacter bereziniae with 98.32% ANI.

Now that you have finished this tutorial, you are now able to use the ConEst16 tool to test for contamination of a genome sequence and extract the 16S sequences from a genome, as well as use the ANI calculator. Both are quick and simple ways you can identify a bacterial strain given a genome sequence. Most of all, it is the best method and most accurate method to identify a bacterial strain.

More readings


This tutorial was prepared by Suyeon Hong (Yale Univ)/JC.

Last updated on Sept 16th, 2017.