TrueBac ID Demonstrations

TrueBac ID has been designed to definitively identify bacteria using whole genome sequence data. Here we have run TrueBac ID on some publicly available data to highlight the accuracy of the system. Because taxonomy and the TrueBac Reference Database are constantly being updated, the data presented here are those at the time of analysis.

Case #1: NCBI bacterial genome database

Input dataset contains 99,078 bacterial genomes from pure cultures (excluding metagenome and single cell assemblies). Contaminated genomes were also excluded by the ContEst16S tool. All identification results of TrueBac ID are provided at www.ezbiocloud.net. The identification was carried out on May 15, 2018.

Identification of NCBI genomes using TrueBac ID-Genome

Case #2: An unbiased collection of clinical isolates

A team at the University of Washington Medical Center published the genome data of >1,200 bacterial strains isolated from an Intensive Care Unit for a year (Roach et al., 2015; PLOS Genetics 11:e1005413). TrueBac ID was used to re-analyze the same dataset. The identification was carried out on May 15, 2018.

Identification of clinical isolates from ICU for a year using TrueBac ID-Genome

 

Detailed identification results are available here.

Case #3: Accurate identification of a gut bacterium fails using MALDI-TOF and other conventional methods

A team at Harvard University isolated a potential therapeutics strain from human gut. This strain could not be identified by MALDI-TOF or other conventional methods, so it was tentatively proposed as a novel species ‘Clostridium immunis‘ (Nature 2017; 14;552(7684):244-247) . TrueBac ID successfully identified this strain as Clostridium symbiosum by 98.48% Average Nucleotide Identity (ANI).

Screenshot of TrueBac ID. Genome sequence is available at NCBI (GCA_002814175.1).

 

 

 


Categories of TrueBac ID results against the original species/subspecies designations in the database or publications

Further identified as the species level:

The original name of the genome has the correct genus name but does not contain specific epithet. In this example, Sulfitobacter sp. NAS-14.1 (GCA_000152645.1) is identified as Sulfitobacter pontiacus.

Screenshot of TrueBac ID result/GCA_000152645.1

Further identified as the subspecies level:

The original name of the genome has the correct species name but does not contain subspecies name. In this example, Pasteurella multocida FDAARGOS_384 (GCF_002393385.1) is further identified to Pasteurella multocida subsp. septica.

Screenshot of TrueBac ID result/GCF_002393385.1

Identified as a genomospecies:

Genomospecies is a potentially novel species and tentatively named in EzBioCloud/TrueBac databases [Learn more]. Actinomyces odontolyticus ATCC 17982 (GCF_000154225.1) is not a strain of Actinomyces odontolyticus but represents a novel species which we named a genomospecies (DS264586_s).

Screenshot of TrueBac ID result/GCF_000154225.1

 

Misidentified:

Bacillus cereus ATCC 10987 (GCA_000008005.1) is identified as Bacillus pacificus with 99.84% ANI. It is clearly not a strain of Bacillus cereus as ANI value to B. cereus type strain is only 91.96%.

Screenshot of TrueBac ID result/GCA_000008005.1

Not identified at the species level:

Because either it is a novel species or there is no sufficient reference genome data, the genome cannot be identified to a known species with confidence. Haemophilus parainfluenzae strain 1209_HPAR (GCA_001053035.1) is identified as a novel species. In this example, the closest known species is Haemophilus parainfluenzae.

Screenshot of TrueBac ID result/GCA_001053035.1


The BIOiPLUG team / Last edited on May 22, 2018