Browsing a single MTP
Here, we will go through how to browse and get the necessary information about a single Microbiome Taxonomic Profile (MTP). We will use example data which was generated using a whole run of Illumina MiSeq 250 bp paired-end sequencing. The sample is from Dr. Jon Jongsik Chun who is the founder and CEO of ChunLab, Inc. This data was analyzed using the BIOiPLUG Microbiome Taxonomic Profiling (MTP) cloud service and can be browsed freely, without a login at https://www.bioiplug.com/mtpsb/view_browseMtpInfo?id=G001S1&from=guest.
This web-page consists of several tabs each with different aspects of a single MTP sample.
[About MTP/sample] – tab
- This is your sample/MTP name. It is entered when you upload the NGS raw data to the BIOiPLUG cloud service and can be edited later.
- Sample information is organized by metadata tags in BIOiPLUG. Apply tags according to your specific needs. Tags can be a great tool when grouping samples/MTPs into sets for subsequent comparisons and secondary analysis
- This is a memo field where you can store comments.
- Target taxon can be one of [All], [Bacteria] and [Archaea]. It should be decided when the data is uploaded to the cloud. In this example, we used the bacterial primer set but the target taxon was set [All] to see if any other organisms belonging to Archaea and Eukarya.
- The BIOiPLUG MTP pipeline can be run using different versions of reference database. PKSSU3.0 stands for EzBioCloud’s prokaryotic small subunit (16S) rRNA database 3.0 which was released in Feb 2018.
- To edit an MTP name and/or the memo field, click the [Edit] button.
- To download the taxonomic profile of this MTP, click the [Profile] button. A profile file contains full information about read-counts for each taxon (from phyla to species). We also provide the copy-number-corrected counts [Learn more].
About read counts
- This indicates the number of sequencing reads from the uploaded raw NGS data, minus the reads that do not overlap in paired-end sequencing.
- Several quality measures are applied to filter out low-quality [Learn more], non-target, and chimeric reads [Learn more]. Only the remaining sequencing reads, called “Valid reads” are used for the subsequent microbiome analyses.
- Removed sequences are further classified here. You may note that >2.5M reads were detected as chimeric amplicons, which may surprise you. This high level of chimera detection is likely due to the fact that our non-chimeric reference database covered better for this particular sample. In the latest version, we added >2,000 full-length high-quality 16S sequences derived from human and mouse gut microbiomes.
About read lengths
In this section, statistics (min, max, and average) about the lengths of valid reads are given.
About taxonomic assignment
- This is the percent of quality-controlled sequencing reads that were assigned at the species level. In this case, >6.2 million reads were assigned at the species level, or 97.6%. Using the EzBioCloud 16S database, the taxonomic coverage of human microbiome samples ranges from 95 to 98% [Learn more].
- This is the number of species that were actually detected in this MTP. Because very deep sequencing has been done on this sample, we have found a relatively high number of species. If the same sample is sequenced around 10,000 reads, this would be less than 500.
[Alpha-Diversity] – tab
- OTU-picking method used. CL_OPEN_REF_UCLUST_MC2 is an open-reference method in which de novo clustering is carried out using the UCLUST program and single-membered de novo clusters are ignored [Learn more].
- Cutoff used for taxonomic assignment at the species level and for de novo clustering.
- The number of OTUs found=the number of species and OTUs from de novo clustering. Because 97.6% of reads were assigned to 857 species, most of the remaining 2.4% of reads constitute >28,000 OTUs. Even though many single-membered OTUs were discarded, it seems that there is an over-estimate of OTUs probably due to sequencing errors. Just because there are two or more identical reads in this deep-sequenced data, does not guarantee that the sequences are real.
- Extreme deep-sequencing with 64M reads captured almost all of the species diversity in this fecal sample, resulting in 100% Good’s coverage of library.
Various alpha-diversity indices can be used to explain biodiversity of an MTP:
- Species richness indices (ACE, Chao1, Jackknife) try to estimate the number of species/OTUs in an MTP.
- Diversity indices (Shannon, Simpson, NPShannon, Phylogenetic Diversity) are mathematical measurements of species diversity or evenness in an MTP. LCI, low confidence interval; HCI, high confidence interval.
[Taxonomic hierarchy] – tab
On this tab, you can do the following:
- Browse the taxonomic structure of an MTP in a hierarchical manner
- Download sequences (as FASTA format) or copy individual sequences to the Clip Board
Let’s explore data to see what species of the genus Faecalibacterium are present in this MTP. Faecalibacterium is one of the most important human gut taxa and is thought to be beneficial, as it produces short chain fatty acids from dietary fibers.
Click here to view its taxonomic hierarchy.
In [Taxonomic hierarchy] tab, select and expand Firmicutes → Clostridia → Clostridiales → Ruminococcaceae → Faecalibacterium.
- Click “Faecalibacterium prausnitzii group” to reveal the other species and phylotypes that are included in this taxonomic group. Species and phylotypes that are indistinguishable from each other are classified into taxonomic groups in the BIOiPLUG cloud service [Learn more].  indicates the number of sequencing reads assigned in this taxon.
- Species/phylotypes included in the “Faecalibacterium prausnitzii group” are listed here. Click the name to open the webpage with its taxonomic information. Note that FP929045_s, NMTZ_s, and GG697149_s are phylotypes that were supported by genomic evidence [Learn more].
- GL538271_s is the second largest member in this sample. It is a phylotype whose genome sequence is available [Learn more]. It has a 16S gene that is sufficiently different from “Faecalibacterium prausnitzii group”.
- The third most abundant species, PAC001430_s, is represented by a full-length PacBio sequence.
- To download all sequences in the selected taxa, click here. This function is not supported in the guest mode and free-trials.
- To expand all taxa at once, select the taxonomic rank and click [Expand].
Formally, the genus Faecalibacterium has only one species, Faecalibacterium prausnitzii, with a valid name. Here, the combination of a taxonomically validated EzBioCloud database and a sensitive taxonomic assignment algorithm allow the elucidation of detailed profiling at the species level. Because many identified phylotypes are represented by whole genome sequences, a further in-depth functional investigation is possible using comparative genomics.
[Taxonomic composition] – tab
On this tab, taxonomic compositions at various ranks (from phylum to species) are given as pie charts and tables. The charts and tables can be exported or downloaded for immediate use in reports and publications.
In the species composition table, use the <filter> to quickly search for abundances of taxa that you are interested in. For example, entering “Bacteroides” will show you all species that include the term “Bacteroides” (See the below screenshot). Please note that this will not reveal the phylotypes in the genus Bacteroides.
[Selected taxa] – tab
In this tab, you can explore the abundances of any taxa. We also provide several predefined taxa (subject to change):
- Lactic acid bacteria (LAB): This term is not a taxonomic one, as it refers any bacteria that are capable of producing lactic acids. Traditionally, LAB is used for the probiotic strains which are now classified in the genera Lactobacillus, Leuconostoc, Lactococcus, Weissella, and Bifidobacterium.
- Firmicutes to Bacteroidetes ratio: Firmicutes and Bacteroidetes are two major phyla in human gut microbiota. Firmicutes to Bacteroidetes ratio (F/B) has been used as a biomarker indicating the healthy state of a person. The F/B has been shown to be correlated with obesity in many studies.
- Human gut taxa: Several taxa at various ranks are predefined for human microbiome study.
- Select [Human gut taxa]
- [F] indicates the taxonomic rank, “family”, of Ruminococcaceae. Similarly, [P] for phylum, [C] for class, [O] for order, [G] for genus, [S] for species.
- The bar indicates the abundance (18.56%).
The abundance of any taxa which are not included in these predefined lists can be found by entering the name of a taxon into [Search taxa]. For example, enter “Escherichia coli group” to get the abundance of this taxonomic group.
[Krona] – tab
Under the “Krona” tab, taxonomic compositional data are loaded onto the Krona tool, which is an open source visualization project available here. This tool is developed by Ondov et al. (2011) and provides an interactive means of exploring the data. Nice figures can be captured here, for publication or presentation purposes.
[Word Cloud] – tab
It is a quick and easy way of visualizing major taxonomic groups at any ranks (from phyla to species) for a single MTP. The Word Cloud is an image composed of words used in a particular text or subject, in which the size of each word indicates its frequency or importance. The below World Cloud images are of phyla and genera from a human gut microbiome. Obviously, the genus Faecalibacterium belongs to the phylum Firmicutes.
The BIOiPLUG team / Last edited on May 22, 2018