A microbiota is the entire collection of microorganisms in a specific niche, such as the human gut or soil. The microbiome is comprised of all of the genetic material within a microbiota. The most important methodology for studying microbiomes is metagenomics which involves the massive sequencing of DNA followed by sophisticated bioinformatics.
The goals of microbiome research are to understand (i) who are the inhabitants (ii) what they do and (iii) how they do it.
To achieve these goals, we need to get the taxonomic and functional profiles of microbiome samples, then group and compare them to understand differences. For example, we could identify bacterial species responsible for causing obesity by comparing taxonomic profiles between the groups of healthy and obese subjects. High throughput DNA sequencing provides an accurate and efficient way to obtain these profiles.
The above figure summarizes the major steps in microbiome studies. The process of bioinformatics can be divided into two steps: primary and secondary analyses.
Primary analysis in microbiome bioinformatics
In this step, NGS reads in large volume are turned into light-weight profiles. For example, if 100,000 16S NGS reads match to the sequence of Vibrio cholerae type strain, the final profile will only store only the count information, i.e., 100,000, of V. cholerae, not the raw sequence data. Similarly, NGS sequences matched to a certain functional ortholog group, e.g., K00076 involved in secondary bile acids biosynthesis, will be stored in the functional profile with only the count information. A series of software tools, called a pipeline, is used to process raw NGS data in order to generate taxonomic or functional profiles.
The most popular method of generating microbiome profiles is by sequencing amplicons of phylogenetic markers. 16S and ITS are the choice of markers genes for Bacteria and fungi, respectively. It is both cheap and sufficient to capture the taxonomic structure of microbiome samples. The drawback is that only taxonomic profiles can be obtained. To obtain functional profiles, shotgun sequencing should be used. There is a way of predicting functional profiles from taxonomic profiles (See Langille et al., 2013), but the accuracy cannot be guaranteed. The following table illustrates the pros and cons of amplicon and shotgun metagenomics.
|Amplicon sequencing||Shotgun sequencing|
|NGS Platforms||Illumina, PacBio, Ion Torrent||Illumina, Ion Proton|
|Reference database||16S database, most known species have been sampled||Genome database, <50% of known species have been sampled|
|Resolution||Species or taxonomic group||Subspecies, if reference database is available|
|Output||Taxonomic profiles||Taxonomic/Functional Profiles|
|Limitation||Low taxonomic resolution.
No functional interpretation.
|Taxonomic profiling may be wrong if reference genome database has low taxonomic coverage (See Tessler et al., 2017)|
Secondary analysis in microbiome bioinformatics
Once taxonomic or functional profiles of microbiome samples have been generated, using a pipeline and reference databases, they can then be compared to see differentially present taxa or functional units. Functional units may be orthologous groups or pathways, called biomarkers. We call it secondary analysis as the sets of profiles can easily be swapped or changed out for a new analysis. Because profiles are light-weight, most secondary analyses can be run instantly or within a reasonably short time (e.g. <20 sec).
A web-based secondary analysis platform is a powerful tool, as it enables an instant and interactive process for biomarker discovery. BIOiPLUG cloud is designed to provide an optimized means of secondary analysis with versatile visualizations and publication-ready reports.
The BIOiPLUG team / Last edited on Feb. 19, 2018