Content of EzBioCloud 16S database
EzBioCloud 16S database contains the following information:
- Standardized 16S rRNA gene sequence representing reference taxa
- All sequences are extracted between two most popular PCR primers (27F-1492R), so similarity calculation should be consistently carried out.
- In principle, single 16S is assigned to single reference taxon.
- Reference taxa mean
- Currently validly published taxonomic names
- Representatives of Candidatus taxa
- Unnamed phylotypes that do not belong to the above. These include 16S amplicons and genome sequences.
- Complete taxonomic hierarchy is given for all 16S sequence (from species to phylum). The hierarchy is based on the maximum likelihood phylogenetic tree of 16S with consideration of the currently accepted classification.
Source of 16S data
Since we have tried to secure the best quality of 16S sequences, the sources of 16S can vary and one of the followings:
- NCBI 16S amplicon sequences of validly published taxa: e.g., AY692362 for Adiaceo aphidicola
- NCBI 16S amplicon sequences of phylotypes: e.g., AJ290038 for AJ290038_s (phylotype corresponding species)
- 16S sequence extracted from NCBI genome assembly: e.g., CP000238 for Baumannia cicadellinicola.
- 16S sequence extracted from JGI genome assembly (this genome data may not be available in NCBI): e.g. jgi.1096475 for phylotype jgi.1096475_s in the genusGeodermatophilus.
- 16S sequence compiled from Pacific Biosciences full-length sequencing of microbiome samples. These represent high-quality 16S sequences using PacBio’s circular consensus sequencing (ccs) technology: e.g. PAC000364 for phylotype PAC000364_s.
- 16S sequence extracted from internally assembled genome data: e.g. CLG_48533 for Arthrobacter oryzae.
Consequently, not all data are available in NCBI database. However, all data are freely accessible through www.ezbiocloud.net.
Why 16S sequences from genome assemblies were used in EzBioCloud, instead of PCR
- Genome assembly is usually in better quality than PCR amplicon sequencing. Typical NGS sequencing resulted in 50X or higher sequencing depths of coverage.
- When we include genome sequence-derived 16S to EzBioCloud database, we always check the quality by manual alignment using secondary structural information. In our experience, using genome sequences we can improve the quality of 16S databases for reference purposes.
Last edited by JC on Aug. 27, 2017