Genome annotation

Original/User’s Label of Genome
05/15/2017
Genome Map
05/15/2017

Genome annotation

The analysis of all bacterial genome starts with genome annotation.  This process can be divided into two steps: Gene-finding step and Functional Annotation step.

“Gene-finding” step uses genome sequences to find the various patterns of gene’s start and end location, and “Functional Annotation” step finds and annotates the function of each gene through sequence search. The analysis results obtained between researchers can vary slightly due to the different software, database, and parameters they used, but there is no big difference between our pipeline and pipelines used by other database since we use the most common pipeline in academia. In EzBioCloud, for all genomes, the following software and database are used to perform genome annotation and also comparative genomics. As of Sept 2017, more than 90,000 genomes were annotated using the following method and is provided through www.ezbiocloud.net.

 

 

More detailed information:

Pipeline Steps Run Description
Finding tRNA genes Program: tRNA-scan version 1.3.1

Run Parameter: tRNA-scan-SE –bact [Fasta File]

Finding rRNA genes Program: INFERNAL version 1.0.2 (cmsearch)

Database: rfam 12.0

Run Parameter: -E 1.0E-5 -Z 700 –noali  rfam12.0/rRNA_bact.cm [Fasta File]

Finding CRISPR Program: PilerCR version 1.06

Run Parameter: pilercr -in [Fasta File] -out [Output File]

Program: CRT version 1.2

Run Parameter: java -cp CRT1.2-CLI.jar crt [Input Fasta File]

Finding ncRNA Program: INFERNAL version 1.0.2 (cmsearch)

Database: Rfam 12.0

Run Parameter: cmsearch -E 1.0E-5 -Z 700 –noali  rfam12.0/RNase_bact.cm [Fasta File]

Run Parameter: -E 1.0E-5 -Z 700 –noali  rfam12.0/Gene_bact.cm [Fasta File]

Finding CDS Program: PRODIGAL version 2.6.2

Run Parameter: -i [Input Fasta File] -o [Output GFF File] -f gff -m -c -g 11 -a [Output Protein Fasta File]

Functional annotation Program: usearch 64bit version 8.0.1517

Database:

-KEGG version (Date: 2015.12.10)
-eggnog version 4.1
-swissprot (Date: 2015.12.10)
-SEED subsystems (Date: 2015.12.10)

Run Parameter: -ublast [Input Fasta File] -db [DB File] -maxaccepts 1 -evalue 1.0E-5 -accel 1.0 -ka_dbsize 700000000 -alnout [Output File]

Last updated on Sept 10, 2017 (JC)