Gene frequency plot in pan-genome

OrthoANI genomic similarity
05/15/2017
Overall genome relatedness index (OGRI)
05/15/2017

Gene frequency plot in pan-genome

All potential orthologous protein-coding  genes (=CDSs) are clustered into non-redundant gene sets after pan-genome calculation to generate “Pan-genome Orthologous Groups (POGs)”.

Obviously, a core part of the genome containing essential or house-keeping genes are found within all genomes, and less important genes are found less frequently. Some genes are detected only in a single genome. A “gene frequency plot” gives a general overview of the frequency of genes within a whole genome set. A typical plot will show a U-shaped plot where most genes are detected either as all genomes or a single genome.

In the following example, 31 Vibrio vulnificus genomes were analyzed from which 13,220 non-redundant genes were found from a total of 144,931 genes. Please note that the below figure shows the number of genes in log-scale.

The below figure visualizes the same data as the above except that the genes are classified into 4 major functional categories, and the number of genes that are displayed are NOT in log-scale.

Here are more pan-genome examples of other species. Can you tell how and why the shapes of the charts are different?

  • Chlamydia psittaci (obligate parasitic bacterium with small genome)

  • Corynebacterium diphtheriae (Pathogenic bacterium belonging to the phylum Actinobacteria)

This type of figures has been used in many publications including:

  1. Lefebure, T. & Stanhope, M.J. Evolution of the core and pan-genome of Streptococcus: positive selection, recombination, and genome composition. Genome Biol 8, R71 (2007).
  2. Touchon, M. et al. Organised genome dynamics in the Escherichia coli species results in highly diverse adaptive paths. PLoS Genet 5, e1000344 (2009).

Updated on April 28th, 2016 (EK)