| Literature DB >> 22363326 |
Abstract
Lifestyle adaptation of microbes due to changes in their ecological niches or acquisition of new environments is a major driving force for genetic changes in their respective genomes. Moving into more specialized niches often results in the acquisition of new gene sets via horizontal gene transfer to utilize previously unavailable metabolites, while genetic ballast is shed by gene loss and/or gene inactivation. In some cases, larger genome rearrangements can be observed, such as the incorporation of whole genetic islands, providing a range of new phenotypic capabilities. Until recently these changes could not be comprehensively followed and identified due to the lack of complete microbial genome sequences. The advent of high-throughput DNA sequencing has dramatically changed the scientific landscape and today microbial genomes have become increasingly abundant. Currently, more than 2,900 genomes are published and more than 11,000 genome projects are listed in the Genomes Online Database. Although this wealth of information provides many new opportunities to assess microbial functionality, it also creates a new array of challenges when a comparison between multiple microbial genomes is required. Here, functional genome distribution (FGD) is introduced, analyzing the diversity between microbes based on their predicted ORFeome. FGD is therefore a comparative genomics approach, emphasizing the assessments of gene complements. To further facilitate the comparison between two or more genomes, degrees of amino-acid similarities between ORFeomes can be visualized in the Artemis comparison tool, graphically depicting small and large scale genome rearrangements, insertion and deletion events, and levels of similarity between individual open reading frames. FGD provides a new tool for comparative microbial genomics and the interpretation of differences in the genetic makeup of bacteria.Entities:
Keywords: functional genomics; genome comparison; genome plasticity; horizontal gene transfer; lifestyle adaption
Year: 2012 PMID: 22363326 PMCID: PMC3282942 DOI: 10.3389/fmicb.2012.00048
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Figure 1Graphical workflow of the compACTor software. Schematic representation of the algorithm structure. Symbols used are consistent with standard flowchart icons. Abbreviations: DB, database; pMSPs, ORFeome based MSPcrunch comparison format (Sonnhammer and Durbin, 1994) data files. Oval symbol: data pools, diamond symbol: internal decision points, square boxes: internal processes and functions, hourglass symbol: central parsing algorithm, Multidocument symbol: external flatfile databases created by compACTor.
e-Value range based trust levels.
| e-Value range | Trust value |
|---|---|
| <0.1 | 0 |
| 0.1 ≤ e-value > 1e−10 | 1 |
| 1e−10 ≤ e-value > 1e−40 | 10 |
| 1e−40 ≤ e-value > 1e−50 | 20 |
| 1e−50 ≤ e-value > 1e−60 | 30 |
| 1e−60 ≤ e-value > 1e−70 | 40 |
| 1e−70 ≤ e-value > 1e−80 | 50 |
| 1e−80 ≤ e-value > 1e−90 | 60 |
| 1e−90 ≤ e-value > 1e−100 | 70 |
| 1e−100 ≤ e-value > 1e−110 | 80 |
| 1e−110 ≤ e-value > 1e−120 | 85 |
| 1e−120 ≤ e-value > 1e−130 | 90 |
| 1e−130 ≤ e-value > 1e−160 | 95 |
| 1e−160 ≤ e-value ≥ 0 | 100 |
Figure 2Functional genome distribution of 39 taxa. Publicly available complete genomes were downloaded in GenBank format from the NCBI genome database. Publicly available draft phase genomes were downloaded in FASTA format, concatenated using a universal spacer-stop-spacer sequence and automatically annotated using GAMOLA (Altermann and Klaenhammer, 2003). The in-house draft phase genome of Butyrivibrio proteoclasticus was assembled into an artificial genome and annotated using GAMOLA (publication in preparation). Predicted ORFeomes of all genomes were subjected to an FGD analysis and the resulting distance matrix was imported into MEGA4. The functional distribution was visualized using the UPGMA method (Sneath and Sokal, 1962). The optimal tree with the sum of branch length = 133.1 is shown. The tree is drawn to scale, with branch lengths in the same units as those of the functional distances used to infer the distribution tree.
Genomes used for assessment of functional genome distribution.
| Designation | Domain/family | Genome size [bp] | ORFeome size | Accession number |
|---|---|---|---|---|
| Lactobacillus plantarum WCFS1 | Bacteria/Lactobacillaceae | 3308274 | 3051 | AL935263 |
| Lactobacillus brevis ATCC 367 | Bacteria/Lactobacillaceae | 2291220 | 2314 | CP000416 |
| Pediococcus pentosaceus ATCC 25745 | Bacteria/Lactobacillaceae | 1832387 | 1847 | NC_008525 |
| Lactobacillus sakei subsp. sakei 23K | Bacteria/Lactobacillaceae | 1884661 | 1886 | CR936503 |
| Lactobacillus casei ATCC 334 | Bacteria/Lactobacillaceae | 2895264 | 2909 | CP000423 |
| Lactobacillus salivarius UCC118 | Bacteria/Lactobacillaceae | 1827111 | 1738 | CP000233 |
| Lactobacillus reuteri F275 | Bacteria/Lactobacillaceae | 1999618 | 1944 | CP000705 |
| Lactobacillus johnsonii NCC 533 | Bacteria/Lactobacillaceae | 1992676 | 1857 | AE017198 |
| Lactobacillus gasseri ATCC 33323 | Bacteria/Lactobacillaceae | 1894360 | 1811 | CP000413 |
| Lactobacillus acidophilus NCFM | Bacteria/Lactobacillaceae | 1993561 | 1979 | CP000033 |
| Lactobacillus delbrueckii subsp. bulgaricus ATCC 11842 | Bacteria/Lactobacillaceae | 1864998 | 2218 | CR954253 |
| Lactobacillus delbrueckii subsp. bulgaricus ATCC BAA-365 | Bacteria/Lactobacillaceae | 1856951 | 2040 | CP000412 |
| Bacillus cereus ATCC 14579 | Bacteria/Bacillaceae | 5411809 | 5490 | AE016877.1 |
| Bacillus thuringiensis serovar konkukian str. 97-27 | Bacteria/Bacillaceae | 5237682 | 5168 | AE017355 |
| Bacillus pumilus SAFR-032 | Bacteria/Bacillaceae | 3704465 | 3737 | CP000813 |
| Bacillus licheniformis ATCC 14580 | Bacteria/Bacillaceae | 4222645 | 4379 | AE017333.1 |
| Bacillus subtilis subsp. subtilis 168 | Bacteria/Bacillaceae | 4214630 | 4106 | AL009126 |
| Streptococcus pneumoniae R6 | Bacteria/Streptococcaceae | 2038615 | 2046 | NC_003098 |
| Streptococcus pyogenes SSI-1 | Bacteria/Streptococcaceae | 1894275 | 1861 | BA000034 |
| Streptococcus thermophilus LMD-9 | Bacteria/Streptococcaceae | 1856368 | 2003 | CP000419 |
| Leuconostoc mesenteroides subsp. mesenteroides ATCC 8293 | Bacteria/Leuconostocaceae | 2038396 | 2073 | NC_008531 |
| Oenococcus oeni PSU-1 | Bacteria/Leuconostocaceae | 1780517 | 1864 | NC_008528 |
| Clostridium perfringens ATCC 13124 | Bacteria/Clostridiaceae | 3256683 | 2997 | CP000246 |
| Clostridium botulinum ATCC 3502 | Bacteria/Clostridiaceae | 3886916 | 3648 | AM412317 |
| Clostridium kluyveri DSM 555 | Bacteria/Clostridiaceae | 3964618 | 3926 | CP000673 |
| Clostridium difficile ATCC 9689 | Bacteria/Clostridiaceae | 4290252 | 3680 | AM180355 |
| Ruminococcus obeum ATCC 29174 | Bacteria/Lachnospiraceae | 3626304 | 4175 | AAVO00000000; draft |
| Anaerostipes caccae L1-92 | Bacteria/Lachnospiraceae | 1691947 | 1582 | ABAX00000000; draft |
| Ruminococcus gnavus ATCC 29149 | Bacteria/Lachnospiraceae | 3501953 | 3913 | AAYG00000000; draft |
| Butyrivibrio proteoclasticus B316 | Bacteria/Lachnospiraceae | 3936787 | 3477 | Unpublished draft |
| Pseudomonas aeruginosa PA7 | Bacteria/Pseudomonadaceae | 6588339 | 6371 | CP000744 |
| Escherichia coli O157:H7 | Bacteria/Enterobacteriaceae | 5528445 | 6006 | AE005174 |
| Escherichia coli K12 K-12 | Bacteria/Enterobacteriaceae | 4639675 | 4403 | NC_000913 |
| Methanococcoides burtonii DSM 6242 | Archaea/Methanosarcinaceae | 2575032 | 2446 | NC_007955 |
| Methanosarcina mazei Goe1 | Archaea/Methanosarcinaceae | 4096345 | 3371 | NC_003901 |
| Methanocaldococcus jannaschii DSM 2661 | Archaea/Methanocaldococcaceae | 1664970 | 1682 | NC_000909 |
| Methanosphaera stadtmanae DSM 3091 | Archaea/Methanobacteriaceae | 1767403 | 1588 | NC_007681 |
| Methanobrevibacter smithii ATCC 35061 | Archaea/Methanobacteriaceae | 1853160 | 1795 | NC_009515 |
| Methanothermobacter thermautotrophicus DeltaH | Archaea/Methanobacteriaceae | 1751377 | 1918 | NC_000916 |
*Color coding used in the table corresponds to the color scheme shown in Figure .
Figure A1Functional distribution tree of 17 closely related taxa within the class Bacilli. The tree represents a subset of the one shown in Figure 2. Predicted ORFeomes of all genomes were subjected to an FGD analysis and the resulting distance matrix was imported into MEGA4. The functional distribution was visualized using the UPGMA method (32). The tree is drawn to scale, with branch lengths in the same units as those of the functional distances used to infer the distribution tree.
Genomes used for assessment of functional genome distribution on strain level.
| Designation* | Serotype | Host | Genome status | Accession number |
|---|---|---|---|---|
| Chlamydia trachomatis A2497 | A | Human | Complete | 347974781 |
| Chlamydia trachomatis A HAR-13 | A | Human | Complete | 76788711 |
| Chlamydia trachomatis B TZ1A828 | B | Human | Complete | 231272648 |
| OT Chlamydia trachomatis B Jali20 OT | B | Human | Complete | 231273667 |
| Chlamydia trachomatis D UW3 CX | D | Human | Complete | 15604717 |
| Chlamydia trachomatis D-LC | D | Human | Complete | 297749010 |
| Chlamydia trachomatis D-EC | D | Human | Complete | 297748130 |
| Chlamydia trachomatis Ds2923 | D | Human | Complete | 222356764 |
| Chlamydia trachomatis E Sweden2 | E | Human | Complete | 289525045 |
| Chlamydia trachomatis E 150 | E | Human | Complete | 296434583 |
| Chlamydia trachomatis E 11023 | E | Human | Complete | 296438301 |
| Chlamydia trachomatis F 70 | F | Human | Complete | 222444350 |
| Chlamydia trachomatis F 70s | F | Human | Complete | 222444349 |
| Chlamydia trachomatis G 11074 | G | Human | Complete | 296437374 |
| Chlamydia trachomatis G 9301 | G | Human | Complete | 297139873 |
| Chlamydia trachomatis G 9768 | G | Human | Complete | 296435514 |
| Chlamydia trachomatis G 11222 | G | Human | Complete | 296436438 |
| Chlamydia trachomatis J 6276 | J | Human | Complete | 222444352 |
| Chlamydia trachomatis J 6276s | J | Human | Complete | 222444351 |
| Chlamydia trachomatis L2-434 Bu | L | Human | Complete | 166153973 |
| Chlamydia trachomatis L2b UCH1 proctitis | L | Human | Complete | 352951305 |
| Chlamydia trachomatis L2c | L | Human | Complete | 339625373 |
| Chlamydia trachomatis L2tet1 | L | Human | Complete | 301334996 |
| Chlamydia muridarum MopnTet14 draft | Muridae | Complete | 311788820 | |
| Chlamydia muridarum Nigg | Muridae | Complete | 29337300 | |
| Chlamydia muridarum Weiss.cb | Muridae | Draft | NC_002620.2 | |
| Chlamydia pneumoniae | Varied | Complete | 340215159 |
Figure A2A functional distribution tree comprising of 23 . Entries in red depict Chlamydia trachomatis serotypes A–C (trachoma), entries in black represent serotypes D–K (sexually transmitted pathovars) and entries in green show serotype LGV (L1–L3; lymphogranuloma venereum). Chlamydia muridarum entries are shown in blue and Chalmydia pneumoniae is depicted in gray. Functional clusters and subclusters are indicated by square brackets.
Figure 3ORFeome based comparative ACT visualization of 11 . Based on the distribution observed in Figure 2, 11 Lactobacillus genomes and their ORFeome similarities were visualized in ACT using pMSP-datafiles. Respective genome designations are indicated on the left hand side of each genome line. Genomes are shown in full and drawn to scale. Genomic nucleotide sequences are represented by gray lines indicating sense and anti-sense strands and position markers are shown in between. Predicted ORFs are shown on each strand in their respective orientation as arrowed boxes. Direct amino-acid similarity between individual ORFs of neighboring genomes are shown as red lines, inverted similarities are indicated by blue lines. Color shadings indicate the level of similarity, the more saturated a similarity line the more conserved are two ORF-pairs. A trust level value of 40 was employed as display threshold to visualize similarity hits below an e-value of 1e-60.