| Literature DB >> 24704978 |
Kuan Yang1, Lenwood S Heath2, João C Setubal3.
Abstract
Ancestral genome reconstruction can be understood as a phylogenetic study with more details than a traditional phylogenetic tree reconstruction. We present a new computational system called REGEN for ancestral bacterial genome reconstruction at both the gene and replicon levels. REGEN reconstructs gene content, contiguous gene runs, and replicon structure for each ancestral genome. Along each branch of the phylogenetic tree, REGEN infers evolutionary events, including gene creation and deletion and replicon fission and fusion. The reconstruction can be performed by either a maximum parsimony or a maximum likelihood method. Gene content reconstruction is based on the concept of neighboring gene pairs. REGEN was designed to be used with any set of genomes that are sufficiently related, which will usually be the case for bacteria within the same taxonomic order. We evaluated REGEN using simulated genomes and genomes in the Rhizobiales order.Entities:
Year: 2012 PMID: 24704978 PMCID: PMC3899994 DOI: 10.3390/genes3030423
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Overview of all major components in REGEN.
Figure 2Number of operon gene pairs in extant and ancestral genomes. The error bars represent standard deviation.
Leave-one-out stability test result. The table shows the difference in the number of gene runs as well as percentage under different cutoffs.
| Number of fragments | 1 | 2 | ||
|---|---|---|---|---|
|
| 0.85 | 0.9 | 0.85 | 0.9 |
|
| 77 | 103 | 107 | 118 |
|
| 80 | 31 | 50 | 16 |
|
| 157 | 134 | 157 | 134 |
|
| 49.04% | 76.87% | 68.15% | 88.06% |
Figure 3Rhizobiales phylogenomic species tree. The numbers shown at each branching are bootstrap score computed by RAxML [16] based on 100 runs. All numbers are 100, which indicates that this is a robust phylogenetic tree.
Contiguous gene run reconstruction overview of the Rhizobiales group. Length of the gene runs is measured in genes. The number-of-genes column shows the total number of genes on all gene runs and the percentage column shows the coverage of the gene runs.
| Ancestor | number of gene runs | longest gene run length | number of genes | percentage |
|---|---|---|---|---|
| 11_19_21_3_1_18_2_10_6_15_9_14_8_20_16_23_4_22_12_13_7_17 | 305 | 32 | 1321 | 79.87% |
| 11_19_21_3_1_18_2_10_6_15_9_14_8_20_16_23_4_22_12_13_7 | 409 | 33 | 1716 | 85.16% |
| 21_3_1_18_2_10_6_15_9_14_8_20_16_23_4_22_12_13_7 | 457 | 33 | 1894 | 85.43% |
| 14_8_20_16_23_4_22_12_13_7 | 461 | 33 | 1869 | 82.70% |
| 21_3_1_18_2_10_6_15_9 | 510 | 49 | 2394 | 84.95% |
| 14_8_20_16_23_4_22 | 497 | 35 | 2019 | 85.23% |
| 21_3_1_18_2 | 685 | 44 | 3688 | 88.31% |
| 3_1_18_2 | 681 | 47 | 3918 | 90.03% |
| 14_8_20_16 | 509 | 47 | 2775 | 91.28% |
| 10_6_15_9 | 352 | 33 | 1561 | 77.62% |
| 12_13_7 | 333 | 17 | 1024 | 56.83% |
| 23_4_22 | 556 | 35 | 2336 | 83.01% |
| 6_15_9 | 402 | 36 | 2153 | 95.94% |
| 14_8_20 | 525 | 31 | 2513 | 74.55% |
| 14_8 | 339 | 40 | 2638 | 91.95% |
| 3_1 | 483 | 75 | 3863 | 92.70% |
| 18_2 | 589 | 136 | 5280 | 94.57% |
| 13_7 | 384 | 28 | 1272 | 67.55% |
| 23_4 | 502 | 16 | 1933 | 76.28% |
| 15_9 | 353 | 121 | 3615 | 94.63% |
| 11_19 | 260 | 31 | 821 | 60.50% |
Functional annotation of a particular reconstructed contiguous gene run in the LCA of the Rhizobiales group. Consensus column shows the number of genes that have been assigned with the corresponding annotation as well as the total number of genes in the family. Genes are sorted by the order on the chromosome.
| Gene family ID | KEGG Entry | Function class | Definition | Consensus |
|---|---|---|---|---|
| 1719 | K02387 | Cellular Processes; Cell Motility; Bacterial motility proteins, [BR:ko02035], Cellular Processes; Cell Motility; Flagellar assembly [PATH:ko02040] | flagellar basal-body rod protein FlgB | 17/17 |
| 9901747 | K02388 | Cellular Processes; Cell Motility; Bacterial motility proteins, [BR:ko02035], Cellular Processes; Cell Motility; Flagellar assembly [PATH:ko02040] | flagellar basal-body rod protein FlgC | 17/17 |
| 9901380 | K02408 | Cellular Processes; Cell Motility; Bacterial motility proteins, [BR:ko02035], Cellular Processes; Cell Motility; Flagellar assembly [PATH:ko02040] | flagellar hook-basal body complex protein FliE | 17/17 |
| 9901964 | K02392 | Cellular Processes; Cell Motility; Bacterial motility proteins, [BR:ko02035] ,Cellular Processes; Cell Motility; Flagellar assembly [PATH:ko02040] | flagellar basal-body rod protein FlgG | 17/17 |
| 1718 | K02386 | Cellular Processes; Cell Motility; Bacterial motility proteins, [BR:ko02035], Cellular Processes; Cell Motility; Flagellar assembly [PATH:ko02040] | flagella basal body P-ring formation protein FlgA | 16/17 |
| 9903288 | K02394 | Cellular Processes; Cell Motility; Bacterial motility proteins, [BR:ko02035] ,Cellular Processes; Cell Motility; Flagellar assembly [PATH:ko02040] | flagellar P-ring protein precursor FlgI | 16/17 |
| 1717 | not annotated | N/A | N/A | N/A |
| 9904536 | K02393 | Cellular Processes; Cell Motility; Bacterial motility proteins, [BR:ko02035], Cellular Processes; Cell Motility; Flagellar assembly [PATH:ko02040] | flagellar L-ring protein precursor FlgH | 16/17 |
| 1828 | K02415 | Cellular Processes; Cell Motility; Bacterial motility proteins, [BR:ko02035] | flagellar FliL protein | 16/16 |
| 9904106 | K02419 | Environmental Information Processing; Membrane Transport; Secretion system, [BR:ko02044],Cellular Processes; Cell Motility; Bacterial motility proteins, [BR:ko02035], Cellular Processes; Cell Motility; Flagellar assembly [PATH:ko02040] | flagellar biosynthetic protein FliP | 17/17 |
Figure 4A long gene run on the main chromosome split into two smaller fragments during the evolutionary path from the LCA of Agrobacterium vitis S4 and Agrobacterium tumefaciens C58 to Agrobacterium vitis S4. Each number represents a gene and the underscore represents adjacency. +/- symbols represent the gene orientation determined during the reconstruction. Some genes on both ends are omitted for simplicity.
Figure 5Overview of the complete reconstructed evolutionary history of the Rhizobiales group. Rectangles with solid edges represent input genomes and rectangles with dotted edges represent ancestral genomes. Replicons are represented as circles, with circle size proportional to replicon size (in number of genes) in the case of chromosomes; all plasmids are represented with same-sized circles. Chromosomes are shown in light blue, plasmids in green. The reconstructed secondary chromosomes are shown in red. Edge width corresponds to the strength of the inheritance relationships between replicons, and color shows the increase or decrease of chromosome sizes. Edges connected with plasmids are all marked black. On the right we provide a zoomed-in detail. Legend for color code: L: decrease in number of genes; G: increase in number of genes.
Figure 6Replicon architecture reconstruction example. Blue circles represent main chromosomes, green circles plasmids, and purple ovals gene groups. Red boxes represent identified connected components in the group graph and green box final replicon architecture reconstruction result.