| Literature DB >> 25653643 |
Christopher H House1, Matteo Pellegrini2, Sorel T Fitz-Gibbon2.
Abstract
Initially using 143 genomes, we developed a method for calculating the pair-wise distance between prokaryotic genomes using a Monte Carlo method to estimate the conservation of gene order. The method was based on repeatedly selecting five or six non-adjacent random orthologs from each of two genomes and determining if the chosen orthologs were in the same order. The raw distances were then corrected for gene order convergence using an adaptation of the Jukes-Cantor model, as well as using the common distance correction D' = -ln(1-D). First, we compared the distances found via the order of six orthologs to distances found based on ortholog gene content and small subunit rRNA sequences. The Jukes-Cantor gene order distances are reasonably well correlated with the divergence of rRNA (R (2) = 0.24), especially at rRNA Jukes-Cantor distances of less than 0.2 (R (2) = 0.52). Gene content is only weakly correlated with rRNA divergence (R (2) = 0.04) over all distances, however, it is especially strongly correlated at rRNA Jukes-Cantor distances of less than 0.1 (R (2) = 0.67). This initial work suggests that gene order may be useful in conjunction with other methods to help understand the relatedness of genomes. Using the gene order distances in 143 genomes, the relations of prokaryotes were studied using neighbor joining and agreement subtrees. We then repeated our study of the relations of prokaryotes using gene order in 172 complete genomes better representing a wider-diversity of prokaryotes. Consistently, our trees show the Actinobacteria as a sister group to the bulk of the Firmicutes. In fact, the robustness of gene order support was found to be considerably greater for uniting these two phyla than for uniting any of the proteobacterial classes together. The results are supportive of the idea that Actinobacteria and Firmicutes are closely related, which in turn implies a single origin for the gram-positive cell.Entities:
Keywords: Actinobacteria; Archaea; Firmicutes; evolutionary distance; gene order; genomics; tree of life
Year: 2015 PMID: 25653643 PMCID: PMC4299520 DOI: 10.3389/fmicb.2014.00785
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
143 taxa.
| Aeropyrum pernix K1 | ap |
| Agrobacterium tumefaciens C58UW | at |
| Agrobacterium tumefaciens C58C | atc |
| Aquifex aeolicus VF5 | aa |
| Archaeoglobus fulgidus DSM4304 | af |
| Bacillus anthracis Ames | baa |
| Bacillus cereus ATCC 14579 | bc |
| Bacillus halodurans C-125 | bh |
| Bacillus subtilis 168 | bs |
| Bacteroides thetaiotaomicron | bt |
| Bifidobacterium longum NCC2705 | bl |
| Bordetella bronchiseptica | bbr |
| Bordetella parapertussis | bpp |
| Bordetella pertussis | bp |
| Borrelia burgdorferi B31 | bb |
| Bradyrhizobium japonicum USDA 110 | bj |
| Brucella melitensis | bm |
| Brucella suis | brs |
| Buchnera aphidicola Bp | ba |
| Buchnera aphidicola Sg | bas |
| bu | |
| Campylobacter jejuni NCTC 11168 | cj |
| Candidatus Blochmannia floridanus | cbf |
| Caulobacter crescentus CB15 | cc |
| Chlamydia trachomatis serovar D | ct |
| Chlamydia trachomatis MoPn/Nigg | cm |
| Chlamydophila caviae GPIC | cca |
| Chlamydophila pneumoniae AR39 | cpa |
| Chlamydophila pneumoniae J138 | cpj |
| Chlamydophila pneumoniae TW183 | cpt |
| Chlamydophila pneumoniae CWL029 | cp |
| Chlorobium tepidum TLS | cte |
| Chromobacterium violaceum | cv |
| Clostridium acetobutylicum ATCC 824 | ca |
| Clostridium perfringens | cpe |
| Clostridium tetani | clt |
| Corynebacterium diphtheria | cd |
| Corynebacterium efficiens YS-314 | cef |
| Corynebacterium glutamicum | cg |
| Coxiella burnetii | cb |
| Deinococcus radiodurans R1 | dr |
| Enterococcus faecalis V583 | ef |
| Escherichia coli O157:H7 strain EDL933 | ece |
| Escherichia coli K-12 Strain MG1655 | ec |
| Escherichia coli CFT073 | ecc |
| Escherichia coli O157:H7 | ech |
| Fusobacterium nucleatum ATCC 25586 | fn |
| Gloeobacter violaceus | gv |
| Haemophilus ducreyi | hd |
| Haemophilus influenzae Rd KW20 | hi |
| hsp | |
| Helicobacter hepaticus ATCC 51449 | hh |
| Helicobacter pylori 26695 | hp |
| Helicobacter pylori J99 | hpj |
| Lactobacillus plantarum WCFS1 | lp |
| Lactococcus lactis IL1403 | ll |
| Leptospira interrogans s.l. 56601 | li |
| Listeria innocua clip11262 | lin |
| Listeria monocytogenes EGD-e | lm |
| Mesorhizobium loti MAFF303099 | ml |
| Methanobacterium thermoautotroph. | mt |
| Methanococcus jannaschii DSM 2661 | mj |
| Methanopyrus kandleri AV19 | mk |
| Methanosarcina acetivorans C2A | ma |
| Methanosarcina mazei Goe1 | mma |
| Mycobacterium bovis bovis | mb |
| Mycobacterium leprae | mle |
| Mycobacterium tuberculosis H37Rv | mtb |
| Mycobacterium tuberculosis cdc1551 | mtc |
| Mycoplasma gallisepticum | mga |
| Mycoplasma genitalium G-37 | mg |
| Mycoplasma penetrans | mpe |
| Mycoplasma pneumoniae M129 | mp |
| Mycoplasma pulmonis UAB CTIP | mpu |
| Nanobacterium equitans Kin4-M | neq |
| Neisseria meningitidis MC58 | nmm |
| Neisseria meningitidis A Z2491 | nmz |
| Nitrosomonas europaea | ne |
| ns | |
| Oceanobacillus iheyensis HTE831 | oi |
| Pasteurella multocida Pm70 | pm |
| Photorhabdus luminescens | pl |
| Pirellula_sp | pi |
| Porphyromonas gingivalis | pg |
| Prochlorococcus marinus CCMP1375 | pmc |
| Prochlorococcus marinus MED4 | pmm |
| Prochlorococcus marinus MIT9313 | pma |
| Pseudomonas aeruginosa PAO1 | psa |
| Pseudomonas putida KT2440 | psp |
| Pseudomonas syringae pv. tomato | pss |
| Pyrobaculum aerophilum IM2 | pa |
| Pyrococcus abyssi | pab |
| Pyrococcus furiosus DSM3638 | pf |
| Pyrococcus horikoshii OT3 | ph |
| Ralstonia solanacearum | rs |
| Rickettsia conorii Malish 7 | rc |
| Rickettsia prowazekii Madrid E | rp |
| Salmonella enterica Typhi | se |
| Salmonella enterica Typhi_Ty2 | set |
| Salmonella typhimurium LT2 | sty |
| Shewanella oneidensis | so |
| Shigella flexneri 2a | sf |
| Sinorhizobium meliloti 1021 | sm |
| Staphylococcus aureus N315 | san |
| Staphylococcus aureus MW2 | saw |
| Staphylococcus aureus Mu50 | sam |
| Staphylococcus epidermidis 12228 | sep |
| Streptococcus agalactiae 2603 | sa |
| Streptococcus agalactiae NEM316 | sag |
| Streptococcus mutans | smu |
| Streptococcus pneumoniae R6 | spn |
| Streptococcus pneumoniae TIGR4 | spt |
| Streptococcus pyogenes SSI-1 | mle |
| Streptococcus pyogenes MGAS8232 | spa |
| Streptococcus pyogenes MGAS315 | spg |
| Streptococcus pyogenes M1_GAS | spm |
| Streptomyces avermitilis MA-4680 | sav |
| Streptomyces coelicolor A3(2) | sco |
| Sulfolobus solfataricusP2 | ss |
| Sulfolobus tokodaii 7 | st |
| syo | |
| sy | |
| Thermoanaerobacter tengcongensis | tt |
| Thermoplasma acidophilum | ta |
| Thermoplasma volcanium GSS1 | tv |
| Thermosynechococcus elongatus BP-1 | te |
| Thermotoga maritima MSB8 | tm |
| Treponema pallidum Nichols | tp |
| Tropheryma whipplei Twist | tw |
| Tropheryma whipplei TW08_27 | twt |
| Ureaplasma urealyticum serovar 3 | uu |
| Vibrio cholerae serotype O1 (N16961) | vc |
| Vibrio parahaemolyticus RIMD 2210633 | vp |
| Vibrio vulnificus CMCP6 | vv |
| Vibrio vulnificus YJ016 | vvy |
| Wigglesworthia brevipalpis | wb |
| Wolinella_succinogenes | ws |
| Xanthomonas axonopodis pv citri 306 | xa |
| Xanthomonas campestris ATCC 33913 | xc |
| Xylella fastidiosa 9a5c | xf |
| Xylella fastidiosa Temecula1 | xft |
| Yersinia pestis CO-92 Biovar Orientalis | yp |
| Yersinia pestis KIM | ypk |
Figure 1Diagram demonstrating the method used to calculate the pair wise distributed gene order distance between genomes. Repeatedly, six ortholog pairs are chosen randomly (requiring every gene in the six be at least 5 genes away along the genome from each). The six genes are then tested to see if they are in the same order (irrespective of the orientation of the genes). In the case above, the test fails because orthologs C and E are switched. Distributed gene order distance is equal to the fraction of times such a test fails between two genomes. The diagram also works for demonstrating the distributed gene order distance between genomes using five genes (A–E) by ignoring gene F.
Steps used in hierarchical tree building.
| 1 | Construct a ranked list of gene order distances starting with the shortest distances |
| 2 | Move down ranked list, forming NJ trees of increasing taxa number estimating reaching a NJ tree of all taxa |
| 3 | In turn, evaluate each tree formed starting with the smallest and moving to the largest |
| 4 | Keep trees consistent with all previously retained trees, while rejecting any new tree that is incongruent with a previously retained tree |
| 5 | Starting with those represented by the smallest gene order pairs, single taxa were added to the largest retained tree if their addition did not disrupt the existing NJ topology (second round of taxa addition) |
Figure 2Histograms showing the frequency of gene order distances calculated for 143 prokaryotes. (A) Distribution of raw gene order distances. The predicted distance for randomly ordered genomes is 0.983, 82% of the genome pairs have raw distances less the 0.983. (B) Distribution of distances after a Jukes-Cantor type correction. The predicted “Jukes-Cantor” gene order distance for randomly ordered genomes is >16. Some highly distant genome pairs are not shown in (B) because this logarithmic correction cannot be applied to distances greater than that expected randomly. (C) Distribution of Tajima-corrected gene order distances. Highly distant genome pairs are extreme outliners due to large corrections applied. Without these genome-pairs, the distribution is similar to that shown in (B).
Figure 3Comparison of “Jukes-Cantor” distributed gene order distances with ortholog gene content and Jukes-Cantor rRNA distances. Select gene pairs have been labeled. (A) Gene order distance plotted as a function of rRNA distance. Solid line is linear regression of all data (R2 = 0.24). Dashed line is a linear regression for genome pairs with rRNA distances <0.2 (R2 = 0.52). (B) Gene content distance plotted as a function of rRNA distance. Solid line is linear regression of all data (R2 = 0.04). Dashed line is linear regression for genome pairs with rRNA distances <0.1 (R2 = 0.67). (C) Gene content distance plotted as a function of gene order distance. Solid line is linear regression of all data (R2 = 0.22).
172 taxa.
| Acidaminococcus fermentans | ACIFV |
| Acidilobus saccharovorans | ACIS3 |
| Acidimicrobium ferrooxidans | ACIFD |
| Acidithiobacillus ferrooxidans | ACIF5 |
| Acinetobacter baumannii | ACIBS |
| ACIAD | |
| Aeromonas hydrophila hydrophila | AERHH |
| Aeromonas salmonicida | AERS4 |
| Alcanivorax borkumensis | ALCBS |
| Alicyclobacillus acidocaldarius | ALIAD |
| Alteromonas macleodii | ALTMD |
| Amycolatopsis mediterranei | AMYMU |
| Anabaena variabilis | ANAVT |
| Anoxybacillus flavithermus | ANOFW |
| Arcanobacterium haemolyticum | ARCHD |
| Archaeoglobus fulgidus | ARCFU |
| Archaeoglobus profundus | ARCPA |
| Archaeoglobus veneficus | ARCVS |
| AZOSB | |
| Azotobacter vinelandii | AZOVD |
| Bacillus amyloliquefaciens | BACA2 |
| Bacillus pumilus | BACP2 |
| Bacillus selenitireducens | BACIE |
| Beutenbergia cavernae | BEUC1 |
| Bifidobacterium adolescentis | BIFAA |
| Bifidobacterium animalis animalis | BIFAR |
| Bifidobacterium animalis lactis | BIFA0 |
| Burkholderia mallei | BURMA |
| Burkholderia thailandensis | BURTA |
| Campylobacter jejuni HS:41 | CAMJC |
| Campylobacter lari | CAMLR |
| Catenulispora acidiphila | CATAD |
| Caulobacter crescentus | CAUCR |
| Caulobacter segnis | CAUST |
| Cellvibrio gilvus | CELGA |
| Cellvibrio japonicus | CELJU |
| Cenarchaeum symbiosum | CENSY |
| Clostridium novyi | CLONN |
| Clostridium perfringens | CLOPS |
| Clostridium tetani | CLOTE |
| Coriobacterium glomerans | CORGP |
| Corynebacterium jeikeium | CORJK |
| Corynebacterium kroppenstedtii | CORK4 |
| Corynebacterium urealyticum | CORU7 |
| Dechloromonas aromatic | DECAR |
| Desulfovibrio vulgaris | DESVV |
| Desulfurococcus kamchatkensis | DESK1 |
| Desulfurococcus mucosus | DESM0 |
| Dichelobacter nodosus | DICNV |
| Enterobacter cloacae | ENTCS |
| ENT38 | |
| Enterococcus faecalis | ENTFA |
| Frankia alni | FRAAA |
| FRASC | |
| Gardnerella vaginalis | GARV4 |
| Geobacillus kaustophilus | GEOKA |
| GEOSW | |
| Geobacillus thermodenitrificans | GEOTN |
| Gloeobacter violaceus | GLOVI |
| Hahella chejuensis | HAHCH |
| Halobacterium salinarum | HALSA |
| Halothermothrix orenii | HALOH |
| Helicobacter mustelae | HELM1 |
| Helicobacter pylori | HELP5 |
| Hydrogenobacter thermophiles | HYDTT |
| Kineococcus radiotolerans | KINRD |
| Korarchaeum cryptofilum | KORCO |
| Lactobacillus fermentum | LACFC |
| Lactobacillus helveticus | LACH4 |
| Lactobacillus salivarius | LACSC |
| Lactococcus lactis cremoris | LACLS |
| LACLA | |
| Legionella pneumophila | LEGPL |
| Legionella pneumophila pneumophila | LEGPH |
| Leuconostoc citreum | LEUCK |
| Leuconostoc gasicomitatum | LEUGT |
| LEUS2 | |
| Listeria monocytogenes serotype 4b | LISMC |
| Listeria monocytogenes serovar 1/2a | LISMO |
| Listeria welshimeri serovar 6b | LISW6 |
| Lysinibacillus sphaericus | LYSSC |
| MAGSM | |
| METSW | |
| Methanocaldococcus fervens | METFA |
| Methanocaldococcus infernus | METIM |
| Methanocaldococcus vulcanius | ETVM |
| Methanocella conradii | METCZ |
| Methanococcus aeolicus | META3 |
| Methanococcus vannielii | METVS |
| Methanococcus voltae | METV3 |
| Methanopyrus kandleri | METKA |
| Methanosaeta concilii | METCG |
| Methanosaeta harundinacea | METH6 |
| Methanosaeta thermophile | METTP |
| Methanosarcina acetivorans | METAC |
| Methanosarcina barkeri | METBF |
| Methanosarcina mazei | METMA |
| Methylobacillus flagellates | METFK |
| Methylococcus capsulatus | METCA |
| Microcystis aeruginosa | MICAN |
| Micromonospora aurantiaca | MICAI |
| MICSL | |
| Moraxella catarrhalis | MORCR |
| Nanoarchaeum equitans | NANEQ |
| Natranaerobius thermophiles | NATTJ |
| Nautilia profundicola | NAUPA |
| Neisseria meningitides | NEIML |
| Neisseria meningitidis serogroup B | NEIMG |
| Nitrosomonas europaea | NITEU |
| Nitrosomonas eutropha | NITEC |
| Nitrosopumilus maritimus | NITMS |
| Nitrososphaera gargensis | NITGG |
| Nocardia cyriacigeorgica | NOCCG |
| Nocardia farcinica | NOCFA |
| NOCSJ | |
| Nostoc azollae | NOSA0 |
| Nostoc punctiforme | NOSP7 |
| NOSS1 | |
| Oceanobacillus iheyensis | OCEIH |
| Parvularcula bermudensis | PARBH |
| Pasteurella multocida | PASMU |
| Prochlorococcus marinus | PROM4 |
| Prochlorococcus marinus pastoris | PROMP |
| Propionibacterium acnes | PROAC |
| Propionibacterium propionicum | PROPF |
| Pseudomonas fulva | PSEF1 |
| Pseudomonas stutzeri | PSEU5 |
| Psychrobacter arcticus | PSYA2 |
| PSYWF | |
| Rhizobium etli | RHIEC |
| Rhizobium meliloti | RHIME |
| Rhodobacter capsulatus | RHOCB |
| Rhodobacter sphaeroides | RHOS1 |
| Rhodospirillum centenum | RHOCS |
| Rhodospirillum rubrum | RHORT |
| Rickettsia prowazekii | RICPR |
| Rickettsia typhi | RICTY |
| Rubrobacter xylanophilus | RUBXD |
| Saccharomonospora viridis | SACVD |
| Saccharopolyspora erythraea | SACEN |
| Sphingomonas wittichii | SPHWW |
| Staphylococcus carnosus | STACT |
| Staphylococcus epidermidis | STAES |
| Staphylococcus lugdunensis | STALH |
| Streptococcus pyogenes M49 | STRPZ |
| Streptococcus pyogenes M5 | STRPG |
| Streptococcus thermophiles | STRTD |
| Streptomyces avermitilis | STRAW |
| Streptomyces coelicolor | STRCO |
| Streptomyces griseus | STRGG |
| Streptosporangium roseum | STRRD |
| Sulfolobus acidocaldarius | SULAC |
| Sulfolobus islandicus | SULIM |
| Sulfolobus solfataricus | SULS9 |
| Thermoanaerobacter italicus | THEIA |
| Thermoanaerobacter mathranii | THEM3 |
| Thermoanaerobacter pseudethanolicus | THEP3 |
| Thermobispora bispora | THEBD |
| Thermococcus onnurineus | THEON |
| Thermococcus sibiricus | THESM |
| THES4 | |
| Thermoplasma acidophilum | THEAC |
| Thermoplasma volcanium | THEVO |
| Thermoproteus neutrophilus | THENV |
| Thermoproteus tenax | THETK |
| Thermoproteus uzoniensis | THEU7 |
| Thiomicrospira crunogena | THICR |
| Veillonella parvula | VEIPT |
| Vibrio cholerae serotype O1 | VIBCM |
| Vibrio fischeri | VIBF1 |
| Xanthomonas campestris | XANCP |
| Xanthomonas oryzae pv. Oryzae | XANOM |
Figure 4NJ phylogram of all taxa built using Tajima-corrected gene order distances calculated using 10 million iterations of six predicted orthologs (unresolved single taxon are not shown for clarity). Major taxonomic groups are labeled. Actinobacteria are shown in green, and the two clusters of Firmicutes are shown in blue. The Actinobacteria are grouped with the bulk of the Firmicutes.
Figure 5“Bootstrap” NJ cladogram of the gene order distance tree shown in Figure . Each node shows the number of times that node appears in 100 replicate trees each using gene order distances based on 100,000 iterations. Select taxonomic groups are labeled with the same color scheme as used in later figures.
Figure 6Midpoint-rooted NJ phylogram based on our hierarchical tree building starting with 143 genomes (see Materials and Methods) using the same gene order data distances as the tree shown in Figure . This resultant tree includes the most taxa during the initial round of clustering with solid lines and bold font. Taxa connected with dashed lines are those found to be compatible during a second round of single taxon addition. “Bootstrap values” shown are the number of times a node is found when NJ trees are formed using these taxa and the 100 replicate gene order distances. The values listed for individual taxa are the number of times that taxon is found in the biggest tree formed by the initial round of clustering when the 100 replicate gene order distances are used. Taxa not shown that were found 60 or more times in the largest tree after the initial clustering were: vc (70), cbf (67), vv (65), sty (63), set (61), and vp (60).
Figure 7NJ phylogram, starting with the 143 original taxa, limited to only the 23 taxa found in the agreement subtrees for the 100 replicate trees formed using iterations of six predicted orthologs. Bold lines show the part of the tree that is found in all 18 agreement subtrees. “Bootstrap values” shown are the number of times a node is found when NJ trees are formed using these taxa and the 100 replicate gene order distances. Actinobacteria are shown in green and Firmicutes are shown in blue, while the γ-Proteobacteria shown in gray.
Figure 8NJ phylogram, starting with 172 representative taxa, limited to only the 23 taxa found in the agreement subtrees for the 100 replicate trees formed using iterations of six predicted orthologs. “Bootstrap values” shown are the number of times a node is found when NJ trees are formed using these taxa and the 100 replicate gene order distances.
Figure 9Midpoint-rooted NJ phylogram, starting with 172 representative taxa, limited to only the 56 taxa found in the agreement subtrees for the 100 replicate trees formed using iterations of five predicted orthologs with the same distance equation as before, which ends up functionally equivalent to using D′ = −ln(1-D). “Bootstrap values” shown are the number of times a node is found when NJ trees are formed using these taxa and the 100 replicate gene order distances with * representing a bootstrap value of 100.