| Literature DB >> 15566569 |
Alex Mira1, Ravindra Pushker, Boris A Legault, David Moreira, Francisco Rodríguez-Valera.
Abstract
BACKGROUND: The phylogenetic position and evolutionary relationships of Fusobacteria remain uncertain. Especially intriguing is their relatedness to low G+C Gram positive bacteria (Firmicutes) by ribosomal molecular phylogenies, but their possession of a typical gram negative outer membrane. Taking advantage of the recent completion of the Fusobacterium nucleatum genome sequence we have examined the evolutionary relationships of Fusobacterium genes by phylogenetic analysis and comparative genomics tools.Entities:
Mesh:
Substances:
Year: 2004 PMID: 15566569 PMCID: PMC535925 DOI: 10.1186/1471-2148-4-50
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Figure 1Phylogenetic tree (Bayesian method) using the combined sequence of the 16S-23S rRNA of representative bacterial species. The Fusobacteria are a coherent and taxonomically independent group, that branches out at the base of the lineage leading to Firmicutes. Numbers represent bootstrap values. In the case of the branching of Fusobacteria/Firmicutes, the numbers represent the values obtained by four different methods: BA: Bayesian; NJ: Neighbor-joining; MP: Parsimony; ML: Maximum likelihood.
Figure 2(G-C)/(G+C) values (GC-skew) plotted every 5000 bp for Red wide arrows represent replication origin (bottom) and terminus (top). Orange thin arrows indicate the 36 "clusters" of four or more contiguous genes that are potentially transferred from species outside the Firmicutes. Note that some of F. nucleatum shifts in GC skew coincide with putative HGT regions.
General function of F. nucleatum genes, divided by group of best BLAST hit1.
| Aa biosynthesis | 2 (6.5,3.9) | 22 (71,2.3) | 3 (9.7,2.0) | 2 (6.5,1.9) | 1 (3.2,1.4) | 1 (3.2,0.2) | |||
| Cofactors and carriers biosynth. | 1 (1.4,2.0) | 59 (83,6.2) | 4 (5.6,2.6) | 2 (2.8,1.9) | 2 (2.8,2.7) | 2 (2.8,2.2) | 1 (1.4,0.2) | ||
| Cell envelope | 1 (0.6,2.0) | 6 (3.6,11.3) | 45 (27.1,4.7) | 24 (14.5,16) | 12 (7.2,11.4) | 7 (4.2,9.5) | 12 (7.2,13.3) | 59 (35.5,10) | |
| Cellular processes | 2 (3.1,3.9) | 2 (3.1,3.8) | 35 (54.7,3.7) | 4 (6.2,2.6) | 2 (3.1,1.9) | 2 (3.1,2.2) | 17 (26.6,2.9) | ||
| Central intermed. metab. | 3 (7.0,5.9) | 3 (7.0,5.7) | 24 (55.8,2.5) | 6 (14.0,4.0) | 2 (4.7,1.9) | 3 (7.0,3.3) | 2 (4.7,0.3) | ||
| DNA metab. | 4 (5.5,7.8) | 45 (61.6,4.7) | 5 (6.8,3.3) | 3 (4.1,2.9) | 4 (5.5,5.4) | 1 (1.4,1.1) | 11 (15.1,1.9) | ||
| Energy metab. | 1 (0.9,2.0) | 9 (7.8,17.0) | 82 (70.7,8.6) | 6 (5.2,4.0) | 6 (5.2,5.7) | 4 (3.4,5.4) | 3 (2.6,3.3) | 5 (4.3,0.9) | |
| Lipid metab. | 21 (75.0,2.2) | 3 (10.7,2.0) | 3 (10.7,2.9) | 1 (3.6,0.2) | |||||
| Hypothetical prots. | 3 (2.1,5.7) | 43 (29.9,4.5) | 4 (2.8,2.6) | 2 (1.4,1.9) | 3 (2.1,4.1) | 9 (6.2,10.0) | 80 (55.6,14) | ||
| Other categories | 1 (5.3,1.9) | 15 (79,1.6) | 1 (5.3,0.7) | 1 (5.3,1.0) | 1 (5.3,0.2) | ||||
| Protein fate | 4 (7.0,7.5) | 33 (58,3.5) | 7 (12,4.6) | 4 (7.0,3.8) | 4 (7.0,5.4) | 1 (1.8,1.1) | 4 (7.0,0.7) | ||
| Protein synthesis | 1 (0.9,2.0) | 1 (0.9,1.9) | 88 (78,9.2) | 5 (4.4,3.3) | 7 (6.2,6.7) | 8 (7,10.8) | 3 (2.7,0.5) | ||
| Nucleotides metab. | 1 (2.9,2.0) | 1 (2.9,1.9) | 28 (80,2.9) | 2 (5.7,1.3) | 1 (2.9,1.0) | 2 (5.7,2.7) | |||
| Regulat. functions | 1 (2.0,2.0) | 2 (3.9,3.8) | 29 (57,3.0) | 3 (5.9,2.0) | 3 (5.9,2.9) | 2 (3.9,2.7) | 5 (9.8,5.6) | 6 (11.8,1) | |
| Signal transduction | 3 (60.0,0.3) | 2 (40,0.3) | |||||||
| Transcription | 1 (5.0,2.0) | 1 (5.0,1.9) | 15 (75,1.6) | 1 (5.0,1.4) | 1 (5.0,1.1) | 1 (5.0,0.2) | |||
| Transport/binding proteins | 11 (5.7,21.6) | 2 (1.0,3.8) | 91 (47.4,9.5) | 23 (12,15.2) | 15 (7.8,14.3) | 14 (7.3,18.9) | 18 (9.4,20.0) | 18 (9.4,3.1) | |
| Unclassified | 11 (7.0,21.6) | 3 (1.9,5.7) | 72 (45.6,7.5) | 16 (10,10.6) | 9 (5.7,8.6) | 1 (0.6,1.4) | 6 (3.8,6.7) | 40 (25.3,6.8) | |
| Unknown function | 3 (3.4,5.9) | 3 (3.4,5.7) | 50 (57.5,5.2) | 4 (4.6,2.6) | 10 (11.5,9.5) | 5 (5.7,6.8) | 3 (3.4,3.3) | 9 (10.3,1.5) | |
| Hipothetical function | 8 (1.3,15.7) | 12 (2.0,22.6) | 155 (26.1,16.2) | 31 (5.2,20.5) | 21 (3.5,20.0) | 16 (2.7,21.6) | 24 (4.0,26.7) | 327 (55,55.6) | |
1 First number between brackets indicates the percentage of genes with a best match in a given taxon that have the function indicated on the row heading. Second number between brackets indicate the percentage of genes with a given function that have a best match in the taxon indicated on the column heading.
Figure 3Gene-position plot with a reconstruction of vertical descent and potentially-transferred genes across Plus signs indicate genes whose phylogenetic affiliation to a certain group on the left is supported by BLAST analysis only. Crosses indicate genes whose phylogenetic origin is supported by BLAST and by one or two other methods (phylogenetic tree reconstruction and gene order conservation). Thirty-six clusters are indicated containing four or more consecutive genes that appear to have a xenologous origin (i.e. a phylogenetic affiliation outside the Firmicutes). Details of these clusters are explained in Table 2. Arrowheads at the top indicate the position of transposases. Arrowheads at the bottom indicate position of phage-related genes. Plus signs and crosses indicate potential transfers to/from Firmicutes (Firm), Spirochaetes (Sp), alpha-beta-gamma Proteobacteria (αβγ-Pr), delta-epsilon Proteobacteria (δε-Pr), Cytophaga-Flexibacter-Bacteroides Group (CFB), other bacterial groups (Others), Archaea, or genes that are consistent with the phylogeny (CWP) shown in Figure 1 and additional file 1.
Figure 4Gene order conservation in some representative cases of potentially-transferred clusters. Homologous genes are shown with the same colour in F. nucleatum and the species with which it is compared. Small black boxes represent short orphan genes. Non-contiguous genes are separated by an interrupted line. Genes are not drawn to scale.
Clusters of 4 or more consecutive genes with a best match outside the Firmicutes5.
| # | Putative function | Observations1 | Sequence similarity2 | Genes | Sinteny3 |
| 1 | Transposase + 4 hypothetical proteins of similar sequence | Flanked by 3 short orphans4 One of proteins is a short ORF | 24–32 | FN1511 to FN1515 | |
| 2 | KDO (LPS core synthesis) + endonuclease and DNA pol III | Includes a short orphan | 31–58 | FN1561 to FN1576 | |
| 3 | Peptide ABC transporter | It includes two long (>1500 bp) hypothetical proteins | 30–56 | FN1650 to FN1656 | |
| 4 | sysnthesis of LPS (O chain) + phosphatidylcholine synthesis | Split by a hypothetical protein and 3 short ORFs | 37–61 | FN1661 to FN1668 | |
| 5 | carbohydrate trasnport-pot operon (periplasmic binding prot dependent transport) | Split by long spacer | 22–55 | FN1792 to FN1800 | |
| 6 | periplasmic binding protein dependent cation (Mn2+, Zn2+) transport | posibly Co2+ Flanked by transposase and archaeal best-match ORF | 24–56 | FN1807 to FN1814 | |
| 7 | DNA pol III gamma and tau subunits and TonB OM export system | Flanked by hypothetical orphans | 25–36 | FN1830 to FN1834 | |
| 8 | Periplasmic amilase and ribose ABC trasnporter | Short orphan in the middle | 23–32 | FN1893 to FN1897 | |
| 9 | LPS synthesis and/or decoration and outer membarne stabilization | Flanked by 3528 bp hypothet. protein with eukaryotic best-match followed by long spacer | 25–77 | FN1908 to FN1911 | |
| 10 | capsule biosynthesis | Includes 2 short ORFs (possible HIPA pseudogenes) | 23–46 | FN1997 to FN2003 | |
| 11 | Slow porin homologous to OmpA ( | Split by a long spacer with some homology to membrane proteins. Includes 2 short ORF | 23–49 | FN2056 to FN2062 | |
| 12 | Hypothetical exported 24-amino acid repeat protein | Includes 4 short ORFs (one of them with homology to subunit δ of DNA Polym. III) | 34–45 | FN2110 to FN2122 | |
| 13 | 24 aa repeat protein like in cluster 23 | Protein match to | 31–53 | FN0023 to FN0028 | |
| 14 | Endonuclease + 3 genes implicated in porfirinic siderophore synthesis | Flanked by short orphan | 24–65 | FN0185 to FN0188 | |
| 15 | DNA helicase + peptide transporters | High gene order conservation in an archaeal species | 28–42 | FN0191 to FN0197 | |
| 16 | Sugar ABC transporter | Short spacers/overlapping genes | 31–48 | FN0217 to FN0220 | |
| 17 | Large cluster of hemolysin/ hemagglutinin containing hemagglutinin | Largest bacterial protein. Some degraded hemolysin copies found throughout genome | 23–26 | FN0290 to FN0293 | |
| 18 | ABC iron/haemin transporter with periplasmic binding protein | Flanked by long spacer | 27–47 | FN0300 to FN0303 | |
| 19 | Periplasmic binding protein dependent iron transport system | Physically linked to other iron transport genes of Gram positive and Archaeal match | 34–49 | FN0309 to FN0312 | |
| 20 | NA+/H+ antiporter + 3 genes of unknown function | Split by a tRNA gene. Includes 2 short orphans | 33–53 | FN0350 to FN0354 | |
| 21 | Two clusters of genes implicated in drug efflux (detoxification) extrusion out of OM | Flanked by two orphans of 402 and 618 bp | 21–37 | FN0515 to FN0519 | |
| 22 | Mixed functions cluster | 30–44 | FN0524 to FN0527 | ||
| 23 | LPS synthesis and/or decoration and outer membarne stabilization | Includes recA and recX proteins with best match to | 29–100 | FN0538 to FN0548 | |
| 24 | Structural lipoprotein with release and mureine anchoring components | Flanked by short ORF | 30–46 | FN0579 to FN0582 | |
| 25 | Membrane-related functions + Fe-S oxidoreductase | Includes a short hypothetical protein with biased codon use | 32–55 | FN0734 to FN0739 | |
| 26 | Haemin uptake with periplasmic binding protein iron acquisition | Haemin genes tightly-linked, probable operon | 24–59 | FN0766 to FN0771 | |
| 27 | Biotin biosynthesis | Most spacers are short, possible cotranscription | 31–55 | FN0846 to FN0852 | |
| 28 | Hydrolase + protease + aromatic compound synthesis | Mixed function cluster | 30–47 | FN0869 to FN0873 | |
| 29 | Iron ABC transporter | Flanked by a short orphan with biased codon usage | 45–71 | FN0879 to FN0882 | |
| 30 | Membrane proteins | 1st and 2nd genes probably permeases | 22–37 | FN1030 to FN1033 | |
| 31 | Lipase B componet of type II secretion system + 24 aa repeat protein+ bacterioferritin | All proteins of short length | 26–34 | FN1075 to FN1079 | |
| 32 | KDO (cetodeoxyoctulonic acid biosynthetic operon) | KDO is a component of LPS core in Fusobacterium and many Gram negatives. | 31–100 | FN1221 to FN1224 | |
| 33 | Eps synthesis + EpsF (secretion of proteins/large biomolecules) | Possible tandem duplication | 30–47 | FN1242 to FN1245 | |
| 34 | LOS choline decoration + Ton B (biopolymer transport through Outer Membrane) | Includes a short ORF (a degraded copy of a biopolymer transporter) | 29–40 | FN1306 to FN1312 | |
| 35 | ABC transporter system | Flanked by short orphan followed by a transposase | 30–69 | FN1346 to FN1355 | |
| 36 | ABC amino acid transport system | 50–62 | FN1428 to FN1431 |
1 "Split" indicates a cluster separated by a long intergenic spacer, the two parts of the cluster generally coding for different functions.
2 Range of sequence similarity among the genes from each cluster compared to their BLAST top hits.
3 Representative species with similar gene order.
4 An "orphan" gene is defined as an ORF with an unknown function and no BLAST similarity in the current database. Short orphans (<500 bp) are likely to be pseudogene remnants or other non-functional regions (Mira et al. 2002, Davies et al. 2004).
5 Only protein coding genes are included in the analysis.
Percentage of F. nucleatum ORFs classified by the taxa of potential origin.
| Sequence similarity method (BLAST) | Phylogenetic trees method | Gene order conservation | |
| Number of genes analyzed | 2067 | 1236 | 738 |
| Root of Firmicutes1 | 33.28 % | 8.41 % | 35.1 % |
| Inside Firmicutes2 | 12.92 % | 25.8 % | 15.45 % |
| CFB group | 2.56 % | 2.27 % | 4.06 % |
| α, β, γ Proteobacteria | 7.34 % | 10.3 % | 21.0 % |
| δ, ε Proteobacteria | 5.07 % | 3.4 % | 5.7 % |
| Spirochaetes | 4.35 % | 4.32 | 6.37 % |
| Other eubacteria | 3.58 % | 4.53 | 4.2 % |
| Archaea | 2.46 % | 1.13 % | 0.95 % |
| No hit, hit to eukaryotes, uncertain/unresolved | 28.4 % | 38.7 % | 7.45 % |
1 Genes consistent with the 16S-23S and ribosomal proteins phylogeny.
2 It indicates possible HGT to/from Firmicutes.
Figure 5Chimeric operons (metabolic pathways of putatively mixed origin) in Arrowed boxes represent gene orientation, coloured by BLAST top hit. White boxes: top hit to Firmicutes; grey boxes: top hit to Archaeal species; black boxes: top hit in Gram negative species. Numbers below boxes indicate the percent of top ten hits that have matches in Firmicutes. Names above indicate gene names (I-BP: Iron binding protein; NIP: Nitrogenase iron protein; Oxdtase: Oxidoreductase; B, C, D, F: dipeptide permeases B, C, D, F; BP: dipeptide binding protein; Tr: ABC transporter; unk: unknown function gene; Rec: Hemin receptor). Best match taxa by the phylogenetic tree and gene order methods are also indicated (A: Archaea; Pr: Proteobacteria; Sp: Spirochaetes; CP: consistent with (ribosomal) phylogeny; O: other eubacteria; x: unresolved; --: not analysed. Plus signs indicate unusual DNA composition by the method of García-Vallvé et al. 2003.
Figure 6Length frequency distributions for Genes are divided by the group with closest sequence similarity match. ORFs without sequence similarity on the non-redundant NCBI database (orphan genes) are significantly shorter than the rest.