| Literature DB >> 15653627 |
Christine Dufraigne1, Bernard Fertil, Sylvain Lespinats, Alain Giron, Patrick Deschavanne.
Abstract
Horizontal DNA transfer is an important factor of evolution and participates in biological diversity. Unfortunately, the location and length of horizontal transfers (HTs) are known for very few species. The usage of short oligonucleotides in a sequence (the so-called genomic signature) has been shown to be species-specific even in DNA fragments as short as 1 kb. The genomic signature is therefore proposed as a tool to detect HTs. Since DNA transfers originate from species with a signature different from those of the recipient species, the analysis of local variations of signature along recipient genome may allow for detecting exogenous DNA. The strategy consists in (i) scanning the genome with a sliding window, and calculating the corresponding local signature (ii) evaluating its deviation from the signature of the whole genome and (iii) looking for similar signatures in a database of genomic signatures. A total of 22 prokaryote genomes are analyzed in this way. It has been observed that atypical regions make up approximately 6% of each genome on the average. Most of the claimed HTs as well as new ones are detected. The origin of putative DNA transfers is looked for among approximately 12 000 species. Donor species are proposed and sometimes strongly suggested, considering similarity of signatures. Among the species studied, Bacillus subtilis, Haemophilus Influenzae and Escherichia coli are investigated by many authors and give the opportunity to perform a thorough comparison of most of the bioinformatics methods used to detect HTs.Entities:
Mesh:
Year: 2005 PMID: 15653627 PMCID: PMC546175 DOI: 10.1093/nar/gni004
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1Signatures (4-letter words and 5 kb windows) along genome for Clostridium acetobutylicum, Deinococcus radiodurans and Mycobacterium tuberculosis. In this kind of displays, lines represent the frequency of words along genome, columns represent signature of windows.
Main data for the 22 species
| Species | Genome size (Mb) | rRNA in genome (%) | Detected rRNA (%) | rRNA-free outliers (%) | Intrinsic host variation mean distance (AU) | Cut-off distance (AU) | Atypical regions (#) | Length of atypical regions (median) | Taxonomy of potential donors: most populous classes and percentage of total donors |
|---|---|---|---|---|---|---|---|---|---|
| A.pernix | 1.67 | 0.37 | 78.01 | 13.09 | 145 | 230 | 63 | 2000 | Eukaryota 61% (m 70%), Vertebrata 40% (m 85%), Archaea 24% |
| Aquifex aeolicus | 1.55 | 0.6 | 100 | 7.87 | 120 | 204 | 26 | 2500 | Bacteria 48% (p 31%), Firmicutes 26% (p 50%), Eukaryota 29% (m 67%) |
| Archaeoglobus fulgidus | 2.19 | 0.21 | 97.31 | 11.04 | 122 | 190 | 38 | 4000 | Eukaryota 43%, Embryophyta 20%, bacteria 30%, viruses 20% |
| B.subtilis | 4.21 | 1.09 | 100 | 12.97 | 126 | 204 | 51 | 4500 | Bacteria 69% (p 23%), Firmicutes 50% (p 21%), viruses 16% |
| Borrelia burgdorferi | 0.91 | 0.84 | 91.19 | 1.98 | 143 | 273 | 7 | 500 | Eukaryota 50%, bacteria 25%, viruses 25% |
| Campylobacter jejunii | 1.64 | 0.83 | 98.3 | 2.08 | 145 | 279 | 12 | 750 | Bacteria 50% (p 75%), Eukaryota 38% |
| Chlamydia pneumoniae | 1.23 | 0.37 | 70.62 | 2.18 | 132 | 196 | 13 | 500 | Bacteria 58%, Eukaryota (Viridiplantae…Asterids) 25% |
| Chlamydia trachomatis | 1.04 | 0.87 | 99.16 | 2.97 | 121 | 178 | 11 | 1250 | Bacteria 58% (p 43%), viruses 25% |
| C.acetobutylicum | 3.94 | 1.26 | 99.87 | 2.78 | 139 | 258 | 26 | 1500 | Bacteria 63% (p 15%), Firmicutes 40%, Eukaryota 21% |
| Deinococcus radiodurans | 3.26 | 0.26 | 82.48 | 5.46 | 132 | 242 | 35 | 3000 | Bacteria 81% (p 13%), Proteobacteria 60%, Pseudomonas 22% |
| E.coli | 4.64 | 0.69 | 72.01 | 10.33 | 130 | 216 | 84 | 3750 | Bacteria 87% (p 28%), Enterobacteriales 56%, viruses 10% |
| H.influenzae | 1.83 | 1.49 | 90.84 | 3.29 | 130 | 239 | 13 | 1500 | Bacteria 59% (p 20%), Eukaryota 35% |
| Helicobacter pylori | 1.67 | 0.56 | 50.92 | 4.6 | 130 | 237 | 18 | 2500 | Bacteria 83% (p 40%), Firmicutes 33% |
| M.thermoautotrophicum | 1.75 | 0.53 | 79.18 | 6.72 | 133 | 238 | 14 | 6750 | Viruses 42%, Eukaryota (Viridiplantae…Magnoliophyta) 35% (m 33%) |
| M.jannaschii | 1.66 | 0.54 | 98.69 | 1.93 | 142 | 289 | 7 | 1000 | Viruses 63%, Eukaryota (Viridiplantae…Magnoliophyta) 37% |
| M.tuberculosis | 4.41 | 0.11 | 96.77 | 6.28 | 131 | 259 | 43 | 4500 | Bacteria 95% (p 10%), Proteobacteria 50% |
| P.abyssi | 1.77 | 0.29 | 87.33 | 1.27 | 134 | 285 | 2 | 8750 | Eukaryota 66%, Tracheophyta 66%, Archea 33% |
| Pyrococcus furiosus | 1.91 | 0.25 | 94.86 | 3.43 | 132 | 234 | 13 | 2500 | Archea 36%, bacteria 36%, Firmicutes 32%, Eukaryota 28% |
| Pyrococcus horikoshii | 1.74 | 0.31 | 97.78 | 1.91 | 133 | 255 | 7 | 2500 | Bacteria 45% (p 20%), Firmicutes 45%, Eukaryota 27%, Archaea 27% |
| Rickettsia prowazekii | 1.11 | 0.39 | 65.64 | 2.75 | 129 | 229 | 10 | 1250 | Bacteria 59% (p 33%), Eukaryota 30% |
| 3.57 | 0.25 | 91.24 | 9.03 | 124 | 217 | 65 | 3500 | Bacteria 55% (p 24%), Lactobacillales 22% (p 29%), Eukaryota 32% | |
| Thermotoga maritima | 1.86 | 0.25 | 100 | 9.23 | 123 | 189 | 29 | 4000 | Bacteria 47% (p 27%), Eukaryota 41%, Ascomycota 19% |
| Mean for 22 genomes | 2.25 | 0.56 | 88.28 | 5.60 | 132 | 234 | 27 | 2864 | |
| Median for 22 genomes | 1.76 | 0.46 | 93.05 | 4.02 | 132 | 236 | 16 | 2500 |
aIntrinsic host variation in terms of the mean distance of window signatures to host signature (AU).
bThreshold used for the selection of atypical regions (AU).
cTaxonomy of potential donors with percentage of donors per taxonomic branch (m, mitochondrial DNA; p, plasmid).
Figure 2Atypical regions for the B.subtilis genome (a) Upper panel: signatures along the genome (same as Figure 1). Lower panel: distances of local signatures to host signature (one window out of ten is shown). Distances are expressed in arbitrary units (AU). (b) Inset: close up of the 1116–1141 kb region of a putative HT, with gene composition, using 0.5 kb window and 50 bp step. Gray diamonds, host; closed diamonds, original rRNA-free regions; and multiple symbols, rRNA-containing regions.
Figure 3Atypical regions for the H.influenzae genome. Upper panel: signatures along the genome. Lower panel: distances of local signatures to the host signature (one window out of 10 is shown). Distances are expressed in AU. Gray diamonds, host; closed diamonds, original rRNA-free regions; and multiple symbols, rRNA regions.
Agreement between methods for the analysis of B.subtilis genome in terms of Kappa
| Garcia-Vallve ( | Nicolas ( | Nakamura ( | Moszer ( | This work | |
|---|---|---|---|---|---|
| Atypical genes (#) | 557 | 529 | 457 | 537 | 599 |
| Atypical genes (%) | 14 | 13 | 11 | 13 | 15 |
| Single vote genes (#) | 116 | 47 | 111 | 61 | 83 |
| Genes in majority consensus (#) | 398 | 445 | 295 | 424 | 453 |
| Kappas | |||||
| Majority consensus | 0.74 | 0.88 | 0.59 | 0.82 | 0.83 |
| Garcia-Vallve ( | 0.66 | 0.45 | 0.62 | 0.66 | |
| Nicolas ( | 0.51 | 0.72 | 0.78 | ||
| Nakamura ( | 0.57 | 0.48 | |||
| Moszer ( | 0.69 | ||||
The majority consensus results from a voting scheme about the status of each gene (all methods are electors). Total number of detected genes = 1011; majority consensus (no. of genes) = 470. Gene scores (1 vote, 418; 2 votes, 123; 3 votes, 95; 4 votes, 145; and 5 votes, 230).
Agreement between methods for the analysis of E.coli genome in terms of Kappa
| Garcia-Vallve ( | Hayes ( | Lawrence ( | Nakamura ( | Medigue ( | This work | |
|---|---|---|---|---|---|---|
| Atypical genes (#) | 359 | 653 | 1184 | 710 | 398 | 508 |
| Atypical genes (%) | 8 | 15 | 28 | 17 | 9 | 12 |
| Single vote genes (#) | 19 | 240 | 372 | 103 | 16 | 56 |
| Genes in majority consensus (#) | 243 | 186 | 335 | 314 | 278 | 261 |
| Kappas | ||||||
| Majority consensus | 0.74 | 0.34 | 0.43 | 0.66 | 0.81 | 0.67 |
| Garcia-Vallve ( | 0.13 | 0.31 | 0.36 | 0.47 | 0.57 | |
| Hayes ( | 0.22 | 0.26 | 0.22 | 0.18 | ||
| Lawrence ( | 0.45 | 0.34 | 0.37 | |||
| Nakamura ( | 0.55 | 0.36 | ||||
| Medigue ( | 0.40 | |||||
Total number of detected genes = 1732 majority consensus (no. of genes) = 342. Gene scores (1 vote, 806; 2 votes, 363; 3 votes, 221; 4 votes, 157; 5 votes, 121; and 6 votes, 64). Similarities between methods for the detection of atypical genes in E.coli (correspondence analysis: a graphical technique that is used here for E.coli to show which methods have similar patterns of gene selection).
Agreement between methods for the analysis of H.Influenzae genome in terms of Kappa
| Garcia-Vallve ( | Nakamura ( | This work | |
|---|---|---|---|
| Atypical genes (#) | 86 | 184 | 71 |
| Atypical genes (%) | 5 | 11 | 4 |
| Single vote genes (#) | 33 | 158 | 25 |
| Genes in majority consensus (#) | 53 | 26 | 46 |
| Kappas | |||
| Majority consensus | 0.73 | 0.17 | 0.71 |
| Garcia-Vallve ( | 0.1 | 0.51 | |
| Nakamura ( | 0.06 | ||
Total number of detected genes = 273; majority consensus (no. of genes) = 57. Gene scores (1 vote, 216; 2 votes, 46; and 3 votes, 11).
Recent potential HTs (genes) in E.coli strains
| Strains/cluster | Begin | End | Genes | Absent in | Homologous gene in other species (FASTA) | Remarkable donor(s) (rank 1–10) |
|---|---|---|---|---|---|---|
| E.coli cft073 | 1365450 | 1367050 | c1466 | K12 | S.typhimurium | |
| E.coli cft073 | 3029950 | 3030550 | c3154 | K12 | S.typhimurium | |
| E.coli 0157-H7 | 1203950 | 1205550 | ECs1120 | K12 | S.typhimurium | |
| E.coli 0157-H7 | 1795450 | 1796550 | ECs1806 | K12 | S.typhimurium | |
| E.coli 0157-H7 | 2164450 | 2165550 | ECs2161 | K12 | S.typhimurium | |
| E.coli 0157-H7 | 2215450 | 2216550 | ECs2234–ECs2235 | K12 | S.typhimurium | |
| E.coli 0157-H7 | 2674450 | 2675550 | ECs2719 | K12 | S.typhimurium | |
| E.coli 0157-H7 | 2900950 | 2901550 | ECs2943 | K12 | S.typhimurium | |
| E.coli 0157-H7 | 919450 | 920050 | ECs0842 | K12, cft073 | S.typhimurium | |
| E.coli 0157-H7 | 1964950 | 1966050 | ECs1990 | K12, cft073 | S.typhimurium | |
| E.coli 0157-H7 | 923450 | 924050 | ECs0844 | K12, cft073, S flex | No homology, bacteriophage-like protein | No credible donor |
| E.coli 0157-H7 | 1207950 | 1209050 | ECs1123–ECs1124 | K12, cft073, S flex | No homology, bacteriophage-like protein | |
| E.coli 0157-H7 | 1285450 | 1286550 | ECs1228 | K12, cft073, S flex | No homology, bacteriophage-like protein | No credible donor |
| E.coli 0157-H7 | 1799450 | 1800050 | ECs1808 | K12, cft073, S flex | No homology, bacteriophage-like protein | No credible donor |
| E.coli 0157-H7 | 1968950 | 1969550 | ECs1992 | K12, cft073, S flex | No homology, bacteriophage-like protein | No credible donor |
| E.coli 0157-H7 | 2160950 | 2161550 | ECs2157–ECs2158 | K12, cft073, S flex | No homology, bacteriophage-like protein | |
| E.coli 0157-H7 | 2211950 | 2212550 | ECs2231 | K12, cft073, S flex | No homology, bacteriophage-like protein | No credible donor |
| E.coli 0157-H7 | 2670950 | 2671550 | ECs2717 | K12, cft073, S flex | No homology, bacteriophage-like protein | No credible donor |
| E.coli 0157-H7 | 2896950 | 2897550 | ECs2940–ECs2941 | K12, cft073, S flex | No homology, bacteriophage-like protein | No credible donor |
| E.coli 0157-H7 | 581950 | 601550 | ECs0451–452 | K12, cft073, S flex | ||
| S.flexneri 2a str. 3 | 1626950 | 1632050 | SF1599–SF1604 | K12, 0157, cft073 | S.enterica | Coliphages (1–3) |
| S.flexneri 2a str. 3 | 1633450 | 1634550 | SF1607–1608 | K12, 0157, cft073 | No homology | Phages (1,7) |
| S.flexneri 2a 2457T | 1911950 | 1914050 | S1981 | K12, 0157, cft073 | No homology | No credible donor |
| S.flexneri 2a str. 3 | 2950450 | 2953550 | SF2859–2862 | K12, 0157, cft073 | S.typhi | Coliphages (1–3) |
| S.flexneri 2a str. 3 | 2956450 | 2957550 | SF2866 | K12, 0157, cft073 | S.typhi | Coliphages (1–3) |
| E.coli cft073 | 3411450 | 3412050 | c3562–c3563 | K12 | ||
| E.coli 0157-H7 | 2207450 | 2208550 | ECs224 | K12 | ||
| E.coli 0157-H7 | 2736450 | 2737050 | ECs2791 | K12 |
The transfers are grouped according to their similarities in terms of signature (only the strain with the more distant DNA segments is mentioned. The genes are present in all strains except in those mentioned in the absent column, see text). Position and gene content of atypical region, group of strains where these genes are absent, homolog genes in other species detected by a FASTA search and remarkable donors proposed by our method are given.
aThe seven strains/species can be grouped into four sets [K12 (2 strains), S flex (2 strains), 0157 (2 strains) and CFT073] regarding gene content, gene specificity and similarities of detected regions.