| Literature DB >> 15716310 |
Aristotelis Tsirigos1, Isidore Rigoutsos.
Abstract
In recent years, the increase in the amounts of available genomic data has made it easier to appreciate the extent by which organisms increase their genetic diversity through horizontally transferred genetic material. Such transfers have the potential to give rise to extremely dynamic genomes where a significant proportion of their coding DNA has been contributed by external sources. Because of the impact of these horizontal transfers on the ecological and pathogenic character of the recipient organisms, methods are continuously sought that are able to computationally determine which of the genes of a given genome are products of transfer events. In this paper, we introduce and discuss a novel computational method for identifying horizontal transfers that relies on a gene's nucleotide composition and obviates the need for knowledge of codon boundaries. In addition to being applicable to individual genes, the method can be easily extended to the case of clusters of horizontally transferred genes. With the help of an extensive and carefully designed set of experiments on 123 archaeal and bacterial genomes, we demonstrate that the new method exhibits significant improvement in sensitivity when compared to previously published approaches. In fact, it achieves an average relative improvement across genomes of between 11 and 41% compared to the Codon Adaptation Index method in distinguishing native from foreign genes. Our method's horizontal gene transfer predictions for 123 microbial genomes are available online at http://cbcsrv.watson.ibm.com/HGT/.Entities:
Mesh:
Year: 2005 PMID: 15716310 PMCID: PMC549390 DOI: 10.1093/nar/gki187
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1Example of a template.
Figure 2Demonstrating the automatic method for selecting a score threshold using the genome of A.pernix as a test case—see also text.
List of phages
| Phage | GenBank ID | Genes |
|---|---|---|
| Streptococcus thermophilus bacteriophage Sfi21 | NC_000872 | 50 |
| Coliphage alpha3 | NC_001330 | 10 |
| Mycobacterium phage L5 | NC_001335 | 85 |
| Haemophilus phage HP1 | NC_001697 | 42 |
| Methanobacterium phage psiM2 | NC_001902 | 32 |
| Mycoplasma arthritidis bacteriophage MAV1 | NC_001942 | 15 |
| Chlamydia phage 2 virion | NC_002194 | 8 |
| Methanothermobacter wolfeii prophage psiM100 | NC_002628 | 35 |
| Bacillus phage GA-1 virion | NC_002649 | 35 |
| Lactococcus lactis bacteriophage TP901-1 | NC_002747 | 56 |
| Streptococcus pneumoniae bacteriophage MM1 provirus | NC_003050 | 53 |
| Sulfolobus islandicus filamentous virus | NC_003214 | 72 |
| Bacteriophage PSA | NC_003291 | 59 |
| Halovirus HF2 | NC_003345 | 114 |
| Cyanophage P60 | NC_003390 | 80 |
| Lactobacillus casei bacteriophage A2 virion | NC_004112 | 61 |
| Vibrio cholerae O139 fs1 phage | NC_004306 | 15 |
| Salmonella typhimurium phage ST64B | NC_004313 | 56 |
| Pseudomonas aeruginosa phage PaP3 | NC_004466 | 71 |
| Streptococcus pyogenes phage 315.4 provirus | NC_004587 | 64 |
| Staphylococcus aureus phage phi 13 provirus | NC_004617 | 49 |
| Yersinia pestis phage phiA1122 | NC_004777 | 50 |
| Xanthomonas oryzae bacteriophage Xp10 | NC_004902 | 60 |
| Enterobacteria phage RB69 | NC_004928 | 179 |
| Burkholderia cepacia phage BcepNazgul | NC_005091 | 75 |
| Ralstonia phage p12J virion | NC_005131 | 10 |
| Bordetella phage BPP-1 | NC_005357 | 49 |
Gene scoring methods
| Name | Width | Step | Measure | Description |
|---|---|---|---|---|
| CG | 1 | 1 | χ2 | G + C content |
| 3/4 | 2 | 3 | χ2 | Dinucleotide composition of codon positions 3 and 1 |
| CODONS | 3 | 3 | χ2 | Codon composition |
| CAI | 3 | 3 | N/A | Codon Adaptation Index |
| W8 | 8 | 1 | Covariance | 8-nucleotide composition (no gaps) |
Overall performance Perf for the methods under evaluation
| %HGT | CG (%) | 3/4 (%) | CODONS (%) | CAI (%) | W8 (%) |
|---|---|---|---|---|---|
| 1 | 38.81 | 36.80 | 27.68 | 43.83 | 51.28 |
| 2 | 44.41 | 43.08 | 34.41 | 49.58 | 56.26 |
| 4 | 50.33 | 49.34 | 41.59 | 55.30 | 61.21 |
| 8 | 56.41 | 56.24 | 49.79 | 61.11 | 65.88 |
Figure 3Overall performance Perf of five scoring methods that has been averaged over 123 genomes: (a) case of a phage donor gene pool and (b) case of a prokaryote donor gene pool.
Improvement of reported W8 method over previous methods
| %HGT | W8 vs CG (%) | W8 vs 3/4 (%) | W8 vs CODONS (%) | W8 vs CAI (%) |
|---|---|---|---|---|
| (A) % improvement in overall performance | ||||
| 1 | 12.47 | 14.48 | 23.60 | 7.45 |
| 2 | 11.85 | 13.18 | 21.85 | 6.68 |
| 4 | 10.88 | 11.87 | 19.62 | 5.91 |
| 8 | 9.47 | 9.64 | 16.09 | 4.77 |
| (B) % average relative improvement | ||||
| 1 | 146.57 | 93.01 | 232.79 | 41.61 |
| 2 | 70.57 | 59.82 | 129.98 | 27.87 |
| 4 | 32.90 | 37.24 | 78.96 | 18.18 |
| 8 | 19.88 | 22.04 | 45.05 | 11.64 |
Figure 4Achieved relative improvement Rel of W8 versus CAI averaged over all experiments and all genomes (see also text).
Figure 5Average relative improvement of W8 over CAI for each one of the 123 organisms. Each point is an average over 100 experiments with donor genes drawn from the phage gene pool (see also text).
Figure 7Achieved overall performance Perf as a function of template size and for different percentages of artificially added genes: (a) case of phage gene donor pool and (b) case of prokaryotic gene donor pool.
Figure 6Average relative improvement of W8 over CAI for each one of the 123 organisms. Each point is an average over 1000 experiments with donor genes drawn from the prokaryote gene pool (see also text).
Figure 8Detecting the vancomycin-resistance cluster of horizontally transferred genes in E.faecalis. In an ideal setting, the genes of this cluster should be reported as a group (i.e. their ranks for a given scoring scheme should be as close to each other as possible) and uninterrupted by genes that do not belong to the cluster. Additionally, the ideal method should be able to report typicality scores for the group as a whole that are as low as possible or, equivalently, assign gene ranks to these genes that are as low as possible (see also text).