| Literature DB >> 16006619 |
Aristotelis Tsirigos1, Isidore Rigoutsos.
Abstract
In earlier work, we introduced and discussed a generalized computational framework for identifying horizontal transfers. This framework relied on a gene's nucleotide composition, obviated the need for knowledge of codon boundaries and database searches, and was shown to perform very well across a wide range of archaeal and bacterial genomes when compared with previously published approaches, such as Codon Adaptation Index and C + G content. Nonetheless, two considerations remained outstanding: we wanted to further increase the sensitivity of detecting horizontal transfers and also to be able to apply the method to increasingly smaller genomes. In the discussion that follows, we present such a method, Wn-SVM, and show that it exhibits a very significant improvement in sensitivity compared with earlier approaches. Wn-SVM uses a one-class support-vector machine and can learn using rather small training sets. This property makes Wn-SVM particularly suitable for studying small-size genomes, similar to those of viruses, as well as the typically larger archaeal and bacterial genomes. We show experimentally that the new method results in a superior performance across a wide range of organisms and that it improves even upon our own earlier method by an average of 10% across all examined genomes. As a small-genome case study, we analyze the genome of the human cytomegalovirus and demonstrate that Wn-SVM correctly identifies regions that are known to be conserved and prototypical of all beta-herpesvirinae, regions that are known to have been acquired horizontally from the human host and, finally, regions that had not up to now been suspected to be horizontally transferred. Atypical region predictions for many eukaryotic viruses, including the alpha-, beta- and gamma-herpesvirinae, and 123 archaeal and bacterial genomes, have been made available online at http://cbcsrv.watson.ibm.com/HGT_SVM/.Entities:
Mesh:
Year: 2005 PMID: 16006619 PMCID: PMC1174904 DOI: 10.1093/nar/gki660
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Gene scoring methods
| Name | Width | Step | Measure | Description |
|---|---|---|---|---|
| CAI | 3 | 3 | N/A | Codon Adaptation Index |
| W8 | 8 | 1 | covariance | 8 nt composition (no wildcards) |
| W8-SVM | 8 | 1 | SVM | 8 nt composition (no wildcards) |
The overall performance Perfm for the methods under evaluation is shown: higher numbers for the overall performance are more preferable—see also text for a definition of Perfm
| % HGT | CAI (%) | W8 (%) | W8-SVM (%) |
|---|---|---|---|
| 1 | 46.3 | 51.6 | 56.6 |
| 2 | 51.6 | 56.2 | 60.6 |
| 4 | 56.5 | 60.9 | 64.1 |
| 8 | 61.5 | 65.4 | 67.7 |
Improvement of the new W8-SVM method over CAI and over W8
| % HGT | W8-SVM versus CAI (%) | W8-SVM versus W8 (%) |
|---|---|---|
| % Improvement in overall performance | ||
| 1 | 10.3 | 5.0 |
| 2 | 9.0 | 4.4 |
| 4 | 7.6 | 3.2 |
| 8 | 6.2 | 2.3 |
| % Average relative improvement | ||
| 1 | 52.0 | 15.0 |
| 2 | 33.6 | 10.6 |
| 4 | 23.5 | 6.3 |
| 8 | 15.4 | 3.8 |
Figure 1Achieved relative improvement of W8-SVM versus CAI and of W8-SVM versus W8. The results represent an average over all experiments and all genomes (see also text).
Figure 2Average relative improvement of W8-SVM over W8 for each one of 123 organisms. Each value is an average over 20 experiments with donor genes drawn from the archaeal and bacterial gene pool (see also text).
Figure 3Average relative improvement of W8-SVM over CAI for each one of 123 organisms. Each value is an average over 20 experiments with donor genes drawn from the archaeal and bacterial gene pool (see also text).
Figure 4Atypical regions (candidate horizontal transfers) in the HHV5 genome, strain AD169.