| Literature DB >> 16006623 |
Catrióna R Johnston1, Denis C Shields.
Abstract
Searching databases for distant homologues using alignments instead of individual sequences increases the power of detection. However, most methods assume that protein evolution proceeds in a regular fashion, with the inferred tree of sequences providing a good estimation of the evolutionary process. We investigated the combined HMMER search results from random alignment subsets (with three sequences each) drawn from the parent alignment (Rand-shuffle algorithm), using the SCOP structural classification to determine true similarities. At false-positive rates of 5%, the Rand-shuffle algorithm improved HMMER's sensitivity, with a 37.5% greater sensitivity compared with HMMER alone, when easily identified similarities (identifiable by BLAST) were excluded from consideration. An extension of the Rand-shuffle algorithm (Ali-shuffle) weighted towards more informative sequence subsets. This approach improved the performance over HMMER alone and PSI-BLAST, particularly at higher false-positive rates. The improvements in performance of these sequence sub-sampling methods may reflect lower sensitivity to alignment error and irregular evolutionary patterns. The Ali-shuffle and Rand-shuffle sequence homology search programs are available by request from the authors.Entities:
Mesh:
Substances:
Year: 2005 PMID: 16006623 PMCID: PMC1174907 DOI: 10.1093/nar/gki687
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1(a) Sensitivity (y-axis) against the false-positive rate (x-axis) for the search results of 688 protein families. Ali-shuffle (ali), HMMER (hmm), Rand-shuffle (ran) and PSI-BLAST (psi) are compared. A total of 95% confidence intervals are also included in the plot. (b) Detail.
Effect of excluding BLAST-detected hits on the relative percentage increase in sensitivity of Ali-shuffle and Rand-shuffle compared with HMMER alone
| % False positives | Ali-shuffle | Rand-shuffle | ||
|---|---|---|---|---|
| Including blast | Excluding blast | Including blast | Excluding blast | |
| 5 | 7.79 | 62.50 | 5.19 | 37.50 |
| 10 | 7.41 | 50.00 | 2.47 | 33.33 |
| 15 | 8.43 | 57.89 | 6.02 | 42.11 |
| 20 | 8.14 | 60.00 | 5.23 | 50.00 |
| 25 | 10.34 | 63.64 | 5.75 | 45.45 |
| 30 | 8.89 | 83.33 | 4.44 | 50.00 |
| 35 | 9.89 | 64.29 | 5.49 | 35.71 |
| 40 | 10.87 | 51.61 | 6.52 | 25.81 |
| 45 | 12.77 | 57.58 | 6.38 | 21.21 |
| 50 | 13.54 | 61.11 | 5.21 | 16.67 |
aExcluding from consideration BLAST hits detectable with E-value of 10−4.