| Literature DB >> 17105653 |
Saikat Chakrabarti1, Christopher J Lanczycki, Anna R Panchenko, Teresa M Przytycka, Paul A Thiessen, Stephen H Bryant.
Abstract
BACKGROUND: Accurate multiple sequence alignments of proteins are very important in computational biology today. Despite the numerous efforts made in this field, all alignment strategies have certain shortcomings resulting in alignments that are not always correct. Refinement of existing alignment can prove to be an intelligent choice considering the increasing importance of high quality alignments in large scale high-throughput analysis.Entities:
Mesh:
Year: 2006 PMID: 17105653 PMCID: PMC1654193 DOI: 10.1186/1471-2105-7-499
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Improvement of objective scores after refinement (BAliBASE dataset). Histograms showing the relative improvement after refinement for four objective scores: a) alignment score b) conservation score (SCORECONS score) c) norMD score and d) information content for the BAliBASE 3.0 alignment dataset are plotted. The X-axis represents bins of relative improvement of the objective score while the Y-axis shows the percentage of alignments. Relative improvement of objective score is measured as the difference between the final scores after application of REFINER, RF and RASCAL method divided by the final score obtained from default alignment program output.
Figure 2Improvement of objective scores after refinement (CDD dataset). Histograms showing the relative improvement after refinement for four objective scores: a) alignment score b) conservation score (SCORECONS score) c) norMD score and d) information content for the CDD alignment dataset are plotted.
Impact on alignment quality following refinement.
| Reference 1 | 0.65 | 0.63 | 0.66 | 0.66 | 0.62 | 0.65 | 0.67 | 0.62 | 0.70 | 0.69 | 0.69 | 0.71 |
| Reference 2 | 0.78 | 0.80 | 0.80 | 0.80 | 0.78 | 0.80 | 0.79 | 0.79 | 0.83 | 0.82 | 0.83 | 0.82 |
| Reference 3 | 0.66 | 0.69 | 0.67 | 0.68 | 0.65 | 0.64 | 0.65 | 0.66 | 0.76 | 0.73 | 0.75 | 0.77 |
| Reference 4 | 0.67 | 0.68 | 0.66 | 0.70 | 0.67 | 0.71 | 0.66 | 0.69 | 0.75 | 0.73 | 0.70 | 0.77 |
| Reference 5 | 0.65 | 0.67 | 0.66 | 0.68 | 0.67 | 0.65 | 0.64 | 0.67 | 0.76 | 0.73 | 0.72 | 0.76 |
| 0.682 | 0.694 | 0.69 | 0.704 | 0.678 | 0.690 | 0.682 | 0.692 | 0.760 | 0.740 | 0.738 | 0.766 | |
| Reference 1 | 0.66 | 0.66 | 0.67 | 0.67 | 0.72 | 0.72 | 0.70 | 0.73 | 0.68 | 0.68 | 0.69 | 0.68 |
| Reference 2 | 0.80 | 0.81 | 0.80 | 0.80 | 0.83 | 0.83 | 0.82 | 0.82 | 0.81 | 0.83 | 0.82 | 0.82 |
| Reference 3 | 0.71 | 0.71 | 0.71 | 0.73 | 0.76 | 0.73 | 0.75 | 0.77 | 0.63 | 0.62 | 0.62 | 0.64 |
| Reference 4 | 0.71 | 0.72 | 0.68 | 0.72 | 0.77 | 0.75 | 0.71 | 0.77 | 0.71 | 0.71 | 0.70 | 0.72 |
| Reference 5 | 0.70 | 0.71 | 0.67 | 0.71 | 0.76 | 0.74 | 0.71 | 0.75 | 0.73 | 0.73 | 0.69 | 0.74 |
| 0.716 | 0.722 | 0.706 | 0.734 | 0.768 | 0.754 | 0.738 | 0.77 | 0.704 | 0.714 | 0.704 | 0.720 | |
Correlation coefficients between the improvements of estimated and real alignment accuracy scores.
| 0.236 | 0.188 | 0.167 | |
| 0.228 | 0.164 | 0.162 | |
| 0.338 | 0.308 | 0.287 | |
| 0.18 | 0.15 | 0.15 | |
| 0.258 | 0.201 | 0.191 |
Figure 3Relationship between improvement of alignment accuracy and benchmark difficulty. The relative improvement of the alignment accuracy (Y-axis) calculated as improvement of SP (Sum-of-Pair) score is plotted against the quality of alignment input to REFINER, as measured by three objective scoring functions (X-axis). a) Conservation (SCORECONS) score b) norMD score c) information content. The central line in each box shows the median value, the upper and lower boundaries of individual box show the upper and lower quartiles, and the vertical lines extend to a value 1.5 times the inter quartile range. Outlier values are shown outside the whiskers.
Sensitivity values estimated from the ROC curves at 1% and 5% error rates (fraction of false positives)
| 0.47 | 0.48 | 0.46 | 0.45 | ||
| 0.54 | 0.56 | 0.54 | 0.54 |
Comparison of average run time (in seconds) for the BAliBASE 3.0 benchmark dataset.
| Reference 1 | 10.58 | 10.34 | 51.06 | 1.93 | 22.21 | 10.00 | 38.46 | 1.65 | 26.03 | 10.33 | 53.55 | 1.67 |
| Reference 2 | 64.93 | 12.62 | 206.58 | 26.67 | 175.67 | 12.61 | 179.62 | 18.97 | 123.58 | 11.61 | 129.75 | 18.85 |
| Reference 3 | 98.81 | 25.84 | 495.25 | 35.98 | 304.23 | 24.84 | 397.11 | 35.60 | 241.69 | 21.69 | 237.96 | 32.35 |
| Reference 4 | 49.79 | 23.04 | 370.85 | 13.95 | 496.04 | 24.08 | 607.02 | 11.18 | 222.70 | 22.08 | 271.87 | 8.91 |
| Reference 5 | 35.66 | 13.23 | 327.00 | 15.19 | 201.33 | 12.00 | 300.00 | 12.93 | 103.00 | 10.00 | 209.00 | 13.45 |
| Reference 1 | 23.21 | 10.00 | 49.83 | 2.09 | 29.34 | 10.54 | 38.67 | 2.37 | 20.54 | 10.62 | 36.79 | 1.73 |
| Reference 2 | 116.66 | 12.20 | 124.50 | 16.10 | 167.12 | 12.25 | 170.50 | 13.47 | 160.74 | 21.23 | 166.29 | 18.34 |
| Reference 3 | 267.28 | 24.84 | 261.86 | 36.76 | 289.49 | 20.84 | 283.72 | 31.38 | 284.11 | 21.78 | 265.29 | 38.84 |
| Reference 4 | 267.70 | 24.58 | 272.55 | 11.00 | 362.29 | 22.08 | 443.12 | 11.26 | 403.40 | 12.08 | 415.00 | 7.57 |
| Reference 5 | 111.00 | 8.00 | 215.00 | 20.41 | 203.00 | 8.00 | 206.00 | 15.64 | 188.00 | 10.00 | 232.00 | 14.03 |