| Literature DB >> 16046817 |
Abstract
In a recently proposed contextual alignment model, efficient algorithms exist for global and local pairwise alignment of protein sequences. Preliminary results obtained for biological data are very promising. Our main motivation was to adopt the idea of context dependency to the multiple-alignment setting. To this aim the relaxation of the model was developed (we call this new model averaged contextual alignment) and a new family of amino acids substitution matrices are constructed. In this paper we present a contextual multiple-alignment algorithm and report the outcomes of experiments performed for the BAliBASE test set. The contextual approach turned out to give much better results for the set of sequences containing orphan genes.Entities:
Year: 2005 PMID: 16046817 PMCID: PMC1184041 DOI: 10.1155/JBB.2005.124
Source DB: PubMed Journal: J Biomed Biotechnol ISSN: 1110-7243
Characteristics of substitution scores.
| Clustering% | NONCTX | CTX | ||||
|---|---|---|---|---|---|---|
| Avg | StdDev | Entropy | Avg | StdDev | Entropy | |
| 100% | −0.5984 | 1.4561 | 1.0642 | −0.4715 | 1.5498 | 1.0558 |
| 90% | −0.3363 | 1.2165 | 0.6515 | −0.3124 | 1.2537 | 0.6620 |
| 80% | −0.2472 | 1.1086 | 0.5128 | −0.2310 | 1.1316 | 0.5248 |
| 70% | −0.1658 | 0.9878 | 0.3839 | −0.1590 | 0.9999 | 0.3970 |
| 60% | −0.0928 | 0.8449 | 0.2590 | −0.0931 | 0.8523 | 0.2716 |
| 50% | −0.0429 | 0.6858 | 0.1519 | −0.0500 | 0.7040 | 0.1622 |
| 40% | −0.0110 | 0.5607 | 0.0883 | −0.0278 | 0.6013 | 0.1002 |
The robustness of noncontextual tables. The range, median, and standard deviation for the number of examples drawn on per substitution score.
| Table | No of pairs used | Min | Max | Med | StdDev |
| NONCTX100 | 910427386 | 201204 | 44246771 | 2089431 | 6447364 |
| NONCTX90 | 397939179 | 85159 | 17134863 | 1154422 | 2226931 |
| NONCTX80 | 228719630 | 52834 | 8703468 | 719781 | 1159503 |
| NONCTX70 | 125188080 | 32428 | 4022674 | 429833 | 563942 |
| NONCTX60 | 58669007 | 17718 | 1468427 | 218982 | 228104 |
| NONCTX50 | 21889157 | 8121 | 424847 | 94091 | 70724 |
| NONCTX40 | 7104342 | 3034 | 110252 | 32151 | 21160 |
The robustness of contextual tables. The range, median, and standard deviation for the number of examples drawn on per substitution score.
| Table | No of pairs used | Min | Max | Med | StdDev |
| CTX100 | 910427386 | 80 | 3033544 | 72832 | 105276 |
| CTX90 | 397939179 | 52 | 1112640 | 38208 | 33440 |
| CTX80 | 228719630 | 40 | 504100 | 21952 | 17628 |
| CTX70 | 125188080 | 24 | 211316 | 14016 | 8620 |
| CTX60 | 58669007 | 8 | 98096 | 1644 | 3560 |
| CTX50 | 21889157 | 4 | 18736 | 700 | 1116 |
| CTX40 | 7104342 | 1 | 7016 | 228 | 352 |
Summarized score for contextual versus noncontextual model. Score here corresponds to the frequency of properly aligned pairs of residues.
| Reference | Protein families | Context | Noncontext | % of improvement |
| Ref 1 | Short (< 25%) | 0.5619 | 0.5260 | 6.83 |
| Short (20%–40%) | 0.7323 | 0.7309 | 0.19 | |
| Short (> 35%) | 0.9004 | 0.8964 | 0.45 | |
| Medium (< 25%) | 0.4034 | 0.4091 | −1.39 | |
| Medium (20%–40%) | 0.7951 | 0.7879 | 0.90 | |
| Medium (> 35%) | 0.9202 | 0.9198 | 0.04 | |
| AVG | 0.7379 | 0.7318 | 0.83 | |
| Ref 2 | Short | 0.6868 | 0.6633 | 3.52 |
| Medium | 0.6580 | 0.6561 | 0.30 | |
| AVG | 0.6742 | 0.6602 | 2.12 | |
| Ref 3 | Short | 0.4008 | 0.4263 | −5.49 |
| Medium | 0.5880 | 0.5790 | 1.55 | |
| AVG | 0.4810 | 0.4917 | −2.17 | |
| Ref 6 | AVG | 0.45 | 0.442 | 1.81 |
| AVG | — | 0.6674 | 0.6610 | 0.96 |
The influence of orphans on the quality of the alignment.
| Protein family | Context | Noncontext | % of improvement |
| Short | 0.8918 | 0.8461 | 5.4 |
| Medium | 0.8593 | 0.8807 | −2.4 |
| AVG | 0.8776 | 0.8613 | 1.89 |
Figure 1Comparison of Contextual and noncontextual scores. (a) SPS score comparison, (b) SPS score distributions, and (c) Improvement versus noncontextual score.
Families for which the contextual model gives much better alignments.
| Protein family | No of sequences | Context | Noncontext | % of improvement |
| 1ycc: cytochrome e | 4 | 0.765 | 0.665 | 15.04 |
| 2trx: thioredoxin | 4 | 0.671 | 0.468 | 43.38 |
| 1aboA: sh3 | 15 | 0.683 | 0.580 | 17.76 |
| 1uky: uridyl kin | 24 | 0.541 | 0.464 | 20.91 |
| sh3-2-ref6: sh3 | 6 | 0.553 | 0.454 | 21.81 |
| sh3-3-ref6: sh3 | 5 | 0.430 | 0.214 | 100.93 |
| AVG | — | 0.606 | 0.474 | 29.11 |
Figure 2The entropy for substitution tables with a standard context (full_D1), a wider but grouped context (6GR_W2), and a more distant context (full_D2).