| Literature DB >> 22039207 |
Carsten Kemena1, Jean-Francois Taly, Jens Kleinjung, Cedric Notredame.
Abstract
MOTIVATION: Evaluating alternative multiple protein sequence alignments is an important unsolved problem in Biology. The most accurate way of doing this is to use structural information. Unfortunately, most methods require at least two structures to be embedded in the alignment, a condition rarely met when dealing with standard datasets. RESULT: We developed STRIKE, a method that determines the relative accuracy of two alternative alignments of the same sequences using a single structure. We validated our methodology on three commonly used reference datasets (BAliBASE, Homestrad and Prefab). Given two alignments, STRIKE manages to identify the most accurate one in 70% of the cases on average. This figure increases to 79% when considering very challenging datasets like the RV11 category of BAliBASE. This discrimination capacity is significantly higher than that reported for other metrics such as Contact Accepted mutation or Blosum. We show that this increased performance results both from a refined definition of the contacts and from the use of an improved contact substitution score. CONTACT: cedric.notredame@crg.eu AVAILABILITY: STRIKE is an open source freeware available from www.tcoffee.org SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.Entities:
Mesh:
Substances:
Year: 2011 PMID: 22039207 PMCID: PMC3232373 DOI: 10.1093/bioinformatics/btr587
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.STRIKE contact matrix. Each entry corresponds to the score of a contact between two amino acids. The color code merely reflects the numeric value of the entry (blue: negative, red: positive). Amino acids in the vertical column correspond to the more N-terminal amino acid of the considered interaction while the top row corresponds to the more C-terminal amino acids.
Fig. 2.The correlation factor on Homstrad of the normalized score with the BaliScore. For better display some outliers (38 points) were removed from this picture.
Fig. 3.Comparison of Δ BaliScore and Δ STRIKE score on BAliBASE3 RV11 using alignments produced by T-Coffee, Mafft and ClustalW as well as the reference alignment. All points which have the same algebraic sign are correctly classified.
The sum-of-pairs score, CAO score and STRIKE applied to BA-liBASE 3, Homstrad and Prefab
| Dataset | #comp. | PAM | Blosum | #comp. | CAO | STRIKE |
|---|---|---|---|---|---|---|
| RV11 | 1036 | 56.3 | 55.8 | 7000 | 42.5 | 79.2 |
| RV12 | 1148 | 59.2 | 58.4 | 3556 | 50.9 | 70.4 |
| RV20 | 1148 | 56.8 | 56.3 | 5544 | 48.7 | 64.9 |
| RV30 | 840 | 57.4 | 57.5 | 4480 | 49.4 | 66.1 |
| RV40 | 1316 | 58.4 | 58.3 | 6328 | 51.6 | 66.8 |
| RV50 | 420 | 55.0 | 55.5 | 2520 | 55.2 | 66.8 |
| BAliBASE total | 5908 | 57.5 | 57.2 | 29 428 | 48.8 | 69.7 |
| Homstrad | 6496 | 54.5 | 52.7 | 46 200 | 43.7 | 67.0 |
| Prefab | 47 012 | 57.5 | 57.9 | 91 644 | 47.4 | 67.4 |
The number of comparison (# comp.) is much higher for the structural measurements because a score can be computed for each structure included.
Performance measurement of STRIKE on all three databases dependent on their classification according to SCOP
| Class | #chains | #comp. | STRIKE |
|---|---|---|---|
| All α | 312 | 15 568 | 64.3 |
| All β | 437 | 21 560 | 69.4 |
| α and β (α/β) | 724 | 41 300 | 67.7 |
| α and β (α+β) | 636 | 32 424 | 67.3 |
| Multidomain proteins (α and β) | 59 | 3080 | 69.6 |
| Small proteins | 103 | 3752 | 62.9 |
#chains represents the number of different PDB chains found in this class.
Fig. 4.STRIKE classification values as function of the differences in sequence identity and delta contact score. Numbers marked with at ‘*’ have a P>0.001. The numbers in the cells give the overall number of alignments. The color denotes the percentage of correctly classified alignment pairs.