| Literature DB >> 27639380 |
Maciej Antczak1, Marta Kasprzak2,3, Piotr Lukasiak2,3, Jacek Blazewicz2,3.
Abstract
BACKGROUND: Structural alignment of proteins is one of the most challenging problems in molecular biology. The tertiary structure of a protein strictly correlates with its function and computationally predicted structures are nowadays a main premise for understanding the latter. However, computationally derived 3D models often exhibit deviations from the native structure. A way to confirm a model is a comparison with other structures. The structural alignment of a pair of proteins can be defined with the use of a concept of protein descriptors. The protein descriptors are local substructures of protein molecules, which allow us to divide the original problem into a set of subproblems and, consequently, to propose a more efficient algorithmic solution. In the literature, one can find many applications of the descriptors concept that prove its usefulness for insight into protein 3D structures, but the proposed approaches are presented rather from the biological perspective than from the computational or algorithmic point of view. Efficient algorithms for identification and structural comparison of descriptors can become crucial components of methods for structural quality assessment as well as tertiary structure prediction.Entities:
Keywords: Combinatorial optimization; Protein structure; Structural comparison
Mesh:
Substances:
Year: 2016 PMID: 27639380 PMCID: PMC5027075 DOI: 10.1186/s12859-016-1237-9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Example of an alignment of amino acid sequences of protein descriptors from the same group (the group founder is descriptor d1p1da2_A_206_LEU, see also Fig. Sf2 in Additional file 1)
| Descriptor name | Segment 1 | Segment 2 | Segment 3 | Segment 4 | Segment 5 | Segment 6 |
|---|---|---|---|---|---|---|
| d1p1da2_A_206_LEU | FHVKLPK | LGITI | DPLVISD | SVAHRTGTLEL | DKLLAIDN | QILQQCEDLVKLKIRK |
| d1q3oa__A_679_VAL | KTVLLQK | FGFVL | ..QYLES | GVAWR.AGLRM | DFLIEVNG | NMIRQ..NTLMVKVVM |
| d1y7na1_A_84_MET | TTVLIRR | LGFSV | ..GIICS | GIAER.GGVRV | HRIIEING | HILSN..GEIHMKTMP |
| d1x6da1_A_98_ILE | HVTILHK | AGLGF | ..ITVHR | GLASQ.GTIQK | NEVLSING | RQARE..RQAVIVTRK |
| d1v62a__A_96_LEU | ..VEIVK | LGISL | ..ITIDR | SVVDR.GALHP | DHILSIDG | KLLASISEKVRLEILP |
| d1w9ea1_A_188_MET | REVILCK | LRLKS | ..IFVQL | SPASL.VGLRF | DQVLQING | KVLKQ..EKITMTIRD |
| d2cssa1_A_110_ILE | GRVILNK | LKVVG | ..AFITK | SLADVVGHLRA | DEVLEWNG | NIILE..PQVEIIVSR |
| d1uf1a__A_98_LEU | KKVNLVL | LTIRG | ..IYITG | SEAEG.SGLKV | DQILEVNG | RLLKS..RHLILTVKD |
Character “.” means that there is no structural mapping for a particular residue between the founder and the group member
Fig. 1A cost matrix C MA (on the left) and the corresponding matrix C A (on the right). The solution of the assignment problem is denoted with the gray background
Fig. 2A matrix C MA=C A for K=4 (on the left) and another C A for the same C MA and K=3 (on the right). The solutions of the assignment problem are denoted with the gray background
The dataset used in the experiment of the structural comparison of descriptors
| Descriptor elements count | Considered descriptors count | All similar descriptor pairs count |
|---|---|---|
| 3 | 1657 | 340 |
| 4 | 1631 | 100 |
| 5 | 1590 | 238 |
| 6 | 1544 | 144 |
| 7 | 1494 | 109 |
| 8 | 1446 | 117 |
| 9 | 1400 | 203 |
| 10 | 1346 | 350 |
| 11 | 1301 | 421 |
Summary of processing time [ms]
| Algorithm 1 | Algorithm 2 | Algorithm 3 | Algorithm 4 | |||||
|---|---|---|---|---|---|---|---|---|
| Descriptor elements count | avg. | std. dev. | avg. | std. dev. | avg. | std. dev. | avg. | std. dev. |
| 3 | 4.3 | 0.5 | 7.8 | 0.5 | 9.4 | 0.5 | 5.3 | 0.5 |
| 4 | 5.3 | 0.6 | 8.9 | 0.7 | 15.1 | 0.9 | 7.0 | 1.3 |
| 5 | 5.1 | 0.8 | 8.0 | 1.8 | 20.5 | 4.9 | 13.3 | 11.2 |
| 6 | 5.9 | 0.9 | 12.4 | 2.3 | 39.5 | 6.1 | 42.2 | 48.6 |
| 7 | 6.1 | 0.9 | 14.0 | 2.5 | 61.5 | 6.8 | 97.0 | 124.6 |
| 8 | 6.2 | 0.9 | 14.8 | 2.8 | 91.1 | 7.5 | 448.8 | 372.2 |
| 9 | 6.0 | 1.1 | 16.5 | 3.5 | 109.8 | 7.6 | 2743.7 | 3362.4 |
| 10 | 5.9 | 1.1 | 16.2 | 4.0 | 121.8 | 9.7 | 12267.0 | 21504.2 |
| 11 | 6.6 | 1.3 | 28.5 | 7.4 | 140.6 | 8.3 | 256785.6 | 424309.9 |
Summary of solutions quality (for elements count in the range between 3 and 11)
| Algorithm (threshold) | Coverage of similar descriptor pairs [%] | Quality identity [%] | Higher gl. RMSD, equal resid. ratio [%] | Lower residues ratio [%] | Global RMSD [Å] |
|---|---|---|---|---|---|
| 1 (1.75) | 70.60 | 96.43 | 1.42 | 2.15 | 2.01 |
| 1 (2.0) | 77.54 | 97.66 | 1.43 | 0.91 | 2.12 |
| 1 (2.33) | 81.90 | 98.44 | 1.32 | 0.23 | 2.18 |
| 2 (1.75) | 75.19 | 95.45 | 1.75 | 2.80 | 2.00 |
| 2 (2.0) | 85.59 | 96.02 | 1.88 | 2.10 | 2.12 |
| 2 (2.33) | 91.35 | 96.44 | 1.95 | 1.61 | 2.20 |
| 3 (1.75) | 80.41 | 92.10 | 3.31 | 4.59 | 2.05 |
| 3 (2.0) | 87.92 | 94.36 | 3.10 | 2.54 | 2.14 |
| 3 (2.33) | 93.07 | 95.00 | 3.06 | 1.94 | 2.21 |
| 4 | 100.00 | 100.00 | 0.00 | 0.00 | 2.28 |
Summary of solutions quality (for elements count in the range between 5 and 11)
| Algorithm (threshold) | Coverage of similar descriptor pairs [%] | Quality identity [%] | Higher gl. RMSD, equal resid. ratio [%] | Lower residues ratio [%] | Global RMSD [Å] |
|---|---|---|---|---|---|
| 1 (1.75) | 78.13 | 95.40 | 1.83 | 2.77 | 2.12 |
| 1 (2.0) | 82.64 | 97.00 | 1.84 | 1.16 | 2.20 |
| 1 (2.33) | 82.03 | 98.00 | 1.70 | 0.30 | 2.21 |
| 2 (1.75) | 84.04 | 94.16 | 2.25 | 3.60 | 2.11 |
| 2 (2.0) | 92.99 | 94.88 | 2.42 | 2.70 | 2.21 |
| 2 (2.33) | 94.18 | 95.42 | 2.51 | 2.07 | 2.22 |
| 3 (1.75) | 90.75 | 89.85 | 4.26 | 5.90 | 2.17 |
| 3 (2.0) | 95.99 | 92.74 | 3.98 | 3.27 | 2.24 |
| 3 (2.33) | 96.40 | 93.57 | 3.94 | 2.49 | 2.25 |
| 4 | 100.00 | 100.00 | 0.00 | 0.00 | 2.27 |