| Literature DB >> 31870277 |
Lei Deng1, Guolun Zhong1, Chenzhe Liu1, Judong Luo2, Hui Liu3.
Abstract
BACKGROUND: Protein comparative analysis and similarity searches play essential roles in structural bioinformatics. A couple of algorithms for protein structure alignments have been developed in recent years. However, facing the rapid growth of protein structure data, improving overall comparison performance and running efficiency with massive sequences is still challenging.Entities:
Keywords: Parallel programming; Protein structure alignment; Structural neighbor searching
Mesh:
Substances:
Year: 2019 PMID: 31870277 PMCID: PMC6929402 DOI: 10.1186/s12859-019-3235-1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Alignment performance comparison on the TM-align dataset
| Method | RMSD | TM-score | Running Time (s) |
|---|---|---|---|
| CE | 6.30 | 0.273 | 0.52 |
| SAL | 6.96 | 0.320 | 2.47 |
| TM-align | 4.99 | 0.348 | 0.13 |
| Fr-TM-align | 4.73 | 0.365 | 1.65 |
| MADOKA | 4.07 | 0.562 | 0.02 |
Fig. 1Computing time and amount of protein at different protein sizes. a shows the average computing time of proteins in the TM-align dataset for structural similarity searching against the whole PDB database by using MADOKA. b indicates the number of structures at different protein sizes in the TM-align dataset. c shows average running time curves with respect to randomly selected proteins from entire TM-align dataset (N=20, 40 and 60 is the number of selected proteins each time). d shows average running time corresponding to three different group of protein split by lengths
Performance of six pairwise structure alignment tools on benchmarks MALIDUP and MALISAM
| Benchmark | Method | Nali | RMSD | TM-score | Total Time (s) |
|---|---|---|---|---|---|
| MALIDUP | DeepAlign ∗ | 85.5 | 2.61 | 0.622 | 10.2 |
| DALI ∗ | 83.5 | 2.65 | 0.600 | 115.3 | |
| MATT ∗ | 82.3 | 2.47 | 0.608 | 63.0 | |
| Formatt ∗ | 70.6 | 2.19 | 0.542 | 85.1 | |
| TM-align ∗ | 87.0 | 2.62 | 0.631 | 6.4 | |
| MADOKA | 91.7 | 3.43 | 0.631 | 1.2 | |
| MALISAM | DeepAlign ∗ | 61.3 | 2.96 | 0.521 | 4.3 |
| DALI ∗ | 61.0 | 3.11 | 0.515 | 47.4 | |
| MATT ∗ | 56.2 | 2.74 | 0.486 | 16.2 | |
| Formatt ∗ | 44.9 | 2.42 | 0.411 | 33.1 | |
| TM-align ∗ | 61.1 | 3.06 | 0.517 | 2.9 | |
| MADOKA | 62.8 | 2.72 | 0.555 | 0.7 |
*These are detailed in [33]
Fig. 2Two examples showing the structural alignments from TM-align and MADOKA. a shows alignment between 1A1O_A (276 residues) and 4HKJ_A (277 residues). b shows alignment between 2GZA_A (334 residues) and 1A1M_B (99 residues)
Fig. 3Web page for 3D visualization of structure alignment and structure neighbors
Fig. 4Schematic diagram of MADOKA algorithm and the web interface. The algorithm involves two steps: 1) Search for Longest-Common-Subsequence (LCS) for each pairwise secondary structure elements using dynamic programming, and then structure pairs with the length of the LCS below the threshold are removed; 2) Pairwise 3D residue structural rigid body superposition is performed and residue-level alignments are constructed, and the best alignment with the highest TM-score and optimally aligned position for each pair of protein structures is selected
Fig. 5Scoring matrix for LCS problem and dynamic programming backtrack