| Literature DB >> 32466409 |
Óscar Álvarez-Machancoses1, Juan Luis Fernández-Martínez1, Andrzej Kloczkowski2.
Abstract
We discuss the use of the regularized linear discriminant analysis (LDA) as a model reduction technique combined with particle swarm optimization (PSO) in protein tertiary structure prediction, followed by structure refinement based on singular value decomposition (SVD) and PSO. The algorithm presented in this paper corresponds to the category of template-based modeling. The algorithm performs a preselection of protein templates before constructing a lower dimensional subspace via a regularized LDA. The protein coordinates in the reduced spaced are sampled using a highly explorative optimization algorithm, regressive-regressive PSO (RR-PSO). The obtained structure is then projected onto a reduced space via singular value decomposition and further optimized via RR-PSO to carry out a structure refinement. The final structures are similar to those predicted by best structure prediction tools, such as Rossetta and Zhang servers. The main advantage of our methodology is that alleviates the ill-posed character of protein structure prediction problems related to high dimensional optimization. It is also capable of sampling a wide range of conformational space due to the application of a regularized linear discriminant analysis, which allows us to expand the differences over a reduced basis set.Entities:
Keywords: LDA classification; PSO; Protein Tertiary Structure; Uncertainty Analysis
Mesh:
Substances:
Year: 2020 PMID: 32466409 PMCID: PMC7321371 DOI: 10.3390/molecules25112467
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Figure 1Linear discriminant analysis (LDA)–singular value decomposition (SVD) algorithm flowchart.
Figure 2Overview of the protein modelling via regressive–regressive particle swarm optimization (RR-PSO).
Summary of the protein selected and the number of templates available alongside the class division.
| Protein (CASP Code) | Number of Residues | Number of Templates | Number of Classes |
|---|---|---|---|
| 2l3f (T0545) | 166 | 185 | 4 |
| 3obh (T0551) | 82 | 199 | 4 |
| 2l06 (T0555) | 155 | 182 | 4 |
| 2kyy (T0557) | 153 | 183 | 4 |
| 2xse (T0561) | 170 | 180 | 4 |
| 3nbm (T0580) | 108 | 195 | 4 |
| 3n1u (T0635) | 191 | 181 | 4 |
| 2x3o (T0637) | 240 | 194 | 4 |
| 3nym (T0639) | 128 | 206 | 4 |
| 3nzl (T0643) | 82 | 178 | 4 |
| 4pqx (T0760) | 217 | 94 | 4 |
| 4q69 (T0770) | 462 | 100 | 4 |
| 4qdy (T0780) | 227 | 103 | 4 |
| 4l4w (T0790) | 295 | 107 | 4 |
| 4qrk (T0800) | 220 | 277 | 4 |
| Q6MI90_BDEBA (T0810) | 383 | 164 | 4 |
| VCID6010 (T0820) | 140 | 333 | 4 |
| 5f15 (T0830) | 575 | 225 | 4 |
| 4gt8 (T0840) | 669 | 96 | 4 |
| U1 Protein (T0850) | 190 | 268 | 4 |
| 5d9g (T0864) | 246 | 264 | 4 |
| 5j5v (T0870) | 323 | 268 | 4 |
| 1ctf (T0880) | 787 | 321 | 4 |
| 5t87 (T0885) | 116 | 122 | 4 |
| 3k1e (T0890) | 125 | 321 | 4 |
| 5aot (T0900) | 106 | 255 | 4 |
| 6c0t (T0910) | 347 | 105 | 4 |
| 5ere (T0920) | 568 | 91 | 4 |
| 5sy1 (T0930) | 149 | 187 | 4 |
| 1o6d (T0940) | 163 | 259 | 4 |
Figure 3Template energy and energy selection.
Figure 4Template classification based on energy and structural considerations.
Figure 5Example of protein classification. Protein 3obh class division and intraclass structural similarity.
Figure 6Protein search space constructed via L2-regularized LDA.
Details of the computational experiments performed with the methodology presented in this paper via LDA–SVD and PSO.
| Protein (CASP Code) | Number of Residues | Number of Classes | Reduced Basis Terms | Percentile of Decoys | Number of Iterations | Swarm Size | Energy Obtained |
|---|---|---|---|---|---|---|---|
| 2l3f (T0545) | 166 | 4 | 5 | 30 | 50 | 40 | −343.86 |
| 3obh (T0551) | 82 | 4 | 5 | 30 | 50 | 40 | −163.42 |
| 2l06 (T0555) | 155 | 4 | 5 | 30 | 50 | 40 | −381.96 |
| 2kyy (T0557) | 153 | 4 | 5 | 30 | 50 | 40 | −152.77 |
| 2xse (T0561) | 170 | 4 | 5 | 30 | 50 | 40 | −449.50 |
| 3nbm (T0580) | 108 | 4 | 5 | 30 | 50 | 40 | −255.42 |
| 3n1u (T0635) | 191 | 4 | 5 | 30 | 50 | 40 | −369.47 |
| 2x3o (T0637) | 240 | 4 | 5 | 30 | 50 | 40 | −372.10 |
| 3nym (T0639) | 128 | 4 | 5 | 30 | 50 | 40 | −343.22 |
| 3nzl (T0643) | 82 | 4 | 5 | 30 | 50 | 40 | −210.34 |
| 4pqx (T0760) | 217 | 4 | 5 | 30 | 50 | 40 | −496.11 |
| 4q69 (T0770) | 462 | 4 | 5 | 30 | 50 | 40 | −992.46 |
| 4qdy (T0780) | 227 | 4 | 5 | 30 | 50 | 40 | −425.77 |
| 4l4w (T0790) | 295 | 4 | 5 | 30 | 50 | 40 | −598.56 |
| 4qrk (T0800) | 220 | 4 | 5 | 30 | 50 | 40 | −502.35 |
| Q6MI90_BDEBA (T0810) | 383 | 4 | 5 | 30 | 50 | 40 | −902.65 |
| VCID6010 (T0820) | 140 | 4 | 5 | 30 | 50 | 40 | −356.56 |
| 5f15 (T0830) | 575 | 4 | 5 | 30 | 50 | 40 | −1214.65 |
| 4gt8 (T0840) | 669 | 4 | 5 | 30 | 50 | 40 | −1115.98 |
| U1 Protein (T0850) | 190 | 4 | 5 | 30 | 50 | 40 | −448.13 |
| 5d9g (T0864) | 246 | 4 | 5 | 30 | 50 | 40 | −545.61 |
| 5j5v (T0870) | 323 | 4 | 5 | 30 | 50 | 40 | −408.32 |
| 1ctf (T0880) | 787 | 4 | 5 | 30 | 50 | 40 | −398.39 |
| 5t87 (T0885) | 116 | 4 | 5 | 30 | 50 | 40 | −298.43 |
| 3k1e (T0890) | 125 | 4 | 5 | 30 | 50 | 40 | −561.94 |
| 5aot (T0900) | 106 | 4 | 5 | 30 | 50 | 40 | −208.01 |
| 6c0t (T0910) | 347 | 4 | 5 | 30 | 50 | 40 | −838.89 |
| 5ere (T0920) | 568 | 4 | 5 | 30 | 50 | 40 | −1229.68 |
| 5sy1 (T0930) | 149 | 4 | 5 | 30 | 50 | 40 | −1812.17 |
| 1o6d (T0940) | 163 | 4 | 5 | 30 | 50 | 40 | −627.02 |
RMSDs predicted structures via LDA–SVD and particle swarm optimization compared to Rosetta and Zhang servers.
| Protein (CASP Code) | RMSD LDA–SVD | RMSD Zhang Server | RMSD Rosetta Server |
|---|---|---|---|
| 2l3f (T0545) | 1.27 | 2.17 | 2.38 |
| 3obh (T0551) | 5.30 | 2.75 | 2.65 |
| 2l06 (T0555) | 6.16 | 2.99 | 3.20 |
| 2kyy (T0557) | 1.16 | 2.54 | 2.08 |
| 2xse (T0561) | 5.88 | 3.01 | 3.09 |
| 3nbm (T0580) | 1.16 | 1.80 | 1.37 |
| 3n1u (T0635) | 1.31 | 0.74 | 1.08 |
| 2x3o (T0637) | 5.18 | 2.44 | 2.61 |
| 3nym (T0639) | 6.70 | 2.74 | 2.11 |
| 3nzl (T0643) | 3.67 | 2.72 | 2.75 |
| 4pqx (T0760) | 2.73 | 2.93 | 3.21 |
| 4q69 (T0770) | 5.01 | 4.53 | 4.47 |
| 4qdy (T0780) | 3.12 | 2.97 | 2.93 |
| 4l4w (T0790) | 3.81 | 4.96 | 4.48 |
| 4qrk (T0800) | 12.25 | 7.25 | 9.80 |
| Q6MI90_BDEBA (T0810) | 13.85 | 8.30 | 14.76 |
| VCID6010 (T0820) | 12.97 | 9.32 | 14.75 |
| 5f15 (T0830) | 26.57 | 20.77 | 11.15 |
| 4gt8 (T0840) | 3.03 | 2.71 | 4.99 |
| U1 Protein (T0850) | 3.65 | 3.48 | 4.03 |
| 5d9g (T0864) | 2.81 | 2.58 | 2.10 |
| 5j5v (T0870) | 17.12 | 12.67 | 11.84 |
| 1ctf (T0880) | 6.66 | 8.69 | 7.59 |
| 5t87 (T0885) | 2.87 | 3.92 | 2.93 |
| 3k1e (T0890) | 8.93 | 3.99 | 8.02 |
| 5aot (T0900) | 3.32 | 7.63 | 4.67 |
| 6c0t (T0910) | 1.74 | 1.58 | 1.83 |
| 5ere (T0920) | 2.38 | 2.40 | 2.32 |
| 5sy1 (T0930) | 4.56 | 3.27 | 4.25 |
| 1o6d (T0940) | 4.13 | 3.71 | 3.07 |