| Literature DB >> 29891960 |
Xin Chen1, Bing Yang1, Zijing Lin2.
Abstract
Computational determination of peptide conformations is challenging as it is a problem of finding minima in a high-dimensional space. The "divide and conquer" approach is promising for reliably reducing the search space size. A random forest learning model is proposed here to expand the scope of applicability of the "divide and conquer" approach. A random forest classification algorithm is used to characterize the distributions of the backbone φ-ψ units ("words"). A random forest supervised learning model is developed to analyze the combinations of the φ-ψ units ("grammar"). It is found that amino acid residues may be grouped as equivalent "words", while the φ-ψ combinations in low-energy peptide conformations follow a distinct "grammar". The finding of equivalent words empowers the "divide and conquer" method with the flexibility of fragment substitution. The learnt grammar is used to improve the efficiency of the "divide and conquer" method by removing unfavorable φ-ψ combinations without the need of dedicated human effort. The machine learning assisted search method is illustrated by efficiently searching the conformations of GGG/AAA/GGGG/AAAA/GGGGG through assembling the structures of GFG/GFGG. Moreover, the computational cost of the new method is shown to increase rather slowly with the peptide length.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29891960 PMCID: PMC5995823 DOI: 10.1038/s41598-018-27167-w
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The φ-ψ units in a pentapeptide X1X2X3X4X5. Generally, x1x2…xni refers to the φ-ψ unit of the (i + 1)th AA residue in a peptide X1X2…Xn with n AA residues.
The matrix of “error rate” for different φ-ψ units.
| gvgg1 | gvgg2 | gvgg3 | gtgg1 | gtgg2 | gtgg3 | gfgg1 | gfgg2 | gfgg3 | gtg1 | gtg2 | gvg1 | gvg2 | vgg1 | vgg2 | mgg1 | mgg2 | fgg1 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| gvgg2 | 0.2 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
| gvgg3 | 0.2 | 0.2 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
| gtgg1 | 0.4 | 0.3 | 0.2 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
| gtgg2 | 0.3 | 0.6 | 0.2 | 0.3 | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
| gtgg3 | 0.2 | 0.2 | 0.6 | 0.2 | 0.3 | — | — | — | — | — | — | — | — | — | — | — | — | — |
| gfgg1 | 0.7 | 0.3 | 0.2 | 0.5 | 0.3 | 0.2 | — | — | — | — | — | — | — | — | — | — | — | — |
| gfgg2 | 0.3 | 0.7 | 0.2 | 0.3 | 0.7 | 0.2 | 0.3 | — | — | — | — | — | — | — | — | — | — | — |
| gfgg3 | 0.2 | 0.2 | 0.7 | 0.2 | 0.2 | 0.7 | 0.2 | 0.2 | — | — | — | — | — | — | — | — | — | — |
| gtg1 | 0.4 | 0.3 | 0.2 | 0.5 | 0.3 | 0.2 | 0.4 | 0.2 | 0.2 | — | — | — | — | — | — | — | — | — |
| gtg2 | 0.1 | 0.1 | 0.3 | 0.1 | 0.2 | 0.4 | 0.1 | 0.1 | 0.4 | 0.1 | — | — | — | — | — | — | — | — |
| gvg1 | 0.4 | 0.2 | 0.1 | 0.3 | 0.3 | 0.1 | 0.4 | 0.2 | 0.1 | 0.4 | 0.1 | — | — | — | — | — | — | — |
| gvg2 | 0.1 | 0.1 | 0.3 | 0.1 | 0.2 | 0.4 | 0.1 | 0.1 | 0.4 | 0.1 | 0.6 | 0.1 | — | — | — | — | — | — |
| vgg1 | 0.1 | 0.3 | 0.2 | 0.2 | 0.4 | 0.2 | 0.2 | 0.4 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | — | — | — | — | — |
| vgg2 | 0.1 | 0.1 | 0.3 | 0.1 | 0.2 | 0.3 | 0.1 | 0.1 | 0.4 | 0.1 | 0.5 | 0.1 | 0.6 | 0.2 | — | — | — | — |
| mgg1 | 0.1 | 0.2 | 0.0 | 0.1 | 0.2 | 0.1 | 0.1 | 0.3 | 0.1 | 0.1 | 0.0 | 0.1 | 0.1 | 0.3 | 0.1 | — | — | — |
| mgg2 | 0.1 | 0.1 | 0.3 | 0.1 | 0.1 | 0.3 | 0.1 | 0.1 | 0.3 | 0.1 | 0.4 | 0.1 | 0.5 | 0.1 | 0.3 | 0.0 | — | — |
| fgg1 | 0.1 | 0.3 | 0.1 | 0.2 | 0.3 | 0.1 | 0.2 | 0.4 | 0.1 | 0.2 | 0.2 | 0.2 | 0.2 | 0.6 | 0.1 | 0.4 | 0.2 | — |
| fgg2 | 0.1 | 0.1 | 0.3 | 0.1 | 0.2 | 0.4 | 0.1 | 0.1 | 0.4 | 0.1 | 0.5 | 0.1 | 0.6 | 0.2 | 0.6 | 0.1 | 0.4 | 0.2 |
See Fig. 1 for the notions of φ-ψ units.
Figure 22D MDS map of all φ-ψ units in the low-energy conformations of 8 peptides.
Figure 3Results of the new search method (in red), the conventional systematic search method (in black) and the path matrix method (in green) on the obtained low-energy conformations of: (A) AAA, (B) GGGG, (C) AAAA, (D) GGGGG.
Total numbers of trial structures for a peptide backbone with n AA residues required by the systematic search method (Nsys), the path matrix method (NPM)3 and the random forest assisted “divide and conquer” method (NRF).
|
| 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|
|
| 3,456 | 41,472 | 4.98E + 05 | 5.97E + 06 | 7.17E + 07 | 8.60E + 08 | 1.03E + 10 | 1.24E + 11 |
|
| 240 | 1,130 | 5,310 | 2.49E + 04 | 1.17E + 05 | 5.50E + 05 | 2.58E + 06 | 1.21E + 07 |
|
| 1,838 | 4,096 | 5,438 | 6,649 | 7,318 | 8,540 | 9,613 | 11,341 |