| Literature DB >> 31487288 |
Abstract
In protein tertiary structure prediction, model quality assessment programs (MQAPs) are often used to select the final structural models from a pool of candidate models generated by multiple templates and prediction methods. The 3-dimensional convolutional neural network (3DCNN) is an expansion of the 2DCNN and has been applied in several fields, including object recognition. The 3DCNN is also used for MQA tasks, but the performance is low due to several technical limitations related to protein tertiary structures, such as orientation alignment. We proposed a novel single-model MQA method based on local structure quality evaluation using a deep neural network containing 3DCNN layers. The proposed method first assesses the quality of local structures for each residue and then evaluates the quality of whole structures by integrating estimated local qualities. We analyzed the model using the CASP11, CASP12, and 3D-Robot datasets and compared the performance of the model with that of the previous 3DCNN method based on whole protein structures. The proposed method showed a significant improvement compared to the previous 3DCNN method for multiple evaluation measures. We also compared the proposed method to other state-of-the-art methods. Our method showed better performance than the previous 3DCNN-based method and comparable accuracy as the current best single-model methods; particularly, in CASP11 stage2, our method showed a Pearson coefficient of 0.486, which was better than those of the best single-model methods (0.366-0.405). A standalone version of the proposed method and data files are available at https://github.com/ishidalab-titech/3DCNN_MQA.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31487288 PMCID: PMC6728020 DOI: 10.1371/journal.pone.0221347
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Workflow of proposed method.
1. Local structure was extracted by 3D grid bounding box for each residue. 2. Local structure quality was evaluated using 3D convolutional neural network. 3. Integration residue-wise local score into whole structure score.
Fig 2Featurization of local structure.
(a) 3D grid bounding box was set for each C-alpha atom (CA) of a residue. One side size of the box was 28 Å and the box was divided into 1-Å voxels. (b) The orthonormal basis of the bounding box was calculated from C-CA vector and N-CA vector and cross product of C-CA and N-CA. (c) Atoms featured within a voxel were labeled into 14 categories as shown in Table 1. Each category feature was assigned into an independent channel of the CNN. In the figure, each voxel is colored as C, N, O, and S.
Atom feature 14 categories.
| Type | Description | Atoms |
|---|---|---|
| 1 | Sulfur/selenium | CYS:SG, MET:SD, MSE:SE |
| 2 | Nitrogen (amide) | ASN:ND2, GLN:NE2, backbone N (including N-terminal) |
| 3 | Nitrogen (aromatic) | HIS:ND1/NE1, TRP:NE1 |
| 4 | Nitrogen (guanidinium) | ARG:NE/NH* |
| 5 | Nitrogen (ammonium) | LYS:NZ |
| 6 | Oxygen (carbonyl) | ASN:OD1, GLN:OE1, backbone O (except C-terminal) |
| 7 | Oxygen (hydroxyl) | SER:OG, THR:OG1, TYR:OH |
| 8 | Oxygen (carboxyl) | ASP:OD*, GLU:OE*, C-terminal O, C-terminal OXT |
| 9 | Carbon (sp2) | ARG:CZ, ASN:CG, ASP:CG, GLN:CD, GLU:CD, backbone C |
| 10 | Carbon (aromatic) | HIS:CG/CD2/CE1,PHE:CG/CD*/CE*/CZ, TRP:CG/CD*/CE*/CZ*/CH2, TYR:CG/CD*/CE*/CZ |
| 11 | Carbon (sp3) | ALA:CB, ARG:CB/CG/CD, ASN:CB, ASP:CB, CYS:CB, GLN:CB/CG, GLU:CB/CG, HIS:CB, ILE:CB/CG*/CD1, LEU:CB/CG/CD*, LYS:CB/CG/CD/CE, MET:CB/CG/CE, MSE:CB/CG/CE, PHE:CB, PRO:CB/CG/CD, SER:CB, THR:CB/CG2, TRP:CB, TYR:CB, VAL:CB/CG*, backbone CA |
| 12 | Occupancy | *:* |
| 13 | Backbone | *:N,*:CA,*:C |
| 14 | CA | *:CA |
1–11 atom types were cited from Derevyanko et al. [23]. We also added 3 classes (CA atom, backbone chain atom, all atoms).
Decoy set detail used for comparison to previous 3DCNN method based on whole protein structures.
| Decoy set | Number of decoys per target | Number of targets |
|---|---|---|
| CASP11 stage1 | 20.0 | 81 |
| CASP11 stage2 | 148.2 | 80 |
| CASP12 | 172.1 | 40 |
| 3DRobot | 300.0 | 200 |
Fig 3ROC curve of best epoch model.
ROC curve of best validation loss epoch model.
Comparison with previous 3D-CNN method.
| Dataset | Method | Pearson | Spearman | GDT_TS loss | Best model rank |
|---|---|---|---|---|---|
| CASP11 stage1 | Proposed | ||||
| Derevyanko+2018 | 0.535 | 0.425 | 6.396 | 3.691 | |
| CASP11 stage2 | Proposed | ||||
| Derevyanko+2018 | 0.421 | 0.409 | 6.449 | 27.563 | |
| CASP12 | Proposed | 44.200 | |||
| Derevyanko+2018 | 0.607 (NA) | 0.521 (NA) | NA | ||
| 3DRobot | Proposed | ||||
| Derevyanko+2018 | 0.856 | 0.839 | 9.627 | 18.610 |
The first and second columns represent the dataset name and method name. The third and fourth columns, respectively, show the average Pearson’s correlation coefficient (Pearson) and average Spearman’s correlation (Spearman) between the actual ranking and predicted ranking. The fifth and sixth column show the average GDT_TS loss and best model rank. Values in parenthesis in the columns 3–4 show the p-value (Wilcoxon signed-rank test) for the differences in Pearson and Spearman results, respectively, between the proposed method and previous method (Derevyanko+2018). A p-value <0.05 indicates that the difference was significant. Values with high accuracy and p-values <0.05 are shown in bold.
Comparison with single-model methods in CASP11 stage2.
| Method | Pearson | Spearman | GDT_TS loss | Best model rank |
|---|---|---|---|---|
| Proposed | ||||
| VoroMQA | 0.413 | 0.394 | 7.307 | 27.25 |
| MULTICOM-CLUSTER | 0.405 | 0.397 | 7.058 | 31.83 |
| MULTICOM-NOVEL | 0.390 | 0.389 | 6.888 | 32.375 |
| RFMQA | 0.369 | 0.351 | 7.021 | 31.621 |
| ProQ2 | 0.368 | 0.363 | 6.34 | 35.705 |
| ProQ2-refine | 0.366 | 0.373 | 6.754 | 34.67 |
| Ornate | 0.39 (NA) | 0.37 (NA) | 5.5 | NA |
The legend is the same as that for the columns 2–6 in Table 3.
Comparison to single-model methods in CASP12 stage2.
| Method | Pearson | Spearman | GDT_TS loss | Best model rank |
|---|---|---|---|---|
| Proposed | 6.159 | |||
| ProQ3 | 0.639 | 0.590 (0.8081) | 5.633 | 20.343 |
| SVMQA | 0.631 (0.2575) | 0.587 (0.5408) | 20.743 | |
| VoroMQA | 0.593 | 0.544 | 7.789 | 19.914 |
| ProQ2 | 0.591 | 0.556 | 6.823 | 20.843 |
| MULTICOM-CLUSTER | 0.577 | 0.540 | 7.678 | 24.543 |
| Ornate | 0.49 (NA) | 0.46 (NA) | 7.200 | NA |
The legend is the same as that for columns 2–6 in Table 3.