| Literature DB >> 35134816 |
Raj S Roy1, Farhan Quadir1, Elham Soltanikazemi1, Jianlin Cheng1.
Abstract
MOTIVATION: Deep learning has revolutionized protein tertiary structure prediction recently. The cutting-edge deep learning methods such as AlphaFold can predict high-accuracy tertiary structures for most individual protein chains. However, the accuracy of predicting quaternary structures of protein complexes consisting of multiple chains is still relatively low due to lack of advanced deep learning methods in the field. Because interchain residue-residue contacts can be used as distance restraints to guide quaternary structure modeling, here we develop a deep dilated convolutional residual network method (DRCon) to predict interchain residue-residue contacts in homodimers from residue-residue co-evolutionary signals derived from multiple sequence alignments of monomers, intrachain residue-residue contacts of monomers extracted from true/predicted tertiary structures or predicted by deep learning, and other sequence and structural features.Entities:
Year: 2022 PMID: 35134816 PMCID: PMC8963319 DOI: 10.1093/bioinformatics/btac063
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
The statistics of the Homo_std test dataset, DeepHomo test dataset and CASP-CAPRI test dataset
| Name | Number of dimers | Range of length | Average length | Range of contact density | Average contact density |
|---|---|---|---|---|---|
| Homo_std test dataset | 1702 | 30 to 600 | 254.94 | 0.003 to 4.54 | 0.67 |
| DeepHomo | 218 | 48 to 498 | 235.9 | 0.210 to 4.5 | 1.06 |
| CASP-CAPRI | 40 | 73 to 480 | 248.97 | 0.346 to 4.96 | 2.04 |
Fig. 1.The deep learning architecture of DRCon for interchain contact prediction in homodimers. For a homodimer in which the length of the monomer sequence is L, the input is a L×L×592 tensor. The number of input features for each pair of residues is 592. For convenience, L is set to a fixed number—600. 0 padding is applied if L is less than 600. It is worth noting that in the prediction phase, no zero padding is used in generating the input tensor if L is greater than 600. The input is transformed to a 600×600×48 tensor using a 2D-convolutional layer which has a kernel size of 1 and uses Exponential Linear Unit (elu). The output of the convolution layer is passed through 36 residual blocks with kernel size of 3x3. Each residual block uses a 2D-convolution layer with a kernel size of 3, instance normalization and dropout of 15% probability of a neuron being ignored, followed by a dilated convolution layer without dropout. The step of the dilation in the dilated convolution layers in these blocks changes from 1, 2, 4, 8, 16 periodically. The sigmoid activation function is applied to the output of the last residual block to calculate the contact probability of each interchain residue–residue pair. The probabilities for residue pair (i, j) and residue pair (j, i) are averaged to a symmetric final contact map
The interchain contact prediction precision of DNCON2_Inter, the DRCon with true intrachain contacts as input (DRCon_true), DRCon with AlphaFold2 predicted tertiary structure’s intrachain contacts as input (DRCon_alpha) and DRCon with trRosetta predicted intrachain contacts as input (DRCon_pre) on Homo_std test set
| Predictor | Top10 (%) | Top | Top | Top |
|---|---|---|---|---|
| DNCON2_Inter | 16.9 | 17.32 | 16.31 | 13.69 |
| DRCon_pre | 40.20 | 37.25 | 33.75 | 18.92 |
| DRCon_alpha | 49.71 | 46.21 | 42.12 | 24.04 |
| DRCon_true | 50.61 | 47.21 | 43.46 | 25.05 |
Note: The precision of DNCON2_Inter is reported with its best parameter setting (relax_removal = 2).
L, length of a monomer in a dimer.
Target-level accuracy rate of DNCON2_Inter, DRCon_pre, DRCon_alpha and DRCon on the Homo_std test dataset
| Predictor | Top 10 (%) | Top | Top | Top |
|---|---|---|---|---|
| DNCON2_Inter | 23.52 | 30.19 | 37.50 | 43.36 |
| DRCon_pre | 58.28 | 63.40 | 67.74 | 73.85 |
| DRCon_alpha | 67.09 | 70.38 | 74.14 | 83.31 |
| DRCon_true | 67.39 | 70.86 | 75.56 | 80.38 |
The interchain contact prediction precision of DRCon and DeepHomo, DNCON2_Inter on the DeepHomo test dataset
| Predictor | Top 10 (%) | Top | Top | Top |
|---|---|---|---|---|
| DRCon |
|
|
|
|
| DNCON2_Inter | 7.43 | 7.59 | 7.95 | 7.67 |
| DeepHomo | 43.80 | 38.74 | 34.10 | 21.35 |
Bold fold is used to denote the highest precision in terms of each metric.
The target-level accuracy rate of DRCon, DNCON2_Inter and DeepHomo on the DeepHomo test dataset
| Predictor | Top 10 (%) | Top | Top | Top |
|---|---|---|---|---|
| DRCon |
|
|
| 86.69 |
| DNCON2_Inter | 22.01 | 28.44 | 34.40 | 64.22 |
| DeepHomo | 66.67 | 70.77 | 77.17 |
|
Bold fold is used to denote the highest precision in terms of each metric.
The precision of DRCon, DeepHomo, Glinter and DNCON2_inter on the CASP-CAPRI test dataset
| Predictor | Top 10 (%) | Top | Top |
|---|---|---|---|
| DRCon_true |
|
|
|
| DRCon_alpha | 40.25 | 36.88 | 31.67 |
| DeepHomo | 26.0 | 23.88 | 20.27 |
| Glinter | 37.5 | 35.16 | 30.61 |
| DNCON2_Inter | 2.36 | 2.33 | 2.39 |
Note: DRCon_true and DeepHomo use the true tertiary structures of monomers in the bound state to extract intrachain contacts as input. DRCon_alpha uses the tertiary structures predicted by AlphaFold2 in the unbound state to extract intrachain contacts as input. Bold fold is used to denote the highest precision in terms of each metric.
Fig. 2.Illustrating the effect of contact density on interchain contact prediction precision on the Homo_std test dataset
Fig. 3.(A) The predicted and true contact maps of target 1DR0. The top L/5 predicted contacts (red dots) and true contacts (blue dots) are plotted. Most predicted contacts overlap with the true contacts, indicating a high contact prediction precision. (B) The superimposition of the true quaternary structure (chain A in red and chain B in green) and the predicted quaternary structure (chain A in blue and chain B in orange). The two quaternary structures are quite similar
Fig. 4.Impact of different groups of features on the average top L/5 precision on the Homo_std test dataset