| Literature DB >> 35513443 |
Abstract
Recent spectacular advances by AI programs in 3D structure predictions from protein sequences have revolutionized the field in terms of accuracy and speed. The resulting "folding frenzy" has already produced predicted protein structure databases for the entire human and other organisms' proteomes. However, rapidly ascertaining a predicted structure's reliability based on measured properties in solution should be considered. Shape-sensitive hydrodynamic parameters such as the diffusion and sedimentation coefficients ([Formula: see text], [Formula: see text]) and the intrinsic viscosity ([η]) can provide a rapid assessment of the overall structure likeliness, and SAXS would yield the structure-related pair-wise distance distribution function p(r) vs. r. Using the extensively validated UltraScan SOlution MOdeler (US-SOMO) suite, a database was implemented calculating from AlphaFold structures the corresponding [Formula: see text], [Formula: see text], [η], p(r) vs. r, and other parameters. Circular dichroism spectra were computed using the SESCA program. Some of AlphaFold's drawbacks were mitigated, such as generating whenever possible a protein's mature form. Others, like the AlphaFold direct applicability to single-chain structures only, the absence of prosthetic groups, or flexibility issues, are discussed. Overall, this implementation of the US-SOMO-AF database should already aid in rapidly evaluating the consistency in solution of a relevant portion of AlphaFold predicted protein structures.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35513443 PMCID: PMC9072687 DOI: 10.1038/s41598-022-10607-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Screenshots of the US-SOMO-AF webpage. Shown are the results for AF-P01029-F1 that includes the removal of the signal sequence and two propeptides. (a) The upper part containing text/data information. (b) The bottom part showing the computed p(r) vs. r distribution and CD spectrum graphs, and the JSmol representation of the structure.
Figure 2Plots of selected calculated parameters for 41,200 AF-v1 predicted structures with no corresponding entries in the solved structures PDB database. (a) R vs. M, log–log scale. (b) [η] vs. M, log–log scale. (c) [η] vs. % decreasing mean confidence level, log-lin scale. (d) A 3D plot where M (log scale) is on the vertical Z-axis, and R and [η] are on the horizontal X- and Y-axes, respectively (both linear scales).
Comparison between experimental and calculated and for three proteins having a crystallographic structure and a predicted AF-v1 structure.
| Parameter | Experimental | 1AVU.PDB (completed) | % diff. with expt | AF-P01070 (no propeptide) | % diff. with expt | % diff PDB-AF |
|---|---|---|---|---|---|---|
|
| ||||||
| 9.47 ± 0.18 | 9.91 | + 4.65 | 9.43 | − 0.42 | − 4.84 | |
| 2.29 ± n.a. | 2.18 | − 4.80 | 2.08 | − 9.17 | − 4.59 | |
The PDB entries had a few missing residues, which were previously manually added[21]; the experimental parameters for all proteins were taken from Ref.[24].
aThe 2CAB.PDB entry and the AF-P00915 structure differ at one amino acid position, and have also a position-swap on another two residues; the reported MW is that of the PDB entry.
Some calculated parameters for a selection of AF-v1 predicted structures with no RCSB PDB counterparts, ordered by decreasing molecular mass.
| UniProt accession | Organism | Mean AF % conf. | Signal peptide | Molecular mass [Da] | [ | Helix% | Sheet% | ||
|---|---|---|---|---|---|---|---|---|---|
| Q6PGP7a |
| 86.48 | n/a | 175,523 | 6.98 | 6.30 | 10.4 | 74.5 | 0.5 |
| Q4DE01b |
| 65.88 | n/a | 102,098 | 3.99 | 5.74 | 12.0 | 6.5 | 23.2 |
| Q9Y5H4c |
| 75.64 | 1–28 | 98,141 | 8.42 | 6.56 | 23.3 | 9.2 | 25.5 |
| D3ZV97d |
| 82.81 | 1–20 | 94,123 | 5.55 | 4.76 | 8.93 | 42.8 | 11.2 |
| O88338e |
| 84.24 | 1–21 | 87,414 | 8.69 | 5.87 | 21.2 | 5.7 | 32.5 |
| Q9LMT9f |
| 78.02 | 1–26 | 82,090 | 5.16 | 4.66 | 8.96 | 25.4 | 15.2 |
| I1LDW0g |
| 75.28 | n/a | 73,181 | 2.86 | 4.16 | 6.33 | 32.6 | 8.9 |
| A4I8P1h |
| 60.77 | n/a | 64,586 | 2.66 | 3.72 | 5.18 | 29.2 | 9.2 |
| Q6PFT0i |
| 81.28 | n/a | 46,965 | 11.3 | 5.75 | 47.8 | 66.0 | 10.8 |
| Q9VG48j |
| 88.50 | 1–18 | 44,673 | 2.04 | 2.89 | 3.44 | 38.0 | 9.0 |
| A0A060D4L2k |
| 68.46 | n/a | 30,921 | 3.90 | 4.06 | 15.7 | 32.2 | 10.4 |
| Q8IJG3l |
| 69.83 | n/a | 19,460 | 2.05 | 2.57 | 5.66 | 26.4 | 11.9 |
| P08372m |
| 82.28 | n/a | 12,010 | 2.89 | 2.48 | 10.5 | 44.3 | 21.7 |
| O16446n |
| 88.31 | 1–19 | 8483 | 1.18 | 1.66 | 3.46 | 68.9 | 0.0 |
The corresponding structures and calculated p(r) vs. r distributions and CD spectra can be seen in Fig. 3.
aTetratricopeptide repeat protein 37.
bTrans-sialidase, putative.
cProtocadherin gamma-a1.
dVomeronasal 2 receptor, 50.
eCadherin-16.
fPutative wall-associated receptor kinase-like 13.
gAminotran_5 domain-containing protein.
hAdenosine deaminase-like protein.
iFlotillin.
jLipase.
kBHLH transcription factor.
lRNA-binding protein, putative.
mPrepilin peptidase-dependent protein C.
nUncharacterized protein.
Figure 3JSmol snapshots of the structures for the entries reported in Table, together with the calculated p(r) vs. r and CD plots.
Figure 4P(r) vs. r curves SAXS-derived and calculated from AF and RCSB PDB structures. (a–f) Protein source and names, SASBDB, AF (UniProt) and RCSB PDB accession numbers for each entry are indicated in the boxes within each panel. In all panels the experimentally-derived and the AF-calculated p(r) vs. r are black and red lines, respectively. Additional SAXS-derived and AF-calculated p(r) vs. r present in (c,f) are blue and magenta lines, respectively. Additional PDB-calculated p(r) vs. r (green lines) are present in (c,d).
Figure 5Calculated p(r) vs. r distributions for the 100 conformations generated in the DMD run on the AF-predicted O88338 structure.
Figure 6Histograms of the calculated parameters for the MMC-generated conformations of three AF-predicted structures from Table 2. Shown are the distributions of R/R (a,c,e) and of [η] (b,d,f) calculated for AF-Q4DE01 (16,520 conformations, (a,b)), AF-A0A060D4L2 (16,666 conformations, (c,d)), and AF-Q8IJG3 (16,367 conformations, (e,f)). In each panel, the vertical green lines mark the location of the starting structure parameters, while the vertical solid and dashed red lines indicate the average ± SD over all conformations (the actual values are reported in each panel’s inside legend).