Literature DB >> 34462960

Modeling SARS-CoV-2 proteins in the CASP-commons experiment.

Andriy Kryshtafovych1, John Moult2, Wendy M Billings3, Dennis Della Corte3, Krzysztof Fidelis1, Sohee Kwon4, Kliment Olechnovič5, Chaok Seok4, Česlovas Venclovas5, Jonghun Won4.   

Abstract

Critical Assessment of Structure Prediction (CASP) is an organization aimed at advancing the state of the art in computing protein structure from sequence. In the spring of 2020, CASP launched a community project to compute the structures of the most structurally challenging proteins coded for in the SARS-CoV-2 genome. Forty-seven research groups submitted over 3000 three-dimensional models and 700 sets of accuracy estimates on 10 proteins. The resulting models were released to the public. CASP community members also worked together to provide estimates of local and global accuracy and identify structure-based domain boundaries for some proteins. Subsequently, two of these structures (ORF3a and ORF8) have been solved experimentally, allowing assessment of both model quality and the accuracy estimates. Models from the AlphaFold2 group were found to have good agreement with the experimental structures, with main chain GDT_TS accuracy scores ranging from 63 (a correct topology) to 87 (competitive with experiment).
© 2021 Wiley Periodicals LLC.

Entities:  

Keywords:  CASP; COVID; EMA; SARS-CoV-2; model accuracy; protein structure prediction

Mesh:

Substances:

Year:  2021        PMID: 34462960      PMCID: PMC8616790          DOI: 10.1002/prot.26231

Source DB:  PubMed          Journal:  Proteins        ISSN: 0887-3585


Critical Assessment of Structure Prediction CASP community‐wide experiment on modeling SARS‐CoV‐2 proteins causing the coronavirus disease estimates of model accuracy free modeling Global Distance Test‐Total Score Local Distance Difference Test Local‐Global Alignment program severe acute respiratory syndrome—coronavirus‐2 template‐based modeling Unreliable Local Regions

INTRODUCTION

The advent of the COVID‐19 crisis spurred major efforts to combat the disease from biologists all over the world. Key to understanding many aspects of the disease mechanism is knowledge of protein structure. Experimental research groups have devoted major effort to this task, but progress has been necessarily slow and more than 2300 amino acids in the Severe Acute Respiratory Syndrome (SARS2) proteins still have no experimental structural coverage. Computed protein structure, while until recently not as accurate as experiment, , , , , can nevertheless provide models that may aid in the choice of drug targets, development of vaccine strategies, and insights into viral mechanisms. Early in the pandemic, a number of leading structure modeling research groups, including SWISSMODEL https://swissmodel.expasy.org/repository/species/2697049; AlphaFold https://deepmind.com/research/open‐source/computational‐predictions‐of‐protein‐structures‐associated‐with‐COVID‐19; Baker https://www.ipd.uw.edu/2020/02/rosettas-role-in-fighting-coronavirus; Zhang https://zhanglab.ccmb.med.umich.edu/COVID-19; Feig https://github.com/feiglab/sars-cov-2-proteins; and the Xu group, produced sets of computed structures of severe acute respiratory syndrome—coronavirus‐2 (SARS‐CoV‐2) proteins. Because of earlier experimental work on other viruses, particularly SARS, there are homologous structures available for the majority of SARS‐CoV‐2 proteins, so that useful models can be produced with straightforward template‐based methods. , , , , , The Critical Assessment of Structure Prediction (CASP) initiative engaged the broader modeling community with the aim of producing the best possible structures for the more demanding cases, those without detectable homology to experimentally determined structures, where a community effort was likely to have the most impact. The strategy for this CASP community‐wide experiment on modeling SARS‐CoV‐2 proteins causing the coronavirus disease (CASP‐COVID) experiment was to collect models from as many modeling groups as possible and to also solicit community input on evaluating the accuracy of those models, so as to provide the scientific community with the most accurate structures currently possible. The strategy built on three things—the existence of a closely knit CASP modeling community, extensive previous CASP results on the reliability of modeling and accuracy estimation methods, , , , , , , , and the CASP infrastructure. , , , , The CASP‐COVID experiment was started on March 9, 2020. The experiment proceeded through six stages, followed by the discussion of the results at the CASP14 conference in December 2020. The stages were as follows: (1) Selection of targets and their analysis, (2) Call for three‐dimensional (3D) models, (3) Call for accuracy estimates of the models, (4) Community discussion of the initial results, (5) Call for revised and refined models and accuracy estimates, and (6) Re‐release of some targets in CASP14, allowing thorough comparison of models with new experimental data. In addition, there was a post‐CASP follow‐up to further assess effectiveness of estimates of model accuracy (EMA) methods. There was a strong community response to the call for CASP‐COVID participation, with 47 research groups submitting models using a total of 53 3D modeling approaches and 30 accuracy estimation approaches. All groups who submitted at least five models to CASP‐COVID and submitted an abstract to CASP14 Abstract book (or had a documented history of participation in CASP) were invited to contribute their method description to this article.

RESULTS

Selection of targets and their analysis

The CASP organizers analyzed 29 proteins coded for by the SARS‐CoV‐2 genome and identified 10 for which part or all of the sequence did not have reliable homologs in the structural database. These were selected as CASP‐COVID targets. Table S1 shows graphical representations of the HHsearch sequence searches against the structural database for the selected targets. The targets were analyzed to identify the predicted secondary structure and domain composition, disorder regions, trans‐membrane regions and signal peptides. The results of the analysis were posted on the CASP‐Commons web site https://predictioncenter.org/caspcommons/target_analysis.cgi. Target sequence information was also posted at https://predictioncenter.org/caspcommons/targetlist.cgi. Participants were asked to return their models in 3 weeks.

3D Structures

Over 1500 3D models were submitted in the first CASP‐COVID round. Those included models from the most capable research groups as previously assessed in CASP. , , , , , , , Methods descriptions provided by authors of this article are in available in the Supporting Information (“TS methods” file). The full list of participants and associated statistics are at https://predictioncenter.org/caspcommons/groups_info.cgi. All collected models were posted at the Prediction Center Data Archive site https://predictioncenter.org/download_area/CASPCOMMONS/2020_COVID‐19/ immediately after closing the first round of submissions. The models were analyzed for structural consensus based on the average pair‐wise global and local LDDT and GDT_TS , scores. The results of the analysis allowed identification of consensus regions of structure and of groups with structurally similar models. For example, for the SARS‐CoV‐2 M‐protein (target C1906), high local consensus scores in regions 1–105 (marked with the black box in Figure 1) suggested the protein has two domains, and that a split into two domain level targets in Round 2 of the experiment might assist modeling.
FIGURE 1

Screenshot of the model consensus table (https://predictioncenter.org/caspcommons/models_consensus2.cgi) for the SARS‐CoV‐2 M‐protein (target C1906) showing local structural agreement along the sequence of the selected model (second column) with the remaining models. The black box shows the region where many models agree, suggesting a relatively easy to model domain

Screenshot of the model consensus table (https://predictioncenter.org/caspcommons/models_consensus2.cgi) for the SARS‐CoV‐2 M‐protein (target C1906) showing local structural agreement along the sequence of the selected model (second column) with the remaining models. The black box shows the region where many models agree, suggesting a relatively easy to model domain

Community‐wide discussion of the results and second round of modeling

Following the first round of modeling, the community discussed the results in two Zoom conferences and group chat using the Microsoft teams. Consensus analyses helped identify consistent domain boundaries within the targets, used in the second modeling round. Community members also discussed possible features of models such as membrane regions and signal peptides, that could help guide the next stage of modeling. The second round ran for 2 weeks in May 2020, immediately before the start of the regular CASP14 experiment. The round consisted of 15 domain‐level targets derived from the Round 1 analysis, and seven first‐round targets re‐released for prediction. Thirty‐three groups submitted over 1500 3D models, which were again made public immediately after the deadline. Second round models underwent the same evaluation procedure as those from Round 1.

Accuracy estimates

Each of the submitted models in both rounds of modeling was evaluated by accuracy estimation methods developed by the CASP community. Overall, 32 EMA methods were used. The list of participated methods and brief descriptions are provided in the “EMA methods” Supporting Information file. All submitted accuracy estimates are available at https://predictioncenter.org/caspcommons/models_QAresults.cgi. The overall goal of this step was to identify the best models for each target and to estimate their accuracy. This was the first time CASP has addressed this non‐trivial task in a real‐life situation. Previous regular CASP experiments have shown that EMA methods are overall effective at ranking models by accuracy, but even the best‐performing methods cannot identify the most accurate models for all targets. , , The CASP‐COVID results showed surprisingly high variation in model rankings: for no target was there unanimous agreement on the best 3D model. Rather, for most targets over 10 distinct models were selected as the best (Table STQA1), creating a problem in recommending which model should be used. To address this issue, the Venclovas group devised a new EMA‐jury algorithm that identifies which models were most favored by the EMA methods. The algorithm is described in detail in the Supporting Information. Briefly, the method pools the top 1, top 2,…, top 10 models selected by each EMA ranking into 10 corresponding supersets. If a model is selected by more than one EMA method, it is included multiple times, thus receiving more weight. A consensus structural similarity score is calculated for every model in each superset as an average of CAD‐scores from the model's pairwise comparisons with other models in the superset (Figure SFQA1). The maximum of superset‐specific consensus scores for a model is recorded as the EMA‐jury consensus score. Note that the EMA‐jury consensus score quantifies how typical the structure of a model is among the top selections made by the EMA methods rather than the expected level of its structural similarity to the native structure (as individual EMAs do). The EMA‐jury scores together with two additional refinement criteria described in the Supporting Information are used for the final selection of models that are most strongly supported by the EMA methods (Table STQA2). Comparison of the EMA‐jury scores with the overall consensus scores computed on full sets of models for each CASP‐COVID target shows that the EMA‐jury method always selects a subset of models that are more structurally similar within the subset than overall (Figure 2). This indicates that individual EMA rankings are not random and often agree in favoring some structural features.
FIGURE 2

Maximum consensus scores on CASP‐COVID targets (EMA‐jury—gray bars; overall consensus—black). Targets are ordered by increasing EMA‐jury values. The gray bars are always longer than black ones, indicating that the EMA‐jury method successfully selects subsets of models that are more structurally consistent. The vertical dashed line corresponds to the consensus level of 0.6, which represents 100th percentile of overall consensus scores for all models (Figure SFQA4). CASP, Critical Assessment of Structure Prediction; CASP‐COVID, CASP community‐wide experiment on modeling SARS‐CoV‐2 proteins causing the coronavirus disease; EMA, estimates of model accuracy;

Maximum consensus scores on CASP‐COVID targets (EMA‐jury—gray bars; overall consensus—black). Targets are ordered by increasing EMA‐jury values. The gray bars are always longer than black ones, indicating that the EMA‐jury method successfully selects subsets of models that are more structurally consistent. The vertical dashed line corresponds to the consensus level of 0.6, which represents 100th percentile of overall consensus scores for all models (Figure SFQA4). CASP, Critical Assessment of Structure Prediction; CASP‐COVID, CASP community‐wide experiment on modeling SARS‐CoV‐2 proteins causing the coronavirus disease; EMA, estimates of model accuracy; The EMA‐jury algorithm was also run using the LDDT scoring function (instead of CAD‐score). The results are presented in the Supporting Information (Figures SFQA2 and 3, and Table STQA3). They are very similar to the CAD‐score based results with 84% of selected CASP‐COVID models being the same, and at least one model in common for every target. To assess the effectiveness of the EMA‐jury method, we evaluated its ability to select the best available model from a set of models. Such an analysis requires knowing actual accuracy of models with respect to the target structure. Since only two CASP‐COVID targets have been solved so far, we tested the EMA‐jury on CASP13 set of server models (almost 11 000 models. on 80 targets). Figure 3 shows that the EMA‐jury very often picks the best or nearly the best model, and that the EMA‐jury selection is better than simple consensus‐based selection. The mean score of the EMA‐jury‐selected models (0.622) is just slightly behind the mean of the maximum CAD‐scores of CASP13 models (0.640) and better than the mean score of models selected with simple‐consensus (0.574). The average Z‐score (calculated from the distribution of individual EMA scores) of Jury‐selected models stands at 1.67, almost twice the value of the average simple‐consensus z score (0.87). Of interest is also the fact that the relative performance of the EMA‐jury with respect to simple consensus becomes even more dominant on harder modeling targets. For example, the average EMA‐jury z score grows from 1.67 on all CASP13 targets to 2.02 on FM targets, while the corresponding numbers for simple consensus are trending downward: 0.87→0.75. Similar tendencies in scores are observed when analyzing the LDDT‐based results (Figure SFQA5).
FIGURE 3

Selection of the top model by the estimates of model accuracy (EMA)‐jury (top panel) and simple structural consensus (bottom panel) on 80 CASP13 targets. Maximum per‐target CAD‐scores are shown as pointing up triangles; the CAD‐scores of models selected by the EMA‐jury approach (top) and simple structural consensus method (bottom) are shown as pointing down triangles. The hardest to predict targets (FM) are in red, others in green. Vertical lines between the corresponding triangles represent the error in the selection process. Comparison of the top and bottom panels demonstrates that the EMA‐jury method selects models closer to the best absolute value more often than the simple consensus

Selection of the top model by the estimates of model accuracy (EMA)‐jury (top panel) and simple structural consensus (bottom panel) on 80 CASP13 targets. Maximum per‐target CAD‐scores are shown as pointing up triangles; the CAD‐scores of models selected by the EMA‐jury approach (top) and simple structural consensus method (bottom) are shown as pointing down triangles. The hardest to predict targets (FM) are in red, others in green. Vertical lines between the corresponding triangles represent the error in the selection process. Comparison of the top and bottom panels demonstrates that the EMA‐jury method selects models closer to the best absolute value more often than the simple consensus

Evaluation of ORF3a and ORF8 models

Structures of two CASP‐COVID proteins—ORF3a (Target ID: C1905) and ORF8 (Target ID: C1908)—were experimentally solved by the start of CASP14 conference allowing full CASP evaluation of accuracy of the corresponding models against experimental structures. Full‐length sequences of both solved targets were released for modeling in both rounds of CASP‐COVID, and ORF3a was additionally released in the second round as domain targets C1905‐D1 and C1905‐D2. Independently, ORF8 was also released in the CASP14 experiment as target T1064. The number of 3D models and EMA estimates collected in the CASP‐COVID experiment are summarized in Table 1.
TABLE 1

The number of 3D models and accuracy estimates in the CASP‐COVID experiment for ORF3a and ORF8

CASP‐COVID target IDORF3aORF8
C1905C1905‐D1C1905‐D2C1908
No. 3D models (GDT_TS ≥40)153 (6)83 (38)79 (0)181 (0)
No. EMA submissions in CASP‐COVID30191929

Note: Numbers in parentheses show the number of high‐accuracy models. ORF3a was treated as one target in the first round of CASP‐COVID (C1905) and as two separate domains in Round 2 (C1905‐D1, C1905‐D2).

The number of 3D models and accuracy estimates in the CASP‐COVID experiment for ORF3a and ORF8 Note: Numbers in parentheses show the number of high‐accuracy models. ORF3a was treated as one target in the first round of CASP‐COVID (C1905) and as two separate domains in Round 2 (C1905‐D1, C1905‐D2). Since there was no significant accuracy improvement in models submitted on full‐length targets in the second round, we report only the first‐round results for those.

Post‐CASP EMA experiment

From the CASP‐COVID and CASP14 evaluation of ORF3a and ORF8 targets, it was immediately apparent that models from DeepMind's AlphaFold2 group were by far the most accurate, consistent with the broader CASP14 results. An interesting question to check was whether accuracy assessment methods can recognize the high accuracy of these models. However, it was impossible to answer this question only with the available data at hand: AlphaFold2 did not submit models in the second round of CASP‐COVID (thus no domain‐based models for ORF3a), nor did they submit ORF8 models to CASP‐COVID (only to CASP14). To adjust for that, we added five AlphaFold models to each of the three CASP‐COVID model sets. For ORF8, we added AlphaFold2 (AF2) models submitted on the CASP14 T1064 target. For ORF3a domains, we added AlphaFold models submitted to CASP‐COVID (AF‐COV) and a‐posteriori split into domains. Additional accuracy estimates were solicited on the added AlphaFold models from the authors of 10 established in CASP EMA methods. We discuss here the results for four (out of these 10) that participated both in CASP‐COVID and CASP14: ModFOLD8_rank, ProQ3D, VoroMQA‐dark, and QMEANDisCo. The overall conclusions do not change by including all 10 post‐CASP EMA methods. This analysis, aimed at determining whether accuracy estimation methods were able to recognize high accuracy of AlphaFold models of the two CASP_COVID targets, is referred to here as the post‐CASP EMA.

Results for ORF3a (C1905)

Round 1 results: models of the full structure

Among the first‐round 3D models of the full structure, only six models have GDT_TS scores above 40 (green crosses in Figure 4A). Five of these models are from AlphaFold (with accuracy ranging from 45 to 59 GDT_TS), and the sixth is from FEIGLAB‐R, who attempted to refine an AlphaFold model resulting in a lower (worse) GDT_TS score of 42. The six top models are all monomeric, while the experimental ORF3a structure is dimeric. Overall, the best AlphaFold model (AF‐COV_2, GDT_TS = 59) correctly reproduces ORF3a's fold (Figure SFQA6a), but loops and orientation of helixes around the dimeric interface are less accurate: the average per‐residue distance error (as calculated from the optimal LGA model‐target superposition) is 3.9 Å for the whole structure, and 4.6 Å for the interface region.
FIGURE 4

Round 1 three‐dimensional (3D) and accuracy estimation results for SARS2 ORF3a (C1905). (A) Each green cross represents a 3D model, black squares indicate models selected as high accuracy by accuracy estimation methods, and orange circles indicate models selected by the estimates of model accuracy (EMA)‐Jury method. 3D model accuracy is shown in terms of LDDT (y‐axis) and GDT_TS (x‐axis). Only one accuracy estimation method selected a higher accuracy model. (B) Locally inaccurate regions of the highest‐scoring model, AF‐COV_2, according to the ULR definition (left) and as predicted for the same model by the BAKER EMA method (right). The superpositions are identical; the crystal structure is in yellow, ULRs and predicted inaccurate regions are in red and the rest of the model in green

Round 1 three‐dimensional (3D) and accuracy estimation results for SARS2 ORF3a (C1905). (A) Each green cross represents a 3D model, black squares indicate models selected as high accuracy by accuracy estimation methods, and orange circles indicate models selected by the estimates of model accuracy (EMA)‐Jury method. 3D model accuracy is shown in terms of LDDT (y‐axis) and GDT_TS (x‐axis). Only one accuracy estimation method selected a higher accuracy model. (B) Locally inaccurate regions of the highest‐scoring model, AF‐COV_2, according to the ULR definition (left) and as predicted for the same model by the BAKER EMA method (right). The superpositions are identical; the crystal structure is in yellow, ULRs and predicted inaccurate regions are in red and the rest of the model in green In terms of global EMA, BAKER was the only group who selected a reasonable model (GDT_TS > 40) as top1. However, it was the sixth‐ranked model with the GDT_TS of 42 rather than the most accurate model with GDT_TS of 59. Other EMA methods selected a number of much less accurate models (black squares at low LDDT and GDT_TS), including the EMA‐Jury method (orange circles), which by its nature selects models preferred by the majority of individual EMAs. In the evaluation of local accuracy in the post‐CASP EMA, the ProQ3D group had the best average results, with the ASE score of 85.4 (ASE—Assessment of S‐function Errors, see the EMA assessment paper ), AUC of 0.86 (AUC—Area Under the ROC Curve of the prediction of accurate/inaccurate residues), and the ULR‐F1 score of 0.4 (ULR‐F1—the F1‐score on Unreliable Local Regions, see papers , ) for the best submitted model AF_2 (C1905TS156_2). AlphaFold's self‐estimate of per‐residue distance errors was worse than the results of ProQ3D, scoring ASE of 72.7, AUC of 0.78, and ULR‐F1 of 0.0. The BAKER local EMA method was able to identify some part of the ULRs in the beta sheet domain (actual ULRs = 163–198 and 219–235; predicted ULRs = 163–199 and 214–238), but the ULRs in the alpha helix domain were identified less precisely (actual ULRs = 40–48, 51–55, and 102–104; predicted ULRs = 40–43, 62–68, and 99–101), as illustrated in Figure 4B. ULRs are defined as regions consisting of three or more sequential model residues deviating by more than 3.8 Å from the corresponding target residues in the optimal superposition on the crystal structure.

Round 2 results: prediction of the domain structures

Figure 5 shows the accuracy distribution of CASP‐COVID second round models for the two domains of ORF3a separately. The domain structures of the AF‐COV models submitted in the first round are included (pink stars), and are substantially more accurate that those from other groups, especially for Domain 2.
FIGURE 5

Round 2 3D and accuracy estimation results for two domains of SARS‐CoV‐2 ORF3a protein (A) C1905‐D1 and (B) C1905‐D2. 3D model accuracy is shown in terms of LDDT (y‐axis) and GDT_TS (x‐axis) (green crosses). The panels show both models from CASP‐COVID and AF‐COV models added in the post‐CASP EMA experiment (pink stars). The models selected by EMA methods as top1 during CASP‐COVID are shown as black hollow squares; models selected in the post‐CASP experiment are in pink hollow squares. For Domain 1, three out of four EMA groups selected one of the higher accuracy AlphaFold models, with many low accuracy models also selected. There is a similar pattern for Domain 2, where two of four methods picked two different AlphaFold models

Round 2 3D and accuracy estimation results for two domains of SARS‐CoV‐2 ORF3a protein (A) C1905‐D1 and (B) C1905‐D2. 3D model accuracy is shown in terms of LDDT (y‐axis) and GDT_TS (x‐axis) (green crosses). The panels show both models from CASP‐COVID and AF‐COV models added in the post‐CASP EMA experiment (pink stars). The models selected by EMA methods as top1 during CASP‐COVID are shown as black hollow squares; models selected in the post‐CASP experiment are in pink hollow squares. For Domain 1, three out of four EMA groups selected one of the higher accuracy AlphaFold models, with many low accuracy models also selected. There is a similar pattern for Domain 2, where two of four methods picked two different AlphaFold models In the post‐CASP experiment, three and two out of four EMA groups picked an AF‐COV model as top1 for Domains 1 and 2, respectively (pink squares in Figures 5A,B). Although some EMA groups could discriminate AF‐COV models from the others, no group was successful in predicting the correct ranking within the five AF‐COV models, although these models are very close.

Results for ORF8 (C1908)

For ORF8, no high‐accuracy models were submitted during CASP‐COVID (maximum GDT_TS = 26, AlphaFold not participating; see green crosses in Figure 6). The protein was re‐released in the regular CASP14 experiment as target T1064 (without 15 N‐term residues corresponding to a signal peptide, a feature which almost all CASP‐COVID participants ignored, and one cause of poor models). The AlphaFold2 group submitted five high‐accuracy predictions for this target. These models (ranging from 64 to 87 GDT_TS) were added to the pool of models for the post‐CASP analysis (pink stars in Figure 6). The crystal structure of ORF8 was solved as a covalent dimer, while AlphaFold models were monomeric. Despite this, the best monomeric model possesses some important structural features needed for forming the dimeric assembly. In particular, the model correctly reproduces the side chain orientation of the cysteine involved in covalent chain linkage (Figure SFQA6b). The average per‐residue distance error is similar for the whole structure (1.25 Å) and for the interface region (1.46 Å).
FIGURE 6

Round 1 3D modeling and accuracy estimation (EMA) results for SARS‐CoV‐2 protein ORF8 (C1908). 3D model accuracy for submissions in terms of LDDT (y‐axis) and GDT_TS (x‐axis) (green crosses) and EMA selections (black squares for CASP‐COVID, pink squares for post‐CASP experiment, orange circles for EMA‐Jury). Five AF2 models added in the post‐CASP experiment are shown as pink stars. Two of the AF2 models are impressively accurate. Two post‐CASP EMA methods succeeded in selecting those models as best

Round 1 3D modeling and accuracy estimation (EMA) results for SARS‐CoV‐2 protein ORF8 (C1908). 3D model accuracy for submissions in terms of LDDT (y‐axis) and GDT_TS (x‐axis) (green crosses) and EMA selections (black squares for CASP‐COVID, pink squares for post‐CASP experiment, orange circles for EMA‐Jury). Five AF2 models added in the post‐CASP experiment are shown as pink stars. Two of the AF2 models are impressively accurate. Two post‐CASP EMA methods succeeded in selecting those models as best In global accuracy estimation, only VoroMQA‐dark could identify AF2 models as superior to others (pink squares in Figure 6). However, this method did not predict the big difference in absolute model quality (as quantified by GDT_TS). For example, VoroMQA‐dark assigned the best AF2 model (AF2_1, GDT_TS = 87) a global EMA score of 67 (on the 0–100 scale), while some models by other groups with the GDT_TS < 20 were assigned a relatively high EMA score of 50+ (all scores are for ORF8 without the signal peptide). It should be noted that VoroMQA‐dark has a narrow range of values so that a difference of 10+ may indicate substantial difference in model accuracy. In the evaluation of local accuracy in the post‐CASP EMA, the best results were shown again by the ProQ3D, with ASE of 88.5, AUC of 0.89, and the perfect ULR‐F1 score of 1.0 for the AlphaFold2 model AF2_1 (CASP14 id: T1064TS427_1). AlphaFold2's self‐estimate of per‐residue distance errors was comparable or better than the results of the best EMA method, scoring ASE of 92.7, AUC of 0.96 and ULR‐F1 of 1.0. All AlphaFold2 models showed local structural differences to experiment near residues 60–86, which are involved in a crystal contact (Figure SFQA6c), and residues 104–110 which have high crystallographic B‐factor of ~70 (Figure SFQA6d). ProQ3D could identify the structural deviations in these two loop regions of AF2 models with high accuracy, scoring 0.78 with ULR‐F1 measure. GraphQA also showed a high performance with average ULR‐F1 score of 0.68, while AlphaFold2's self‐assessment scored 0.47. On the other hand, it is not clear that the models have errors in either of these regions rather than being a crystal artifact and a crystallographic error respectively. It is possible that the EMA methods are predicting relatively flexible regions of polypeptide, rather than model errors.

DISCUSSION

The central goal of CASP is to make assessment of both 3D modeling methods and accuracy estimation methods as rigorous possible, by using a blind prediction system and comparison with experiment. In doing so over 14 rounds, CASP has built a strong community. Further, recent advances in modeling methods show the field has advanced to the point , , where taking on the most challenging structures should yield useful results. In the past, CASP has also found that properly balanced consensus models can achieve higher accuracy than any of the contributing models. So, there was an obvious appeal to drawing on this community resource to address one aspect of the COVID‐19 emergency. Indeed, there was very enthusiastic response and participation from the CASP community. From a more pragmatic point of view, the CASP‐COVID modeling initiative also provided a different, real‐world application of the modeling methods. Although CASP strives to be as realistic as possible, assessment is done with knowledge of the experimental answers. What can be done when the goal is to generate useful information from models? Since we do not yet know most of the experimental structures of the target proteins, conventional CASP analysis is limited to just two targets. In both cases, correct folds were produced by just one group, AlphaFold2. Based on the most recent CASP14 results, , , we expect better performance overall, with at least the majority of the folds correctly predicted by multiple groups. We will have to wait for more experimental results to see if that is true. The most difficult task in generating recommended models turned out to be estimating relative accuracy and, beyond that, absolute accuracy of the submissions. CASP has nurtured the development of accuracy estimation methods for more than a decade, and assessment against experiment has shown impressive progress, with apparently very useful outcomes. , , , , , , However, in the absence of experimental ground truth, initial focus was on agreement between methods and this was low. In turn, this prompted the development of a new method for obtaining consensus accuracy estimates. In spite of these limitations, overall, we regard the experiment as a success, both in terms of bringing the community together to tackle an urgent problem, and in producing a set of potentially useful models. As noted above it was also valuable in drawing attention to issues in real world use that were not apparent in the standard CASP environment. It also once again demonstrated the value of community science. In particular, the experiment was particularly impactful for undergraduate students just beginning in the field, as they were able to better understand the role of their research in a broader scientific context and its potential for benefiting society at large.

AUTHOR CONTRIBUTIONS

Major contributions to the main text of the article: Idea, organization, article concept, coordination and editing—Andriy Kryshtafovych, John Moult, and Krzysztof Fidelis. Abstract, Section 1, Section 2, and Section 3—Andriy Kryshtafovych and John Moult. Section 2.4—Kliment Olechnovič and Česlovas Venclovas. Section 2.7‐2.8—Sohee Kwon, Jonghun Won, and Chaok Seok. Section 1 and Section 3—Wendy M. Billings and Dennis Della Corte. All other authors contributed to the article by providing description of their methods for the Supporting Information.

PEER REVIEW

The peer review history for this article is available at https://publons.com/publon/10.1002/prot.26231. Appendix S1. Supporting Information. Click here for additional data file. Appendix S2. Supporting Information. Click here for additional data file. Table S1 Results of the HHsearch runs versus structures in the PDB. Figure SFQA1. “Top N" consensus CAD‐score values calculated for different values of N when running the EMA‐jury algorithm. Each line represents a model. Thick red lines indicate the models that were selected by the EMA‐Jury algorithm. Figure SFQA2. “Top N" consensus LDDT values calculated for different values of N when running the EMA‐jury algorithm. Each line represents a model. Thick red lines indicate the models that were selected by the EMA‐Jury algorithm. Figure SFQA3. Maximum consensus scores (simple and selection‐influenced) achieved for each target, using LDDT as the pairwise structural comparison method. Targets ordered by the selection‐influenced values. Figure SFQA4. Histograms of simple global consensus scores. A simple global consensus score is an average similarity of a model when compared to all the other models of the same target. Figure SFQA5. Selection of the top model by the EMA‐jury (top panel) and simple structural consensus (bottom panel) on 80 CASP13 targets. Maximum per‐target LDDT scores are shown as pointing up triangles; the LDDT scores of models selected by the EMA‐jury approach (top) and simple structural consensus method (bottom) are shown as pointing down triangles. The hardest to predict targets (free modeling) are in red, others in green. Vertical lines between the corresponding triangles represent the error of the selection process. Visual comparison of the top and bottom panels demonstrates that the EMA‐jury method selects models closer to the best absolute value more often than the simple consensus. Figure SFQA6. (A) Structure of the best AlphaFold CASP‐COVID model aligned to the dimeric crystal structure of target C1905 (ORF3a, PDB ID 6xdc). Two copies of the monomeric model (pink and cyan) are independently aligned to different chains of the reference structure with the UCSF Chimera; (B) Structure of the best AlphaFold2 model aligned to the dimeric crystal structure of target C1908/T1064 (ORF8, PDB ID 7jtl). The figure coloring is similar to panel a. The cysteine residues involved in covalent linkage are shown as ball and sticks; (C) top: crystal structure of ORF8 (chains in black and gray) showing the crystal contact region; (C) bottom: five AlphaFold2 models (cyan) aligned to one of the chains (gray), with the crystal contact region 60–86 highlighted in orange; (D) ORF8 crystal structure colored according to the B‐factor coloring scale. The crystal contact region (res. 60–86) and the high B‐factor region (res. 104–110) are encircled. Table STQA1. Disagreements between all available EMA methods when selecting top models for every target. Table STQA2. Models selected by the EMA‐jury algorithm for each target using CAD_score as the pairwise structural comparison method (models with the EMA_jury score > 0.6 are colored green, red otherwise). Table STQA3. Models selected by EMA‐jury algorithm for each target using LDDT as the pairwise structural comparison method (models with the EMA_jury score > 0.6 are colored green, red otherwise). Click here for additional data file.
  53 in total

1.  LGA: A method for finding 3D similarities in protein structures.

Authors:  Adam Zemla
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

2.  Assessment of template based protein structure predictions in CASP9.

Authors:  Valerio Mariani; Florian Kiefer; Tobias Schmidt; Juergen Haas; Torsten Schwede
Journal:  Proteins       Date:  2011-10-15

3.  CAD-score: a new contact area difference-based function for evaluation of protein structural models.

Authors:  Kliment Olechnovič; Eleonora Kulberkytė; Ceslovas Venclovas
Journal:  Proteins       Date:  2012-09-29

4.  CASP6 data processing and automatic evaluation at the protein structure prediction center.

Authors:  Andriy Kryshtafovych; Maciej Milostan; Lukasz Szajkowski; Pawel Daniluk; Krzysztof Fidelis
Journal:  Proteins       Date:  2005

5.  Driven to near-experimental accuracy by refinement via molecular dynamics simulations.

Authors:  Lim Heo; Collin F Arbour; Michael Feig
Journal:  Proteins       Date:  2019-06-24

6.  Assessment of model accuracy estimations in CASP12.

Authors:  Andriy Kryshtafovych; Bohdan Monastyrskyy; Krzysztof Fidelis; Torsten Schwede; Anna Tramontano
Journal:  Proteins       Date:  2017-09-08

7.  Assessment of the assessment: evaluation of the model quality estimates in CASP10.

Authors:  Andriy Kryshtafovych; Alessandro Barbato; Krzysztof Fidelis; Bohdan Monastyrskyy; Torsten Schwede; Anna Tramontano
Journal:  Proteins       Date:  2013-08-31

8.  Assessment of the model refinement category in CASP12.

Authors:  Ladislav Hovan; Vladimiras Oleinikovas; Havva Yalinca; Andriy Kryshtafovych; Giorgio Saladino; Francesco Luigi Gervasio
Journal:  Proteins       Date:  2017-11-29

9.  The PSIPRED Protein Analysis Workbench: 20 years on.

Authors:  Daniel W A Buchan; David T Jones
Journal:  Nucleic Acids Res       Date:  2019-07-02       Impact factor: 16.971

10.  Pcons.net: protein structure prediction meta server.

Authors:  Björn Wallner; Per Larsson; Arne Elofsson
Journal:  Nucleic Acids Res       Date:  2007-06-21       Impact factor: 16.971

View more
  8 in total

1.  Critical assessment of methods of protein structure prediction (CASP)-Round XIV.

Authors:  Andriy Kryshtafovych; Torsten Schwede; Maya Topf; Krzysztof Fidelis; John Moult
Journal:  Proteins       Date:  2021-10-07

2.  Effect of an Amyloidogenic SARS-COV-2 Protein Fragment on α-Synuclein Monomers and Fibrils.

Authors:  Asis K Jana; Chance W Lander; Andrew D Chesney; Ulrich H E Hansmann
Journal:  J Phys Chem B       Date:  2022-05-17       Impact factor: 3.466

Review 3.  Methodology-Centered Review of Molecular Modeling, Simulation, and Prediction of SARS-CoV-2.

Authors:  Kaifu Gao; Rui Wang; Jiahui Chen; Limei Cheng; Jaclyn Frishcosy; Yuta Huzumi; Yuchi Qiu; Tom Schluckbier; Xiaoqi Wei; Guo-Wei Wei
Journal:  Chem Rev       Date:  2022-05-20       Impact factor: 72.087

4.  Modeling SARS-CoV-2 proteins in the CASP-commons experiment.

Authors:  Andriy Kryshtafovych; John Moult; Wendy M Billings; Dennis Della Corte; Krzysztof Fidelis; Sohee Kwon; Kliment Olechnovič; Chaok Seok; Česlovas Venclovas; Jonghun Won
Journal:  Proteins       Date:  2021-10-05

5.  Applying and improving AlphaFold at CASP14.

Authors:  John Jumper; Richard Evans; Alexander Pritzel; Tim Green; Michael Figurnov; Olaf Ronneberger; Kathryn Tunyasuvunakool; Russ Bates; Augustin Žídek; Anna Potapenko; Alex Bridgland; Clemens Meyer; Simon A A Kohl; Andrew J Ballard; Andrew Cowie; Bernardino Romera-Paredes; Stanislav Nikolov; Rishub Jain; Jonas Adler; Trevor Back; Stig Petersen; David Reiman; Ellen Clancy; Michal Zielinski; Martin Steinegger; Michalina Pacholska; Tamas Berghammer; David Silver; Oriol Vinyals; Andrew W Senior; Koray Kavukcuoglu; Pushmeet Kohli; Demis Hassabis
Journal:  Proteins       Date:  2021-12

6.  Evaluation of Deep Neural Network ProSPr for Accurate Protein Distance Predictions on CASP14 Targets.

Authors:  Jacob Stern; Bryce Hedelius; Olivia Fisher; Wendy M Billings; Dennis Della Corte
Journal:  Int J Mol Sci       Date:  2021-11-27       Impact factor: 5.923

7.  Application of Homology Modeling by Enhanced Profile-Profile Alignment and Flexible-Fitting Simulation to Cryo-EM Based Structure Determination.

Authors:  Yu Yamamori; Kentaro Tomii
Journal:  Int J Mol Sci       Date:  2022-02-10       Impact factor: 5.923

8.  Training undergraduate research assistants with an outcome-oriented and skill-based mentoring strategy.

Authors:  Dennis Della Corte; Connor J Morris; Wendy M Billings; Jacob Stern; Austin J Jarrett; Bryce Hedelius; Adam Bennion
Journal:  Acta Crystallogr D Struct Biol       Date:  2022-07-14       Impact factor: 5.699

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.