Literature DB >> 34819324

Evaluation of the stereochemical quality of predicted RNA 3D models in the RNA-Puzzles submissions.

Francisco Carrascoza1, Maciej Antczak1,2, Zhichao Miao3,4, Eric Westhof5, Marta Szachniuk1,2.   

Abstract

In silico prediction is a well-established approach to derive a general shape of an RNA molecule based on its sequence or secondary structure. This paper reports an analysis of the stereochemical quality of the RNA three-dimensional models predicted using dedicated computer programs. The stereochemistry of 1052 RNA 3D structures, including 1030 models predicted by fully automated and human-guided approaches within 22 RNA-Puzzles challenges and reference structures, is analyzed. The evaluation is based on standards of RNA stereochemistry that the Protein Data Bank requires from deposited experimental structures. Deviations from standard bond lengths and angles, planarity, or chirality are quantified. A reduction in the number of such deviations should help in the improvement of RNA 3D structure modeling approaches.
© 2022 Carrascoza et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

Entities:  

Keywords:  3D structure prediction; RNA structure; RNA-Puzzles; quality validation; stereochemistry

Mesh:

Substances:

Year:  2021        PMID: 34819324      PMCID: PMC8906551          DOI: 10.1261/rna.078685.121

Source DB:  PubMed          Journal:  RNA        ISSN: 1355-8382            Impact factor:   4.942


INTRODUCTION

Knowledge of the RNA atomic structure is crucial to address biological problems, therefore computational tools for the prediction of RNA three-dimensional models from the sequence have been developed to help or bypass some hurdles of laboratory procedures (Lukasiak et al. 2015; Miao and Westhof 2017; Gumna et al. 2020; Li et al. 2020; Magnus et al. 2020). The first decade of the 21st century resulted in several computer programs and protocols, which paved the way for automated modeling of RNA 3D structures: S2S (Jossinet and Westhof 2005), FARFAR (Das and Baker 2007), iFoldRNA (Ding et al. 2008), MC-Fold/MC-Sym (Parisien and Major 2008), and NAST (Jonikas et al. 2009). Some of them developed into highly specialized programs, which are used for either fully automatic or human-guided prediction. In the following years, this collection grew to include other tools such as ModeRNA (Rother et al. 2011), RNAComposer (Popenda et al. 2012), 3dRNA (Zhao et al. 2012), Vfold (Xu et al. 2014), and SimRNA (Boniecki et al. 2016). To stimulate the improvement of quality in RNA prediction, RNA-Puzzles was organized 10 yr ago (Cruz et al. 2012). RNA-Puzzles is a community-wide assessment of RNA 3D structure prediction that aims to understand the bottlenecks in current RNA 3D structure prediction to promote the improvement of prediction methods. Before the publication of an experimentally determined RNA structure, the sequence is disseminated among the community and prediction results are submitted within 3–4 wk. Assessment against the experimental structure is performed after the release of the structure. There are two categories of challenges, depending on the protocols used to obtain the models: They can originate from fully automated web services or human experts running various prediction programs. The starting point for each challenge is a novel experimentally determined RNA 3D structure, the conformation of which is unknown to the predictors. The web servers have 48 h and human experts 3–4 wk for submitting their models. After the deadline, the predictions are evaluated and the results are published with the ranking of the submitted models. Presently, 28 crystallographic structures have been part of the contest. Eighteen of them have been the basis of four scientific papers published by the RNA-Puzzles community (Cruz et al. 2012; Miao et al. 2015, 2017, 2020). As of October 2020, 22 challenges have been concluded with assessment results available on the RNA-Puzzles website (http://www.rnapuzzles.org). It provides accuracy assessments determined in comparison with the reference structure and calculation of several global similarity and distance measures (Magnus et al. 2020): root mean square deviation (RMSD) (Kabsch 1978); deformation index (DI) that normalizes RMSD with the sequence length (Parisien et al. 2009); interaction network fidelity (INF), including Watson–Crick, noncanonical, and stacking interactions (Parisien et al. 2009); and, more recently, mean of circular quantities operating in torsion angle space (Zok et al. 2014; Wiedemann et al. 2017). RMSD serves as the main criterion to rank the predicted models, although it is only capable of assessing the minimum average distance between two 3D structures represented as two sets of atomic coordinates. The remaining metrics allow a focus on base pairs and torsion angles. Additionally, RNA-Puzzles uses the Clashscore—as defined in the MolProbity software package (Williams et al. 2018)—for assessing the accuracy in a noncomparative procedure by finding overlapping or too close atoms in the models and used as an overall evaluation of the stereochemistry. Nevertheless, current biological problems are setting new thresholds of what acceptable geometry qualities should be. Catalytic features, for instance, highlight that not only the model's geometry is important, but also its stereochemistry is an important factor as well. One example is the torsion-angle-based dependence between the active and nonactive conformation of base pairs in some ribozyme active sites (White et al. 2018). Moreover, self-cleaving ribozymes can provide another example, in which the correct description of phosphate backbone stereochemistry is critical to correctly assess the reaction pathways of these mechanisms (Teplova et al. 2020). Yet another recent case is drug development that targets RNA (e.g., against viruses) (Aftab et al. 2020). Therefore, there is a clear need to advance technology to provide useful and trustable tools capable to address these challenges. Proper stereochemistry is at the core of biomolecular structure modeling. The geometries and stereochemistry of the nucleic acid building blocks are very well known and with high precision (Clowney et al. 1996; Gelbin et al. 1996; Schneider et al. 1996). Inaccuracies in molecular geometry can result from geometry optimizations that fall into local minima that may lead to a metastable conformer, different from the native one or another biologically irrelevant conformation. Inappropriate geometry may mask incorrect choice in torsion-angle space (for example, a base incorrectly in the syn conformer can lead to geometrical distortions in the sugar–phosphate backbone). Biomolecular structures are extremely well fine-tuned and the whole variety of physicochemical interactions is exploited in the folded native structure. Neglect of some type of interactions, or an inappropriate calibration, can lead to wrong conformations that can produce molecular distortions under insufficiently controlled structural refinement (Popenda et al. 2021). Here, we revisit the evaluation of the stereochemistry of predicted models beyond interatomic noncovalent distances. We follow a routine recommended to experimenters who deposit their structure data in the Protein Data Bank (Berman et al. 2000) and the Biological Magnetic Resonance Data Bank (Ulrich et al. 2008)—both contributing to the wwPDB partnership (Berman et al. 2003). wwPDB stresses the importance of careful examination of structures by providing tools that set the standards for 3D structure submission. In 2017, it introduced OneDep—a unified system applying the deposition, biocuration, and validation pipelines for structural data (Gore et al. 2017; Young et al. 2017). OneDep is an extensive suite of programs operating on different metrics to assess the accuracy of structures. It implements stereochemistry analysis through MAXIT (Feng et al. 1998; Berman et al. 2000). To evaluate the stereochemistry of RNA tertiary structure predictions, we analyzed the results of all RNA-Puzzles challenges with the standardized data available as of November 2020—that is, puzzles 1–15, 17–21, and 24 (puzzle 14 in two versions, bound 14a and free 14b)—and 22 corresponding reference structures. We downloaded 1030 predicted RNA models from the standardized data set belonging to RNA-Puzzles resources (located at https:// github.com/RNA-Puzzles; Magnus et al. 2020). Among those, 797 models were in the human category and 233 models in the web server category. From these data, we created 23 clusters by participants—each containing models submitted by a single human group or a web server (Table 1). An additional 24th cluster included the reference structures (Table 2). We processed the structures in all of these subsets using MAXIT software (Feng et al. 1998; Berman et al. 2000) and compared results with the MolProbity software (Williams et al. 2018). Next, we used Barnaba (Bottaro et al. 2019) and X3DNA-DSSR (Lu and Olson 2003) to verify base-paring geometries and handedness of helices, respectively. Finally, we conducted a simple statistical analysis by computing the average value, standard deviation, and median for every subset including more than one model (see Materials and Methods).
TABLE 1.

Clusters of RNA 3D models predicted within RNA-Puzzles organized by participants

TABLE 2.

Cluster with the reference structures and their prediction-related data

Clusters of RNA 3D models predicted within RNA-Puzzles organized by participants Cluster with the reference structures and their prediction-related data

RESULTS

For each model, MAXIT returned a report of abnormal stereochemical parameters (falling into six categories: close contacts, bond length deviations, bond angle deviations, deviation from planarity, chirality issues, and phosphate bond linkages (Supplemental Material includes tables with the error numbers in every model). Using MAXIT, we examined them first for the subset of 22 reference structures (Fig. 1; Supplemental Table S1). Most of them contained some types of geometrical deviations from standard dictionaries. We found the highest incidence of errors in the bond angles (183 errors in 17 structures), followed by close contacts (54 errors in seven structures) and bond lengths (32 errors in five structures). Among the worst cases (PZ07, PZ01, and PZ21), two are for structures at a resolution worse than 2.5 Å (cf. Supplemental Fig. S1 in the Supplemental Material). The software X3DNA-DSSR (Lu and Olson 2003) does not reveal any left-handed helix/dinucleotide step in RNA-Puzzles submissions nor experimentally determined RNA 3D structures (cf. Supplemental Tables S46–S69 in the Supplemental Material). In Figure 1, one can also observe that there are no chirality issues, while deviations from planarity occur only in two instances (nine errors in total). For polymer linkage (i.e., deviations in P–O bond lengths), we found seven structures with a total of nine reported inaccuracies, making an average of less than one error per structure, the same as for errors in planarity.
FIGURE 1.

Stereochemical errors were reported by MAXIT for the reference structures.

Stereochemical errors were reported by MAXIT for the reference structures. We have analyzed separately clusters with models predicted by human experts and web servers. Each of these 23 collections contains the predictions submitted by one participant within all the considered challenges available in the standardized data set of RNA-Puzzles resource (cf. Table 1). Their cardinalities range from 1 to 188. Within each of these clusters, except for those including only one model (i.e., H10, H14, and H16), we determined the total number of errors of each type (Supplemental Tables S2–S24 in the Supplemental Material), the average number over all the errors and the standard deviation (Fig. 2) and confirmed these results using MolProbity software, version 4.5.1 (Supplemental Figs. S2–S4; Supplemental Tables S71–S92 in the Supplemental Material; Williams et al. 2018). We did the same for each of the six types of stereochemical properties; we further computed the average value of each error and the standard deviation per cluster (Fig. 3). One can observe that some of the applied prediction methods have an advantage over others in terms of the total number of errors. However, most submissions have stereochemical issues to address. Interestingly, there is no visible difference between the qualities of human versus web server predictions as far as the average number of all the inaccuracies is concerned. In both categories, we can observe both good and bad scores. The average number of errors per model in the human category equals 106, while in the web server category it is 103.
FIGURE 2.

The number of stereochemical errors identified in all the considered models by participants. The white dot at the center of every violin plot represents a median. The black bar corresponds to the interquartile range. The first and the third quartile are represented as wicks up and down from the interquartile range. The violin shape shows error distribution.

FIGURE 3.

The number of identified stereochemical errors per error type and participant. The box plot of each participant shows the interquartile range. The black middle line in every box depicts a median. The first and third quartiles are represented as wicks up and down from the interquartile range. Separated dots outside boxes correspond to outliers.

The number of stereochemical errors identified in all the considered models by participants. The white dot at the center of every violin plot represents a median. The black bar corresponds to the interquartile range. The first and the third quartile are represented as wicks up and down from the interquartile range. The violin shape shows error distribution. The number of identified stereochemical errors per error type and participant. The box plot of each participant shows the interquartile range. The black middle line in every box depicts a median. The first and third quartiles are represented as wicks up and down from the interquartile range. Separated dots outside boxes correspond to outliers. Due to the significant difference between cluster cardinalities, there is no statistical consistency between them, but there is statistical consistency within each cluster—the results of a single participant. For instance, the H9 set, for which the total score is significant in Figure 2, has only 32 items—we should remember that in a small set, one highly defective object significantly affects the average value and standard deviation. The most numerous clusters (over 100 models) are H2, H3, H4, and H5. The sets labeled as H1, H6, W2, and W6 include 50–100 items. The remaining ones have less than 50 models each. By clustering and comparing the predictions submitted, one can observe that H1 (average total number of errors, ATN = 0), H4 (ATN = 7.69), and H8 (ATN = 11.60) groups apply methods performing the best in the category of human experts; W2 (ATN = 1.30), W5 (ATN = 7.61), and W4 (ATN = 31.20) are most successful among the web servers—their average number of all errors per model is less than 50 (Fig. 2). For clusters H2, H3, H5, H6, H11, H12, H13, and W1, the average total number of inaccuracies is in the range of 50–200. However, the significant standard deviations indicate a large spread in stereochemical issues for the prediction methods used to obtain the models collected in these clusters. In the other clusters, the average total number of errors falls in the range of 200–350 (if we do not consider single-model clusters). A comparison between Figures 1 and 2 shows the gap between reference structures and predicted models. The most notable conformational errors in predicted models occur in bond lengths and angles. On average, MAXIT has identified over a hundred of such inaccuracies per the predicted RNA 3D model and less than 10 per the reference structure (on average). Chirality (or incorrect sugar substituent) is correct in the experimentally determined RNAs, while 28% of the predicted models have problems with it. Quite many deviations from the average planes of aromatic rings are observed. Polymer linkage assesses bond lengths between the adjacent nucleotides by measuring the P–O bond distances. This parameter has the lowest error rate in computationally generated structures—errors of this type occurred in 22% of all analyzed RNAs. Figure 3 presents MAXIT results separately for each error type and allows us to take a closer look into the weaknesses of the protocols embedded within various prediction programs. The plots reveal the highest number of inaccuracies especially in bond angles (70,184). Virtually every prediction method generates errors in covalent geometries, and the exceptional models with no such issue are not necessarily the most similar to the reference structure(s) in terms of overall RMSD. In the human category, models collected in H1, H4, and H8 clusters have little or no geometric issues (although, at the same time, four H4 models in puzzle 19 have the largest bond length error with O5′−C5′ length >100 Å), while predictions in H7, H9, and H15 are among those with the highest number of inaccuracies. In the web server category, W2, W4, and W5 perform the best as far as bond lengths and angles are concerned, while W3, W6, and W7 are at the end of the ranking. If we consider deviations from ring planarity (Fig. 3), of which the total number is 17,594, their average per model for every cluster is below 90 errors. Models in H6, H9, and H13 have a significant number of these issues. The largest identified deviation from planarity equals 0.791 Å and occurred in H15 model 1 predicted for puzzle 12. An example error of this type is depicted in Supplemental Figure S5. The average number of chirality errors is below 20 for all clusters. For some clusters, MAXIT reported zero or one issue of this type in total: H1, H8, and H13 within the human category, and W2, W3, and W5 within the web servers. Let us add that for H4—having the longest track of submissions—the number of errors in this category is also negligible. Some approaches (H2, H15, and W4) scored higher as far as the average number of chirality inaccuracies is concerned. In total, 2130 abnormalities classified by MAXIT as chirality errors occurred in 291 predicted models. The most common form of chiral error is the interchange of the hydroxyl group and hydrogen atom on the same carbon atom at the ribose moiety (Fig. 4). Such an interchange does not lead to a chiral error; it produces another sugar type (for example arabinose or xylose instead of ribose). Such improper sugar construction represents 94.9% of all chiral errors identified by MAXIT. The remaining 5.1% are planar inaccuracies in the sugar ring, and they occur when the improper torsion angle at a sp3 carbon atom is close to zero instead of being around −122 or +122 degrees. Such a situation occurs in the furanose ring with distorted or flat sugar rings.
FIGURE 4.

Example chiral errors in H3 model from PZ07 (top) and H4 model from PZ06 (bottom). (Top left) C3′ atom in U82 with correct chiral center. (Top right) U15 with incorrect chiral inversion at carbon atom C3′, actually changing a ribose to a xylose moiety. (Bottom left) A2 with correct chirality at C4′. (Bottom right) G82 with incorrect chirality at C4′.

Example chiral errors in H3 model from PZ07 (top) and H4 model from PZ06 (bottom). (Top left) C3′ atom in U82 with correct chiral center. (Top right) U15 with incorrect chiral inversion at carbon atom C3′, actually changing a ribose to a xylose moiety. (Bottom left) A2 with correct chirality at C4′. (Bottom right) G82 with incorrect chirality at C4′. A distribution of chiral errors among nucleotides is shown in Table 3. We can see a high frequency in guanine (692 inaccuracies, which make 32.5% of all chiral errors) and a lower one for uracil (410 inaccuracies, which make 19.2% of all chiral errors). This relationship is visible for both sugar construction inversions and planar errors. However, the frequencies are affected by the nucleotide content in the analyzed RNA structures. Thus, in Table 3, we also present the total number of adenines, cytosines, guanines, and uracils in the analyzed data set, and the percentage of these nucleotides having erroneous chirality. Let us add that among all nucleotides with chiral errors, 91% are anti while 9% are syn nucleotides. A similar distribution is observed for each of the individual nucleotide types. Syn/anti conformation characterizes a relative orientation of base and sugar and is determined based on the χ-torsion angle (defined by O4′-C1′-N1-C2 for pyrimidines and O4′-C1′-N9-C4 for purines). Usually, χ falls into the ranges [+90, +180] or [−180, −90] corresponding to the anti conformation. Occasionally, we observe its value in [−90, +90], which refers to the syn conformation. Some chiral errors (11%) appear when the conformation of a nucleotide in the predicted model differs from that in the reference structure. However, in most cases (89%), these errors cannot result from the conformation change (Table 4). Regarding the distribution of errors among chiral atomic centers, we can observe that 43% of inaccuracies occur at C4′ (cf. Fig. 4), ∼25% at C2′ and C3′ (cf. Fig. 4), and only 5.9% at C1′ atom (Table 5).
TABLE 3.

Chiral or planar sp3 atom errors by nucleotide type

TABLE 4.

Chiral or planar sp3 atom errors by nucleotide type and conformation (anti or syn) depending on whether it is different or the same as in the reference structure

TABLE 5.

Chiral or planar sp3 atom errors

Chiral or planar sp3 atom errors by nucleotide type Chiral or planar sp3 atom errors by nucleotide type and conformation (anti or syn) depending on whether it is different or the same as in the reference structure Chiral or planar sp3 atom errors Polymer linkage errors are rare for most predicted models (Supplemental Fig. S6). We have found that MAXIT may report false positives in this category—whenever it comes to the truncation of the sequence in the model, MAXIT fails to recognize different chains properly. This kind of artifact is clear for the reference structures (e.g., PZ12 and PZ21). However, when it comes to RNA 3D models predicted by web servers or human experts, we have not found such false positives since different chains are labeled correctly by the prediction methods (∼10% of submissions contain double-chain models). Thus, polymer linkage inaccuracies depicted in Supplemental Figure S6 are true-positive errors, and they come from incorrect linkage bond length between oxygen and phosphate group of two neighboring nucleotides in the polymer chain (Supplemental Fig. S6). Such errors were suspected to occur during the assembly building process of RNA fragments since generally nucleotides start at 5′-P and end at O3′; however, our analysis revealed that models predicted by assembly-based methods did not show errors of this type. The analysis of the RNA-Puzzles data set containing 1052 RNAs reveals 2431 polymer linkage errors, including 2422 errors in 230 predicted models and nine errors in seven reference structures. H2 and H7 clusters have the highest average number of these errors among human expert predictions. For the web servers, MAXIT has found the highest number of this type of inaccuracy in W6 and W7. The remaining prediction methods do not tend to generate errors in this category. By default, MAXIT reports such error whenever the distance between oxygen atom O3′ and phosphorus atom P of the next nucleotide in the polymer chain is longer than a typical length of a covalent bond between these atoms (Schneider et al. 1996). Some of these errors are small deviations, but major ones occur as well. For example, in H15 model three predicted within Puzzle 13, MAXIT identified a bond of length 82.52 Å between A70 and A71—it is the highest inaccuracy of this type identified within the data set. The distribution of errors by the nucleotide type and syn/anti conformation is presented in Table 6. One can observe that linkage errors involving adenine (19%) are least frequent, and those with guanine (32%) occur most often. However, as a function of the relative contents of the four nucleotides in the analyzed RNA molecules, cytosine has the most linkage errors and adenine the least.
TABLE 6.

Polymer linkage errors by the types and conformation (anti or syn) of nucleotides depending on whether they have the same or different conformation as in the reference structure

Polymer linkage errors by the types and conformation (anti or syn) of nucleotides depending on whether they have the same or different conformation as in the reference structure Then, we analyzed the data set with the Barnaba software (Bottaro et al. 2019), and we computed the backbone root mean square deviation (herein called BBRMSD) and base-pairing interactions root mean square deviations (eRMSD) for nucleobases. BBRMSD can be interpreted as the quality of the backbone structures expressed in RMSD values. eRMSD was taken to measure the quality of base pairs in the structures. Both values were computed using the reference structure given by the RNA-Puzzles organizers. eRMSD values <8 Å are considered as low. For values <5 Å, both the reference structure and target structure are very similar (Bottaro et al. 2019). Our results (see Supplemental Fig. S7A) show that across all the puzzles, there is no sensitive trend in the average measurement of both BBRSMD and eRMSD. In the average RMSD values across participants (see Supplemental Fig. S7B), groups H1 to H4 and H12 performed better than the automated protocols, with the exception of W7, which showed the lowest average in both, backbone and base-pair quality. Some groups, from H6 to H9, have lower eRMSD than BBRMSD values, suggesting these groups focus their attention on the base pairs rather than the backbone structure. Other groups, like H11 and W5, perform better at deducing the volumetric backbone shape, but they have relatively worst base-pair performance (Supplemental Fig. S7B).

DISCUSSION

Stereochemical errors in the predicted RNA 3D models are primarily generated by computer programs used in both human-guided and web server prediction. In many cases, these may be the result of relatively small rounding errors appearing at one of the calculation steps and propagating in subsequent iterations. Knowing the general approach used by the prediction method, it is possible to indicate the most sensitive stages at which the errors arise. Computational complexity is the main problem in de novo simulation of RNA folding. Therefore, different techniques are used to reduce the time cost of de novo prediction methods. One of them is to evaluate the fold using simplified potentials, which do not take stereochemical parameters into account. The simulation (e.g., Monte Carlo) converges toward an optimum defined in terms of the overall 3D shape of the molecule and gives a fold that is stereochemically oversimplified and far from ideal. The large number of calculations that are performed during the random sampling of the solution space also affects the generation of errors, as the errors that occur in one step are propagated further and often deteriorate the final solution. Another related problem concerns coarse-grained simulations. The transformation of a coarse-grained model to a full-atom model is a highly erroneous procedure. In template-based approaches (homology modeling, and fragment assembly methods), a crucial moment is the choice of the right template/fragment. A model based on a stereochemically erroneous template or of low-resolution may incorporate the incorrect stereochemical parameters. Stereochemical errors may also arise during nucleobase exchange, structural blocks insertion, or their assembly into a larger whole. In the latter case, the choice of the structural blocks that are rotated and translated is critical, since maneuvering a larger element is more erroneous. Errors that arise during the structure modeling process can be avoided when applying a function that validates partial solutions based on their stereochemical parameters. However, it is very time-consuming and—in the case of de novo methods—completely unprofitable since it may cause the method not to return the solution in a reasonable time. Therefore, the best solution to the problem is to improve stereochemistry postfactum by minimizing the geometry or the energy after building the model, but bad local geometries are not easy to relieve when embedded in a large fold. At the same time, rerefining the templates used with standard dictionaries may alleviate the propagation of errors, leading to tight conformers with stereochemical errors difficult to energy minimize. In some tools, like FARFAR (Das and Baker 2007; Das et al. 2010) and RNAComposer (Popenda et al. 2012; Purzycka et al. 2014; Antczak et al. 2016), such a procedure has been implemented and successfully fulfills its role. Erroneous bond lengths, bond angles, and planarity deviations are the most frequent errors in RNA 3D structure prediction, while incorrect sugar constructions or chirality and polymer linkage errors occur less frequently (∼10 issues per structure on average). False-positive errors, which are caused by improper identification of structural chains in multichain RNA structures, are found in the polymer linkage category of the MAXIT results. Most errors can be compensated by running energy minimization protocols—for example, CYANA (Güntert and Buchner 2015), NAMD (Phillips et al. 2020), XPLOR-NIH (Schwieters et al. 2003)—for the preliminary models or ensuring a proper stereochemistry from the early stages of prediction. One can also process the predicted RNA structures using tools—for example, RNAfitme (Zok et al. 2015; Antczak et al. 2018) or QRNAS (Stasiewicz et al. 2019)—having the potential to refine the nucleic acid structure.

Conclusions

We found that most RNA 3D structure prediction methods evaluated within RNA-Puzzles—either in human or web server category—generate models with some incorrect stereochemical parameters. Even the best models, according to the RMSD-based rankings, are not free of such errors. One could argue that one can generate easily a very precise model that is inaccurate and that precision in geometric and stereochemical parameters are of lesser importance. These geometric and stereochemical parameters are very well established and need to be implemented to be helpful in the future for modeling structures with catalytic or fine recognition properties. Thus, a similarity/distance measure assessing a model against a reference structure cannot be the only reliable indicator of the model quality and that all the predictors should ensure the stereochemical accuracy of their models before submission. We suggest that a detailed stereochemical analysis should enter regular evaluation processes for improving the accuracy of RNA-Puzzles submissions and promoting high-quality RNA 3D structure prediction.

MATERIALS AND METHODS

In this research, we used MAXIT version 10 downloaded from RCSB PDB (https://sw-tools.rcsb.org/apps/MAXIT), Barnaba 0.1.7 obtained from https://github.com/srnas/barnaba (Bottaro et al. 2019), MolProbity 4.5.1 taken from https://github.com/rlabduke/MolProbity, (Williams et al. 2018), and X3DNA-DSSR, version 2.4 (Lu and Olson 2003). Structures were divided into 24 subsets: one subset with the reference structures and 23 subsets with predicted models (one for each participant), and the average values and standard deviations were computed for them. Three clusters, H10, H14, and H16, including predictions by human groups, were excluded from the statistical analysis since, in all the challenges, these groups submitted only one model each. However, their MAXIT reports are also available in the Supplemental Material. MAXIT reports the following stereochemical issues: (1) close contacts; (2) bond length deviations; (3) bond angle deviations; (4) deviations from planarity; (5) chirality errors; and (6) polymer linkage errors (the P–O bond lengths). For 1–3 and 6, the program identifies abnormality if the parameter exceeds the expected value six times the standard σ value. The expected values and attached σ's are based on Clowney et al. (1996), Gelbin et al. (1996), Parkinson et al. (1996), and Schneider et al. (1996) and are in the Supplemental Material. The current source of these reference terms is the Cambridge Structural Database (Urzhumtseva et al. 2009; Tickle 2012; Bruno and Groom 2014). Atomic clashes are signaled whenever any intermolecular atom pair is closer than the sum of their respective van der Waals radii. In general, a clash is defined when the distance between two atoms is <2.2 Å (if no H atom is involved) or 1.6 Å (if one H atom is involved) (cf. Supplemental Table D1 in the Supplemental Material). Departures from the average best-fit plane center yield the RMS deviations for all atoms from the plane. It is reported when >6 × 0.02 Å or when at least one atom has a deviation >0.02 Å. MAXIT also determines improper torsion angles in the furanose ring and reports deviations from puckering in the ring (imposed by the sp3 carbon atoms). In the case of chirality assessment, MAXIT lists the residues that contain unexpected configuration of chiral centers (C1′, C2′, C3′, and C4′). Improper dihedrals (Gelbin et al. 1996) are a measure of the chirality/planarity of the structure at a specific atom. Polymer linkage between the adjacent nucleotides is measured based on the distances computed for O3′−P and O5′−P atom pairs. By default, the O3′−P distance is evaluated. However, if it exceeds 2.5 Å, MAXIT takes the minimum value out of these two for consideration. Figures 4–6 were prepared using Symmetry Tool Plug-in 1.3 implemented in VMD software, version 1.94 (Humphrey et al. 1996).

SUPPLEMENTAL MATERIAL

Supplemental material is available for this article.
  46 in total

1.  Announcing the worldwide Protein Data Bank.

Authors:  Helen Berman; Kim Henrick; Haruki Nakamura
Journal:  Nat Struct Biol       Date:  2003-12

2.  3DNA: a software package for the analysis, rebuilding and visualization of three-dimensional nucleic acid structures.

Authors:  Xiang-Jun Lu; Wilma K Olson
Journal:  Nucleic Acids Res       Date:  2003-09-01       Impact factor: 16.971

3.  OneDep: Unified wwPDB System for Deposition, Biocuration, and Validation of Macromolecular Structures in the PDB Archive.

Authors:  Jasmine Y Young; John D Westbrook; Zukang Feng; Raul Sala; Ezra Peisach; Thomas J Oldfield; Sanchayita Sen; Aleksandras Gutmanas; David R Armstrong; John M Berrisford; Li Chen; Minyu Chen; Luigi Di Costanzo; Dimitris Dimitropoulos; Guanghua Gao; Sutapa Ghosh; Swanand Gore; Vladimir Guranovic; Pieter M S Hendrickx; Brian P Hudson; Reiko Igarashi; Yasuyo Ikegawa; Naohiro Kobayashi; Catherine L Lawson; Yuhe Liang; Steve Mading; Lora Mak; M Saqib Mir; Abhik Mukhopadhyay; Ardan Patwardhan; Irina Persikova; Luana Rinaldi; Eduardo Sanz-Garcia; Monica R Sekharan; Chenghua Shao; G Jawahar Swaminathan; Lihua Tan; Eldon L Ulrich; Glen van Ginkel; Reiko Yamashita; Huanwang Yang; Marina A Zhuravleva; Martha Quesada; Gerard J Kleywegt; Helen M Berman; John L Markley; Haruki Nakamura; Sameer Velankar; Stephen K Burley
Journal:  Structure       Date:  2017-02-09       Impact factor: 5.006

4.  New metrics for comparing and assessing discrepancies between RNA 3D structures and models.

Authors:  Marc Parisien; José Almeida Cruz; Eric Westhof; François Major
Journal:  RNA       Date:  2009-08-26       Impact factor: 4.942

5.  VMD: visual molecular dynamics.

Authors:  W Humphrey; A Dalke; K Schulten
Journal:  J Mol Graph       Date:  1996-02

Review 6.  RNA Structure: Advances and Assessment of 3D Structure Prediction.

Authors:  Zhichao Miao; Eric Westhof
Journal:  Annu Rev Biophys       Date:  2017-03-30       Impact factor: 12.981

7.  Validation of Structures in the Protein Data Bank.

Authors:  Swanand Gore; Eduardo Sanz García; Pieter M S Hendrickx; Aleksandras Gutmanas; John D Westbrook; Huanwang Yang; Zukang Feng; Kumaran Baskaran; John M Berrisford; Brian P Hudson; Yasuyo Ikegawa; Naohiro Kobayashi; Catherine L Lawson; Steve Mading; Lora Mak; Abhik Mukhopadhyay; Thomas J Oldfield; Ardan Patwardhan; Ezra Peisach; Gaurav Sahni; Monica R Sekharan; Sanchayita Sen; Chenghua Shao; Oliver S Smart; Eldon L Ulrich; Reiko Yamashita; Martha Quesada; Jasmine Y Young; Haruki Nakamura; John L Markley; Helen M Berman; Stephen K Burley; Sameer Velankar; Gerard J Kleywegt
Journal:  Structure       Date:  2017-11-22       Impact factor: 5.006

8.  RNA-Puzzles Round III: 3D RNA structure prediction of five riboswitches and one ribozyme.

Authors:  Zhichao Miao; Ryszard W Adamiak; Maciej Antczak; Robert T Batey; Alexander J Becka; Marcin Biesiada; Michał J Boniecki; Janusz M Bujnicki; Shi-Jie Chen; Clarence Yu Cheng; Fang-Chieh Chou; Adrian R Ferré-D'Amaré; Rhiju Das; Wayne K Dawson; Feng Ding; Nikolay V Dokholyan; Stanisław Dunin-Horkawicz; Caleb Geniesse; Kalli Kappel; Wipapat Kladwang; Andrey Krokhotin; Grzegorz E Łach; François Major; Thomas H Mann; Marcin Magnus; Katarzyna Pachulska-Wieczorek; Dinshaw J Patel; Joseph A Piccirilli; Mariusz Popenda; Katarzyna J Purzycka; Aiming Ren; Greggory M Rice; John Santalucia; Joanna Sarzynska; Marta Szachniuk; Arpit Tandon; Jeremiah J Trausch; Siqi Tian; Jian Wang; Kevin M Weeks; Benfeard Williams; Yi Xiao; Xiaojun Xu; Dong Zhang; Tomasz Zok; Eric Westhof
Journal:  RNA       Date:  2017-01-30       Impact factor: 4.942

9.  Coupling between conformational dynamics and catalytic function at the active site of the lead-dependent ribozyme.

Authors:  Neil A White; Minako Sumita; Victor E Marquez; Charles G Hoogstraten
Journal:  RNA       Date:  2018-08-15       Impact factor: 4.942

10.  LCS-TA to identify similar fragments in RNA 3D structures.

Authors:  Jakub Wiedemann; Tomasz Zok; Maciej Milostan; Marta Szachniuk
Journal:  BMC Bioinformatics       Date:  2017-10-23       Impact factor: 3.169

View more
  5 in total

1.  FebRNA: An automated fragment-ensemble-based model for building RNA 3D structures.

Authors:  Li Zhou; Xunxun Wang; Shixiong Yu; Ya-Lan Tan; Zhi-Jie Tan
Journal:  Biophys J       Date:  2022-08-17       Impact factor: 3.699

2.  RNAspider: a webserver to analyze entanglements in RNA 3D structures.

Authors:  Kamil Luwanski; Vladyslav Hlushchenko; Mariusz Popenda; Tomasz Zok; Joanna Sarzynska; Daniil Martsich; Marta Szachniuk; Maciej Antczak
Journal:  Nucleic Acids Res       Date:  2022-03-29       Impact factor: 19.160

3.  RNAsolo: a repository of cleaned PDB-derived RNA 3D structures.

Authors:  Bartosz Adamczyk; Maciej Antczak; Marta Szachniuk
Journal:  Bioinformatics       Date:  2022-06-08       Impact factor: 6.931

4.  Computational Pipeline for Reference-Free Comparative Analysis of RNA 3D Structures Applied to SARS-CoV-2 UTR Models.

Authors:  Julita Gumna; Maciej Antczak; Ryszard W Adamiak; Janusz M Bujnicki; Shi-Jie Chen; Feng Ding; Pritha Ghosh; Jun Li; Sunandan Mukherjee; Chandran Nithin; Katarzyna Pachulska-Wieczorek; Almudena Ponce-Salvatierra; Mariusz Popenda; Joanna Sarzynska; Tomasz Wirecki; Dong Zhang; Sicheng Zhang; Tomasz Zok; Eric Westhof; Zhichao Miao; Marta Szachniuk; Agnieszka Rybarczyk
Journal:  Int J Mol Sci       Date:  2022-08-25       Impact factor: 6.208

5.  RNAloops: a database of RNA multiloops.

Authors:  Jakub Wiedemann; Jacek Kaczor; Maciej Milostan; Tomasz Zok; Jacek Blazewicz; Marta Szachniuk; Maciej Antczak
Journal:  Bioinformatics       Date:  2022-07-09       Impact factor: 6.931

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.