| Literature DB >> 32276988 |
Fabrizio Pucci1, Mehari B Zerihun1,2,3, Emanuel K Peter1, Alexander Schug1.
Abstract
RNA molecules play many pivotal roles in a cell that are still not fully understood. Any detailed understanding of RNA function requires knowledge of its three-dimensional structure, yet experimental RNA structure resolution remains demanding. Recent advances in sequencing provide unprecedented amounts of sequence data that can be statistically analyzed by methods such as direct coupling analysis (DCA) to determine spatial proximity or contacts of specific nucleic acid pairs, which improve the quality of structure prediction. To quantify this structure prediction improvement, we here present a well curated data set of about 70 RNA structures of high resolution and compare different nucleotide-nucleotide contact prediction methods available in the literature. We observe only minor differences between the performances of the different methods. Moreover, we discuss how robust these predictions are for different contact definitions and how strongly they depend on procedures used to curate and align the families of homologous RNA sequences.Keywords: RNA contact prediction; RNA structure prediction; direct coupling analysis; multiple sequence alignment
Mesh:
Substances:
Year: 2020 PMID: 32276988 PMCID: PMC7297115 DOI: 10.1261/rna.073809.119
Source DB: PubMed Journal: RNA ISSN: 1355-8382 Impact factor: 4.942
FIGURE 1.(A) Prediction performances of the different methods analyzed in this paper measured by PPV as a function of the number of top scoring contacts. All contacts that are separated along the sequence by at least 4 nt are considered. (B) Averaged PPV of all prediction methods as a function of the effective number of sequences Meff.
Performance of the DCA-based methods analyzed on the different data sets
FIGURE 2.Prediction performances of the methods on the DHigh and DLow data sets. Only contacts that are separated along the sequence of at least 4 nt are considered here.
FIGURE 3.(A) Averaged PPV of all prediction methods as a function of the BIT score value for the chosen RFAM family. (B) Table of comparison for PPV as influenced by different values of Meff and BIT score.
Positive predicted values (PPVs) according to the type of contact considered
Accuracy of the different DCA-based methods for the prediction of the long-range tertiary contacts
Run-time comparison of the different DCA-based methods
Accuracy of the mean-field DCA for different contact definitions classified according to the distance threshold and the atoms used in the computation of the nucleotide pair distance
Impact of the MSA construction, alignment and trimming on the performances of the mean-field DCA contact prediction method
FIGURE 4.(A) Contact map of the adenine riboswitch from Vibrio vulnificus: in orange and red, the correctly and wrongly predicted contacts in the top 35 pairs, respectively, while in blue all other contacts from PDB structure 4TZX. In B we plot its secondary structure within orange all correctly predicted WC base pairs in the top 35 pairs.
Predicted positive values for different number of contacts N and dfferent contact threshold definitions for the adenine riboswitch from Vibrio vulnificus