| Literature DB >> 23315997 |
Radhakrishnan Sabarinathan1, Hakim Tafer, Stefan E Seemann, Ivo L Hofacker, Peter F Stadler, Jan Gorodkin.
Abstract
Structural characteristics are essential for the functioning of many noncoding RNAs and cis-regulatory elements of mRNAs. SNPs may disrupt these structures, interfere with their molecular function, and hence cause a phenotypic effect. RNA folding algorithms can provide detailed insights into structural effects of SNPs. The global measures employed so far suffer from limited accuracy of folding programs on large RNAs and are computationally too demanding for genome-wide applications. Here, we present a strategy that focuses on the local regions of maximal structural change between mutant and wild-type. These local regions are approximated in a "screening mode" that is intended for genome-wide applications. Furthermore, localized regions are identified as those with maximal discrepancy. The mutation effects are quantified in terms of empirical P values. To this end, the RNAsnp software uses extensive precomputed tables of the distribution of SNP effects as function of length and GC content. RNAsnp thus achieves both a noise reduction and speed-up of several orders of magnitude over shuffling-based approaches. On a data set comprising 501 SNPs associated with human-inherited diseases, we predict 54 to have significant local structural effect in the untranslated region of mRNAs.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23315997 PMCID: PMC3708107 DOI: 10.1002/humu.22273
Source DB: PubMed Journal: Hum Mutat ISSN: 1059-7794 Impact factor: 4.878
Summary of the Notation for (Dis)Similarity Measures Between Base Pair Probabilities of Wild-Type and Mutant
| Probability distribution | |
| | Full base pairing probabilities |
| | Position-wise pairing probabilities |
| | Position-wise, upstream |
| | Position-wise, downstream |
| | Position wise, distinguishing up- and downstream |
| Similarity measure | |
| | Pearson correlation |
| | Euclidean distance ( |
| Optimization | |
| | Global |
| | Best local interval |
| | Approximate optimization for scanning |
Figure 1The scatter plot shows the rank correlation between various (dis)similarity measures, which were tested on 7,000 random sequences of length 400 nts with G+C contents between 20% and 80% and a SNP introduced at position 200. The local measures dmax and rmin based on the base pairing probability matrix P and its marginal distributions π and ξ<> (see Table 1) correlate well with each other. The distance δ of the Boltzmann distribution, on the other hand, behaves quite differently from the above P-derived measures.
Summary of Structure Distance Measures Implemented in RNAsnp
| Measure | Folding | Distr. | |
|---|---|---|---|
| RNAfold | Gumbel | 0.9982 | |
| RNAplfold | Gumbel | 0.9999 | |
| RNAplfold | Gumbel | 0.9999 | |
| RNAfold | Beta | 0.9986 |
The first column represents the distance measure, the second column lists the folding program used to computing the base pair probabilities. The remaining columns give the type of distribution used to fit the empirical P-value distribution and the correlation coefficient ρ between the fitted P values and the rank-based P values computed for a set of random sequences. The fitted P values are computed using the parameters from fitted distribution.
Figure 2(A) Density plot showing the distribution of three distance values calculated using random sequences of length 400 nts, a G+C content between 50% and 60% and the SNP position at 200 position. (B) Quantile–quantile plots of the transformed dmax computed from the base pair probabilities returned by RNAplfold against the fitted gumbel distribution. (C) Same as (B) for d#. (D) Same as (B) but the dmax computed with the base pair probabilities returned by RNAfold. (E) The density plot showing the distribution of rmin computed from the base pair probabilities returned by RNAfold and (F) quantile–quantile plot showing the transformed rmin against the beta distribution.
Figure 3Flowchart of RNAsnp program. The options “Mode 1” and “Mode 2” help to detect the SNP induced local structural changes in a given RNA sequence. The choice of the modes depends on the length of the input sequence. The option “Mode 3” helps to screen the putative structure-disruptive SNPs in an RNA sequence. This is achieved by testing the effect of all three possible substitutions at each nucleotide position. This mode facilitates an effective screening of putative structure-disruptive SNPs from transcripts or genome sequences.
Figure 4Significance of structural effects as predicted by the local (dis)similarity measures (dmax and rmin using different probability probabilities: P, π, and ξ<>) and distance (δ) between the distribution of structures for the 30 SNPs with reported secondary structure changes. See Table 1 for more details about the symbols. The P values are shown as bars and the dashed line represents the selected threshold value 0.1. The four experimentally validated examples are indicated in green/gray. The SNPs were described according to HGVS nomenclature.
The Results of RNAsnp and SNPfold for the Four Cases with Experimentally Verified Structures
| SNPfold | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| References | Gene | SNP | Local reg. | max | Interval | Interval | Interval | |||||||
| 1 | p53 | c.[51A>G;54A>C;57T>C] | 257–262 | 241 | 0.009 | 241–285 | 0.004 | 241–290 | 0.015 | 231–280 | 0.142 | 0.197 | 0.245 | 0.664 |
| 2 | NS5B | n.[8953C>A;8955T>G] | 9,271–9,294 | 9,270 | 0.062 | 9270–9,298 | 0.009 | 9,261–9,310 | 0.074 | 9,268–9,317 | 0.121 | 0.412 | 0.257 | 0.359 |
| 3 | AARS | c.903T>C | 980–1,032 | 994 | 0.217 | 998–1,052 | 0.123 | 975–1,025 | 0.072 | 998–1,052 | 0.093 | 0.042 | 0.065 | 0.077 |
| 4 | Nef | n.8546G>A | 8,512–8,579 | 8,517 | 0.387 | 8,517–8,555 | 0.285 | 8,518–8,576 | 0.094 | 8,518–8,567 | 0.083 | 0.116 | 0.148 | 0.293 |
The SNPs were described according to HGVS nomenclature.
(1) Alteration of RNA replication in HCV [You et al., 2004], (2) tumor formation [Grover et al., 2011], (3) HIV-1 resistance against RNAi [Westerhout et al., 2005], and (4) alteration of alanyl-tRNA synthetase expression in human [Shen et al., 1999].
Refseq/Accession ids: NS5B – AJ238799.1, p53 – NM_001126114.2, Nef – K02013.1, AARS – D32050.1.
Position of the local regions reported in literature. The start and end values are represented with respect to the sequence position of full length mRNA sequence.
Figure 5Significance of structural effects using RNAsnp (Modes 1 and 2) for the data set of 30 SNPs with reported secondary structure changes. The P values are shown as bars and the dashed line represents the selected threshold value 0.1. The four experimentally validated examples are indicated in green/gray. The SNPs were described according to HGVS nomenclature.
List of Disease-Associated SNPs that are Predicted to have Significant Local Structural Effect (P < 0.1) Based on RNAfold with the Scores dmax and rmin
| Disease/phenotype | Gene | HGMD Accession | GenBank Accession | NTs | SNP | ||
|---|---|---|---|---|---|---|---|
| Pseudohypoaldosteronism | NR3C2 | CR030126 | NM_000901.4 | 5,898 | c.-2C>G | 0.017 | 0.022 |
| Hypertension | EDN2 | CR994679 | NM_001956.3 | 1,243 | c.*390G>A | 0.036 | 0.021 |
| Obesity | CNR1 | CR073542 | NM_033181.3 | 5,373 | c.*2394A>G | 0.032 | 0.036 |
| Myocardial infarction | GP1BA | CR022116 | NM_000173.5 | 2,463 | c.-5T>C | 0.040 | 0.037 |
| Colorectal cancer | INSR | CR082021 | NM_001079817.1 | 9,023 | c.*104A>G | 0.042 | 0.030 |
| Graves’ disease | FCRL3 | CR067134 | NM_052939.3 | 3,019 | c.-11G>C | 0.011 | 0.042 |
| Increased triglyceride levels | ABCA1 | CR025352 | NM_005502.3 | 10,502 | c.-279C>G | 0.044 | 0.022 |
| Insulin resistance and hypertension | RETN | CR032443 | NM_020415.3 | 478 | c.*62G>A | 0.045 | 0.043 |
| Cartilage-hair hypoplasia | RMRP | CR063417 | NR_003051.3 | 268 | n.215A>G | 0.048 | 0.027 |
| Hypercholesterolemia | LDLR | CR971948 | NM_000527.4 | 5,283 | c.-14C>A | 0.025 | 0.048 |
| Glaucoma | CYP1B1 | CR032431 | NM_000104.3 | 5,153 | c.-286C>T | 0.063 | 0.036 |
| Reduced transcriptional activity | NR3C1 | CR016150 | NM_001024094.1 | 6,787 | c.-219C>A | 0.044 | 0.063 |
| HDL cholesterol levels | LIPG | CR032437 | NM_006033.2 | 4,141 | c.*482A>G | 0.051 | 0.065 |
| Factor VII deficiency | F7 | CR090334 | NM_019616.2 | 3,059 | c.-44T>C | 0.066 | 0.042 |
| Hemophilia A | F8 | CR070421 | NM_000132.3 | 9,035 | c.-112G>A | 0.074 | 0.010 |
| Cartilage-hair hypoplasia | RMRP | CR064472 | NR_003051.3 | 268 | n.10T>C | 0.076 | 0.024 |
| Von Hippel–Lindau syndrome | VHL | CR011856 | NM_000551.3 | 4560 | c.*7C>G | 0.076 | 0.065 |
| Obesity | SLC6A14 | CR035766 | NM_007231.3 | 4,564 | c.*178C>G | 0.078 | 0.062 |
| Spastic paraplegia 31 | REEP1 | CR082030 | NM_022912.2 | 3,853 | c.*14C>T | 0.033 | 0.081 |
| Hyperferritinemia-cataract syndrome | FTL | CR061334 | NM_000146.3 | 871 | c.-178T>G | 0.052 | 0.097 |
The SNPs were described according to HGVS nomenclature. SNPs, which have been predicted to have significant global structural effect using SNPfold [Halvorsen et al., 2010], are highlighted with gray background color.