| Literature DB >> 24267009 |
Firoz Ahmed, Rakesh Kaundal, Gajendra P S Raghava.
Abstract
BACKGROUND: Dicer, an RNase III enzyme, plays a vital role in the processing of pre-miRNAs for generating the miRNAs. The structural and sequence features on pre-miRNA which can facilitate position and efficiency of cleavage are not well known. A precise cleavage by Dicer is crucial because an inaccurate processing can produce miRNA with different seed regions which can alter the repertoire of target genes.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24267009 PMCID: PMC3851333 DOI: 10.1186/1471-2105-14-S14-S9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Schematic diagram of pre-miRNA, hsa-mir-200c, predicted by quikfold software and patterns of Dicer cleavage site at 5p arm. (A) miR* derived from 5p arm and miR derived from 3p arm of hairpin, bases are represented in capital letter. CD-5p and CD-3p are cleavage sites of Dicer at 5' and 3' arm respectively. (B) Sequence of CP-5p cleavage pattern of 14 nucleotides having cleavage site CD-5p at center. Following each cleavage pattern, features of mononucleotide and binary used as input feature for SVM are given. (C) Structure of CP-5p cleavage pattern of 14 nucleotides having cleavage site CD-5p at center and its partially complementary strand. Base pairs are indicated with arrows. Zero (0) indicates that no base pairing occurs between complementary strands. The pattern of 14+14 is used to generate binary pattern. Mononucleotide having 4, sequence binary pattern having 56, and structure binary pattern having 112 dimensional vector. +1 is the class for cleavage pattern. Binary pattern is represented only for highlighted nucleotides.
Performance of 'RNAfold' derived sequence-based SVM models for Dicer cleavage sites at 5p and 3p arm.
| Features | Window size | CD-5p (sequence-based) | CD-3p (sequence-based) | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Sn | Sp | Ac | Mc | Sn | Sp | Ac | Mc | ||
| 8 | 58.20 | 61.80 | 60.00 | 0.20 | 58.38 | 59.28 | 58.83 | 0.18 | |
| 10 | 63.78 | 58.92 | 61.35 | 0.23 | 60.36 | 61.44 | 60.90 | 0.22 | |
| 12 | 61.08 | 63.96 | 62.52 | 0.25 | 64.68 | 59.46 | 62.07 | 0.24 | |
| 8 | 59.82 | 59.82 | 59.82 | 0.20 | 61.80 | 52.25 | 57.03 | 0.14 | |
| 10 | 60.54 | 58.20 | 59.37 | 0.19 | 58.74 | 52.97 | 55.86 | 0.12 | |
| 12 | 61.44 | 56.40 | 58.92 | 0.18 | 60.90 | 61.62 | 61.26 | 0.23 | |
| 8 | 60.90 | 57.48 | 59.19 | 0.18 | 58.92 | 56.04 | 57.48 | 0.15 | |
| 10 | 55.14 | 61.62 | 58.38 | 0.17 | 59.82 | 53.15 | 56.49 | 0.13 | |
| 12 | 58.92 | 55.14 | 57.03 | 0.14 | 63.24 | 57.12 | 60.18 | 0.20 | |
| 8 | 60.54 | 61.62 | 61.08 | 0.22 | 65.23 | 65.05 | 65.14 | 0.30 | |
| 10 | 61.26 | 63.78 | 62.52 | 0.25 | 67.75 | 63.06 | 65.41 | 0.31 | |
| 67.57 | 65.05 | 66.31 | 0.33 | ||||||
| 70.99 | 63.96 | 67.48 | 0.35 | ||||||
Models were developed using composition and binary pattern of different window sizes. Patterns were taken from sequence of Dicer cleavage site of 5p arm (sequence of CP-5p) and 3p arm (sequence of CP-3p) using miRNA.str data.
CD-5p: Dicer cleavage site at 5p arm, CD-3p: Dicer cleavage site at 3p arm, Mono: Mononucleotide, Dinuc: Dinucleotide, Trinuc: Trinucleotide, Binary: Binary pattern, Sn: sensitivity, Sp: specificity, Ac: accuracy, Mc: Matthews correlation coefficient.
Performance of 'RNAfold' derived structure-based SVM models for Dicer cleavage sites at 5p and 3p arm.
| Features | Window size | CD-5p (structure-based) | CD-3p (structure-based) | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Sn | Sp | Ac | Mc | Sn | Sp | Ac | Mc | ||
| 8 | 62.88 | 65.59 | 64.23 | 0.28 | 62.88 | 57.30 | 60.09 | 0.20 | |
| 10 | 67.21 | 66.85 | 67.03 | 0.34 | 63.24 | 63.06 | 63.15 | 0.26 | |
| 12 | 71.53 | 69.01 | 70.27 | 0.41 | 59.28 | 67.21 | 63.24 | 0.27 | |
| 8 | 61.98 | 63.24 | 62.61 | 0.25 | 57.84 | 59.46 | 58.65 | 0.17 | |
| 10 | 63.60 | 63.78 | 63.69 | 0.27 | 59.64 | 58.92 | 59.28 | 0.19 | |
| 12 | 69.19 | 62.34 | 65.77 | 0.32 | 63.96 | 62.52 | 63.24 | 0.26 | |
| 8 | 54.77 | 63.42 | 59.10 | 0.18 | 57.84 | 55.32 | 56.58 | 0.13 | |
| 10 | 63.24 | 59.82 | 61.53 | 0.23 | 57.84 | 60.00 | 58.92 | 0.18 | |
| 12 | 63.24 | 63.60 | 63.42 | 0.27 | 60.54 | 57.12 | 58.83 | 0.18 | |
| 8 | 65.23 | 65.41 | 65.32 | 0.31 | 65.23 | 70.45 | 67.84 | 0.36 | |
| 10 | 67.57 | 67.57 | 67.57 | 0.35 | 72.07 | 62.88 | 67.48 | 0.35 | |
| 12 | 72.79 | 69.91 | 71.35 | 0.43 | 71.89 | 68.11 | 70.00 | 0.40 | |
Models were developed using composition and binary pattern of different window sizes. Patterns were taken from structure of Dicer cleavage site of 5p arm (structure of CP-5p) and 3p arm (structure of CP-3p) using miRNA.str data.
Performance of 'quikfold' derived sequence- and structure-based SVM models for Dicer cleavage sites at 5p arm.
| Features | Window size | CD-5p (sequence-based) | CD-5p (structure-based) | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Sn | Sp | Ac | Mc | Sn | Sp | Ac | Mc | ||
| 8 | 56.76 | 56.94 | 56.85 | 0.14 | 69.37 | 70.45 | 69.91 | 0.40 | |
| 10 | 59.10 | 56.40 | 57.75 | 0.16 | 75.68 | 77.48 | 76.58 | 0.53 | |
| 12 | 54.05 | 57.84 | 55.95 | 0.12 | 77.84 | 83.06 | 80.45 | 0.61 | |
| 14 | |||||||||
| 8 | 58.92 | 59.46 | 59.19 | 0.18 | 65.95 | 67.03 | 66.49 | 0.33 | |
| 10 | 55.50 | 61.08 | 58.29 | 0.17 | 67.39 | 67.57 | 67.48 | 0.35 | |
| 12 | 59.82 | 56.76 | 58.29 | 0.17 | 69.73 | 74.41 | 72.07 | 0.44 | |
| 14 | |||||||||
| 8 | 61.98 | 52.43 | 57.21 | 0.14 | 60.36 | 59.10 | 59.73 | 0.19 | |
| 10 | 55.32 | 50.45 | 52.88 | 0.06 | 66.31 | 62.34 | 64.32 | 0.29 | |
| 12 | 57.48 | 56.58 | 57.03 | 0.14 | 65.77 | 65.41 | 65.59 | 0.31 | |
| 14 | |||||||||
| 8 | 62.70 | 58.92 | 60.81 | 0.22 | 69.55 | 78.02 | 73.78 | 0.48 | |
| 10 | 61.26 | 59.46 | 60.36 | 0.21 | 76.40 | 80.36 | 78.38 | 0.57 | |
| 12 | 65.41 | 63.42 | 64.41 | 0.29 | 82.16 | 81.26 | 81.71 | 0.63 | |
| 14 | |||||||||
Models were developed using composition and binary pattern of different window sizes. Patterns were taken from sequence of Dicer cleavage site of 5p arm (sequence of CP-5p) and structure of Dicer cleavage site of 5p arm (structure of CP-5p).
Figure 2Performance of various SVM models for Dicer cleavage site at 5p arm (CD-5p) shown by ROC plots. Bin [str, miRNA.str]: binary feature used for structure of CP-5p taken from miRNA.srt. Mono [str, quikfold]: mononucleotide composition used for structure of CP-5p taken from quikfold. Bin [seq, quikfold]: binary feature used for sequence of CP-5p taken from quikfold. Bin [str, quikfold]: binary feature used for structure of CP-5p taken from quikfold. The value indicates the AUC for the corresponding model.
Performance of different WEKA methods for Dicer cleavage prediction and their comparison with SVM model.
| Methods | Sn | Sp | Ac | Mc | AUC |
|---|---|---|---|---|---|
| Random Forest | 80.54 | 90.81 | 85.68 | 0.717 | 0.921 |
| Naïve Bayes | 81.26 | 86.30 | 83.78 | 0.676 | 0.909 |
| Simple CART | 81.08 | 85.40 | 83.24 | 0.665 | 0.879 |
| REP Tree | 80.54 | 81.08 | 80.81 | 0.616 | 0.872 |
| Random Tree | 69.54 | 74.05 | 71.80 | 0.436 | 0.752 |
All models were used for the prediction of cleavage site at 5p arm by using 'extended binary pattern' feature of 14 nt window taken from 'quikfold' derived structure (structure of CP-5p). Models were developed on training dataset using non-redundant 5-fold cross validation techniques.
Sn: sensitivity, Sp: specificity, Ac: accuracy, Mc: Matthews correlation coefficient. AUC: area under curve.
Analysis of functional consequences of SNPs on Dicer processing sites using PHDcleav.
| miRBase 19 | Reference | Genetic variant | ||||||
|---|---|---|---|---|---|---|---|---|
| Cleavage | PHDcleav | Variation | dbSNP ID | SNP location on hairpin | Cleavage | PHDcleav Score | Effect on Dicer cleavage site | |
| hsa-mir-196a-2 | 47 | 2.326 | g.78C>T | rs11614913 | Mature | 47 | 2.326 | remain same |
| 48 | 1.614 | 48 | 1.614 | |||||
| hsa-mir-335 | 39 | 2.855 | g.39T>C | MIR335_00001* | Mature | 37 | 1.653 | loss of site |
| 36 | 1.337 | |||||||
| 37 | 1.338 | 35 | 1.027 | |||||
| hsa-mir-570 | g.34T>C | rs9860655 | Stem | remain same | ||||
| 45 | 1.126 | 45 | 1.126 | |||||
| 39 | 0.908 | 40 | 0.994 | |||||
| hsa-mir-650 | 35 | 1.366 | g.71C>G | rs59996397 | Stem | 35 | 1.366 | remain same |
| 38 | 1.271 | 38 | 1.271 | |||||
| hsa-mir-941-3 | 57 | 2.095 | g.69C>G | rs12625445 | Mature | 57 | 2.398 | altered |
| 56 | 0.750 | 55 | 1.023 | |||||
Top three scores of PHDcleav with their corresponding cleavage sites are given in the table, higher score indicates most probable Dicer cleave site. miRBase annotated canonical cleavage site is shown in bold, while altered cleavage site is shown in bold and italics.
*miRvar DB-ID [43].