| Literature DB >> 35548744 |
Xiangzheng Fu1, Bo Liao1, Wen Zhu1, Lijun Cai1.
Abstract
MicroRNAs (miRNAs) are a family of short non-coding RNAs that play significant roles as post-transcriptional regulators. Consequently, various methods have been proposed to identify precursor miRNAs (pre-miRNAs), among which the comparative studies of miRNA structures are the most important. To measure and classify the structural similarity of miRNAs, we propose a new three-dimensional (3D) graphical representation of the secondary structure of miRNAs, in which an miRNA secondary structure is initially transformed into a characteristic sequence based on physicochemical properties and frequency of base. A numerical characterization of the 3D graph is used to represent the miRNA secondary structure. We then utilize a novel Euclidean distance method based on this expression to compute the distance of different miRNA sequences for the sequence similarity analysis. Finally, we use this sequence similarity analysis method to identify plant pre-miRNAs among three commonly used datasets. Results show that the method is reasonable and effective. This journal is © The Royal Society of Chemistry.Entities:
Year: 2018 PMID: 35548744 PMCID: PMC9085476 DOI: 10.1039/c8ra04138e
Source DB: PubMed Journal: RSC Adv ISSN: 2046-2069 Impact factor: 4.036
Fig. 1Overall framework of the proposed method.
Fig. 2The secondary structure of the RNA sequence of the TSV-3 and AIMV-3.
Information about the secondary structure of RNA sequences of 9 viruses
| Species | RNA secondary structure | Length |
|---|---|---|
| AIMV-3 | AUGCucaugcaAAACugcaugaAUGCcccUAAgggAUGC | 39 |
| APMV-3 | AAUGCccacaacGUGAAguuguggAUGCcccGUUAgggAAGC | 42 |
| AVII | AUGCcuaaUacucucucuCAGggagagaguuuagAUGCcuccAAAggagAUGC | 53 |
| CILRV | AUGCcuauauuuucucUCCUgagaaaauauagAUGCcuccAAAggagAUGC | 51 |
| CVV-3 | AUGCccaAAcucucucuCAUggagagagAAuggAUGCcuccGAAggagAUGC | 52 |
| EMV-3 | CcuaauUcucucucuCACggagagagauuagAUGCcucCAAGgagAUGC | 49 |
| LRMV-3 | UUCcuauucucucucUCAGgagagGagaauagAUGCcuccAAAggagUCGC | 51 |
| PDV-3 | AUGCccucaccGUAAggugaggAUGCcccuUAAagggAUGC | 41 |
| TSV-3 | GUGCcaguaguauaUAAuauacuacugAUGCcuccuUUAUaggagAUGC | 49 |
The cumulative coordinates of the first 20 bases in the RNA secondary structures of TSV-3 and AIMV-3. X, Y, and Z denote the cumulative coordinates of the X, Y, and Z coordinate axes of the base, respectively
| TSV-3 |
|
|
| AIMV-3 |
|
|
|
|---|---|---|---|---|---|---|---|
| G | 0.02 | 0.02 | 0.02 | A | −0.03 | −0.03 | 0.03 |
| U | 0 | 0 | 0.04 | U | −0.05 | −0.05 | 0.05 |
| G | 0.04 | 0.04 | 0.08 | G | −0.03 | −0.03 | 0.08 |
| C | 0.06 | 0.06 | 0.1 | C | 0 | 0 | 0.1 |
| c | 0.08 | 0.08 | 0.12 | u | 0.03 | −0.03 | 0.08 |
| a | 0.06 | 0.06 | 0.1 | c | 0.05 | 0 | 0.1 |
| g | 0.04 | 0.08 | 0.08 | a | 0.03 | −0.03 | 0.08 |
| u | 0.06 | 0.06 | 0.06 | u | 0.08 | −0.08 | 0.03 |
| a | 0.02 | 0.02 | 0.02 | g | 0.05 | −0.05 | 0 |
| g | −0.02 | 0.06 | −0.02 | c | 0.1 | 0 | 0.05 |
| u | 0.02 | 0.02 | −0.06 | a | 0.05 | −0.05 | 0 |
| a | −0.04 | −0.04 | −0.12 | A | 0 | −0.1 | 0.05 |
| u | 0.02 | −0.1 | −0.18 | A | −0.08 | −0.18 | 0.13 |
| a | −0.06 | −0.18 | −0.27 | A | −0.18 | −0.28 | 0.23 |
| U | −0.1 | −0.22 | −0.22 | C | −0.13 | −0.23 | 0.28 |
| A | −0.12 | −0.24 | −0.2 | u | −0.05 | −0.31 | 0.21 |
| A | −0.16 | −0.29 | −0.16 | g | −0.1 | −0.26 | 0.15 |
| u | −0.08 | −0.37 | −0.24 | c | −0.03 | −0.18 | 0.23 |
| a | −0.18 | −0.47 | −0.35 | a | −0.1 | −0.26 | 0.15 |
| u | −0.08 | −0.57 | −0.45 | u | 0 | −0.36 | 0.05 |
Fig. 3The 3D graphical representation of the RNA secondary structure of viruses TSV-3 and AIMV-3.
Fig. 4Example of the base coordinate calculation.
Fig. 5Illustration of the steps of our method for calculating the distance between sequences. (A) shows the calculation steps for Pattern 1; (B) shows the calculation steps for Pattern 2.
The distance matrix of the secondary structure of the 9 RNA virus sequences
| APMV-3 | AVII | CILRV | CVV-3 | EMV-3 | LRMV-3 | PDV-3 | TSV-3 | |
|---|---|---|---|---|---|---|---|---|
| AIMV-3 | 0.97 | 1.64 | 2.70 | 1.17 | 2.16 | 1.75 | 1.42 | 1.89 |
| APMV-3 | 0.00 | 2.14 | 3.33 | 1.09 | 2.58 | 2.18 | 0.76 | 2.69 |
| AVII | 0.00 | 1.57 | 1.64 | 0.69 | 0.58 | 2.31 | 1.40 | |
| CILRV | 0.00 | 2.92 | 1.47 | 1.89 | 3.62 | 1.14 | ||
| CVV-3 | 0.00 | 2.01 | 1.53 | 1.21 | 2.45 | |||
| EMV-3 | 0.00 | 0.70 | 2.67 | 1.79 | ||||
| LRMV-3 | 0.00 | 2.32 | 1.49 | |||||
| PDV-3 | 0.00 | 3.01 |
Comparison of prediction performance for different methods on the dataset 1 with a jackknife test
| Methods | ACC | SE | SP | MCC |
|---|---|---|---|---|
| iMcRNA | 85.88 | 87.83 | 83.31 | 71.86 |
| miPlantPre | 82.68 |
| 75.18 | 68.48 |
| microPred | 73.96 | 74.92 | 73.51 | 47.93 |
| TripletSVM | 75.72 | 63.34 | 84.54 | 53.24 |
| Our method |
| 86.3 |
|
|
The result based on the iMcRNA method.[24]
The result based on the miPlantPre method.[14]
The result based on the microPred method.[52]
The result based on the TripletSVM method.[53]
Comparison of prediction performance for different methods on the dataset 2 with a jackknife test
| Methods | Sensitivity | Specificity | MCC | ACC |
|---|---|---|---|---|
| miPlantPre |
|
|
|
|
| TripletSVM | 62.98 | 78.33 | 36.25 | 67.39 |
| Our method | 88.26 | 91.48 | 80.08 | 90.02 |
The result based on the miPlantPre method.[14]
The result based on the TripletSVM method.[53]
Comparison of prediction performance for different methods on the dataset 3 with a jackknife test
| Datasets | iMcRNA | microPred | miPlantPre | Our method |
|---|---|---|---|---|
| mtr_67 | 89.5 | 76.1 | 86.6 |
|
| osa_256 | 86.1 | 73.8 | 83.4 |
|
| ppt_184 | 76.9 | 68.8 | 84.5 |
|
| ath_153 | 86.2 | 67.6 | 85 |
|
| updated_aly_167 | 86.5 | 69.5 | 85 |
|
| ptc_133 | 78.6 | 72.2 | 82.7 |
|
| sbi_105 | 85.2 | 76.7 | 83.8 |
|
| updated_gma_105 | 88.1 | 82.4 | 83.8 |
|
| zma_74 | 85.8 | 74.3 | 85.1 |
|
| gma_69 | 88.4 | 71 | 85.5 |
|
The result based on the iMcRNA method.[24]
The result based on the microPred method.[52]
The result based on the miPlantPre method.[14]