| Literature DB >> 21575268 |
Jiamin Xiao1, Xiaojing Tang, Yizhou Li, Zheng Fang, Daichuan Ma, Yangzhige He, Menglong Li.
Abstract
BACKGROUND: MicroRNAs (miRNAs) play a key role in regulating various biological processes such as participating in the post-transcriptional pathway and affecting the stability and/or the translation of mRNA. Current methods have extracted feature information at different levels, among which the characteristic stem-loop structure makes the greatest contribution to the prediction of putative miRNA precursor (pre-miRNA). We find that none of these features alone is capable of identifying new pre-miRNA accurately.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21575268 PMCID: PMC3118167 DOI: 10.1186/1471-2105-12-165
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Three representations of RNA secondary structure for human precursor miRNA hsa-mir-33a.
Figure 2ROC curves estimate the random resampling models. The ROC curves are overlaid by the vertical average curve and box plots showing the vertical spread around the average.
Comparison with existing methods
| Methods | Complete dataset | Training dataset | Testing dataset | Results for testing dataset | Results for Independent dataset | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Pos | Neg | Pos | Neg | Pos | Neg | SE | SP | Plant (Acc) | Virus (Acc) | Total (Acc) | |
| Triplet-SVM | 193 | 1168 | 163 | 168 | 30 | 1000 | 0.933 | 0.881 | 0.882 | 0.843 | 0.877a |
| microPred | 691 | 9248 | SMOTE Outer-5-fold-CV | 0.900 | 0.973 | 0.841 | 0.939 | 0.853b | |||
| Our method | 3928 | 8897 | 3000 | 3000 | 928 | 5897 | 0.873 | 0.911 | 0.976 | 0.913 | 0.970 |
Triplet-SVM is a SVM-based method with triplet elements that represent information of pre-miRNA stem-loop structure. There is an extension called MiPred.
microPred combined the new RNAfold-related, Mfold-related, and pair-related features with 29 'global and intrinsic' features introduced in the miPred approach.
a 178 virus and 1232 plant sequences were used, as samples with multiple loops were filtered out by Triplet-SVM.
b 196 virus and 1389 (the length less than 300) plant sequences were submitted to microPred web server.
Figure 3The bar charts of individual parameter contribution. The contribution of individual parameter is determined by calculating the importance score, with larger scores indicating more relevant properties. The comparison between two strategies is represented by different greyscales, the bar height is the score of individual feature, and the confidence interval is calculated for each parameter. E: Edge; V: Vertex; N: Number; A: Average; Var: Variance; M: Mean.
Figure 4Results for deleting feature one by one. Models are constructed on remainder variables after deleting the feature of the lowest score each time. This process is repeated 23 times, till only one feature is left. Sensitivity and specificity are used to measure model performance.
Definition of network parameter
| Parameter | Description |
|---|---|
| Hub score | Kleinberg's hub. |
| Path length | The length of a path. |
| Shortest path | The shortest path between two vertices. |
| Constraint | Calculates Burt's constraint for each vertex. |
| Degree | The number of edges connected to a vertex. |
| Grith | The length of the shortest circle in the graph. |
| Modularity | Modularity of a community structure of a graph. |
| Graph motifs | The small subgraphs with a well-defined structure. |
| Articulation point | A vertex that, if removed, will disconnect the graph. |
| Node betweenness | The number of shortest paths that pass through a vertex. |
| Edge betweenness | The number of shortest paths that pass through an edge. |
| Diameter | The diameter of a graph is the length of the longest geodesic. |
| Cocitation coupling | Two vertices are cocited if there is another vertex citing both of them. |
| Transitivity | Measures the probability that the adjacent vertices of a vertex are connected. |
| Bibliographic coupling | The bibliographic coupling of two vertices is the number of other vertices they both cite. |
| Closeness centrality | Measures how many steps are required to access every other vertex from a given vertex. |
| Coreness | The coreness of a vertex is k if it belongs to the k-core but not to the (k+1)-core, a subgraph where every node has k connections. |
| Graph density | The density of a graph is the ratio of the number of edges and the number of possible edges. |