| Literature DB >> 29231888 |
Xiuquan Du1,2,3, Changlin Hu4, Yu Yao5, Shiwei Sun6, Yanping Zhang7,8,9.
Abstract
In bioinformatics, exon skipping (ES) event prediction is an essential part of alternative splicing (AS) event analysis. Although many methods have been developed to predict ES events, a solution has yet to be found. In this study, given the limitations of machine learning algorithms with RNA-Seq data or genome sequences, a new feature, called RS (RNA-seq and sequence) features, was constructed. These features include RNA-Seq features derived from the RNA-Seq data and sequence features derived from genome sequences. We propose a novel Rotation Forest classifier to predict ES events with the RS features (RotaF-RSES). To validate the efficacy of RotaF-RSES, a dataset from two human tissues was used, and RotaF-RSES achieved an accuracy of 98.4%, a specificity of 99.2%, a sensitivity of 94.1%, and an area under the curve (AUC) of 98.6%. When compared to the other available methods, the results indicate that RotaF-RSES is efficient and can predict ES events with RS features.Entities:
Keywords: RNA-Seq data; exon skipping event; sequence information
Mesh:
Year: 2017 PMID: 29231888 PMCID: PMC5751293 DOI: 10.3390/ijms18122691
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1The framework of Rotation Forest classifier to predict ES events with RS features (RotaF-RSES), showing both the training and testing stages. RotaF-RSES involves two steps. Step 1: Obtaining known exons, their upstream and downstream introns, and then extract RNA-Seq features and sequence features according to their RNA-Seq data and sequence information. The above two features, called RS features, were used to build a classification model based on a Rotating Forest algorithm (RotaF-RSES). Step 2: After obtaining the RS features of an unknown type of exon, the RotaF-RSES model was used to determine the type of exon.
Performance comparison of different features with Random Forest.
| Features | Accuracy | Specificity | Sensitivity | AUC |
|---|---|---|---|---|
| Initial | 96.4% | 98.0% | 88.5% | 99.1% |
| Equilibrium | 96.2% | 98.0% | 86.8% | 99.3% |
| RNA-Seq | 96.1% | 97.3% | 90.2% | 99.3% |
| Sequence | 82.7% | 95.7% | 17.6% | 62.8% |
| RS | 96.7% | 97.6% | 92.2% | 99.2% |
Figure 2The receiver operating curves (ROC) of different features with Random Forest, showing the initial features with Random Forest (area under a curve (AUC): 99.1%), equilibrium features with Random Forest (AUC: 99.3%), RNA-Seq features with Random Forest (AUC: 99.3%), sequence features with Random Forest (AUC: 62.8%), and RS features with Random Forest (AUC: 99.2%).
Performance comparison of different algorithms on RS features.
| Algorithm | Accuracy | Specificity | Sensitivity | AUC |
|---|---|---|---|---|
| Random Forest | 96.7% | 97.6% | 92.2% | 99.2% |
| Random Tree | 93.1% | 96.5% | 76.5% | 86.5% |
| Naïve Bayes | 51.9% | 44.7% | 88.2% | 85.7% |
| Bayes Net | 94.1% | 94.5% | 92.2% | 97.7% |
| Naïve Bayes Simple | 84.2% | 82.8% | 88.0% | 89.1% |
| Multilayer Perceptron | 93.1% | 97.7% | 70.6% | 96.0% |
| RBF network | 86.9% | 99.6% | 23.5% | 88.4% |
| J48 | 93.1% | 96.5% | 76.5% | 91.7% |
| SVM | 83.7% | 100% | 2% | 51.0% |
| Our Method | 98.4% | 99.2% | 94.1% | 98.6% |
Figure 3The ROCs of different algorithms on RS features.
Comparison results of different features with Rotation Forest.
| Features | Accuracy | Specificity | Sensitivity | AUC |
|---|---|---|---|---|
| Initial | 95.8% (96.4%)RF | 98.0% (98.0%)RF | 84.3% (88.5%)RF | 98.6% (99.1%)RF |
| Equilibrium | 96.7% (96.2%)RF | 98.0% (98.0%)RF | 90.2% (86.8%)RF | 98.6% (99.3%)RF |
| RNA-Seq | 97.4% (96.1%)RF | 98.0% (97.3%)RF | 92.1% (90.2%)RF | 98.3% (99.3%)RF |
| Sequence | 83.0% (82.7%)RF | 98.4% (95.7%)RF | 7.9% (17.6%)RF | 62.3% (62.8%)RF |
| RS | 98.4% (96.7%)RF | 99.2% (97.6%)RF | 94.1% (92.2%)RF | 98.6% (99.2%)RF |
( )RF is the Random Forest value.
Figure 4The ROCs of different features with Rotation Forest.
Performance comparison between ESFinder and our method.
| Method | Accuracy | Specificity | Sensitivity | AUC |
|---|---|---|---|---|
| ESFinder | 96.2% | 98.0% | 86.8% | 99.3% |
| Our method | 98.4% | 99.2% | 94.1% | 98.6% |
Figure 5The ROCs of different methods.
Figure 6The prediction results of the RotaF-RSES, MATS, MISO, and SI methods on test data.
The predictions of RotaF-RSES, ESFinder, MATS, MISO, and SI for independent test data.
| Different Methods | Our Method | ESFinder | MATS | MISO | SI |
|---|---|---|---|---|---|
| Correct Predictions | 1910 | 1977 | 91 | 140 | 179 |
The description of the six basic features.
| Feature | The Description of These Features |
|---|---|
|
| Read counts on exons |
|
| Reads counts on the upstream intron |
|
| Reads counts on the downstream intron |
|
| Reads counts supporting the inclusive exon |
|
| Reads counts supporting the exclusive exon |
|
| Reads counts on the corresponding gene |
The description of all initial features.
| Skeletal Muscle (S) | Brain (B) |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The description of basic normalized features.
| Feature | The Definition of These Features |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The description of the normalized features.
| Skeletal Muscle (S) | Brain (B) |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The equilibrium features.
| Skeletal Muscle (S) | Brain (B) | Divergence |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| SP | BP |
|