| Literature DB >> 30327665 |
Wei Chen1,2, Pengmian Feng3, Hui Ding4, Hao Lin4.
Abstract
Alternative splicing (AS) not only ensures the diversity of gene expression products, but also closely correlated with genetic diseases. Therefore, knowledge about regulatory mechanisms of AS will provide useful clues for understanding its biological functions. In the current study, a random forest based method was developed to classify included and excluded exons in exon skipping event. In this method, the samples in the dataset were encoded by using optimal histone modification features which were optimized by using the Maximum Relevance Maximum Distance (MRMD) feature selection technique. The proposed method obtained an accuracy of 72.91% in 10-fold cross validation test and outperformed existing methods. Meanwhile, we also systematically analyzed the distribution of histone modifications between included and excluded exons and discovered their preference in both kinds of exons, which might provide insights into researches on the regulatory mechanisms of alternative splicing.Entities:
Keywords: alternative splicing; exon skipping; histone acetylation; histone methylation; random forest
Year: 2018 PMID: 30327665 PMCID: PMC6174203 DOI: 10.3389/fgene.2018.00433
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1The IFS curve for classifying “included” and “excluded” exons in the exon skipping event. An IFS peak of 79.79% was obtained when using the optimal 96 features to perform predictions.
Performance metrics of different classifiers for classifying included and excluded exons.
| BayseNet | 66.84 | 55.02 | 61.33 | 0.22 |
| Naïve Bayes | 68.00 | 53.58 | 61.25 | 0.22 |
| J48 Tree | 61.06 | 53.20 | 57.38 | 0.14 |
| SVM | 67.82 | 59.72 | 64.05 | 0.27 |
| Random Forest | 67.03 | 79.65 | 72.91 | 0.46 |
A comparison of the current method with existing method for classifying included and excluded exons.
| Chen et al's method | 68.90 | 66.70 | 68.50 | – |
| Current method | 67.03 | 79.65 | 72.91 | 0.46 |
(Chen et al., .
The 96 optimal features and their bias to exon inclusion or exclusion case.
| H3R2me1.succ | I | H3K36me1.succ | E | H4K5ac | E |
| H3R2me1.prec | I | H3K18ac.prec | I | H4K20me1.prec | – |
| H4K8ac.succ | I | H4K91ac.prec | I | H4K20me1.succ | E |
| H4K12ac.prec | E | H3K23ac.succ | I | H2AK5ac | E |
| H4K8ac.prec | E | H3K36me1.prec | E | H3K23ac | I |
| H4K12ac.succ | – | H3K23ac.prec | E | H3K79me1.succ | – |
| H3K36me3.succ | E | H4R3me2.succ | I | H3K36me1 | – |
| H3K9ac.succ | E | H2BK120ac.prec | I | H3K79me1.prec | E |
| H3K14ac.prec | E | H4R3me2.prec | I | H2BK20ac | E |
| H3K27me3.succ | E | H3K9me1.prec | E | H2BK12ac | E |
| H3K27me3.prec | I | H2BK120ac.succ | E | H4K16ac | – |
| H3K9ac.prec | – | H3K9me1.succ | I | H3K4ac | E |
| H3K14ac.succ | – | H3R2me2.prec | I | H2BK5me1 | E |
| H2AK5ac.prec | E | H2AK9ac.succ | I | H3K18ac | I |
| H2AK5ac.succ | E | H3R2me2.succ | E | H3K9me2 | I |
| H4K5ac.succ | E | H2AK9ac.prec | E | H4R3me2 | I |
| H4K5ac.prec | I | H3K27ac.prec | E | H3K4me1.prec | E |
| H2BK20ac.succ | – | H3K27ac.succ | E | H3K4me1.succ | E |
| H2BK20ac.prec | – | H3K36me3 | E | H2AK9ac | E |
| H4K16ac.prec | E | H3K9me2.succ | I | H3K4me2.prec | E |
| H4K16ac.succ | E | H3R2me1 | I | H3K4me2.succ | I |
| H3K36me3.prec | E | H4K8ac | I | H4K91ac | – |
| H3K4ac.succ | I | H2BK5ac.prec | I | H3K9me1 | – |
| H3K4ac.prec | E | H3K14ac | – | H3R2me2 | – |
| H2BK12ac.prec | E | H3K9me2.prec | – | H2BK120ac | E |
| H2BK12ac.succ | I | H4K12ac | I | H3K79me3.succ | E |
| H2BK5me1.succ | E | H2BK5ac.succ | E | H3K9me3.succ | E |
| H2BK5me1.prec | E | H3K27me3 | E | H3K9me3.prec | E |
| H3K27me2.succ | I | H3K27me1.succ | I | H3K79me3.prec | E |
| H3K27me2.prec | E | H3K9ac | E | H3K36ac.succ | I |
| H4K91ac.succ | E | H3K27me2 | E | H3K27me1 | E |
| H3K18ac.succ | E | H3K27me1.prec | E | H3K27ac | E |
The bias of the 96 optimal features to exon inclusion or exclusion case were analyzed using hypothesis test of sample frequency. “I” indicates that he features that significantly (p < 0.01) bias to exon inclusion case, while “E” indicates bias significantly (p < 0.01) bias to exon exclusion case.
Figure 2Correlation matrix of histone modifications for the exon inclusion case of exon skipping event.
Figure 3Correlation matrix of histone modifications for the exon exclusion case of exon skipping event.