| Literature DB >> 35071593 |
Lei Chen1,2, ZhanDong Li3, ShiQi Zhang4, Yu-Hang Zhang5, Tao Huang6,7, Yu-Dong Cai1.
Abstract
Methylation is one of the most common and considerable modifications in biological systems mediated by multiple enzymes. Recent studies have shown that methylation has been widely identified in different RNA molecules. RNA methylation modifications have various kinds, such as 5-methylcytosine (m5C). However, for individual methylation sites, their functions still remain to be elucidated. Testing of all methylation sites relies heavily on high-throughput sequencing technology, which is expensive and labor consuming. Thus, computational prediction approaches could serve as a substitute. In this study, multiple machine learning models were used to predict possible RNA m5C sites on the basis of mRNA sequences in human and mouse. Each site was represented by several features derived from k-mers of an RNA subsequence containing such site as center. The powerful max-relevance and min-redundancy (mRMR) feature selection method was employed to analyse these features. The outcome feature list was fed into incremental feature selection method, incorporating four classification algorithms, to build efficient models. Furthermore, the sites related to features used in the models were also investigated.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35071593 PMCID: PMC8776474 DOI: 10.1155/2022/4035462
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1Flow chart to construct models for the prediction of m5C sites. A subsequence with 41 bp is used to represent each m5C site. Features of k-mers obtained by RNA2Vec are adopted to constitute features of the subsequence. All features are analysed by max-relevance and min-redundancy method. The outcome feature list is fed into incremental feature selection, incorporating four classification algorithms and 10-fold cross-validation, to construct optimum models.
Figure 2IFS curves with different classifiers on different numbers of sequence features on mouse m5C data.
Performance of models based on different classification algorithms for predicting mouse m5C sites.
| Classification algorithm | Number of features | SN | SP | ACC | MCC | Precision |
|
|---|---|---|---|---|---|---|---|
| Decision tree | 195 | 1.000 | 0.990 | 0.995 | 0.990 | 0.990 | 0.995 |
|
| 3 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| Random forest | 10 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| Support vector machine | 3 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
Figure 3IFS curves with different classifiers on different numbers of sequence features on human m5C data.
Performance of models based on different classification algorithms for predicting human m5C sites.
| Classification algorithm | Number of features | SN | SP | ACC | MCC | Precision |
|
|---|---|---|---|---|---|---|---|
| Decision tree | 15 | 0.767 | 0.808 | 0.788 | 0.576 | 0.800 | 0.783 |
|
| 84 | 0.683 | 0.925 | 0.804 | 0.627 | 0.901 | 0.777 |
| Random forest | 543 | 0.875 | 0.867 | 0.871 | 0.742 | 0.868 | 0.871 |
| Support vector machine | 114 | 0.825 | 0.958 | 0.892 | 0.790 | 0.952 | 0.884 |
Figure 4Frequency visualization for sequence features related to mouse m5C.
Figure 5Frequency visualization for sequence features related to human m5C.
Comparison with previous models on mouse m5C data.
| Classification algorithm | Model | SN | SP | ACC | MCC |
|---|---|---|---|---|---|
| Decision tree | Our model | 1.000 | 0.990 | 0.995 | 0.990 |
| Model in [ | 1.000 | 0.835 | 0.918 | 0.847 | |
|
| |||||
| Random forest | Our model | 1.000 | 1.000 | 1.000 | 1.000 |
| Model in [ | 1.000 | 1.000 | 1.000 | 1.000 | |
|
| |||||
| Support vector machine | Our model | 1.000 | 1.000 | 1.000 | 1.000 |
| Model in [ | 1.000 | 1.000 | 1.000 | 1.000 | |
Comparison with previous models on human m5C data.
| Classification algorithm | Model | SN | SP | ACC | MCC |
|---|---|---|---|---|---|
| Decision tree | Our model | 0.767 | 0.808 | 0.788 | 0.576 |
| Model in [ | 0.783 | 0.783 | 0.783 | 0.567 | |
|
| |||||
| Random forest | Our model | 0.875 | 0.867 | 0.871 | 0.742 |
| Model in [ | 0.900 | 0.917 | 0.908 | 0.817 | |
|
| |||||
| Support vector machine | Our model | 0.825 | 0.958 | 0.892 | 0.790 |
| Model in [ | 0.842 | 0.967 | 0.904 | 0.815 | |