| Literature DB >> 30068294 |
Zhixun Zhao1, Hui Peng1, Chaowang Lan1, Yi Zheng1, Liang Fang2, Jinyan Li3.
Abstract
BACKGROUND: N6-methyladenosine (m6A) is an important epigenetic modification which plays various roles in mRNA metabolism and embryogenesis directly related to human diseases. To identify m6A in a large scale, machine learning methods have been developed to make predictions on m6A sites. However, there are two main drawbacks of these methods. The first is the inadequate learning of the imbalanced m6A samples which are much less than the non-m6A samples, by their balanced learning approaches. Second, the features used by these methods are not outstanding to represent m6A sequence characteristics.Entities:
Keywords: Imbalance Learning; N6-methyladenosine; Site Prediction
Mesh:
Substances:
Year: 2018 PMID: 30068294 PMCID: PMC6090857 DOI: 10.1186/s12864-018-4928-y
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Feature Space Construction
Fig. 2SNP Specificity Ranking. The black blocks stand for the Fisher’s exact test rankings and the green blocks stand for the MRMR rankings. X-axis is the window sequence sites from -25 to 25. Y-axis is the total ranking of each position. A low ranking means a high SNP specificity at this position
Ranking details of Top 12 specific SNP positions (FET: Fisher’s exact test)
| No. | Position | FET ranking | MRMR ranking | Average | Ranking |
|---|---|---|---|---|---|
| 1 | -2 | 1 | 1 | 1 | 1 |
| 2 | -1 | 2 | 5 | 3.5 | 2 |
| 3 | -24 | 6 | 7 | 6.5 | 3 |
| 4 | -21 | 10 | 4 | 7 | 4 |
| 5 | -19 | 7 | 12 | 9.5 | 5 |
| 6 | 2 | 3 | 23 | 13 | 6 |
| 7 | -25 | 4 | 24 | 14 | 7 |
| 8 | -11 | 19 | 9 | 14 | 7 |
| 9 | -4 | 8 | 21 | 14.5 | 8 |
| 10 | -15 | 21 | 11 | 16 | 9 |
| 11 | -9 | 15 | 17 | 16 | 9 |
| 12 | -23 | 9 | 25 | 17 | 10 |
Performance on the Independent Test Dataset (Methy: Methy-RNA; NPPS: RAM-NPPS)
| Methods | Precision | Recall | F1 | MCC |
|---|---|---|---|---|
| Methy | 0.065 | 0.5184 | 0.1163 | -0.1619 |
| NPPS | 0.1656 |
| 0.2626 | 0.1833 |
| SRAMP | 0.2638 | 0.4812 | 0.3408 | 0.2653 |
| HMpre |
| 0.5698 |
|
|
These data in boldface just means the largest values in each metrics
Fig. 3Performance on Datasets of Different Imbalance Levels. The F1 and MCC values of four predictors are represented. X-axis k is the ratio of the negative samples to positive samples (imbalance level) in a test dataset; Y-axis is metric value
Performance on Individual 1226 Transcripts (Methy: Methy-RNA; NPPS: RAM-NPPS)
| Methods | Precision | Recall | F1 | MCC |
|---|---|---|---|---|
| Methy | 0.0723 | 0.5075 | 0.1174 | -0.1614 |
| NPPS | 0.1770 |
| 0.2529 | 0.1907 |
| SRAMP | 0.2484 | 0.4759 | 0.2928 | 0.2387 |
| HMpre |
| 0.6062 |
|
|
These data in boldface just means the largest values in each metrics
Different Feature Space Performance in Cross Validation (CPD: Chemical Property with Density; Joint: joint of conventional features)
| Feature | Precision | Recall | F1 | MCC |
|---|---|---|---|---|
| K-mers | 0.1392 | 0.3426 | 0.2461 | 0.1572 |
| CPD | 0.2460 | 0.4816 | 0.3256 | 0.2532 |
| Binary | 0.25 | 0.4906 | 0.3312 | 0.2601 |
| Joint | 0.2519 | 0.5035 | 0.3358 | 0.2661 |
| Proposed |
|
|
|
|
These data in boldface just means the largest values in each metrics
Fig. 4Boxplot of Feature Importance Scores
Fig. 5Predicted m6A sites in the case studies. The x axis stands for the potential m6A sites confirming to the sequence motif DRACH and the y axis indicates the four predictors. All colored blocks are the predicted m6A sites. Red blocks represent true positive sites and yellow blocks are false positive ones. (a) the prediction results for the c-Jun case and (b) the predictions for the HIV-1 case
Results for the c-Jun gene case study (Methy: Methy-RNA; NPPS: RAM-NPPS)
| Case | Methods | Precision | Recall | F1 | MCC |
|---|---|---|---|---|---|
| c-JUN | Methy | 0.3428 |
| 0.5052 | -0.0542 |
| NPPS | 0.5384 | 0.56 | 0.549 | 0.3019 | |
| SRAMP | 0.75 | 0.48 | 0.5853 | 0.4522 | |
| HMpre |
| 0.72 |
|
| |
| HIV-1 | Methy | 0.1702 |
| 0.2711 | -0.1045 |
| NPPS | 0.1935 | 0.5 | 0.279 | 0 | |
| SRAMP | 0.6 | 0.25 | 0.3529 | 0.2727 | |
| HMpre |
| 0.4166 |
|
|
These data in boldface just means the largest values in each metrics