| Literature DB >> 32093759 |
Hao-Tian Wang1,2,3,4, Fu-Hui Xiao1,2,3, Gong-Hua Li1,2,3, Qing-Peng Kong5,6,7,8.
Abstract
BACKGROUND: An increasing number of nucleic acid modifications have been profiled with the development of sequencing technologies. DNA N6-methyladenine (6mA), which is a prevalent epigenetic modification, plays important roles in a series of biological processes. So far, identification of DNA 6mA relies primarily on time-consuming and expensive experimental approaches. However, in silico methods can be implemented to conduct preliminary screening to save experimental resources and time, especially given the rapid accumulation of sequencing data.Entities:
Keywords: DNA N6-methyladenine; Machine learning; XGBoost
Mesh:
Substances:
Year: 2020 PMID: 32093759 PMCID: PMC7038560 DOI: 10.1186/s13072-020-00330-2
Source DB: PubMed Journal: Epigenetics Chromatin ISSN: 1756-8935 Impact factor: 4.954
The statistics of benchmark dataset in this study
| Dataset | # Positive samples | # Negative samples | Reference genome |
|---|---|---|---|
| 880 | 880 | MH63 | |
| 728 | 728 | dm3 | |
| 632 | 632 | ce10 | |
| 800 | 800 | hg38 | |
| Aggregated | 3040 | 3040 | – |
Fig. 1Nucleotide composition of benchmark dataset. a Two Sample Logos result of benchmark dataset, top panel denotes the nucleotide enrichment status of 6mA-containing sequences and bottom panel is of non-6mA-containing sequences. b Entropy analysis of 6mA- and non-6mA-containing sequence. Red line denote 6mA-containing sequences and blue one denotes non-6mA sequences
Fig. 2Feature selection and parameters tuning. a IFS curve of feature selection. b Grid-search results of parameter tuning
Fig. 3Comparison between p6mA and other existing predictors by benchmark dataset. a The performance of the 4 predictors. b ROC curves of p6mA and MM-6mAPred
Fig. 4Results of independent validation by A. thaliana dataset. a The performance of the 4 predictors. b ROC curves of p6mA and MM-6mAPred