| Literature DB >> 31164644 |
Qian Liu1, Li Fang1, Guoliang Yu2,3, Depeng Wang3, Chuan-Le Xiao4, Kai Wang5,6.
Abstract
DNA base modifications, such as C5-methylcytosine (5mC) and N6-methyldeoxyadenosine (6mA), are important types of epigenetic regulations. Short-read bisulfite sequencing and long-read PacBio sequencing have inherent limitations to detect DNA modifications. Here, using raw electric signals of Oxford Nanopore long-read sequencing data, we design DeepMod, a bidirectional recurrent neural network (RNN) with long short-term memory (LSTM) to detect DNA modifications. We sequence a human genome HX1 and a Chlamydomonas reinhardtii genome using Nanopore sequencing, and then evaluate DeepMod on three types of genomes (Escherichia coli, Chlamydomonas reinhardtii and human genomes). For 5mC detection, DeepMod achieves average precision up to 0.99 for both synthetically introduced and naturally occurring modifications. For 6mA detection, DeepMod achieves ~0.9 average precision on Escherichia coli data, and have improved performance than existing methods on Chlamydomonas reinhardtii data. In conclusion, DeepMod performs well for genome-scale detection of DNA modifications and will facilitate epigenetic analysis on diverse species.Entities:
Mesh:
Year: 2019 PMID: 31164644 PMCID: PMC6547721 DOI: 10.1038/s41467-019-10168-2
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1The flowchart of DeepMod. RNN: recurrent neutral network; LSTM: long short-term memory. Several long reads (in yellow) were shown for demonstration purposes with their alignment and Nanopore signals (yellow lines). To view LSTM RNN for modification prediction, nine LSTM cells adjacent to the center position (with arrows in black) were shown
Nanopore sequencing data sets used to evaluate DeepMod
| Genome | Data set name | Motif | # reads | Coverage | Metha | Reference |
|---|---|---|---|---|---|---|
|
| UMR | NAd | 111,238 | 110X | Negb | Simpson et al. [ |
| CG_MsssI | 69,899 | 67X | 5mC | |||
| CG_SssI | 8679 | 19X | 5mC | |||
| CG_MpeI | 23,593 | 39X | 5mC | |||
| GCGC_HhaI | G | 18,180 | 50X | 5mC | ||
| gaAttc_EcoRI | GA | 16,661 | 27X | 6mA | Stoiber et al. [ | |
| gAtc_dam | G | 17,557 | 33X | 6mA | ||
| tcgA_TaqI | TCG | 16,249 | 22X | 6mA | ||
| Con1 | NAd | 23,762 | 34X | Negb | ||
| Con2 | NAd | 34,170 | 40X | Negb | ||
|
| NA12878 | 30X | 5mC | Jain et al. [ | ||
| HX1 | 4,827,155 | 30X | 5mC | Current study | ||
|
| C. reinhardtii | NAd | 772,817 | 126X | 6mA | Current study |
a Methylation types
b Negative control without any modifications
c Underlined nucleotides in motifs were potential modified target.
d No modifications or no motif information
The number of modified and un-modified bases of interest used for evaluation on E. coli when coverage ≥ 1
| Data set name | Base of interest | Modification | # modified base in motif | # non-modified base in motif | # base not in motif |
|---|---|---|---|---|---|
| CG_MpeI | cytosine (C) | 5mC | 693,518 | 693,427 | 3,326,971 |
| CG_SssI | 682,526 | 693,427 | 3,297,528 | ||
| GCGC_HhaI | 70,172 | 70,160 | 4,573,777 | ||
| gaAttc_EcoRI | adenine (A) | 6mA | 277 | 280 | 1,003,885 |
| gAtc_dam | 7816 | 7831 | 989,363 | ||
| tcgA_TaqI | 6351 | 6403 | 989,375 |
Fig. 2Evaluation of the performance of DeepMod on 5mC prediction on E. coli, NA12878, and HX1. a AP (the outer) and AUC (the inner) plots for 5mC within sequence motifs in E. coli for three synthetically introduced 5mC data sets by M.Mpel (CG_MpeI for CG motif), M.Sssl (CG_SssI for CG motif), and M.Hhal (GCGC_HhaI for GCGC motif), respectively. b, c AUC and AP plots for 5mC prediction of all cytosines in E. coli. d, e AP and AUC of 5mC prediction by DeepMod on NA12878. f, g AP and AUC of 5mC prediction by DeepMod on HX1. Cov: coverage. # base: total number of bases used in the evaluation
Fig. 3Evaluation of the performance of DeepMod on 6mA prediction on E. coli and C. reinhardtii. a AP (the outer) and AUC (the inner) plots on E. coli for three synthetically introduced 6mA data sets by EcoRI (gaAttc_EcoRI for GAATTC motif), TaqI (tcgA_TaqI for TCGA motif), and dam (gAtc_dam for GATC motif), respectively. b 6mA prediction by DeepMod on C. reinhardtii (the outer plot is for genomic DNA digested in 0.5 h, whereas the inner plot is for genomic DNA digested in 12 h[34]). Cov: coverage. # base: total number of bases used in the evaluation