| Literature DB >> 33995495 |
Sadaruddin Chachar1,2, Jingrong Liu3, Pingxian Zhang4, Adeel Riaz5, Changfei Guan1, Shuyuan Liu1.
Abstract
Epigenetic modifications alter the gene activity and function by causing change in the chromosomal architecture through DNA methylation/demethylation, or histone modifications without causing any change in DNA sequence. In plants, DNA cytosine methylation (5mC) is vital for various pathways such as, gene regulation, transposon suppression, DNA repair, replication, transcription, and recombination. Thanks to recent advances in high throughput sequencing (HTS) technologies for epigenomic "Big Data" generation, accumulated studies have revealed the occurrence of another novel DNA methylation mark, N6-methyladenosine (6mA), which is highly present on gene bodies mainly activates gene expression in model plants such as eudicot Arabidopsis (Arabidopsis thaliana) and monocot rice (Oryza sativa). However, in non-model crops, the occurrence and importance of 6mA remains largely less known, with only limited reports in few species, such as Rosaceae (wild strawberry), and soybean (Glycine max). Given the aforementioned vital roles of 6mA in plants, hereinafter, we summarize the latest advances of DNA 6mA modification, and investigate the historical, known and vital functions of 6mA in plants. We also consider advanced artificial-intelligence biotechnologies that improve extraction and prediction of 6mA concepts. In this Review, we discuss the potential challenges that may hinder exploitation of 6mA, and give future goals of 6mA from model plants to non-model crops.Entities:
Keywords: DNA methylation; N6-methyladenosine; artificial intelligence; deep learning; epigenetic modification; gene expression
Year: 2021 PMID: 33995495 PMCID: PMC8118384 DOI: 10.3389/fgene.2021.668317
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1The dynamic DNA 6mA modification in plants. (A) Methods for detecting 6mA. (B) The enzymatic systems of 6mA. (C) Potential association with other epigenetic modifications and future perspective of 6mA. In plants, 6mA can be detected by various methods, such as Dot blot, HPLC–MS, 6mA-IP-seq, 6mA-CLIP-Exo-seq, 6mA-RE-seq, SMRT-seq, and Nanopore-seq. The first two can only detect global 6mA with both qualitative and quantitative analyses but not detect specific 6mA sites; 6mA-IP-seq, 6mA-CLIP-Exo-seq, and 6mA-RE-seq can decipher genome-wide 6mA sites or peaks at a large scale. While, SMRT-seq and Nanopore-seq can identify single-base resolution 6mA and illustrate computational motifs and distribution patterns for the accuracy and robustness of 6mA detection. Currently, we are only known that N6AMT1 is a 6mA writer in mammals, ALKBH1 acts as both erasers in mammals and plants; it remains unknown the readers for recognizing 6mA site on the genome.
FIGURE 2The ALKBH1 orthologs in plants. (A) NJ tree and (B) sequence alignment of ALKBH1 orthologs in plants. (C) Protein structure of ALKBH1 orthologs in rice, Arabidopsis, soybean, and strawberry. Sequences of selected ALKBH1 putative orthologs were downloaded from the NCBI database, and then aligned. Phylogenetic tree was established by MEGA7 using neighbor-joining (NJ) tree with 1,000 replicate bootstrap support. Homology-modeling-based structural prediction of the ALKBH1 orthologs followed by the rice ALKBH1 structure (5XEG; Zhou et al., 2018).
FIGURE 3Distribution pattern (A) and representative motifs (B) of 6mA in in rice, Arabidopsis, soybean, and strawberry. Gene bodies include exons, introns, and 5′ and 3′ UTRs. Data were reproduced by previous studies (Liang et al., 2018b; Zhang et al., 2018; Xie et al., 2020; Yuan et al., 2020).
Summarizing recent machine learning based 6mA prediction approaches.
| Method name | Description | Specie | References |
| i6mA-Pred | The support vector machine approach (SVM) to identify 6mA sites in rice genome with 83% accuracy, in which the DNA sequences are effectively formulated and encoded through the use of chemical property and nucleotide frequency dependent on the SVM approach | Rice genome | |
| SNNRice6mA | A simple and lightweight deep learning model approach for identifying 6mA from rice genome, its evaluation is based on five metrics such as sensitivity, accuracy, specificity, area under the curve (AUC) and Matthews correlation coefficient (MCC) | Rice genome | |
| i6mA-DCNP | A high-quality computational method to identify and predict 6mA sites in the rice genome. This prediction approach is based on encoding the genomic DNA samples using dinucleotides composition and the optimized dinucleotide-based DNA properties | Rice genome | |
| Sequence-based DNA N6-methyladenine predictor (SDM6A) | A sequence-based two-layer method for effectively predicting novel putative 6mA sites and non-6mA sites in the rice genome | Rice genome | |
| iDNA-MS | Utilization of random forest for identifying 6mA, 5hmC, and 4mC sites in multiple species | Multiple species, | |
| Meta-i6mA | An interspecies prediction tool to identify DNA 6mA sites of plant genome through the use of informative features in an integrative machine learning framework | Rice genome | |
| csDMA | A method for identifying and predicting 6mA in various species through Chou’s 5-step rule using three encoding features and different algorithms to produce the feature matrix | Multiple species | |
| iDNA6mA- Rice | The machine learning random forest algorithm to formulate the sample as an input to differentiate between the methylated and non-methylated sites in rice genome for evaluating 6mA sites | Rice genome | |
| iDNA6mA-PseKNC | A sequence-based prediction approach that allows 100% accuracy and 96% precision to identify DNA 6mA sites without using complicated mathematical formulas | Multiple species | |
| iDNA 6mA | A deep learning method, based on the conventional neural network for identifying 6mA sites in the rice genome, which needs a single DNA sequence input | Rice genome | |
| MM-6mA-Pred | This tool identifies 6mA and non-6mA sites by substantial variations in transition probability among adjacent nucleotides based on Morkov’s model having better prediction compared to i6mA-Pred | Rice genome | |
| DEEP6mA | Superior performance platform to identify 6mA sites in plants with an overall prediction precision of 94% using a convolutional neural network (CNN) to retrieve high-level sequence features and a bi-directional long-term memory network (BLSTM) to acquire dependence structure along the sequence | Multiple species | |
| FastFeatGen | This predictor uses a machine learning approach with motif features to predict the 6mA sites in the genome. Due to the multi-threading and shared memory mechanism, speed is the advantage of this tool | Multiple species | |
| eRice | This prediction tool uses a machine learning approach to predict the 6mA sites in the rice genome | Rice genome |