Literature DB >> 30601803

Global analysis of N6-methyladenosine functions and its disease association using deep learning and network-based methods.

Song-Yao Zhang1,2, Shao-Wu Zhang1, Xiao-Nan Fan1, Jia Meng3, Yidong Chen4, Shou-Jiang Gao5, Yufei Huang2,4.   

Abstract

N6-methyladenosine (m6A) is the most abundant methylation, existing in >25% of human mRNAs. Exciting recent discoveries indicate the close involvement of m6A in regulating many different aspects of mRNA metabolism and diseases like cancer. However, our current knowledge about how m6A levels are controlled and whether and how regulation of m6A levels of a specific gene can play a role in cancer and other diseases is mostly elusive. We propose in this paper a computational scheme for predicting m6A-regulated genes and m6A-associated disease, which includes Deep-m6A, the first model for detecting condition-specific m6A sites from MeRIP-Seq data with a single base resolution using deep learning and Hot-m6A, a new network-based pipeline that prioritizes functional significant m6A genes and its associated diseases using the Protein-Protein Interaction (PPI) and gene-disease heterogeneous networks. We applied Deep-m6A and this pipeline to 75 MeRIP-seq human samples, which produced a compact set of 709 functionally significant m6A-regulated genes and nine functionally enriched subnetworks. The functional enrichment analysis of these genes and networks reveal that m6A targets key genes of many critical biological processes including transcription, cell organization and transport, and cell proliferation and cancer-related pathways such as Wnt pathway. The m6A-associated disease analysis prioritized five significantly associated diseases including leukemia and renal cell carcinoma. These results demonstrate the power of our proposed computational scheme and provide new leads for understanding m6A regulatory functions and its roles in diseases.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 30601803      PMCID: PMC6331136          DOI: 10.1371/journal.pcbi.1006663

Source DB:  PubMed          Journal:  PLoS Comput Biol        ISSN: 1553-734X            Impact factor:   4.475


Introduction

N6-methyl-adenosine (m6A) methylation is a paradigm-shifting research filled with exciting discoveries. Recent research has shown that m6A exists in > 25% of mRNAs in mammalian cells [1, 2] and forms an important regulatory circuitry that controls many aspects of RNA metabolism [3-10]. Evidence of m6A’s involvement in cancer and other diseases [11-21] and its role in regulating viral life cycle [22-25] are also accumulating. However, our current knowledge about how m6A levels are regulated and whether and how regulation of m6A levels of a specific gene can play a role in cancer and other diseases is largely elusive. The purpose of this study is to conduct a comprehensive prediction of m6A mediated functions and associated diseases through global analysis of m6A regulated genes using 75 human methylated RNA immunoprecipitation sequencing (MeRIP-seq) [1, 2] samples curated by MeT-DB2 [26]. To this end, prediction of context-specific m6A sites is an essential first step. Several informatics tools have been developed to predict condition independent m6A sites from RNA sequences [27-35] or condition-specific m6A peaks in MeRIP-seq data [36, 37]. Chen et al. [35] proposed the first sequence-based model iRNA-Methyl to predict m6A site using features extracted from RNA sequences. Subsequently, Zhou et al. developed SRAMP [29] to improve the performance of predicting single-base m6A sites using three kinds of features extracted from pri- and mature RNA sequences. Since then, a line of sequence-based algorithms [38-40], all based on different handcrafted features extracted from RNA sequences have been developed. As an alternative to handcrafted RNA sequence features, Wei et al. [33] proposed DeepM6APred to extract features automatically using a deep belief network (DBN). However, because RNA sequences are independent of any study conditions, none of these sequence-based models can predict condition-specific m6A sites. Because m6A has been shown to play different regulatory roles in different cell conditions and disease types, its methylation status is highly dynamic in nature, being either methylated or demethylated depending on the biological contexts. Also, m6A status can be experimentally manipulated through silencing or overexpressing key m6A related proteins, a commonly used approach for studying m6A functions. Therefore, predicting condition-specific single-base m6A site is an essential task in m6A research. Currently, algorithms including exomePeak [41] and MeTPeak [42] are proposed to predict context-specific m6A peaks from MeRIP-seq data. However, MeRIP-Seq has a limited resolution of ~100bp and its large biological and technical variations often result in high false positive rates in the predicted peaks. No existing algorithm can predict condition-specific m6A site at a single-base resolution. In addition to a lack of single-base, context-specific site prediction algorithms, computational prediction of m6A functions has not been adequately addressed. In our previous work [43], we developed m6A-Driver, a network-based approach to identify m6A driven genes with significant functions under a specific context. However, m6A-Driver has several limitations. First, m6A-Driver to identify m6A driven genes in two different conditions, e.g., gene knock-down vs. normal. Therefore, m6A-Driver cannot be used for the intended global analysis that includes samples from multiple conditions. Second, because of the small sample size, m6A-Driver tends to identify a large number of significant genes, which makes it difficult to prioritize these genes. In this work, we propose to develop a new approach to address these limitations of the site or peak detection algorithms and m6A-Driver so that a global prediction of m6A mediated functions and associated diseases across samples from multiple conditions can be reliably performed. To improve the resolution and accuracy of condition-specific m6A site prediction from MeRIP-seq, we propose a novel Convolutional Neural Network (CNN) [44] based method named Deep-m6A to detect m6A sites from MeRIP-Seq peak regions with a single base resolution. This model integrates mRNA sequence information with MeRIP-seq data and trained on single-base m6A sites identified by the miCLIP [45], the state-of-the-art high throughput technology for single-base detection of m6A sites. Deep-m6A makes it possible to identify single-base m6A sites from the 75 human MeRIP-seq samples with higher accuracy and resolution, providing us a chance to investigate the relationship of m6A methylation with gene expression and diseases in a global manner. To this end, we propose Hot-m6A, a new network-based pipeline. We hypothesize that m6A regulates biological processes and pathways by regulating a set of functionally interacted genes through regulating their mRNA stability. Therefore, the expression of a gene regulated by an m6A site is likely to be correlated with the m6A level across multiple conditions and this regulation also influences its neighboring genes in the PPI network. The stronger an m6A regulates a gene, the more significant the m6A-expression correlation and the higher the degree of influence on its neighbors. Our goal is to identify m6A regulated genes, whose expressions are correlated with their m6A methylations across 75 samples and which are closely interacting with other m6A genes in a PPI network. To achieve this goal, we adopt the HotNet2 [46] algorithm. The basic idea of Hot-m6A is to diffuse “heat”, i.e., correlation of gene expression and m6A level in this study, of genes in a network and select the significant genes, where the “heats” they diffused to each other are high. Therefore, the m6A regulated genes identified by HotNet2 will in general have relatively high expression-methylation correlation and closely interacted with each other in the PPI network. However, HotNet2 still can also identify m6A regulated genes with lower “heat” but closely interacted with “hot” genes. After m6A regulated genes are identified, our pipeline prioritizes the m6A associated diseases by applying the random walk with restart on heterogeneous network (RWRH) method [47] on a gene-disease heterogeneous network. We applied Deep-m6A and Hot-m6A to the 75 MeRIP-seq samples, which produced a compact set of 709 functional significant m6A-regulated genes and nine functional enriched subnetworks. The functional enrichment analysis of these genes and networks reveal that m6A targets key genes of many critical biological processes including transcription, cell organization and transport, cell proliferation and cancer-related pathways such as the Wnt pathway. The m6A-associated disease analysis prioritized five significantly associated diseases including leukemia and renal cell carcinoma.

Result

Overview of the method

The flowchart of our pipeline is illustrated in Fig 1. We intend to perform a global analysis of existing human m6A site in different samples to uncover m6A regulated functions and associated disease. To achieve this, we first set out to determine the single-base m6A sites. To this end, we developed a novel CNN model, called Deep-m6A, which takes both sequence feature and MeRIP-Seq IP reads count as an input to predict context specific single-base m6A sites from MeRIP-Seq data. We then applied Deep-m6A to predict single-base m6A sites for all 75 human MeRIP-Seq samples extracted from MeT-DB V2.0 [26]. Then, we extracted the m6A sites appeared in at least 12 samples according to a test based on Fisher’s z transformation [48] and calculated the revised Fisher’s z-transformation [49] of the Pearson correlation of methylation degree and gene expression level across all their appeared samples (see Methods and material for details). Genes that contain at least one m6A site that occurred in more than 12 samples and whose methylation degree and gene expression level are significantly correlated were defined as candidate m6A regulated genes. We selected the largest absolute z-transformed correlation from all m6A sites in a gene to denote the methylation-expression correlation of the gene. To further study the m6A regulated functions and select functional m6A-regulated genes, we developed Hot-m6A and applied it to 4 different PPI networks, taking the absolute z-transformed correlations for all candidate m6A-regulated genes as the heat vector, to identify significant PPI subnetworks that are regulated by m6A more than expected by chance. The 4 PPI networks includes BioGRID [50], HINT+HI2012 [51, 52], MultiNet [53], and iRefIndex [54]. The last three networks are used in the HotNet2 paper. The edges that are identified by all the four networks were extracted to form significant consensus subnetworks, which were then extended to include edges identified in 3, 2, and 1network, respectively (See Methods for the detail). All the genes in the extended subnetworks are defined as significant m6A-regulated genes and genes reported in the significant consensus networks are defined as consensus m6A-regulated genes. Finally, we took all these significant m6A-regulated genes as gene seeds and their correlated diseases according the OMIM as disease seeds and applied RWRH on a heterogeneous gene-disease network. We also built four heterogeneous gene-disease networks corresponding to the four PPI networks. For each heterogeneous gene-disease network, the top 10 ranked diseases were selected as candidate m6A-associated disease. A random network test was applied to calculate an empirical p-value for each candidate m6A-associated disease. The candidate disease with a p-value < 0.05 was selected as significant candidate m6A-associated disease. The significant candidate m6A-associated disease that identified by all the 4 networks were finally identified as m6A-associated disease.
Fig 1

Flowchart of our proposed prediction pipeline.

Deep-m6A achieves higher precision and sensitivity in 10-fold cross validation

We first evaluated the performance of Deep-m6A using the training data from the HEK293 cell lines and compared its performance with Deep-m6A-S. As described in Methods, Deep-m6A-S and Deep-m6A use the same CNN structure but different input features. Deep-m6A-S takes 101-nt long sequences centered at a DRACH motif as input, whereas Deep-m6A takes both sequence and the corresponding IP reads count as input. As detailed in the “Dataset” section, this training data include 4,742 positive samples that are CITS miCLIP m6A sites in the MeRIP-seq peak regions and also centered at DRACH motif, and 33,718 negative samples that are also centered at a DRACH motif in the MeRIP-seq peak regions but are > 50-nt away from the positive samples and not CIMS miCLIP sites or single-base m6A sites reported in other experiments [55]. Before model comparison, we investigated the difference between MeRIP-seq IP reads coverage in the sequence regions of the positive and negative samples. As shown in Fig 2(A), the average reads count of positive samples are higher and more centered at the DRACH m6A motif than those of the negative samples.
Fig 2

Performance of Deep-m6A.

(A) shows the IP reads count coverage in the 101-nt positive and negative training samples. The x-axis denotes the relative location of the nucleotides in the sample sequences, where the 51st position is the center A of the DRACH motif and the 1st position is the 5’ end of the sequence. The IP expression level represents the IP reads input RC. (B) and (C) are the ROC and PR curves of Deep-m6A-S and Deep-m6A obtained from the 10-fold CV on the HEK293 training data. (D) and (E) are performances of Deep-m6A, Deep-m6A-S, SRAMP-Mature and SRAMP-Full model on the independent MOLM13 data. The number after the method names are corresponding area under the curve (AUC).

Performance of Deep-m6A.

(A) shows the IP reads count coverage in the 101-nt positive and negative training samples. The x-axis denotes the relative location of the nucleotides in the sample sequences, where the 51st position is the center A of the DRACH motif and the 1st position is the 5’ end of the sequence. The IP expression level represents the IP reads input RC. (B) and (C) are the ROC and PR curves of Deep-m6A-S and Deep-m6A obtained from the 10-fold CV on the HEK293 training data. (D) and (E) are performances of Deep-m6A, Deep-m6A-S, SRAMP-Mature and SRAMP-Full model on the independent MOLM13 data. The number after the method names are corresponding area under the curve (AUC). The CNN architecture of Deep-m6A and Deep-m6A-S includes one convolutional layer followed by a max-pooling layer, a fully connected layer and an output layer. We used 10-fold cross validation to evaluate the performance of Deep-m6A and Deep-m6A-S and took one of the 10 CVs to optimize the hyperparameters. To reduce the impact of significant imbalance between the positive and negative samples, we split the negative samples into 7 subsets, each of which has the equal size to positive samples and for each CV, we trained 7 models for each balanced pair of positive/negative samples and used the average predicted score of the 7 models as the final predicted probability of a test DRACH motif to be a single base m6A methylation site. The CNN architecture was determined by a grid search method [56], where the searched parameters are the kernel size: 4x3, 4x4 and 4x5; # of kernels: 16, 32 and 48; the max-pooling size: 1x3, 1x4 and 1x5; # of the nodes of the fully connected layer: 8, 12, 16 and 32; the fully connected layer dropout rate: 0.2, 0.25, 0.3, and 0.5; and the softmax output layer dropout rate: 0.2, 0.25, 0.3 and 0.5. Table 1 showed the optimized hyperparameters by grid search. The categorical cross entropy loss function and Adadelta [57] optimizer were adopted in the training.
Table 1

Optimized hyperparameters of Deep-m6A and Deep-m6A-S.

HyperparametersKernel size# FiltersMax-pooling sizeDropout rate1Node of denseDropout rate2
Value4x5321 x 40.25120.25

* “Kernel size” is size of CNN kernel, “# Filters” is number of CNN filters, “Dropout rate1” is the dropout rate after max-pooling, “Node of dense” is the number of nodes in the fully connected layer after convolution layers, and “Dropout rate2” is the dropout rate after the fully connected layer.

* “Kernel size” is size of CNN kernel, “# Filters” is number of CNN filters, “Dropout rate1” is the dropout rate after max-pooling, “Node of dense” is the number of nodes in the fully connected layer after convolution layers, and “Dropout rate2” is the dropout rate after the fully connected layer. Fig 2(B) and 2(C) shows the receiver operating characteristic (ROC) curves and the precision-recall (PR) curves of the CV test. We have trained seven CNN models to balance the scale of the positive and negative training samples and the predicted probability of a site is an average predicted probably of these seven models. We can see Deep-m6A achieved a higher area under the ROC curve (AUC = 0.898) and area under the PR curve (PRAUC = 0.554) than Deep-m6A-S model (AUC = 0.884, PRAUC = 0.489). The AUC and PRAUC of Deep-m6A are 1.4% and 5.6% higher than Deep-m6A-S, respectively. There is especially a higher improvement in precision. This suggests that including IP reads help lower the false positive rate, especially in the higher ranked predictions. Having a higher precision is particularly important for subsequent biological validation and functional study because in practice attention is most likely given to a limited top ranked predictions.

Deep-m6A outperforms sequence-based predictions on an independent dataset

To further validate the performance of Deep-m6A, we applied it along with Deep-m6A-S on the independent MOLM13 dataset and compared their performance with SRAMP [29]. As described in the “Dataset” section, this dataset includes 726 positive and 6,577 negative samples. Deep-m6A and Deep-m6A-S were trained on HEK293 dataset as described in the previous section. For SRAMP, we downloaded the SRAMP tool from their webserver (http://www.cuilab.cn/sramp/) and applied it to the 101-nt RNA (without intron) and DNA (with intron) sequences of each sample. The RNA sequences are used by the SRAMP mature RNA model and the DNA sequences are used by SRAMP full DNA model. When feeding these sequences into the SRAMP tools, they output probabilities of all DRACH motifs contained in the sequence. We only picked the results for the centered motif of each sample as the prediction probabilities for SRAMP. Fig 2(D) and 2(E) shows the ROC curve and PR curve results of these models. First, among the three sequence-based models, Deep-m6A-S reports about 1% improvement in AUC and 2% in PRAUC over the two SRAMP models. This suggests that the CNN model can capture additional discriminate sequence features than SRAMP. Second, Deep-m6A outperforms all three sequence-based prediction models in both AUC and PRAUC. The improvement in precision is even more pronounced (~6% over Deep-m6A-S). This once again speaks the benefit of including IP reads in the prediction.

Identification of candidate m6A-regulated genes from all human MeRIP-Seq data

The relatively large number of human MeRIP-Seq data of different cells and tissues under different conditions curated by MeT-DB2 gives us a chance to investigate the relationship of m6A methylation with gene expression and diseases in a global manner. As the first step of this global investigation, we applied Deep-m6A to the 75 human MeRIP-seq samples to identify candidate m6A-regulated genes. A positive site was predicted when the prediction probability calculated by Deep-m6A is greater than 0.907; this threshold is chosen because at this threshold the 10-fold CV test in training can achieve a precision of 0.7. This resulted in 23,456 single-base m6A sites in all 75 samples. Fig 3(A) shows the frequency of a site being predicted in the 75 samples. As shown, ~23% of these sites are sample specific, i.e. they only appear in one sample. In contrast, ~30% sites appear in more than 12 data samples.
Fig 3

Analysis of predicted m6A sites in 75 human samples.

(A) Distribution of the number of occurrence of a predicted single-base m6A site in the 75 samples. (B) Distributions of all predicted m6A sites and sites appeared in more than 12 samples in a meta-mRNA. (C) Distributions of the absolute revised Fisher’s z transformed Pearson correlations of the 49 consensus m6A-regulated genes, 709 m6A-regulated genes identified by HotNet2, and other remaining candidate genes. (D) GO BP and KEGG pathway enrichment result for all 709 m6A-regulated genes. Gene count means the number of genes involved in the corresponding terms and P is the adjusted FDR of the enrichment p-value.

Analysis of predicted m6A sites in 75 human samples.

(A) Distribution of the number of occurrence of a predicted single-base m6A site in the 75 samples. (B) Distributions of all predicted m6A sites and sites appeared in more than 12 samples in a meta-mRNA. (C) Distributions of the absolute revised Fisher’s z transformed Pearson correlations of the 49 consensus m6A-regulated genes, 709 m6A-regulated genes identified by HotNet2, and other remaining candidate genes. (D) GO BP and KEGG pathway enrichment result for all 709 m6A-regulated genes. Gene count means the number of genes involved in the corresponding terms and P is the adjusted FDR of the enrichment p-value. To conduct a global analysis across all the 75 samples, we extracted sites that appear in more than 12 samples. The distributions of these sites on mRNA as well as all predicted m6A sites were tally using Guitar R package [58] and shown in Fig 3(B). As illustrated, for all the predicted sites, the distribution tends to be enriched in 3’ UTR and 5’ UTR, whereas the sites appeared in more than 12 samples are more enriched around stop codon and in 3’ UTR. Because 3’UTR contain binding sites of miRNA and RNA binding proteins such as HuR that are known to post-transcriptionally regulate gene expression, this distribution may indicate that the sites that appeared in > = 12 samples may potentially be involved in regulating gene expression. Next, we selected the genes which harbor m6A sites that appear in > = 12 samples as candidate m6A-regulated genes. In total, 3,670 m6A-regulated genes that contain totally 7,090 m6A sites were extracted. To study the relationship of their expression and m6A methylation, we calculated the Pearson’s correlation of methylation degree and gene expression level for all the candidate m6A-regulated genes and then converted it to a revised Fisher’s z score as described in “Methods and material” section.

Functional significant m6A-regulated genes identified by Hot-m6A are involved in important functional pathways

We then set out to identify functional significant m6A-regulated genes, which are defined as candidate genes whose expression levels are influenced by the m6A methylation more than expected by chance and which are closely interacting with each other in PPI networks. We applied Hot-m6A (See Methods and material for detailed algorithms) to the candidate m6A-regulated genes, taking their absolute revised Fisher’s z score as the heat vector and performing heat diffusion in the PPI network to identify functional interacted genes with relative high correlation. 49 consensus m6A-regulated genes were identified across all four PPI networks using Hot-m6A and 709 m6A-regulated genes were identified in the extended network (S1 Fig; see Methods and material for detail). Most of the 709 m6A-regulated genes (681 or ~96%) are in the biggest subnetwork. We then compared the distributions of the absolute revised correlation z scores of the 49 consensus m6A-regulated genes, 709 m6A-regulated genes, and remaining non-significant candidate genes (Fig 3(C)). We can see that most of the m6A-regulated genes have larger correlation than other nonsignificant candidate genes and the consensus m6A-regulated genes tend to the largest correlations among the three groups. This result speaks for the ability of our pipeline to identify likely m6A regulated genes. Notice there are some m6A-regulated genes with low correlations; they are identified by Hot-m6A because they interacted tightly with “hot” genes with high correlation in the PPI network. To further detect the function module from these 709 m6A-regulated genes, we removed edges that are identified only in one PPI network, and obtained 22 subnetworks including 113 genes in total. We further isolated the 9 largest functional significant subnetworks that have at least 3 genes (S2 Fig). We further examined the functional enrichment for all the 709 significant genes using DAVID [59] (Fig 3(D)). Among the twelve enriched GO BPs (P < 0.05), seven are directly related to transcription including transcription from RNA polymerase III promoter (10 genes, P = 4.78×10−3), positive regulation of DNA-templated transcription (44 genes, P = 5.79×10−3), transcription, DNA-templated (116 genes, P = 1.67×10−2), tRNA transcription from RNA polymerase III promoter (5 genes, P = 1.62×10−2), 5S class rRNA transcription from RNA polymerase III type 1 promoter (5 genes, P = 1.62×10−2), negative regulation of transcription from RNA polymerase II promoter (52 genes, P = 2.92×10−2) and negative regulation of transcription, DNA-templated (39 genes, P = 4.82×10−2). Indeed, the m6A is shown to involve in every stage of RNA metabolism including transcription. It is shown to regulate mRNA stability and speculated to also regulate mRNA splicing [60]. Also, we also see that m6A genes could regulate cell organization and transport as they are enriched in vesicle-mediated transport (25 genes, P = 3.83×10−5), cell−cell adhesion (32 genes, P = 3.61×10−4) and extracellular matrix organization (22 genes, P = 2.04×10−2) are also enriched. We also noticed that that the Wnt signaling pathway is enriched (10 genes, P = 1.26×10−2). Wnt is an important pathway widely involved in cancer and cell development [61-63]. The potential involvement of m6A in cancer is reinforced by the enriched proteoglycans in cancer KEEG pathway (22 genes, P = 2.43×10−2). This is not surprising as there is increasing evidence demonstrating its regulatory roles in different cancer [20, 64–70]. Another enriched KEGG pathways are Protein processing in the endoplasmic reticulum (20 genes, P = 2.37×10−2) and Lysosome pathway (18 genes, P = 6.31×10−3), both of which are protein processing pathways. These predictions are corroborated by our knowledge of the m6A’s role in regulating translational efficiency [71, 72]. We next performed GO and KEGG pathway enrichment analysis to the nine largest significant subnetworks (S3 and S4 Figs). These enriched BP and pathways present reasonably clear and consistent interpretations of these subnetworks. Subnetwork A is mostly involved in protein processing (6 genes enriched in Protein processing in endoplasmic reticulum, 6 genes enriched in ER-associated ubiquitin-dependent protein catabolic process, 3 genes enriched in retrograde protein transport, ER to cytosol and 4 gene enriched in protein stabilization) and potentially regulate mRNA stability via SMG9, UPF1, SMG8 and SMG1 genes enriched in nuclear-transcribed mRNA catabolic process, nonsense-mediated decay BP term. Subnetwork B is predicted to mostly involve in Notch signaling pathway via NOTCH2, MAML1 and MAML3 genes. Subnetwork C is closely related to cell motility, proliferation and survival through enrichment of m6A-regulated genes including VEGFB, PDGFB, VEGFA, COL5A1, NRP1, SLIT2 and SPARC in pathways like Focal adhesion (P = 1.75×10−2), positive regulation of endothelial cell proliferation (P = 1.56×10−3), cell migration involved in sprouting angiogenesis (P = 1.26×10−3) and positive regulation of endothelial cell migration (P = 1.26×10−3). It is also involved in neuronal development related pathways including Axon guidance (NRP1, PLXNA1, SEMA3F and SLIT2 genes with P = 1.30×10−3), branchiomotor neuron axon guidance (NRP1, PLXNA1 and SEMA3F genes with P = 8.54×10−3), semaphorin-plexin signaling pathway involved in axon guidance (NRP1, PLXNA1 and SEMA3F genes with P = 1.19×10−3) and axon extension involved in axon guidance (NRP1, SEMA3F and SLIT2 genes with P = 1.19×10−3). The function of subnetwork D is mostly related to transcription including regulation of transcription from RNA polymerase II promoter (MED26, MED18, MED9, MED11 and MED21 genes with P = 4.16×10−4), transcription, DNA-templated (MED29, MED26, POLR2I, MED9, MED11 and MED21 genes with P = 4.92×10−3) and transcription initiation from RNA polymerase II promoter (MED26, POLR2I and MED13 genes with P = 1.48×10−2). Subnetwork E’s function is defined through mostly the enrichment Wnt signaling pathway (5 genes enriched including FZD8, DKK1, LRP6, FZD5, LRP5 with P = 6.75×10−6). For subnetwork F, 3 m6A-regulated genes are STK11, WDR6 and STRADA and they are involved in cell cycle and cell proliferation related pathways including cell cycle arrest, mTOR signaling pathway and AMPK signaling pathway with P < 0.05. Subnetwork G serves a role in antigen processing (3 genes, TAP2, HLA-E, TAPBP enriched in Antigen processing and presentation, antigen processing and presentation of peptide antigen via MHC class I and antigen processing and presentation of endogenous peptide antigen via MHC class I with P < 0.05). Subnetwork H includes ITGAV, ITGB5 and CYR61 as m6A regulated genes, which are involved in cell adhesion and subnetwork I are involved in MAPK signaling pathway via gene MAP3K4, GADD45B and MAP2K7. Taken together, these results suggest that m6A-regulated genes are enriched in significant biological process and pathways that can influence RNA transcription, cell motility, cell proliferation, cell survival and cell death, and therefore are likely involved in caner and cancer related pathways. Moreover, m6A regulated genes tend to be in the upstream of these enriched pathways (S5 Fig).

Prioritized m6A-associated diseases

We next investigated the association of m6A with diseases. We first searched the OMIM database for diseases related to m6A-regulated genes, where we found 308 phenotypes associated with 177 m6A-regulated genes according to OMIM phenotype-disease relationship record (S1 File). Most of the selected phenotypes are associated with only one m6A-regulated gene. To prioritize these diseases to identify significant m6A-associated diseases, we mapped the 709 m6A-regulated genes to each of the four PPI networks and the 308 phenotypes to the disease network and constructed four gene-disease networks. We then applied RWRH to each of the four networks (See “Methods and material” for the detailed algorithm), taking the regulated genes and their correlated diseases as seed nodes. Then, we selected the top 10 ranked diseases as candidate m6A-associated diseases and calculated an empirical p-value to assess if a disease association is selected by chance. Five significant m6A-associated diseases with p-value < 0.05 and reported by all the four networks were finally identified as the m6A-associated diseases (Table 2).
Table 2

Predicted m6A-associated diseases.

DiseaseOMIM annotated m6A genesExp-m6A corr (# samples)
acute myeloid leukemiaNSD1, PICALM, ABL20.76 (52), 0.77 (28), 0.89 (27)
type 2 diabetes mellitusHNF1B, HNF1A, WFS1, IRS20.73 (14), 0.89 (13), 0.51 (14), 0.60 (23)
microphthalmiaBCOR, SOX20.67 (13), 0.34 (26)
MHC class I deficiencyTAPBP, TAP20.81 (24), 0.46 (36)
renal cell carcinomaHNF1A, OGG1, VHL, HNF1B0.89 (13), 0.41 (14), 0.83 (16), 0.73 (14)
As shown in Table 2, acute myeloid leukemia is prioritized as the most significant m6A associated disease, where m6A-regulated genes NSD1, PICALM, and ABL2 are OMIM-annotated disease genes, suggesting that they may be potential m6A-associated biomarkers in leukemia. Several lines of evidence have shown the direct involvement of m6A in regulating leukemia [68-70]. The second significant disease is type 2 diabetes mellitus. Yang et al. has established this association and reported that glucose is involved in the dynamic regulation of m6A in patients with type 2 diabetes [73]. Our prediction identified the m6A genes HNF1B, HNF1A, WFS1, and IRS2 as the potential disease genes, which could provide a new clue to study the role of m6A in type 2 diabetes. Also, m6A is predicted to have a role in renal cell carcinoma. This is also corroborated by Xiao et al. [74], which reported that the m6A methyltransferase METTL3 acted as a tumor suppressor in renal cell carcinoma and publications in [75, 76], which identified YTHDF2, an m6A binding protein, to be involved in renal cell carcinoma. Taken together, we found existing evidence to support 3 out of 5 predicted association diseases. These results demonstrate the power of the proposed network-based analysis and the RWRH algorithm in identifying m6A associated diseases.

Discussion

The accumulation of a large number of MeRIP-Seq samples from different cells and tissues under different conditions gives us a chance to analyze m6A-regulated genes and m6A-associated functions in a global manner. However, existing informatics tools for predicting m6A sites from MeRIP-seq are hampered by high false positive rates and low resolutions. To address this issue, we developed Deep-m6A, the first CNN model that predicts single-base m6A sites in MeRIP-Seq peak regions by integrating mRNA sequence features with MeRIP-Seq IP reads. Test results from 10-fold CV on training HEK293 data and an independent OMLM13 dataset showed that Deep-m6A outperformed sequence-based algorithms including Deep-m6A-S and SRAMP in both precision and sensitivity. Although the miCLIP technology has been proposed to profile transcriptome-wide m6A at a single-base resolution, its adoption is still very limited because of its more complex protocol. Therefore, MeRIP-seq will continue to serve as the go-to high throughput technology for global m6A profiling in the near future. Give the ability of Deep-m6A to provide MeRIP-seq a single-base detection resolution, we expect Deep-m6A to be an important tool in m6A research. Currently, Deep-m6A is trained to detect sites that reside on DRACH motifs. Even though this requirement of containing motifs helps reduce the false positive predictions, it also sacrifices the prediction sensitivity and will inevitably miss the positive sites that do not have any motifs. Therefore, further improvement of Deep-m6A in the future to be able to detect sites without motifs will provide additional value for Deep-m6A. In our scheme, to further reduce the false positive rate and prioritize functional significant m6A-regulated genes, we examined the correlation between expression and m6A methylation of m6A-regulated genes and assessed their function significance using Hot-m6A implemented on the PPI network. We applied Deep-m6A and Hot-m6A to 75 human MeRIP-seq data, which resulted in a compact collection of 709 m6A-regualted genes and several interacting subnetworks of m6A-regualted genes. Functional analysis revealed that these genes are mainly involved in transcription and Wnt pathway (Fig 2(D)). Wnt pathway is one of the key cascades regulating cell migration and cell development, and is also tightly associated with cancer such as glioblastoma (GBM). Even though direct involvement of m6A in Wnt pathway has not been reported, YTHDF2, an m6A reader, has been shown to suppress cancer cell migration by inhibiting EMT in an m6A dependent manner [77]. Also, ALKBH5, an m6A demethylase, is reported to promote GBM tumorigenesis by stabilizing nascent FOXM1 transcripts through mediating its m6A levels [78] and FOXM1 has also been show to control Wnt target gene expression in GBM. It is highly likely that there are alternative pathways such as Wnt, by which m6A regulates cell migration and tumorigenesis. Functional enrichment of subnetworks also presents a highly consistent interpretations of their functions (S3 and S4 Figs). This suggests that the m6A-regulated genes in the subnetworks are likely to involve in the same processes and pathways and thus share similar functions. In terms of their enriched processes and functions, we observed again important pathways such as focal adhesion, mTOR signaling pathway and AMPK signaling pathway, which are also known to regulate cell cycle, and cell migration, and are critical in cancer. Finally, the network-based disease association analysis on the m6A-regulated genes reported 5 significant associated diseases, where 3 of them have been corroborated by the existing publications. Our predictions could provide clues for the mechanisms, by which m6A regulating these diseases. While there is no published evidence to support the other two associated diseases, our prediction points to potentially new disease associations of m6A. Last but not the least, our proposed methods have several issues that need to be further addressed in the future. First, because of the lack of miCLIP data, the model has only been trained on data from HEK293 cell line; this may not be enough to capture the features of all different kinds of true positive m6A sites. Second, the scale of phenotype-gene relationships in the OMIM database are relative small, which might not be able to capture all potential significant disease-gene correlations. A potential solution is to integrate other disease-gene annotation information like DisGeNET [79] to make the network more complete. Finally, we only analyzed the common m6A sites across many samples and their influence on gene expression. However, some of these context specific m6A sites appear only in a unique sample and they could be important for understanding m6A functions under this specific condition. Developing algorithms to detect these unique genes would be another future work.

Methods and materials

Datasets

The positive single-base m6A sites for our training data for Deep-m6A and Deep-m6A-S were obtained from [45], which developed m6A individual-nucleotide-resolution cross-linking and immunoprecipitation (miCLIP) technology. This paper includes two alternative technologies including cross-linking–induced mutation sites (CIMSs) and cross-linking–induced truncation sites (CITSs) for human embryonic kidney (HEK293) cells using total cellular RNA. The CIMS miCLIP uses Abcam as antibody and C→T transitions as feature, whereas the CITS miCLIP uses SySy as antibody and truncations as feature. We chose CITS miCLIP generated single base m6A sites as true positive m6A sites because the corresponding MeRIP-Seq data of the HEK293 cell line that we used for training were also generated with SySy as antibody [77]. Another reason is that 74% (4847/6543) of CITS sites can be mapped to peaks generated by MeRIP-Seq data, whereas this ratio is only 55% (5202/9536) for CIMS sites. The MeRIP-Seq peaks were detected using the exomePeak R package [41] from 2 replicates of MeRIP-Seq samples [77]. A consistent peak that appears in both replicates and also contains at least one CITS miCLIP site was determined as a single-base m6A-containing MeRIP-Seq peak. The mRNA sequences of these peaks (without introns) were subsequently extracted and any sites in the sequences that contain DRACH motifs were defined as candidate m6A sites. A candidate m6A site was then extended to 101 nt centered at the “A” of the DRACH motif to capture the sequence and reads count features around the motif. The candidate sites that also are CITS miCLIP m6A sites were determined as the positive samples (4,742 in total) and other candidate sites that are at least 50-nt away from a positive “A” and are not any CIMS miCLIP sites or single base m6A sites reported in other experiments [55] were defined as the negative samples (33,718 in total). The independent miCLIP test data were obtained from [69], which has 3 miCLIP replicates for the leukemia cell line MOLM13, each of which contains 8,113, 2,886 and 2,050 miCLIP sites, respectively. There were in total 11,746 miCLIP m6A sites after combining all the three replicates, whereas only 136 sites were common across all the three replicates. Here, we considered miCLIP m6A sites that appeared in at least two replicates as true positive m6A sites (1,147 in total). The corresponding MeRIP-Seq test data were obtained from [80], which included 2 MeRIP-Seq replicates for the MOLM13 cell line. One of the 2 replicates has very low sequencing depth, so we took the combined peaks of these two replicates reported by exomePeak as MeRIP-seq peaks. Similar as the training data, the MeRIP-Seq peaks that contain at least one miCLIP m6A sites were determined as single-base m6A-containing MeRIP-Seq peaks and the sites with DRACH motifs in these regions were defined as candidate single-base m6A sites. The candidate m6A sites that are also miCLIP m6A sites were determined as positive samples (726 in total) and other candidate sites that are >50-nt away from any positive sites and are not miCLIP sites from any of the three miCLIP replicates were determined as negative samples (6,577 in total). The human MeRIP-Seq data were downloaded from MeT-DB V2.0 [26], which include 75 human samples from different human cell lines and tissues under different conditions, including 9 cell lines (A549, Dendritic cells, embryonic stem cell, Endoderm, HEK293T, HeLa, HepG2, neural progenitor cells, OKMS inducible fibroblasts and U2OS) and brain tissue samples under different conditions. The reference PPI networks were built based on BioGRID (release 3.4.128) [50], HINT+HI2012 [51, 52], MultiNet [53], and iRefIndex [54]. After removing the isolated proteins and self-interaction proteins, we established a PPI network with a total of 16,062 proteins and 152,676 interactions. The last three PPI networks were downloaded from http://compbio.cs.brown.edu/pancancer/hotnet2/. The HINT+HI2012 network contains 9,858 genes and 40,704 edges; the iRefIndex network contains 12,128 genes and 91,808 edges; and the Multinet network contains 14,398 genes and 109,569 edges. Diseases ontology terms were collected from the Disease ontology [81]. The “doSim” function of R package “DOSE” [82] was used to calculate the semantic similarity between two DO terms, which are used as the edge weight of the disease network. The disease gene relationship information was extracted from the OMIM database [83]. The OMIM ID was mapped to DO ID so that we can use OMIM gene–phenotype relationship to connect the PPI network and DO disease network to construct a gene-disease network. Four gene-disease heterogeneous networks were built for each of the PPI networks.

Deep-m6A and Deep-m6A-S for single-base m6A prediction

We developed 2 CNN models for single-base m6A prediction, one called Deep-m6A and the other was Deep-m6A-S. These 2 models use the same CNN structure and hyperparameters but with different inputs. For Deep-m6A-S, the input is the OneHot encoded 101nt RNA sequences centered at the “A” of a DRACH motif. OneHot encoding translates the A, U, C, G characters into a binary vector of (1,0,0,0), (0,1,0,0), (0,0,1,0) and (0,0,0,1), respectively. Therefore, the input of Deep-m6A-S becomes a 4 x 101 matrix, M. On the contrary, Deep-m6A takes both RNA sequences and the features of MeRIP-Seq IP reads count at each nucleotide of the RNA sequence. However, the input for Deep-m6A is similar to OneHot encoded M for Deep-m6A-S but with 1s replaced by the IP reads count feature at that nucleotide. The IP reads count features were calculated by the same approach as exomePeak, i.e., where RC is a 101-dimension vector of normalized reads count, RC is another 101-dimension vector of raw reads count, and RC is the total number of reads for the MeRIP-Seq IP sample. Because there are two replicates in each of the HEK293 and MOLM13 datasets, we calculated the average RC of the two replicates as input IP reads count. The Deep-m6A input is then calculated as where is a 101x101 diagonal matrix with diagonal entries being those of RC. Deep-m6A and Deep-m6A-S use CNN [44] to capture the non-linear features of input sequences and IP reads count. The adopted CNN architecture consists of a convolutional, a max-pooling and two fully connected layers (Fig 4). The convolutional layer outputs the pointwise product between the input matrix (M for Deep-m6A and M for Deep-m6A-S) and filters, which is followed by a rectified linear (ReLU) activation. Then, a max-pooling layer, which selects the maximum value over a window, is applied to reduce the dimensionality, which is followed by a dropout operation to reduce the complexity of the model. Finally, a fully connected dense layer is added followed by another ReLU and dropout to combine all the features learned by each filter. The output of the dense layer is passed on to the softmax function to generate the probability of the input sample to be an m6A site.
Fig 4

The architecture of the proposed CNN model for Deep-m6A.

The input matrix is M for Deep-m6A and M for Deep-m6A-S.

The architecture of the proposed CNN model for Deep-m6A.

The input matrix is M for Deep-m6A and M for Deep-m6A-S. Because the positive and negative sample sizes are imbalanced, we split the negative samples into seven subsets of equal size and trained 7 CNN models, each on a set that paired positive samples with a subset of negative samples. Seven models are trained as a result. For any prediction, the averaged prediction probability of the seven models is taken as the final predicted probability for an input sample. We implemented the CNN models using the keras R package. We refer Deep-m6A and Deep-m6A-S as the combination of the seven models. Input data files and Deep-m6A source code and the models are made publicly available on GitHub (https://github.com/NWPU-903PR/Deepm6A.git). The inputs of the Deep-m6A function are a MeRIP-Seq IP sample bam file and a bed file that annotates the peaks identified by exomePeak R package from MeRIP-Seq data. The output is an excel file, where each row contains information about a predicted single-base m6A site extracted from an exomePeak peak region and each column denotes the chromatin, the chromatin start, the chromatin end, Entrez gene ID, the predicted probability, the strand and the motif at this location, respectively.

Prediction of single base m6A peak for all human MeRIP-Seq data

We applied Deep-m6A to predict single-base m6A sites in 75 human MeRIP-Seq samples from MeT-DB2. First, we applied exomePeak to predict m6A peaks in each MeRIP-Seq sample and then searched for DRACH motifs in the peak regions. “A”s in these motifs were treated as candidate single-base m6A sites and the 101nt RNA sequences centered at these “A”s and the corresponding IP reads counts were extracted to construct the input matrix M. For each candidate site, we separately applied Deep-m6A trained on the training data to calculate the probability of it to be a m6A site. We predict a positive site when the output probability is greater than 0.907; this threshold is chosen because at this threshold the 10-fold CV test in training can achieve a precision of 0.7. In this way, we detected in total 23,456 single-base m6A sites.

Hot-m6A for identification of m6A-regulated genes

We define m6A-regulated genes as genes whose expression level is influenced by its m6A methylation level more than expected by chance. We propose a new network based algorithm, Hot-m6A, to identify m6A-regulated genes. Hot-m6A first determines that genes whose expression levels are significantly influenced by m6A by assessing the correlation between the methylation level and the corresponding expression. If the expression levels of a gene change together with its m6A levels across more samples, then there will be a higher chance that the gene expression is influenced by m6A. To obtain a statistically meaningful correlation, we need to have the same m6A sites appearing in many samples. In this study, we estimated the sample size needed to detect a correlation of 0.8 with a significance level of α = 0.05. Using a two-sided test based on the Fisher’s z transformation, at α = 0.05 with power 90%, the required sample size is approximately 12 (i.e., n = 12) [48]. Thus, we select single-base m6A sites that appear in at least 12 samples and calculated the Pearson’s correlations of the methylation degree and gene expression level across all occurred samples. The methylation degree was calculated as: where RC is defined in (1), G is the FPKM of gene harbored the corresponding m6A site, calculated from the MeRIP-seq input sample by Cufflinks [84]. The expression level is denoted by the ln scale gene FPKM. The correlation coefficient is then transformed into a z score using a revised Fisher’s z-transformation (which is known as Hotelling's (1953) second-order transformations) [49] that considers the influence of sample size n, which is expressed as where r is the correlation, n is the sample size, and z is Fisher’s transformation. We used z* of each gene as its correlation for the subsequent analysis in the pipeline. Genes containing this kind of m6A sites are treated as candidate m6A-regulated genes. For a candidate gene that has multiple m6A sites, the z* score with the largest absolute value among all its harbored m6A sites is selected as the correlation degree for that gene and the absolute z* score are used to denote the degree of regulation. To further select m6A regulated genes more than expected by chance, Hot-m6A adopts the HotNet2 algorithm [85], which was originally developed to identify significant mutation genes for cancer. HotNet2 takes the gene mutation frequency or mutation score as input heat vector and applies a heat diffusion method to identify subnetworks of a genome-scale interaction network that is mutated more than expected by chance. The advantage of HotNet2 is that it not only detects significant mutated genes with high mutation frequency but also can identify significant mutation genes with relatively low mutation frequency but interact closely with other significant genes. In our case, we also want to identify significant m6A-regulated genes that not only have high m6A-expression correlations but also cooperate with each other functionally in a functional network. To this end, Hot-m6A takes the absolute m6A-expression correlation coefficients z* scores of candidate m6A regulated genes as the input heat vector and uses the four PPI networks as the reference functional network, which includes BioGRID, which has been shown to contain the significant interactions between m6A methylated genes [43], and the other three PPI networks that were used in the HotNet2 paper. Hot-m6A has two parameters β and δ; β is the fraction of the heat that a node in the network retains for itself at each time step and δ is the threshold, which determines whether there is an edge between 2 nodes in the final subnetwork. β can influence the amount of heat that a gene shares with its neighbors and is determined by the topology of the PPI network. For the three networks used in the HotNet2 paper, we selected β as 0.4 for HINT+HI2012, 0.45 for iRefIndex, and 0.5 for Multinet, which are reported values in [46]. For the BioGRID network, we performed the same analysis as the HotNet2 paper did to determine the value of β. For each different β from {0.05, 0.1, 0.15, …, 0.95}, we analyzed the inflection point at which the heat kept in the direct interacting neighbors of a gene drops and selected β as the one with the largest inflection point. As shown in S6 Fig, β is selected as 0.5 for the BioGRID PPI network. On the other hand, δ influences the scale of the subnetwork that we generated. To automatically determine it, for each PPI network, we firstly generated 100 random PPI networks [86]. We used the same “heat” vector at each run and select δ as the minimum one where all strongly connected subnetwork components identified by Hot-m6A have a size less than and equal to a threshold L. For each L = 5, 10, 15, 20, we reported the median of the 100 δs for the 100 random networks. For each run, we selected the smallest δ with the most significant (P < 0.05) subnetwork sizes k as described in HotNet2 [46]. The P-value, which denotes the significance of subnetwork size k is computed for the statistic X, the number of subnetworks of size ≥ k reported by Hot-m6A. To compute an empirical distribution of X for computing the P-value, we permuted the heat scores, i.e., the z transformation of correlation, among the genes in the original PPI network for 1000 times and then applied HotNet2 to the network using the permutated heat scores. In the end, δ was selected as 0.00769 for BioGRID, 0.0148 for HINT+HI2012, 0.00998 for iRefIndex, and 0.00777 for Multinet. We applied Hot-m6A to each of the four PPI networks and pooled all the genes and edges reported in at least one network to form a candidate m6A regulated gene network, G. We then assigned the number of the PPI networks in which an edge exists as the weight of this edge. Next, we initialized the consensus subnetworks C as the connected components of G with edges of the weight = 4, i.e., the edges that are reported in all 4 PPI networks. Then we extended each consensus subnetwork S ∈ C by adding edges of weight = 3, and afterward further extended them by adding edges of weight = 2 and then 1. Finally, we defined the genes in the final extended networks as the m6A regulated genes and the genes in the initial consensus subnetworks as the consensus m6A regulated genes. To further detect the functional modules of m6A regulated genes, we removed edges with weight = 1 in the m6A regulated subnetworks to generate functional significant subnetworks.

Prioritize m6A-associated diseases

To further reveal the association between m6A and disease, we adopted the RWRH (random walk with restart on heterogeneous network) [47] algorithm to prioritize the candidate m6A regulated diseases. RWRH is a well-known heterogeneous network-based algorithm to infer the gene-phenotype relationship. The heterogeneous network contains three parts: gene-gene interaction network, phenotype (disease) network, and gene-phenotype relationship. Similarly, we also built four heterogeneous networks based on each of the four PPI networks. The disease network was generated from the semantic similarities between two DO terms of Disease Ontology database and gene-phenotype relationships were extracted from OMIM. Specifically, the OMIM phenotype ID was mapped to DO disease ID to match the disease network with the gene-phenotype relationship. RWRH performs a random walk with restart from certain gene and disease seed nodes in the heterogeneous network. After several steps of random walk, the probability of each node, that the seed genes and seed diseases will walk to, become steady and is returned as vectors and . denotes the probability that the seed genes and seed diseases will walk to gene i and is the probability that the seed genes and seed diseases will walk to disease j. In another world, RWRH can prioritize how closely a gene or a disease is correlated with the seed genes and seed diseases. To study the correlation between m6A and diseases, we first mapped all the 709 significant m6A-regulated genes obtained from HotNet2 to OMIM gene-phenotype relationship, which resulted in 308 phenotypes regulated by 177 m6A genes. Most of these phenotypes contain only one disease gene (The details of the m6A gene-disease relationships are included in S1 File). Next, to obtain candidate m6A-associated diseases, we mapped these 308 m6A gene-correlated OMIM phenotypes to DO ID, which resulted in 90 m6A gene-associated DO diseases. For each of the four heterogeneous networks, these 90 diseases were taken as disease seeds for RWRH, whereas all the 709 significant m6A-regulated genes were mapped to the PPI network to serve as gene seeds for RWRH. Finally, we applied RWRH using these genes and disease as seed nodes to prioritize all the DO diseases in the heterogeneous networks. Fig 5 illustrates the heterogeneous network in this study. As shown, there is a chance that d3 may be ranked as significant m6A-associated diseases even though there is no m6A-regulated gene directly connected with it. The reason is that the genes (i.e., g3, g7) that closely interacted with m6A regulated genes (i.e., g5) are connected to d3 and m6A regulated genes (i.e., g5) connected with disease (i.e., d2, d4) also closely interacted with d3. RWRH can efficiently capture this significant network topology and prioritized d3 as significant, which showed the power of this study to prioritize the potential m6A-associated disease with no prior knowledge that correlated with m6A-regulated genes.
Fig 5

Gene-disease heterogeneous network.

The top network is gene-gene interaction network, the bottom network is disease-disease similarity network and they are connected by gene-disease relationship (dashed grey lines). The orange gene nodes denoted the m6A regulated genes and the green disease nodes are m6A regulated genes correlated diseases.

Gene-disease heterogeneous network.

The top network is gene-gene interaction network, the bottom network is disease-disease similarity network and they are connected by gene-disease relationship (dashed grey lines). The orange gene nodes denoted the m6A regulated genes and the green disease nodes are m6A regulated genes correlated diseases. The top 10 ranked diseases based on the RWRH output probability were selected as candidate m6A-associated diseases. To ensure these top diseases were really influenced by m6A-regulated genes rather than by chance, we implemented a random test to calculate an empirical p-value to assess if the disease is randomly selected. For this test, 100 random PPI networks that only keep the degree distribution of original network were generated using the method in [86]. Then, for each random PPI network, we connected it with the disease network using the disease-gene relationship and deleted the relationship between m6A-regulated genes and their corresponding diseases to generate a corresponding random heterogeneous network whose gene interaction relationship is random and contains no prior knowledge of the relationship between m6A regulated genes and any disease. After that, we applied RWRH in each of the 100 random heterogeneous networks. Finally, for each candidate m6A-associated disease d (j = 1, 2, ⋯, 10), the empirical p-value was calculated as: , where π(d) is a random heterogeneous network in which d is found as the top 10 candidate disease after RWRH. The empirical p-value is calculated as the probability that a candidate m6A-associated disease is selected by random. The candidate m6A-associated diseases that with p < 0.05 were selected as significant candidate m6A-associated diseases. The significant candidate m6A-associated diseases that reported by all the four heterogeneous networks are determined as significant m6A-associated diseases.

Consensus m6A-regulated subnetworks and m6A-regulated subnetworks.

The color of the node denotes the heat i.e. expression-methylation correlation degree it has, red denotes higher and blue denotes lower. The solid orange edge is edge that identified in all 4 PPI networks, the dashed edge is edge that identified in at least 3 networks, the dash dot edge is in 2 networks and the dotted is 1. (TIF) Click here for additional data file.

Functional significant m6A-regulated subnetworks.

(A)-(I) denotes the 9 functional significant subnetworks of m6A-regulated genes. The color of the node denotes the heat i.e. expression-methylation correlation degree it has, red denotes higher and blue denotes lower. The solid edge is edge that identified in all 4 PPI networks, the dashed edge is edge that identified in at least 3 networks and the dotted edge is in 2 networks. (TIF) Click here for additional data file.

Significantly enriched GO BP terms for each of the 9 largest significant subnetworks.

There is no significant BP for subnetwork (B). Gene count means number of genes involved in the corresponding BP terms and P is the adjusted FDR of enriched p-value. All the involved BP terms have a P < 0.05 and we only list the top 5 if the enriched terms are more than 5. (TIF) Click here for additional data file.

Significantly enriched KEGG pathways for the 9 largest significant subnetworks.

There is no significant KEGG pathways for network (D). Gene count means number of genes involved in the corresponding pathways and P is the adjusted FDR of enriched p-value. All the involved pathways have a P < 0.05. (TIF) Click here for additional data file.

Parts of the enriched KEGG pathways of the 9 significant subnetworks.

(A) is WNT signaling pathway. (B) is mTOR signaling pathway. (C) is Focal adhesion. (D) is Notch signaling pathway. The genes marked with red star are m6A regulated genes in the subnetworks. As is shown, m6A regulated genes tend to be in the upstream of these enriched pathways. (TIF) Click here for additional data file.

Selection of β of the BioGRID network.

Figures (A)-(G) represent the distributions of number of nodes with influence larger than a cutoff for β from 0.6 to 0.3 for gene TP53, which has a high betweenness centrality. The x-axis of each distribution represents θ, a cutoff of influence. The y-axis represents the number of nodes in the interaction network with influence larger than θ. Red dotted circles denote the number of all nodes with an influence larger than different θs, green dotted circles denote that of the level-one nodes, and blue dotted circles denote that of the level-two nodes. The red vertical lines in all the distributions represent the location of the inflection point in level one for the β we chose for different interaction networks, β = 0.5 for BioGRID. (H) depicts the inflection point θ as a function of β ranging from 0.05 to 0.95. (TIF) Click here for additional data file.

OMIM diseases related to m6A-regulated genes.

(XLSX) Click here for additional data file.
  72 in total

1.  pRNAm-PC: Predicting N(6)-methyladenosine sites in RNA sequences via physical-chemical properties.

Authors:  Zi Liu; Xuan Xiao; Dong-Jun Yu; Jianhua Jia; Wang-Ren Qiu; Kuo-Chen Chou
Journal:  Anal Biochem       Date:  2015-12-31       Impact factor: 3.365

2.  N(6)-methyladenosine Modulates Messenger RNA Translation Efficiency.

Authors:  Xiao Wang; Boxuan Simen Zhao; Ian A Roundtree; Zhike Lu; Dali Han; Honghui Ma; Xiaocheng Weng; Kai Chen; Hailing Shi; Chuan He
Journal:  Cell       Date:  2015-06-04       Impact factor: 41.582

Review 3.  RNA epigenetics.

Authors:  Nian Liu; Tao Pan
Journal:  Transl Res       Date:  2014-04-08       Impact factor: 7.012

4.  FTO Plays an Oncogenic Role in Acute Myeloid Leukemia as a N6-Methyladenosine RNA Demethylase.

Authors:  Zejuan Li; Hengyou Weng; Rui Su; Xiaocheng Weng; Zhixiang Zuo; Chenying Li; Huilin Huang; Sigrid Nachtergaele; Lei Dong; Chao Hu; Xi Qin; Lichun Tang; Yungui Wang; Gia-Ming Hong; Hao Huang; Xiao Wang; Ping Chen; Sandeep Gurbuxani; Stephen Arnovitz; Yuanyuan Li; Shenglai Li; Jennifer Strong; Mary Beth Neilly; Richard A Larson; Xi Jiang; Pumin Zhang; Jie Jin; Chuan He; Jianjun Chen
Journal:  Cancer Cell       Date:  2016-12-22       Impact factor: 31.743

5.  Predicting RNA-protein binding sites and motifs through combining local and global deep convolutional neural networks.

Authors:  Xiaoyong Pan; Hong-Bin Shen
Journal:  Bioinformatics       Date:  2018-10-15       Impact factor: 6.937

6.  Comprehensive analysis of mRNA methylation reveals enrichment in 3' UTRs and near stop codons.

Authors:  Kate D Meyer; Yogesh Saletore; Paul Zumbo; Olivier Elemento; Christopher E Mason; Samie R Jaffrey
Journal:  Cell       Date:  2012-05-17       Impact factor: 41.582

7.  Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes.

Authors:  Mark D M Leiserson; Fabio Vandin; Hsin-Ta Wu; Jason R Dobson; Jonathan V Eldridge; Jacob L Thomas; Alexandra Papoutsaki; Younhun Kim; Beifang Niu; Michael McLellan; Michael S Lawrence; Abel Gonzalez-Perez; David Tamborero; Yuwei Cheng; Gregory A Ryslik; Nuria Lopez-Bigas; Gad Getz; Li Ding; Benjamin J Raphael
Journal:  Nat Genet       Date:  2014-12-15       Impact factor: 38.330

8.  Single-nucleotide-resolution mapping of m6A and m6Am throughout the transcriptome.

Authors:  Bastian Linder; Anya V Grozhik; Anthony O Olarerin-George; Cem Meydan; Christopher E Mason; Samie R Jaffrey
Journal:  Nat Methods       Date:  2015-06-29       Impact factor: 28.547

9.  RNAMethPre: A Web Server for the Prediction and Query of mRNA m6A Sites.

Authors:  Shunian Xiang; Ke Liu; Zhangming Yan; Yaou Zhang; Zhirong Sun
Journal:  PLoS One       Date:  2016-10-10       Impact factor: 3.240

Review 10.  Hypoxia-inducible factors, stem cells, and cancer.

Authors:  Brian Keith; M Celeste Simon
Journal:  Cell       Date:  2007-05-04       Impact factor: 41.582

View more
  9 in total

1.  m6A-express: uncovering complex and condition-specific m6A regulation of gene expression.

Authors:  Teng Zhang; Shao-Wu Zhang; Song-Yao Zhang; Shou-Jiang Gao; Yidong Chen; Yufei Huang
Journal:  Nucleic Acids Res       Date:  2021-11-18       Impact factor: 16.971

2.  WITMSG: Large-scale Prediction of Human Intronic m6A RNA Methylation Sites from Sequence and Genomic Features.

Authors:  Lian Liu; Xiujuan Lei; Jia Meng; Zhen Wei
Journal:  Curr Genomics       Date:  2020-01       Impact factor: 2.236

3.  HLMethy: a machine learning-based model to identify the hidden labels of m6A candidates.

Authors:  Ze Liu; Wei Dong; WenJie Luo; Wei Jiang; QuanWu Li; ZiLi He
Journal:  Plant Mol Biol       Date:  2019-11-13       Impact factor: 4.076

4.  FunDMDeep-m6A: identification and prioritization of functional differential m6A methylation genes.

Authors:  Song-Yao Zhang; Shao-Wu Zhang; Xiao-Nan Fan; Teng Zhang; Jia Meng; Yufei Huang
Journal:  Bioinformatics       Date:  2019-07-15       Impact factor: 6.937

5.  The m6A-Related mRNA Signature Predicts the Prognosis of Pancreatic Cancer Patients.

Authors:  Zibo Meng; Qingchen Yuan; Jingyuan Zhao; Bo Wang; Shoukang Li; Rienk Offringa; Xin Jin; Heshui Wu
Journal:  Mol Ther Oncolytics       Date:  2020-04-29       Impact factor: 7.200

6.  Screening and Identifying m6A Regulators as an Independent Prognostic Biomarker in Pancreatic Cancer Based on The Cancer Genome Atlas Database.

Authors:  Bi Lin; Yangyang Pan; Dinglai Yu; Shengjie Dai; Hongwei Sun; Shengchuan Chen; Jie Zhang; Yukai Xiang; Chaohao Huang
Journal:  Biomed Res Int       Date:  2021-05-15       Impact factor: 3.411

7.  Transcriptome Profiling of m6A mRNA Modification in Bovine Mammary Epithelial Cells Treated with Escherichia coli.

Authors:  Ting Li; Changjie Lin; Yifan Zhu; Haojun Xu; Yiya Yin; Chaohao Wang; Xin Tang; Tongxing Song; Aizhen Guo; Yingyu Chen; Changmin Hu
Journal:  Int J Mol Sci       Date:  2021-06-10       Impact factor: 5.923

8.  Prediction of RNA Methylation Status From Gene Expression Data Using Classification and Regression Methods.

Authors:  Hao Xue; Zhen Wei; Kunqi Chen; Yujiao Tang; Xiangyu Wu; Jionglong Su; Jia Meng
Journal:  Evol Bioinform Online       Date:  2020-07-20       Impact factor: 1.625

9.  m6A demethylase ALKBH5 inhibits pancreatic cancer tumorigenesis by decreasing WIF-1 RNA methylation and mediating Wnt signaling.

Authors:  Bo Tang; Yihua Yang; Min Kang; Yunshan Wang; Yan Wang; Yin Bi; Songqing He; Fumio Shimamoto
Journal:  Mol Cancer       Date:  2020-01-06       Impact factor: 27.401

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.