Literature DB >> 30781586

CVm6A: A Visualization and Exploration Database for m⁶As in Cell Lines.

Yujing Han1,2,3, Jing Feng4, Linjian Xia5,6,7, Xin Dong8, Xinyang Zhang9, Shihan Zhang10, Yuqi Miao11, Qidi Xu12, Shan Xiao13, Zhixiang Zuo14, Laixin Xia15, Chunjiang He16,17,18.   

Abstract

N6-methyladenosine (m⁶A) has been identified in various biological processes and plays important regulatory functions in diverse cells. However, there is still no visualization database for exploring global m⁶A patterns across cell lines. Here we collected all available MeRIP-Seq and m⁶A-CLIP-Seq datasets from public databases and identified 340,950 and 179,201 m⁶A peaks dependent on 23 human and eight mouse cell lines respectively. Those m⁶A peaks were further classified into mRNA and lncRNA groups. To better understand the potential function of m⁶A, we then mapped m⁶A peaks in different subcellular components and gene regions. Among those human m⁶A modification, 190,050 and 150,900 peaks were identified in cancer and non-cancer cells, respectively. Finally, all results were integrated and imported into a visualized cell-dependent m⁶A database CVm6A. We believe the specificity of CVm6A could significantly contribute to the research for the function and regulation of cell-dependent m⁶A modification in disease and development.

Entities:  

Keywords:  N6-methyladenosine; cell line; m6A; visualization

Mesh:

Substances:

Year:  2019        PMID: 30781586      PMCID: PMC6406471          DOI: 10.3390/cells8020168

Source DB:  PubMed          Journal:  Cells        ISSN: 2073-4409            Impact factor:   6.600


1. Introduction

As one important post-transcriptional modification, N6-methyladenosine (m6A) was largely discovered by high throughput sequencing in recent years [1,2,3]. m6A was identified with consensus sequence surrounding m6A site RRACH (R=G or A, H=A, C or U) and conserved in human, mouse, chimpanzee and even in plants [1,4,5]. m6A was also found to exist in bacterial and archaeal species [6]. The abundance of m6A is reported as being correlated with evolutionarily conserved region of genome [2]. m6A modification was a reversible status mediated by methyltransferases METTL3/ METTL14/ WTAP complex [7], demethylases FTO/ALKBH5 [2,8] and recognized by m6A binding proteins YTH (YT521-B homology) domain family/HNRNPA2B1 [9,10], which were called writer, eraser and reader, respectively. m6A can regulate the multiple biological functions in spatial and temporal [11]. m6A methyltransferase complex controls the neuronal functions and fine-tuning sex determination in Drosophila [12]. m6A also acts as a regulator at molecular switches in murine naive pluripotency for proper lineage priming and differentiation [13]. The existence of m6A in lncRNA XIST mediated the gene silencing on X chromosome. Knockdown of m6A methytransferase METTL3 can impair XIST-mediated gene silencing [14]. m6A RNA can recruit DNA polymerase k (Pol k) to facilitate repairing of ultraviolet-induced DNA damage [15]. Furthermore, m6A could alter RNA structure to affect RNA-protein interactions in cells [16]. The m6A-driven gene network was already constructed and the dynamic interactions between m6A related methyltransferases and demthylases were established [17]. The deficiency of m6A modification led to various diseases, such as obesity, cancer, type 2 diabetes mellitus, infertility and developmental arrest, etc. [18]. In previous researches, m6A was discovered mainly located near stop codons, large internal exons and 3’UTR (3’-Untranslated region), as well as in CDS (Coding sequence), transcriptional start sites and intron [1,2,19]. Dynamic m6A modification could affect translation status and lifetime of mRNA in Hela [20]. Several lncRNAs also hosted m6A modification [1,2] and long intergenic noncoding RNAs (lincRNAs) established significantly higher level than mRNAs in B-cell lymphoblastoid cell line GM12878 [21]. In CD4T, m6A modification happened on HIV-1 RNA could regulate viral infection [22]. Though m6A patterns were analyzed in different cells independently, the global patterns across those cells were not well summarized. Several databases collected and detected m6A from public datasets, such as RMBase [23] and MeT-DB [24]. However, RMBase and MeT-DB were not focused on cell-dependent m6A. For examples, MeT-DB only included m6A datasets from a portion of wild type cell lines, and RMBase included m6A sites from various samples without indicating the cell sources. To better understand the function of m6A in cellular biological processes, a more specific database is needed for exploring and comparing the distribution and patterns of m6A in different cell lines. Here, using latest public datasets, we collected MeRIP-Seq and m6A-CLIP-Seq datasets from 23 human cell lines and eight mouse cell lines from previous work, and inspected the global patterns of m6As across those cell lines, including the distribution and abundance of m6A modification in lncRNA or mRNA, different subcellular location and gene regions. The m6A patterns from cancer or non-cancer cell lines were also classified. Moreover, validated m6A sites from previous experiments were also collected and summarized. All results were imported into a cell-dependent m6A database CVm6A (http://gb.whu.edu.cn:8080/CVm6A) providing a visualization interface for searching and comparing the m6A patterns in different cell lines, which could contribute to the function and regulation research of m6A in disease and development.

2. Data Collection and Database Content

2.1. Cell Line Samples in CVm6A

Previous studies showed that MeRIP-Seq (Methylated RNA Immunoprecipitation sequencing) [20], miCLIP-Seq (m6A individual-nucleotide-resolution cross-linking and immunoprecipitation sequencing) [25] and PA-m6A-Seq (Photo-crosslinking-assisted m6A-seq) [26] could be used for detecting m6A modification in transcriptomic level. Therefore, we collected all available MeRIP-Seq, miCLIP-Seq and PA-m6A-Seq datasets with total RNA or PolyA enriched library construction from NCBI GEO database (http://www.ncbi.nlm.nih.gov/GEO). In total, 47 samples from 23 human cell lines and 22 samples from 8 mouse cell lines were collected (Table S1).

2.2. Identification of Cell m6A Peaks

For MeRIP-Seq datasets, both reads from IP (Immunoprecipitation) and Input samples were mapped to human (hg38 version) and mouse (mm10 version) genome separately via Hisat2 [27]. Mapped reads with MAPQ <30 were filtered by samtools [28], and removed PCR duplicates using Picard (http://broadinstitute.github.io/picard, v2.16.0). Then m6A peaks were called and enrichment score of each peak was calculated by MeTPeak [29]. m6A sites from miCLIP-Seq and PA-m6A-Seq were collected from previous works [19,25,26]. Gene annotation of GENCODE (GRCh38 release 28 and GRCm38 release M20) including 35,048 human genes and 31,237 mouse genes were used to annotate m6A sites or peaks. Detailed pipeline was included in Supplemental Method. In all cell lines, total 340,950 m6A peaks from 16,950 human genes and 179,201 m6A peaks from 14,360 mouse genes were identified. In human cell lines, we retrieved 6345 (H1299) ~ 23,052 (A549) m6A peaks, and 2562 (H1299) ~ 6838 (GSC-11) genes with m6A modification (Figure 1A). In mouse cell lines, 6833 (3T3-L1) ~ 20,892 (iPSC) m6A peaks and 2882 (SC) ~ 7125 (NSC) genes with m6A modification were identified (Figure 1B).
Figure 1

Statistics of m6A patterns in CVm6A. (A) Number of m6A peaks and genes in human cell lines. (B) Number of m6A peaks and genes in mouse cell lines. (C) Number of m6A peaks distributed in lncRNA or mRNA in human cell lines. (D) Number of m6A peaks distributed in lncRNA or mRNA in mouse cell lines. (E) Number of m6A peaks distributed in 12 subcellular components in human cell lines. (F) Number of m6A peaks distributed in 12 subcellular components in mouse cell lines. (G) Number of m6A peaks distributed in 6 gene regions: TSS (Transcription start site), 5’UTR, CDS, Stop codon, 3’UTR and intron in human cell lines. (H) Number of m6A peaks distributed in 6 gene regions: TSS, 5’UTR, CDS, Stop codon, 3’UTR and intron in mouse cell lines.

2.3. Prediction of m6A lncRNA and mRNA

All m6A peaks were mapped to mRNA and lncRNA using GENCODE gene annotation (GRCh38 release 28 and GRCm38 release M20) [30] via Bedtools [31]. To view the similarities and differences of m6A modification in lncRNA and mRNA, those all m6A genes were separated into lncRNA or mRNA groups. In human cell lines, there were 225 (HEK293) ~ 2627 (LCL) peaks from lncRNA, while 6044 (H1299) ~ 22,630 (A549) peaks were from mRNA (Figure 1C). We also checked the enrichment score of those m6A peaks from lncRNA and mRNA. The average enrichment scores of lncRNA were from 2.37 (U2OS) ~ 9.55 (ESC), while in mRNAs, the scores were from 2.36 (U2OS) ~ 11.15 (iPSC) (Figure S1A). In mouse cell lines, there were 101 (3T3-L1) ~ 1,243 (iPSC) peaks from lncRNA, while 6732 (3T3-L1) ~ 19,649 (iPSC) peaks were from mRNA (Figure 1D). The average enrichment scores of lncRNA and mRNA from mouse cell lines were also established (Figure S1B).

2.4. Prediction of Subcellular Location

To view the location of m6A in subcellular component, we acquired the subcellular location information of genes from public database Hum-mPloc 3.0 [32] and Euk-mPLoc [33] and classified m6A peaks into subcellular components according to their annotated genes. Total 309,137 peaks were located into different components in all cell lines. Across human cell lines, 2439 (HEK293) ~ 8866 (A549) of peaks were located in Cytoplasm, 2772 (H1299) ~ 9195 (A549) of peaks located in Nucleus and 447 (H1299) ~ 2878 (A549) of m6A peaks located in Plasma membrane. Other peaks were distributed in other subcellular location: Centrosome, Cytoskeleton, Endoplasmic reticulum, Endosome, Golgi apparatus, Lysosome, Mitochondrion, Peroxisome, and Extracellular (Figure 1E). The average enrichment scores of those peaks from different components were also calculated (Figure S1C). Similar distributions were observed in mouse cell lines (Figure 1F and Figure S1D).

2.5. Prediction of Gene Regions

Previous works revealed m6A modification were not uniform on different gene regions [1,2,20]. In CVm6A, we separated all annotated genes into 6 regions: TSS, 5′UTR, CDS, Stop codon, 3′UTR and Intron and located m6A peaks in these regions. The middle site of each peak from MeRIP-Seq or the precise m6A site from m6A-CLIP-Seq was used for the localization. For TSS and stop codon, a 200 bp window (±100 bp surrounding the coordinates) were allowed to locate the m6A site according to previous work [1,2]. Across human cell lines, 2668 (H1299) ~ 11,209 (A549) of m6A peaks were distributed in CDS, while 1517 (H1299) ~ 10,706 (A549) were located in 3′UTR and 710 (CD8T) ~ 4608 (A549) were distributed in Stop codon (Figure 1G). We also checked the average enrichment scores distributed in those regions. In all cell lines, stop codon had average scores from 2.34 (U2OS) to 12.5 (iPSC), while the average scores in CDS were 2.41 (U2OS) ~ 11.7 (iPSC) and the average scores in 3′UTR were 2.32 (U2OS) ~ 11.6 (iPSC) (Figure S1E). The m6A distribution of gene regions in mouse cell lines was also established (Figure 1H and Figure S1F).

2.6. Classification of Cancer and Non-Cancer m6A

To view the association of m6A and diseases, all human m6A peaks were classified into cancer and non-cancer groups. Overall, 190,050 m6A peaks from 14,628 genes were identified in 12 cancer cell lines and 150,900 peaks from 14,346 genes were identified in 11 non-cancer cell lines (Figure S1G).

2.7. Validated m6A Sites

Previous research had validated several m6A modification in cell lines. To enhance the usability of CVm6A, we collected those m6A sites from m6A-RIP or m6A-CLIP experiment from previous literatures. Totally, CVm6A contains validated m6A modification in 96 genes, which were identified in 25 cell lines from human, mouse, zebrafish and Drosophila (Table S2).

3. Database Organization and Web Interface

All the analyzed results, including peak region, gene type, gene region, subcellular location, conditions and library types associated with m6A peaks were integrated into a set of interactive MySQL tables. Laravel–an open-source web framework based in PHP (https://laravel.com) and JavaScript library were used to construct the CVm6A database. The web interface of CVm6A is summarized in Figure 2.
Figure 2

Overview of CVm6A. (A) Browse page. All m6A peaks in this page can be filtered by gene symbol, gene type, gene region, subcellular location, cell line, condition and library type. The peak region and gene symbol are linked to the visualization page. (B) Visualization page. All m6A peaks in a selected gene are displayed with dark yellow color. Peaks in selected cell lines are displayed with brown color. Annotated gene and transcripts structure are displayed in blue box (exon) and gray line (intron). Coordinates of annotated transcripts and table with detailed information corresponding to these peaks are displayed below. (C) Search page. Users can search all m6A peaks in a special gene or cell line. The Search function by genomic coordinates supports fuzzy search of all peaks surrounding the input genomic region. Batch search by gene symbol is also provided.

3.1. Browse Page

On this page, users can browse all m6A peaks from 23 human and eight mouse cell lines. All information, including the peak region, strand, enrichment score, gene symbol, gene type (mRNA/lncRNA), gene region (CDS, 3’UTR, et al.), subcellular location (plasma_membrane, nucleus, et al.), cell line, condition (cancer/non-cancer) and library type for each peak are displayed in the table (Figure 2A). Peak region and gene symbol are linked to the visualization page of m6A peaks located in this gene (Figure 2B).

3.2. Visualization Page

On this page, users can view all m6A peaks distributed in selected annotated gene. All peaks in current gene are displayed with dark yellow color. In the top left corner, user can also select special cell line in the search box. All peaks in the selected cell line are displayed with brown color both in the figure and table. The summit of each peak is determined according to the enrichment score. The gene structure with exon (blue box) and intron (gray line) is displayed below peak figures. If the selected gene has more than one transcript, all transcripts are displayed. The relative location of each peak and the annotated gene are placed according to genomic coordinates. Mouse hover over on or click each peak can display the peak region, enrichment score, gene region, subcellular location, cell line, condition and library type of this peak. The table below the figures includes the coordinates of annotated transcripts and detailed information of all peaks in the selected gene (Figure 2B).

3.3. Search Page

On this page, users can search m6A peaks by gene symbol, cell line or genomic coordinates. While users select gene symbol or cell line, all peaks in this gene or cell line will be displayed in tables below and can be exported into files. Batch search by gene symbol is also provided. The search function by genomic coordinates supports fuzzy search. All peaks surrounding the input coordinates will be displayed (Figure 2C).

4. Summary and Future Directions

CVm6A collects available MeRIP-Seq and m6A-CLIP-Seq datasets in human and mouse cell lines, and provides a visualized m6A database to benefit functional studies of m6A in cell lines. Those samples include the most frequently used cell lines in previous researches. For example, users working on stem cells could explore the m6A modification in ESC, MSC, NPC and iPSC and can compare the distribution of m6A peaks in nucleus and other subcellular components, as well as 3′UTR and other gene regions. Users working on the immune cell lines could inspect the distribution in CD4T, CD8T and LCL. Moreover, more than ten cancer cell lines are included in CVm6A, which allow researchers to study the potential function of m6A in cancers. CVm6A also predicts the enrichment score of each peak, which allow users to check the abundance of m6A. Due to the limited m6A-Seq datasets for total RNA, only two total RNA datasets are included in the current version, which cannot thoroughly establish the distribution of m6A on lncRNA and other non-polyA RNAs. We will update CVm6A when more sequencing data from other library types becomes available.
  33 in total

1.  Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms.

Authors:  Kuo-Chen Chou; Hong-Bin Shen
Journal:  Nat Protoc       Date:  2008       Impact factor: 13.491

2.  Cytoplasmic m6A reader YTHDF3 promotes mRNA translation.

Authors:  Ang Li; Yu-Sheng Chen; Xiao-Li Ping; Xin Yang; Wen Xiao; Ying Yang; Hui-Ying Sun; Qin Zhu; Poonam Baidya; Xing Wang; Devi Prasad Bhattarai; Yong-Liang Zhao; Bao-Fa Sun; Yun-Gui Yang
Journal:  Cell Res       Date:  2017-01-20       Impact factor: 25.617

3.  HNRNPA2B1 Is a Mediator of m(6)A-Dependent Nuclear RNA Processing Events.

Authors:  Claudio R Alarcón; Hani Goodarzi; Hyeseung Lee; Xuhang Liu; Saeed Tavazoie; Sohail F Tavazoie
Journal:  Cell       Date:  2015-08-27       Impact factor: 41.582

4.  m(6)A-LAIC-seq reveals the census and complexity of the m(6)A epitranscriptome.

Authors:  Benoit Molinie; Jinkai Wang; Kok Seong Lim; Roman Hillebrand; Zhi-Xiang Lu; Nicholas Van Wittenberghe; Benjamin D Howard; Kaveh Daneshvar; Alan C Mullen; Peter Dedon; Yi Xing; Cosmas C Giallourakis
Journal:  Nat Methods       Date:  2016-07-04       Impact factor: 28.547

5.  Single-nucleotide-resolution mapping of m6A and m6Am throughout the transcriptome.

Authors:  Bastian Linder; Anya V Grozhik; Anthony O Olarerin-George; Cem Meydan; Christopher E Mason; Samie R Jaffrey
Journal:  Nat Methods       Date:  2015-06-29       Impact factor: 28.547

6.  m6A-Driver: Identifying Context-Specific mRNA m6A Methylation-Driven Gene Interaction Networks.

Authors:  Song-Yao Zhang; Shao-Wu Zhang; Lian Liu; Jia Meng; Yufei Huang
Journal:  PLoS Comput Biol       Date:  2016-12-27       Impact factor: 4.475

7.  RNA m6A methylation regulates the ultraviolet-induced DNA damage response.

Authors:  Yang Xiang; Benoit Laurent; Chih-Hung Hsu; Sigrid Nachtergaele; Zhike Lu; Wanqiang Sheng; Chuanyun Xu; Hao Chen; Jian Ouyang; Siqing Wang; Dominic Ling; Pang-Hung Hsu; Lee Zou; Ashwini Jambhekar; Chuan He; Yang Shi
Journal:  Nature       Date:  2017-03-15       Impact factor: 49.962

8.  N6-methyladenosine alters RNA structure to regulate binding of a low-complexity protein.

Authors:  Nian Liu; Katherine I Zhou; Marc Parisien; Qing Dai; Luda Diatchenko; Tao Pan
Journal:  Nucleic Acids Res       Date:  2017-06-02       Impact factor: 16.971

9.  Evolution of transcript modification by N6-methyladenosine in primates.

Authors:  Lijia Ma; Boxuan Zhao; Kai Chen; Amber Thomas; Jigyasa H Tuteja; Xin He; Chuan He; Kevin P White
Journal:  Genome Res       Date:  2017-01-04       Impact factor: 9.043

10.  N(6)-methyladenosine of HIV-1 RNA regulates viral infection and HIV-1 Gag protein expression.

Authors:  Nagaraja Tirumuru; Boxuan Simen Zhao; Wuxun Lu; Zhike Lu; Chuan He; Li Wu
Journal:  Elife       Date:  2016-07-02       Impact factor: 8.140

View more
  12 in total

1.  Joint analysis of lncRNA m6A methylome and lncRNA/mRNA expression profiles in gastric cancer.

Authors:  Zhi Lv; Liping Sun; Qian Xu; Chengzhong Xing; Yuan Yuan
Journal:  Cancer Cell Int       Date:  2020-09-25       Impact factor: 5.722

2.  RMDisease: a database of genetic variants that affect RNA modifications, with implications for epitranscriptome pathogenesis.

Authors:  Kunqi Chen; Bowen Song; Yujiao Tang; Zhen Wei; Qingru Xu; Jionglong Su; João Pedro de Magalhães; Daniel J Rigden; Jia Meng
Journal:  Nucleic Acids Res       Date:  2021-01-08       Impact factor: 16.971

3.  REPIC: a database for exploring the N6-methyladenosine methylome.

Authors:  Shun Liu; Allen Zhu; Chuan He; Mengjie Chen
Journal:  Genome Biol       Date:  2020-04-28       Impact factor: 13.583

Review 4.  N6-Methyladenosine, DNA Repair, and Genome Stability.

Authors:  Fei Qu; Pawlos S Tsegay; Yuan Liu
Journal:  Front Mol Biosci       Date:  2021-04-09

5.  m6A-Atlas: a comprehensive knowledgebase for unraveling the N6-methyladenosine (m6A) epitranscriptome.

Authors:  Yujiao Tang; Kunqi Chen; Bowen Song; Jiongming Ma; Xiangyu Wu; Qingru Xu; Zhen Wei; Jionglong Su; Gang Liu; Rong Rong; Zhiliang Lu; João Pedro de Magalhães; Daniel J Rigden; Jia Meng
Journal:  Nucleic Acids Res       Date:  2021-01-08       Impact factor: 16.971

6.  Establishment and Validation of a 5 m6A RNA Methylation Regulatory Gene Prognostic Model in Low-Grade Glioma.

Authors:  Zhiqun Bai; Xuemei Wang; Zhen Zhang
Journal:  Front Genet       Date:  2022-02-25       Impact factor: 4.599

Review 7.  Elucidating the Functions of Non-Coding RNAs from the Perspective of RNA Modifications.

Authors:  Venkata Naga Srikanth Garikipati; Shizuka Uchida
Journal:  Noncoding RNA       Date:  2021-05-11

Review 8.  Epigenetics: Roles and therapeutic implications of non-coding RNA modifications in human cancers.

Authors:  Dawei Rong; Guangshun Sun; Fan Wu; Ye Cheng; Guoqiang Sun; Wei Jiang; Xiao Li; Yi Zhong; Liangliang Wu; Chuanyong Zhang; Weiwei Tang; Xuehao Wang
Journal:  Mol Ther Nucleic Acids       Date:  2021-05-01       Impact factor: 8.886

9.  VIRMA-Dependent N6-Methyladenosine Modifications Regulate the Expression of Long Non-Coding RNAs CCAT1 and CCAT2 in Prostate Cancer.

Authors:  Daniela Barros-Silva; João Lobo; Catarina Guimarães-Teixeira; Isa Carneiro; Jorge Oliveira; Elena S Martens-Uzunova; Rui Henrique; Carmen Jerónimo
Journal:  Cancers (Basel)       Date:  2020-03-25       Impact factor: 6.639

10.  M6ADD: a comprehensive database of m6A modifications in diseases.

Authors:  Dianshuang Zhou; Hongli Wang; Fanqi Bi; Jie Xing; Yue Gu; Cong Wang; Menyan Zhang; Yan Huang; Jiaqi Zeng; Qiong Wu; Yan Zhang
Journal:  RNA Biol       Date:  2021-04-27       Impact factor: 4.652

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.