Literature DB >> 29619235

iMETHYL: an integrative database of human DNA methylation, gene expression, and genomic variation.

Shohei Komaki1, Yuh Shiwa1,2,3, Ryohei Furukawa1, Tsuyoshi Hachiya1, Hideki Ohmomo1,2, Ryo Otomo1,2, Mamoru Satoh1,2, Jiro Hitomi4,5, Kenji Sobue6, Makoto Sasaki5,7, Atsushi Shimizu1.   

Abstract

We launched an integrative multi-omics database, iMETHYL (http://imethyl.iwate-megabank.org). iMETHYL provides whole-DNA methylation (~24 million autosomal CpG sites), whole-genome (~9 million single-nucleotide variants), and whole-transcriptome (>14 000 genes) data for CD4+ T-lymphocytes, monocytes, and neutrophils collected from approximately 100 subjects. These data were obtained from whole-genome bisulfite sequencing, whole-genome sequencing, and whole-transcriptome sequencing, making iMETHYL a comprehensive database.

Entities:  

Year:  2018        PMID: 29619235      PMCID: PMC5874393          DOI: 10.1038/hgv.2018.8

Source DB:  PubMed          Journal:  Hum Genome Var        ISSN: 2054-345X


DNA methylation (DNAm) has a critical role in regulating gene expression. Recent epigenome-wide association studies in humans have revealed that locus-specific DNAm signatures are associated with susceptibility to different environmental exposures, intermediate phenotypes, and diseases.[1,2] Hence, locus-specific DNAm signatures are potential biomarkers in the era of precision medicine.[3] We recently found that CpG sites with large interindividual DNAm variation are more likely to be potential biomarkers,[4] suggesting that a database of interindividual DNAm variation would be useful to determine target regions for future epigenome-wide association studies. Several studies have surveyed interindividual DNAm variation[5] using peripheral blood, which contains many different cell types, but they did not investigate cell-type-specific signatures.[6] Only a few studies have reported interindividual DNAm variation using purified cells, such as neutrophils[7] and monocytes.[8,9] Because differences in DNAm profiles among cell types are greater than those among individuals,[4] profiling of DNAm variation using purified cells is essential to revealing interindividual DNAm variation within a cell type. In addition, the DNAm profiling methods frequently used in previous studies (e.g., array-based and targeted bisulfite sequencing) cover a limited number of human autosomal CpG sites (2–13%).[4] Accordingly, whole-genome bisulfite sequencing, which provides the highest coverage (~90%) of human CpG sites among currently available methods, is desirable for compiling an interindividual DNAm variation database.[4] Here we report the development and release of “iMETHYL” (http://imethyl.iwate-megabank.org), an integrative database (methylome, transcriptome, and genome) featuring interindividual DNAm variation. iMETHYL provides summarized open data calculated in our previous study, which characterized interindividual DNAm variation in two principal blood cell types, CD4+ T-lymphocytes (CD4T) and monocytes, which were collected from a cohort of healthy subjects (102 CD4T subjects and 102 monocyte subjects; Table 1) by whole-genome bisulfite sequencing.[4] In addition to DNAm analysis, we performed whole-genome sequencing and whole-transcriptome sequencing to comprehensively profile genomic variation and gene expression, respectively. Briefly, sequence reads were aligned to the human reference genome GRCh37/hg19 using BWA-MEM (ver. 0.7.5a-r405), and single-nucleotide variant (SNV) calling was conducted using the Genome Analysis Toolkit (GATK version 2.5-2). Gene annotation was performed using GENCODE release 19.[10] Details regarding the methods of quality-control filtering, DNAm profiling, gene expression profiling, and variant calling were described by Hachiya et al.[4] In addition to CD4T and monocytes, we isolated neutrophils from 94 subjects and performed whole-genome bisulfite sequencing, whole-genome sequencing, and whole-transcriptome sequencing (Table 1). All subjects were recruited as part of the Tohoku Medical Megabank Project, and they provided written informed consent to participate in our study. All subjects belonged to a single large cluster on a PCA plot that consisted of Japanese subjects of the 1000 Genomes Project and the Tohoku Medical Megabank Project (Supplementary Figure 1). The study was approved by the Ethics Committee of Iwate Medical University (HG H5-558 19). iMETHYL was implemented on a UNIX server with CentOS, Apache HTTP Server, and JBrowse 1.12.1.[11]
Table 1

Demographic and profile statistics of iMETHYL

 MonocytesCD4+ T cellsNeutrophils
Demographic characteristics of subjects
 N102a102a94
 Males, N (%)48 (47.1)49 (48.0)48 (51.1)
 Median age (range), years62.5 (35–75)62.0 (35–75)58.0 (24–81)
    
DNAm profiles
 Sequencing depthb31.1±1.831.0±1.654.7±1.6
 No. of autosomal CpGsc23 ,941,82124,037,51825,483,031
    
Gene expression profiles
 No. of sequencing readsb33,917,157±3,153,52835,175,996±1,275,57547,040,140±6,289,540
 No. of genesd16,28218,29914,534
    
SNV profiles
 Sequencing depthb27.2±1.027.2±1.053.3±13.2
 No. of SNVse8,945,6698,951,8228,792,880

Abbreviations: DNAm, DNA methylation; SNV, single-nucleotide variant.

Both cell types were obtained from the same 95 individuals out of a cohort of 102.

Average±standard deviation.

CpGs that were retained in ≥50% of subjects for each cell type.

Genes that were expressed with a fragments per kilobase of exon per million mapped fragments ≥0.1 in ≥50% of subjects for each cell type.

SNVs with a minor allele count >1.

Based on the DNAm profiles, we estimated the average DNAm levels and variation for ~24 million autosomal CpG sites. iMETHYL provides information on interindividual DNAm variation that was calculated by two methods, i.e., standard deviation (SD) and reference interval (RI), which is defined as the difference between the 95th and 5th percentiles of the DNAm level among individuals.[4] In addition, iMETHYL includes the average and SD of gene expression levels for >14, 000 genes and allele frequencies for ~9 million autosomal SNVs (Table 1). Statistics regarding age, sex, and database profiles used in iMETHYL are presented in Table 1. Furthermore, genomic annotation tracks, such as gene models, repetitive elements, CpG islands, and microarray probes, are available in the iMETHYL browser (Table 2).
Table 2

List of available tracks in iMETHYL

Track nameDescriptionSource
IMM_CpG_CD4TInformation for each CpG site of CD4TRef. 4
IMM_CpG_CD4T_avgAverage DNAm level of each CpG site of CD4TRef. 4
IMM_CpG_CD4T_sdDNAm variations of each CpG site of CD4T measured by SDRef. 4
IMM_CpG_CD4T_RIDNAm variations of each CpG site of CD4T measured by RIRef. 4
IMM_CpG_MonoInformation for each CpG site of monocytesRef. 4
IMM_CpG_Mono_avgAverage DNAm level of each CpG site of monocytesRef. 4
IMM_CpG_Mono_sdDNAm variations of each CpG site of monocytes measured by SDRef. 4
IMM_CpG_Mono_RIDNAm variations of each CpG site of monocytes measured by RIRef. 4
IMM_CpG_NeuInformation for each CpG site of neutrophilsThis study
IMM_CpG_Neu_avgAverage DNAm level of each CpG site of neutrophilsThis study
IMM_CpG_Neu_sdDNAm variations of each CpG site of neutrophils measured by SDThis study
IMM_CpG_Neu_RIDNAm variations of each CpG site of neutrophils measured by RIThis study
IMM_FPKM_CD4TFPKM values of each transcript of CD4TRef. 4
IMM_FPKM_MonoFPKM values of each transcript of monocytesRef. 4
IMM_FPKM_NeuFPKM values of each transcript of neutrophilsThis study
IMM_SNV_CD4TInformation for each SNV of CD4TRef. 4
IMM_SNV_MonoInformation for each SNV of monocytesRef. 4
IMM_SNV_NeuInformation for each SNV of neutrophilsThis study
Reference sequenceHuman genome hg19/GRCh37 sequenceUCSC genome browser
RepeatMaskerRepetitive elementsUCSC genome browser
CpGIslandsExtCpG island locationsUCSC genome browser
HM450Probe information for Illumina Infinium HumanMethylation450UCSC genome browser
gencode_v19Information of genes obtained from GENCODE version 19GENCODE
gencode_v19_trsInformation of transcripts obtained from GENCODE version 19GENCODE

Abbreviations: CD4T, CD4+ T-lymphocyte; DNAm, DNA methylation; FPKM, fragments per kilobase of exon per million fragments mapped; RI, reference interval; SD, standard deviation; SNV, single-nucleotide variant.

iMETHYL was developed to provide an informative, easy-to-use resource that enables investigators to explore DNAm levels and the variability of potential biomarkers identified by epigenome-wide association studies or candidate gene approach studies. From the iMETHYL browser, regions of interest can be specified using gene symbols (GENCODE release 19), dbSNP ID, DNA methylation array probe ID, and genomic positions. The genome browser provides graphical views of genomic annotations and the average methylation level and variability (SD and RI) of each CpG site in each of the three human cell types (Figure 1a). In addition, tracks for the average expression level and SD of each gene for each cell type and allele frequencies of each SNV within 102 (CD4T), 102 (monocytes), and 94 (neutrophils) subjects are provided.
Figure 1

Graphical view of iMETHYL. (a) Three-layer omics data are provided as browser tracks. The browser displays several tracks, which are shown for the region surrounding the DNAm biomarker for tobacco smoking, cg05575921. Users can select tracks that provide information from external sources on gene structure, expression, and SNVs and cell-type-specific original tracks (e.g., CD4T, monocytes, and neutrophils) that show average DNAm levels and different measures of variation (SD and RI). (b–d) Detailed information on CpG tracks for CD4T, monocytes, and neutrophils. The frequencies of the three DNAm categories among individuals are shown as Mlf_high (≥ 67%), Mlf_mid (34–66%), and Mlf_low (≤ 33%). CD4T, CD4+ T-lymphocytes; DNAm, DNA methylation; Mlf_high, frequency of hypermethylated DNA; Mlf_high, frequency of hypermethylated DNA; Mlf_low, frequency of hypomethylated DNA; Mlf_mid, frequency of intermediate methylation DNA; RI, reference interval; SD, standard deviation; SNV, single-nucleotide variation.

In the example shown in Figure 1a, the iMETHYL genome browser showed different tracks in the region flanking cg05575921, which is a DNAm biomarker for tobacco smoking[12,13] located in the aryl-hydrocarbon receptor repressor (AHRR) gene. This DNAm biomarker is markedly demethylated in current smokers.[12,13] Using iMETHYL, the average methylation level and variability of each CpG site in the three cell types (CD4T, monocytes, and neutrophils) are shown, and by selecting the bar in the CpG tracks, histograms of DNAm levels at this CpG site for each cell type appear in pop-up windows (Figure 1b–d). iMETHYL is also useful for investigating cell-type-specific DNAm variability. In the CpG site shown in Figure 1, the DNAm levels in CD4T were hypermethylated with a narrow distribution (Figure 1b), whereas broader distributions of DNAm levels were found in monocytes and neutrophils (Figure 1c and d). Furthermore, investigators can use the browser to explore variability in gene expression and SNVs. For example, upon selecting the bar shown in the fragments per kilobase of exons per million mapped fragment tracks, a histogram of gene expression levels appears in the pop-up window. In addition, the average expression level and SD for each gene are shown. This information provides important clues into the functional relevance of known or putative DNAm biomarkers. Data on the mean and variation of the DNAm level of each CpG site for each of the three cell types can be downloaded from the iMETHYL website so that users can find CpG sites of their own interest based on the DNAm level and variation or differences between cell types. In summary, we constructed a public database, iMETHYL, that provides a reference for human DNAm variation. iMETHYL is the first database featuring interindividual DNAm variation based on high-coverage whole-genome bisulfite sequencing using purified CD4T, monocytes, and neutrophils. Because the data were obtained from apparently healthy subjects, the multi-omics genomic data provided by iMETHYL can be used as a reference control. Investigators can examine DNAm variation, gene expression, and SNVs at any specific region of the human genome, which can enable the identification of variable regions in the population to design assay probes for microarrays or targeted sequencing. iMETHYL provides multi-omics data for three different cell types to the scientific community. The iMETHYL browser will be a useful resource not only for researchers specializing in epigenomics but also for those interested in the interactive analysis of DNA methylation, gene expression, and genomic variation.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
  13 in total

Review 1.  Genetic sources of population epigenomic variation.

Authors:  Aaron Taudt; Maria Colomé-Tatché; Frank Johannes
Journal:  Nat Rev Genet       Date:  2016-05-09       Impact factor: 53.242

Review 2.  Epigenome-wide association studies for common human diseases.

Authors:  Vardhman K Rakyan; Thomas A Down; David J Balding; Stephan Beck
Journal:  Nat Rev Genet       Date:  2011-07-12       Impact factor: 53.242

3.  Characterization of the DNA methylome and its interindividual variation in human peripheral blood monocytes.

Authors:  Hui Shen; Chuan Qiu; Jian Li; Qing Tian; Hong-Wen Deng
Journal:  Epigenomics       Date:  2013-06       Impact factor: 4.778

4.  GENCODE: the reference human genome annotation for The ENCODE Project.

Authors:  Jennifer Harrow; Adam Frankish; Jose M Gonzalez; Electra Tapanari; Mark Diekhans; Felix Kokocinski; Bronwen L Aken; Daniel Barrell; Amonida Zadissa; Stephen Searle; If Barnes; Alexandra Bignell; Veronika Boychenko; Toby Hunt; Mike Kay; Gaurab Mukherjee; Jeena Rajan; Gloria Despacio-Reyes; Gary Saunders; Charles Steward; Rachel Harte; Michael Lin; Cédric Howald; Andrea Tanzer; Thomas Derrien; Jacqueline Chrast; Nathalie Walters; Suganthi Balasubramanian; Baikang Pei; Michael Tress; Jose Manuel Rodriguez; Iakes Ezkurdia; Jeltje van Baren; Michael Brent; David Haussler; Manolis Kellis; Alfonso Valencia; Alexandre Reymond; Mark Gerstein; Roderic Guigó; Tim J Hubbard
Journal:  Genome Res       Date:  2012-09       Impact factor: 9.043

Review 5.  Current and Future Prospects for Epigenetic Biomarkers of Substance Use Disorders.

Authors:  Allan M Andersen; Meeshanthini V Dogan; Steven R H Beach; Robert A Philibert
Journal:  Genes (Basel)       Date:  2015-10-14       Impact factor: 4.096

6.  Cigarette smoking reduces DNA methylation levels at multiple genomic loci but the effect is partially reversible upon cessation.

Authors:  Loukia G Tsaprouni; Tsun-Po Yang; Jordana Bell; Katherine J Dick; Stavroula Kanoni; James Nisbet; Ana Viñuela; Elin Grundberg; Christopher P Nelson; Eshwar Meduri; Alfonso Buil; Francois Cambien; Christian Hengstenberg; Jeanette Erdmann; Heribert Schunkert; Alison H Goodall; Willem H Ouwehand; Emmanouil Dermitzakis; Tim D Spector; Nilesh J Samani; Panos Deloukas
Journal:  Epigenetics       Date:  2014-10       Impact factor: 4.528

7.  Tobacco smoking leads to extensive genome-wide changes in DNA methylation.

Authors:  Sonja Zeilinger; Brigitte Kühnel; Norman Klopp; Hansjörg Baurecht; Anja Kleinschmidt; Christian Gieger; Stephan Weidinger; Eva Lattka; Jerzy Adamski; Annette Peters; Konstantin Strauch; Melanie Waldenberger; Thomas Illig
Journal:  PLoS One       Date:  2013-05-17       Impact factor: 3.240

8.  Intraindividual dynamics of transcriptome and genome-wide stability of DNA methylation.

Authors:  Ryohei Furukawa; Tsuyoshi Hachiya; Hideki Ohmomo; Yuh Shiwa; Kanako Ono; Sadafumi Suzuki; Mamoru Satoh; Jiro Hitomi; Kenji Sobue; Atsushi Shimizu
Journal:  Sci Rep       Date:  2016-05-19       Impact factor: 4.379

9.  JBrowse: a dynamic web platform for genome visualization and analysis.

Authors:  Robert Buels; Eric Yao; Colin M Diesh; Richard D Hayes; Monica Munoz-Torres; Gregg Helt; David M Goodstein; Christine G Elsik; Suzanna E Lewis; Lincoln Stein; Ian H Holmes
Journal:  Genome Biol       Date:  2016-04-12       Impact factor: 13.583

10.  Genome-wide identification of inter-individually variable DNA methylation sites improves the efficacy of epigenetic association studies.

Authors:  Tsuyoshi Hachiya; Ryohei Furukawa; Yuh Shiwa; Hideki Ohmomo; Kanako Ono; Fumiki Katsuoka; Masao Nagasaki; Jun Yasuda; Nobuo Fuse; Kengo Kinoshita; Masayuki Yamamoto; Kozo Tanno; Mamoru Satoh; Ryujin Endo; Makoto Sasaki; Kiyomi Sakata; Seiichiro Kobayashi; Kuniaki Ogasawara; Jiro Hitomi; Kenji Sobue; Atsushi Shimizu
Journal:  NPJ Genom Med       Date:  2017-04-13       Impact factor: 8.617

View more
  23 in total

1.  Mechanism of long noncoding RNAs as transcriptional regulators in cancer.

Authors:  Yan Huang; Qi Guo; Xi-Ping Ding; Xiangting Wang
Journal:  RNA Biol       Date:  2020-01-10       Impact factor: 4.652

2.  DNA methylation as the link between migration and the major noncommunicable diseases: the RODAM study.

Authors:  Felix P Chilunga; Peter Henneman; Andrea Venema; Karlijn Ac Meeks; Juan R Gonzalez; Carlos Ruiz-Arenas; Ana Requena-Méndez; Erik Beune; Joachim Spranger; Liam Smeeth; Silver Bahendeka; Ellis Owusu-Dabo; Kerstin Klipstein-Grobusch; Adebowale Adeyemo; Marcel Mam Mannens; Charles Agyemang
Journal:  Epigenomics       Date:  2021-04-23       Impact factor: 4.778

3.  The association between RAPSN methylation in peripheral blood and breast cancer in the Chinese population.

Authors:  Shuifang Lei; Lixi Li; Xiaoqin Yang; Qiming Yin; Tian Xu; Wenjie Zhou; Wanjian Gu; Fei Ma; Rongxi Yang
Journal:  J Hum Genet       Date:  2021-05-06       Impact factor: 3.172

4.  Changes in DNA Methylation and Gene Expression of Insulin and Obesity-Related Gene PIK3R1 after Roux-en-Y Gastric Bypass.

Authors:  Marcela A S Pinhel; Natália Y Noronha; Carolina F Nicoletti; Vanessa Ab Pereira; Bruno Ap de Oliveira; Cristiana Cortes-Oliveira; Wilson Salgado; Fernando Barbosa; Júlio S Marchini; Doroteia Rs Souza; Carla B Nonino
Journal:  Int J Mol Sci       Date:  2020-06-24       Impact factor: 5.923

5.  The association of integration patterns of human papilloma virus and single nucleotide polymorphisms on immune- or DNA repair-related genes in cervical cancer patients.

Authors:  Jungnam Joo; Yosuke Omae; Yuki Hitomi; Boram Park; Hye-Jin Shin; Kyong-Ah Yoon; Hiromi Sawai; Makoto Tsuiji; Tomonori Hayashi; Sun-Young Kong; Katsushi Tokunaga; Joo-Young Kim
Journal:  Sci Rep       Date:  2019-09-11       Impact factor: 4.379

6.  MethGET: web-based bioinformatics software for correlating genome-wide DNA methylation and gene expression.

Authors:  Chin-Sheng Teng; Bing-Heng Wu; Ming-Ren Yen; Pao-Yang Chen
Journal:  BMC Genomics       Date:  2020-05-29       Impact factor: 3.969

7.  Establishment of diagnostic criteria for upper urinary tract urothelial carcinoma based on genome-wide DNA methylation analysis.

Authors:  Mao Fujimoto; Eri Arai; Koji Tsumura; Takuya Yotani; Yuriko Yamada; Yoriko Takahashi; Akiko Miyagi Maeshima; Hiroyuki Fujimoto; Teruhiko Yoshida; Yae Kanai
Journal:  Epigenetics       Date:  2020-06-04       Impact factor: 4.528

8.  Cell-cycle-gated feedback control mediates desensitization to interferon stimulation.

Authors:  Anusorn Mudla; Yanfei Jiang; Kei-Ichiro Arimoto; Bingxian Xu; Adarsh Rajesh; Andy P Ryan; Wei Wang; Matthew D Daugherty; Dong-Er Zhang; Nan Hao
Journal:  Elife       Date:  2020-09-18       Impact factor: 8.140

9.  Genome-wide analysis of polymorphism × sodium interaction effect on blood pressure identifies a novel 3'-BCL11B gene desert locus.

Authors:  Tsuyoshi Hachiya; Akira Narita; Hideki Ohmomo; Yoichi Sutoh; Shohei Komaki; Kozo Tanno; Mamoru Satoh; Kiyomi Sakata; Jiro Hitomi; Motoyuki Nakamura; Kuniaki Ogasawara; Masayuki Yamamoto; Makoto Sasaki; Atsushi Hozawa; Atsushi Shimizu
Journal:  Sci Rep       Date:  2018-09-21       Impact factor: 4.379

10.  An epigenome-wide association study of posttraumatic stress disorder in US veterans implicates several new DNA methylation loci.

Authors:  Mark W Logue; Mark W Miller; Erika J Wolf; Bertrand Russ Huber; Filomene G Morrison; Zhenwei Zhou; Yuanchao Zheng; Alicia K Smith; Nikolaos P Daskalakis; Andrew Ratanatharathorn; Monica Uddin; Caroline M Nievergelt; Allison E Ashley-Koch; Dewleen G Baker; Jean C Beckham; Melanie E Garrett; Marco P Boks; Elbert Geuze; Gerald A Grant; Michael A Hauser; Ronald C Kessler; Nathan A Kimbrel; Adam X Maihofer; Christine E Marx; Xue-Jun Qin; Victoria B Risbrough; Bart P F Rutten; Murray B Stein; Robert J Ursano; Eric Vermetten; Christiaan H Vinkers; Erin B Ware; Annjanette Stone; Steven A Schichman; Regina E McGlinchey; William P Milberg; Jasmeet P Hayes; Mieke Verfaellie
Journal:  Clin Epigenetics       Date:  2020-03-14       Impact factor: 6.551

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.