Literature DB >> 23969133

MLML: consistent simultaneous estimates of DNA methylation and hydroxymethylation.

Jianghan Qu1, Meng Zhou, Qiang Song, Elizabeth E Hong, Andrew D Smith.   

Abstract

MOTIVATION: The two major epigenetic modifications of cytosines, 5-methylcytosine (5-mC) and 5-hydroxymethylcytosine (5-hmC), coexist with each other in a range of mammalian cell populations. Increasing evidence points to important roles of 5-hmC in demethylation of 5-mC and epigenomic regulation in development. Recently developed experimental methods allow direct single-base profiling of either 5-hmC or 5-mC. Meaningful analyses seem to require combining these experiments with bisulfite sequencing, but doing so naively produces inconsistent estimates of 5-mC or 5-hmC levels.
RESULTS: We present a method to jointly model read counts from bisulfite sequencing, oxidative bisulfite sequencing and Tet-Assisted Bisulfite sequencing, providing simultaneous estimates of 5-hmC and 5-mC levels that are consistent across experiment types. AVAILABILITY: http://smithlab.usc.edu/software/mlml

Entities:  

Mesh:

Substances:

Year:  2013        PMID: 23969133      PMCID: PMC3789553          DOI: 10.1093/bioinformatics/btt459

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

DNA methylation is an important epigenetic mark in mammals. In addition to the extensively studied 5-methylcytosine (5-mC) modification, its oxidation product, 5-hydroxymethylcytosine (5-hmC), has been observed at substantial levels in both somatic and embryonic stem cells (Kriaucionis and Heintz, 2009; Tahiliani ). Recent studies of 5-hmC in mouse TET knock-out models (Ito ), mouse zygotic development (Iqbal ) and multiple cell types (Globisch ; Ito ; Kinney ; Sun ) suggest that 5-hmC is involved in epigenetic regulation. The current most comprehensive and accurate method for profiling cytosine methylation is bisulfite sequencing (BS-seq). Treatment with sodium bisulfite converts unmethylated cytosines to uracils, but does not distinguish between 5-mC and 5-hmC (Huang ), and consequently the yield of methylation from BS-seq is the sum of 5-mC and 5-hmC levels. Two recently developed techniques, oxidative bisulfite sequencing (oxBS-seq) (Booth ) and Tet-Assisted Bisulfite sequencing (TAB-seq) (Yu ), provide high-throughput single-base resolution measurements of 5-mC and 5-hmC, respectively. Any two of BS-seq, TAB-seq or oxBS-seq can be combined to profile both the 5-mC and 5-hmC methylomes of a cell population, and especially when studying 5-hmC, proper interpretation of results depends on having some estimate of the 5-mC level. However, naive manipulation of read count frequencies from independent sequencing experiments often produces two kinds of ‘overshoot’ problems in estimating 5-mC and 5-hmC levels. When combining BS-seq with TAB-seq, the 5-mC level at a given CpG site can be estimated by subtracting the 5-hmC level (TAB-seq) from the combined 5-mC + 5-hmC level (BS-seq). The result can be negative, because of random sampling (or systematic error) in each experiment. Similarly, combining TAB-seq and oxBS-seq could lead to estimates of 5-mC and 5-hmC levels exceeding 100%. These overshoot sites may constitute a substantial proportion. In one dataset based on oxBS-seq technology, 17% of CpG sites captured by reduced representation bisulfite sequencing (RRBS) and oxRRBS experiments exhibited overshoot (Booth ). To fully leverage the information in these data requires some method for making consistent estimates of 5-mC and 5-hmC levels. We present maximum likelihood methylation levels (MLML) for simultaneous estimation of 5-mC and 5-hmC, combining data from any two of BS-seq, TAB-seq or oxBS-seq, or all three when available. Our estimates are consistent in that 5-mC and 5-hmC levels are non-negative, and never sum over 1. In an important subset of cases, our estimates are not only consistent but also show significantly greater accuracy at sites with lower coverage.

2 METHODS

Each of BS-seq, TAB-seq and oxBS-seq provides some amount of information about both the 5-mC and 5-hmC levels. Our approach is to combine information from any pair or all three of these experiments, and arrive at maximum likelihood estimates (MLEs) for the 5-mC and 5-hmC levels. A similar method has been developed in the context of haplotype frequency estimation from pooled sequencing (Kessner ). To explain our method, we assume the data are from TAB-seq and BS-seq experiments for the same biological sample. The more general formulation is provided in Supplementary Information. Focusing on an individual CpG site, let p denote the methylation level (a probability), p the hydroxymethylation and the level of unmethylated C. In the TAB-seq experiment, let h denote the number of C reads mapping over the CpG site, and let g denote the T reads mapping over the same CpG. The total reads covering the CpG site in the TAB-seq experiment is then h + g. Similarly, let t denote the number of C reads mapping over the site in the BS-seq experiment, whereas u denotes the number of T reads, and the total reads covering the CpG in the BS-seq experiment is . If values for p and p are known, h and u are binomial random variables, i.e. , and : Given observations of , when no overshoot would result, we use the frequencies to estimate . In this case, the frequencies directly give MLEs. At overshoot sites, we introduce latent variables and use expectation maximization to approximate the MLE for p. Let () be the number of C (T) reads in BS-seq (TAB-seq) that correspond to 5-mCs. Then () is the number of C (T) reads corresponding to 5-hmC (unmethylated C). The complete data likelihood is then where is a multinomial p.m.f. Estimates for p and p are then computed by expectation maximization algorithm to account for the latent and (Supplementary Information). The MLEs can be compared with binomial confidence intervals around corresponding frequency estimates if direct readouts (e.g. for 5-hmC in the case of TAB-seq) are available. When estimates fall outside the specified confidence interval, sites are flagged as being ‘strongly’ inconsistent. An overabundance of such sites might suggest systematic error.

3 RESULTS

To understand the properties of our estimators and the frequency method, we used simulations with fixed coverage and precisely set levels for 5-mC and 5-hmC, assuming the experiments were BS-seq and TAB-seq. The case of BS-seq and oxBS-seq is symmetric with the estimates for p and p exchanged. For each valid combination of 5-mC and 5-hmC levels from , we simulated from binomial distributions for both BS-seq and TAB-seq. Estimates for p and p were made using the maximum likelihood method and the frequency method, which estimate p using and p using . The relative error () for both estimation methods was computed and then averaged over 100 000 simulations for each parameter combination. The average estimation errors are presented in Supplementary Table S1. Estimates of p are more accurate using MLML, especially at lower values of p and low coverage. For example, when the true values are , the MLML reduces the average relative error by >23% at overshoot sites compared with frequency estimates when the coverage is , and this reduction in error increases to 57% for such sites covered only . The trend for errors of p estimates is shown in Figure 1a, indicating the accuracy advantage for MLML as a function of coverage. The simulation also revealed substantial amounts of overshoot sites under different 5-mC and 5-hmC level combinations (Fig. 1b, Supplementary Tables).
Fig. 1.

Accuracy is improved at lower coverage using MLML (BS-seq + TAB-seq). (a) Average absolute errors of 5-hmC level estimates at overshoot sites. (b) Proportion of overshoot sites in simulated data

Accuracy is improved at lower coverage using MLML (BS-seq + TAB-seq). (a) Average absolute errors of 5-hmC level estimates at overshoot sites. (b) Proportion of overshoot sites in simulated data
  12 in total

1.  Base-resolution analysis of 5-hydroxymethylcytosine in the mammalian genome.

Authors:  Miao Yu; Gary C Hon; Keith E Szulwach; Chun-Xiao Song; Liang Zhang; Audrey Kim; Xuekun Li; Qing Dai; Yin Shen; Beomseok Park; Jung-Hyun Min; Peng Jin; Bing Ren; Chuan He
Journal:  Cell       Date:  2012-05-17       Impact factor: 41.582

2.  Reprogramming of the paternal genome upon fertilization involves genome-wide oxidation of 5-methylcytosine.

Authors:  Khursheed Iqbal; Seung-Gi Jin; Gerd P Pfeifer; Piroska E Szabó
Journal:  Proc Natl Acad Sci U S A       Date:  2011-02-14       Impact factor: 11.205

3.  The behaviour of 5-hydroxymethylcytosine in bisulfite sequencing.

Authors:  Yun Huang; William A Pastor; Yinghua Shen; Mamta Tahiliani; David R Liu; Anjana Rao
Journal:  PLoS One       Date:  2010-01-26       Impact factor: 3.240

4.  High-resolution enzymatic mapping of genomic 5-hydroxymethylcytosine in mouse embryonic stem cells.

Authors:  Zhiyi Sun; Jolyon Terragni; Terragni Jolyon; Janine G Borgaro; Yiwei Liu; Ling Yu; Shengxi Guan; Hua Wang; Dapeng Sun; Xiaodong Cheng; Zhenyu Zhu; Sriharsa Pradhan; Yu Zheng
Journal:  Cell Rep       Date:  2013-01-24       Impact factor: 9.423

5.  Conversion of 5-methylcytosine to 5-hydroxymethylcytosine in mammalian DNA by MLL partner TET1.

Authors:  Mamta Tahiliani; Kian Peng Koh; Yinghua Shen; William A Pastor; Hozefa Bandukwala; Yevgeny Brudno; Suneet Agarwal; Lakshminarayan M Iyer; David R Liu; L Aravind; Anjana Rao
Journal:  Science       Date:  2009-04-16       Impact factor: 47.728

6.  Quantitative sequencing of 5-methylcytosine and 5-hydroxymethylcytosine at single-base resolution.

Authors:  Michael J Booth; Miguel R Branco; Gabriella Ficz; David Oxley; Felix Krueger; Wolf Reik; Shankar Balasubramanian
Journal:  Science       Date:  2012-04-26       Impact factor: 47.728

7.  Tissue-specific distribution and dynamic changes of 5-hydroxymethylcytosine in mammalian genomes.

Authors:  Shannon Morey Kinney; Hang Gyeong Chin; Romualdas Vaisvila; Jurate Bitinaite; Yu Zheng; Pierre-Olivier Estève; Suhua Feng; Hume Stroud; Steven E Jacobsen; Sriharsa Pradhan
Journal:  J Biol Chem       Date:  2011-05-24       Impact factor: 5.157

8.  Tet proteins can convert 5-methylcytosine to 5-formylcytosine and 5-carboxylcytosine.

Authors:  Shinsuke Ito; Li Shen; Qing Dai; Susan C Wu; Leonard B Collins; James A Swenberg; Chuan He; Yi Zhang
Journal:  Science       Date:  2011-07-21       Impact factor: 47.728

9.  The nuclear DNA base 5-hydroxymethylcytosine is present in Purkinje neurons and the brain.

Authors:  Skirmantas Kriaucionis; Nathaniel Heintz
Journal:  Science       Date:  2009-04-16       Impact factor: 47.728

10.  Role of Tet proteins in 5mC to 5hmC conversion, ES-cell self-renewal and inner cell mass specification.

Authors:  Shinsuke Ito; Ana C D'Alessio; Olena V Taranova; Kwonho Hong; Lawrence C Sowers; Yi Zhang
Journal:  Nature       Date:  2010-08-26       Impact factor: 49.962

View more
  21 in total

1.  DNA 5-Methylcytosine-Specific Amplification and Sequencing.

Authors:  Chang Liu; Xiaolong Cui; Boxuan Simen Zhao; Pradnya Narkhede; Yawei Gao; Jun Liu; Xiaoyang Dou; Qing Dai; Li-Sheng Zhang; Chuan He
Journal:  J Am Chem Soc       Date:  2020-02-25       Impact factor: 15.419

2.  LuxGLM: a probabilistic covariate model for quantification of DNA methylation modifications with complex experimental designs.

Authors:  Tarmo Äijö; Xiaojing Yue; Anjana Rao; Harri Lähdesmäki
Journal:  Bioinformatics       Date:  2016-09-01       Impact factor: 6.937

3.  5-hydroxymethylcytosine accumulation in postmitotic neurons results in functional demethylation of expressed genes.

Authors:  Marian Mellén; Pinar Ayata; Nathaniel Heintz
Journal:  Proc Natl Acad Sci U S A       Date:  2017-08-28       Impact factor: 11.205

4.  OxyBS: estimation of 5-methylcytosine and 5-hydroxymethylcytosine from tandem-treated oxidative bisulfite and bisulfite DNA.

Authors:  E Andres Houseman; Kevin C Johnson; Brock C Christensen
Journal:  Bioinformatics       Date:  2016-04-19       Impact factor: 6.937

5.  Bioinformatic Estimation of DNA Methylation and Hydroxymethylation Proportions.

Authors:  Samara Flamini Kiihl
Journal:  Methods Mol Biol       Date:  2021

Review 6.  Epigenetic regulation of persistent pain.

Authors:  Guang Bai; Ke Ren; Ronald Dubner
Journal:  Transl Res       Date:  2014-05-29       Impact factor: 7.012

7.  DIRECTION: a machine learning framework for predicting and characterizing DNA methylation and hydroxymethylation in mammalian genomes.

Authors:  Milos Pavlovic; Pradipta Ray; Kristina Pavlovic; Aaron Kotamarti; Min Chen; Michael Q Zhang
Journal:  Bioinformatics       Date:  2017-10-01       Impact factor: 6.937

8.  A comprehensive approach for genome-wide efficiency profiling of DNA modifying enzymes.

Authors:  Charalampos Kyriakopoulos; Karl Nordström; Paula Linh Kramer; Judith Yumiko Gottfreund; Abdulrahman Salhab; Julia Arand; Fabian Müller; Ferdinand von Meyenn; Gabriella Ficz; Wolf Reik; Verena Wolf; Jörn Walter; Pascal Giehr
Journal:  Cell Rep Methods       Date:  2022-03-28

9.  methylPipe and compEpiTools: a suite of R packages for the integrative analysis of epigenomics data.

Authors:  Kamal Kishore; Stefano de Pretis; Ryan Lister; Marco J Morelli; Valerio Bianchi; Bruno Amati; Joseph R Ecker; Mattia Pelizzola
Journal:  BMC Bioinformatics       Date:  2015-09-29       Impact factor: 3.169

10.  An epigenomic landscape of cervical intraepithelial neoplasia and cervical cancer using single-base resolution methylome and hydroxymethylome.

Authors:  Yingxin Han; Liyan Ji; Yanfang Guan; Mengya Ma; Pansong Li; Yinge Xue; Yinxin Zhang; Wanqiu Huang; Yuhua Gong; Li Jiang; Xipeng Wang; Hong Xie; Boping Zhou; Jiayin Wang; Junwen Wang; Jinghua Han; Yuliang Deng; Xin Yi; Fei Gao; Jian Huang
Journal:  Clin Transl Med       Date:  2021-07
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.