Literature DB >> 26613017

Exemplary multiplex bisulfite amplicon data used to demonstrate the utility of Methpat.

Nicholas C Wong1, Bernard J Pope2, Ida Candiloro3, Darren Korbie4, Matt Trau5, Stephen Q Wong6, Thomas Mikeska7, Bryce J W van Denderen8, Erik W Thompson9, Stefanie Eggers10, Stephen R Doyle11, Alexander Dobrovic12.   

Abstract

BACKGROUND: DNA methylation is a complex epigenetic marker that can be analyzed using a wide variety of methods. Interpretation and visualization of DNA methylation data can mask complexity in terms of methylation status at each CpG site, cellular heterogeneity of samples and allelic DNA methylation patterns within a given DNA strand. Bisulfite sequencing is considered the gold standard, but visualization of massively parallel sequencing results remains a significant challenge.
FINDINGS: We created a program called Methpat that facilitates visualization and interpretation of bisulfite sequencing data generated by massively parallel sequencing. To demonstrate this, we performed multiplex PCR that targeted 48 regions of interest across 86 human samples. The regions selected included known gene promoters associated with cancer, repetitive elements, known imprinted regions and mitochondrial genomic sequences. We interrogated a range of samples including human cell lines, primary tumours and primary tissue samples. Methpat generates two forms of output: a tab-delimited text file for each sample that summarizes DNA methylation patterns and their read counts for each amplicon, and a HTML file that summarizes this data visually. Methpat can be used with publicly available whole genome bisulfite sequencing and reduced representation bisulfite sequencing datasets with sufficient read depths.
CONCLUSIONS: Using Methpat, complex DNA methylation data derived from massively parallel sequencing can be summarized and visualized for biological interpretation. By accounting for allelic DNA methylation states and their abundance in a sample, Methpat can unmask the complexity of DNA methylation and yield further biological insight in existing datasets.

Entities:  

Keywords:  Bisulfite sequencing; Cancer; DNA methylation; Epialleles; Epigenetics; PCR; Visualization

Mesh:

Year:  2015        PMID: 26613017      PMCID: PMC4660811          DOI: 10.1186/s13742-015-0098-x

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   6.524


Data description

DNA methylation can be analyzed using a wide range of methods [1], with bisulfite sequencing considered the current gold standard. Current technologies such as whole genome bisulfite sequencing (WGBS) and reduced representation bisulfite sequencing (RRBS) provide unprecedented detail of methylation patterns throughout the genome, but the complexity of DNA methylation patterns is masked when simple summary metrics are used. For example, most studies of DNA methylation rationalize levels to a percentage value, which typically masks allelic patterns when interpreting the data. We have developed Methpat, a tool that summarizes and visualizes complex DNA methylation data collected by massively parallel sequencing of bisulfite DNA [2]. Using this tool, the DNA methylation state of individual CpG sites and the abundance of allelic patterns can be visualized [3]. Furthermore, by measuring the abundance of allelic DNA methylation patterns, cellular heterogeneity in methylation patterns can now be explored [4]. The utility of Methpat was demonstrated by measuring DNA methylation in 86 samples (Table 1) across 48 regions of interest (Table 2). This was achieved by using multiplex PCR on bisulfite converted DNA followed by massively parallel sequencing using an Illumina MiSeq Sequencing platform with v3 chemistry. Each sample was indexed and pooled at equimolar concentrations into a single library pool for sequencing. Data has been deposited into GEO with reference identifiers GSE67856 [5] and GSE71804 [6]. A panel of breast cancer cell lines treated with epidermal growth factor and transforming growth factor beta were also analyzed in parallel [7].
Table 1

Human Samples used in this study

Sample NameDescriptionGEO Accession
293HEK-293 embryonic kidney cell line. ATCC CRL1573GSE67856
40424Normal fibroblast cell lineGSE67856
910046Normal fibroblast cell lineGSE67856
12A-CD19Normal Fluorescent Activated Cell Sorted (FACS) CD19 positive bone marrow cells from individual 12AGSE67856
12A-CD33Normal Fluorescent Activated Cell Sorted (FACS) CD33 positive bone marrow cells from individual 12AGSE67856
12A-CD34Normal Fluorescent Activated Cell Sorted (FACS) CD34 positive bone marrow cells from individual 12AGSE67856
12A-CD45Normal Fluorescent Activated Cell Sorted (FACS) CD45 positive bone marrow cells from individual 12AGSE67856
6-MDA453MDA-MB-453 metastatic breast cancer cell line. ATCC HTB-131GSE67856
6C-CD19Normal Fluorescent Activated Cell Sorted (FACS) CD19 positive bone marrow cells from individual 6CGSE67856
6C-CD33Normal Fluorescent Activated Cell Sorted (FACS) CD33 positive bone marrow cells from individual 6CGSE67856
6C-CD34Normal Fluorescent Activated Cell Sorted (FACS) CD34 positive bone marrow cells from individual 6CGSE67856
6C-CD45Normal Fluorescent Activated Cell Sorted (FACS) CD45 positive bone marrow cells from individual 6CGSE67856
9A-CD19Normal Fluorescent Activated Cell Sorted (FACS) CD19 positive bone marrow cells from individual 9AGSE67856
9A-CD33Normal Fluorescent Activated Cell Sorted (FACS) CD33 positive bone marrow cells from individual 9AGSE67856
9A-CD34Normal Fluorescent Activated Cell Sorted (FACS) CD34 positive bone marrow cells from individual 9AGSE67856
9A-CD45Normal Fluorescent Activated Cell Sorted (FACS) CD45 positive bone marrow cells from individual 9AGSE67856
9A-Whole-BloodWhole blood sample from individual 9AGSE67856
BRLNormal lymphoblast cell line.GSE67856
CaCoCaco2 Colon cancer cell line. ATCC HTB37GSE67856
DG75Lymphoblast cancer cell line. ATCC CRL-2625GSE67856
EKVXCancer Cell LineGSE67856
HELACancer cell line. ATCC CCL-2GSE67856
HEPG2Liver cancer cell line. ATCC HB-8065GSE67856
HT1080Cancer cell line. ATCC CCL121GSE67856
HTB22-ColMCF7 breast cancer cell line. ATCC HTB22GSE67856
JWLNormal lymphoblast cell line.GSE67856
K562CML cancer cell line. ATCC CCL-243GSE67856
Sample29Cell LineGSE71804
MB231BAGBreast cancer cell line. ATCC HTB-26GSE67856
MCF7Breast cancer cell line. ATCC HTB22GSE67856
NALM6Leukaemia cell line. ACC 128GSE67856
NCCITEmbryonic carcinoma cell line. ATCC CRL-2073GSE67856
OVCAR8Cancer cell lineGSE67856
SKNASNeuroblastoma cancer cell line. ATCC CRL2137GSE67856
U231Cancer cell lineGSE67856
Sample1Human normal colon tissueGSE71804
Sample2Human colon tumorGSE71804
Sample3Human normal colon tissueGSE71804
Sample4Human colon tumorGSE71804
Sample5Human normal colon tissueGSE71804
Sample6Human colon tumorGSE71804
Sample7Human normal colon tissueGSE71804
Sample8Human colon tumorGSE71804
Sample9Human normal colon tissueGSE71804
Sample10Human colon tumorGSE71804
Sample11Human normal colon tissueGSE71804
Sample12Human colon tumorGSE71804
Sample13Pooled human cancer and blood cell DNAGSE71804
Sample14Pooled human cancer and blood cell DNAGSE71804
Sample15Pooled human cancer and blood cell DNAGSE71804
Sample16Pooled human cancer and blood cell DNAGSE71804
Sample17Pooled human cancer and blood cell DNAGSE71804
Sample18Pooled human cancer and blood cell DNAGSE71804
Sample19Artificially methylated human DNAGSE71804
Sample20Artificially methylated human DNAGSE71804
Sample21Artificially methylated human DNAGSE71804
Sample22Artificially methylated human DNAGSE71804
Sample23Artificially methylated human DNAGSE71804
Sample24Artificially methylated human DNAGSE71804
Sample25Human leukemia cell lineGSE71804
Sample26Human leukemia cell lineGSE71804
Sample27Human leukemia cell lineGSE71804
Sample28Human leukemia cell lineGSE71804
468-C1-3-9_S40MDA-468 cell line, control 1GSE71804
468-C2-3-9_S48MDA-468 cell line, control 2GSE71804
468-S1-3-9_S56MDA-468 cell line + EGF 1GSE71804
468-S2-3-9_S64MDA-468 cell line + EGF 2GSE71804
ET-C1-3-9_S71PMC42-ET cell line, control 1GSE71804
ET-C2-3-9_S79PMC42-ET cell line, control 2GSE71804
ET-S1-3-9_S87PMC42-ET cell line, +EGF 1GSE71804
ET-S2-3-9_S95PMC42-ET cell line, +EGF 2GSE71804
LA-C1-3-9_S8PMC42-LA cell line, control 1GSE71804
LA-C3-3-9_S16PMC42-LA cell line, control 2GSE71804
LA-S1-3-9_S24PMC42-LA cell line, +EGF 1GSE71804
LA-S2-3-9_S32PMC42-LA cell line, +EGF 2GSE71804
PMC42ET-72-C_S31PMC42-ET cell line, control 72 hGSE71804
PMC42ET-72 h-EGF_S39PMC42-ET cell line, +EGF 72 hGSE71804
PMC42ET-9d-C_S47PMC42-ET cell line, control 9 daysGSE71804
PMC42ET-9d-EGF_S55PMC42-ET cell line, +EGF 9 daysGSE71804
PMC42ET-9d-TGFb_S63PMC42-ET cell line, +TGFb 9 daysGSE71804
PMC42LA-72 h-C_S86PMC42-LA cell line, control 72 hGSE71804
PMC42LA-72 h-EGF_S94PMC42-LA cell line, +EGF 72 hGSE71804
PMC42LA-9d-C_S7PMC42-LA cell line, control 9 daysGSE71804
PMC42LA-9d-EGF_S15PMC42-LA cell line, +EGF 9 daysGSE71804
PMC42LA-9d-TGFb_S23PMC42-LA cell line, +TGFb 9 daysGSE71804
Table 2

Bisulfite PCR primers used in this study

Primer namePrimer sequencePrimer TmGenomic location (hg38)
mandatory01_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGAGAAGTTTGGTYGTTGYGTTTTTAT60.1–62.9
mandatory01_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGRAAACCRCTCRCRAAATACCCTA57.6–64.6chr4:154710460-154710544
mandatory02_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTAGYGGAGTTTAAGGGTTAGTGT59.2–60.9
mandatory02_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAACRAAACRCACRTACRTATATTTATA56.3–62.1chr1:110052409-110052486
mandatory03_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTGTTTGTTAGTTAGTTTTAGGTTTTTTAAT59.8
mandatory03_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCTACCAAATTTCTATTACAAACCAAA60.8chr4:7526639-7526703
mandatory04_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGATTTGGTTTYGAGAGTTTGGATTTT60.1–61.7
mandatory04_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAAAAACCRCACACCTAAACACTTAAA60.1–61.7chr2:164593225-164593299
mandatory05_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGGAATTTTGAGATTTTTAAAAGTTTTTTT59.8
mandatory05_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGATAAAAACAACAAATACCACTTCCTAAA59.9chr2:9518296-9518358
mandatory06_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTTGYGTYGATTTTGGTTTTGGTTAT57.6–60.9
mandatory06_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCRACCCCTCCCAAATCCTAAAA60.1–62.1chr17:80709100-80709203
mandatory07_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGTTAGAGGAGAYGTTTTAGTTTTT59.2–60.9
mandatory07_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCAATTCCAAAAAACRTCAATCACAATAA59.9–61.5chr3:142837969-142838050
mandatory08_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGTTAAGAGGAGTTTGTTTTGTTTTAT60.8
mandatory08_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTTTCACTAAAAAACCTCACTCCCTA60.9chr7:140218100-140218192
mandatory09_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTTTTAGAGTGTTTTTGGTTTTATTATTTTT60.2
mandatory09_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTATTTACCCCTAAAAATACCCTTTATA59.2chr7:26206542-26206614
mandatory10_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGAAGTTGAAGTGAGAATGTGATT60.3
mandatory10_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAATACCCATACAAACTATCTACACAA60.1chr7:3025554-3025664
mandatory11_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTATATAAAAATTATTAAGAATTTTATTGTTTTGT58.5
mandatory11_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAATATAACCAAAATCCAAATAACACTAA58.2chr7:138229946-138230021
mandatory12_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGYGGYGTTTGATGGATTTGGTTT59.2–62.9
mandatory12_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTTAATATAACCTAAACCCATATACTA59.2chr2:42275714-42275789
mandatory13_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTAGATTATGTTAAGGATTTTGGAAAT59.2
mandatory13_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTCTATACTATCAACACCCATTACTTAA60.8chr15:100249155-100249220
mandatory14_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTAAATTAGATGAGGTATAGTAGATTATAT59.2
mandatory14_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCAACTCTATCTCAAACTTCAAAAAATA59.2chr4:147557821-147557938
mandatory15_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTTGGGGGATAGTTTTGGGTAT60.1
mandatory15_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTACAACCTCCTACAAAAAAACCCTA60.9chr17:75369174-75369252
mandatory16_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGATATTTTTAATTTAATTTGAAGGTTTATTGT57.8
mandatory16_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCCAAACTTTCTCCTATAATCCAA60.3chr7:93520244-93520332
h19_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTTTGTATTATTTTTTTTTTTGAGAGTTTATTT60.2
h19_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGATACRAAAAAAACCCACAATAAACTTAATA59.8–61chr11:2017873-2018050
mest_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGTTTTGTTTTTTTAATTGTGTTTATTGTTT60.2
mest_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTAACCACTATAACCAAAATTACACAAAA59.9chr7:130131098-130131299
xist_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTAGTAATTTAGTATTGTTTATTTTATTTTTTT59
xist_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGATAACRAACCTCTTTATCTTTACTATATA59.2–60.5chrX:73070975-73071183
runx3_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTTTAGAYGTTYGGAGTTTTAGGGT58.3–62
runx3_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCRACAACCCCAACTTCCTCTA59.5–61.2chr1:25256022-25256153
rarb_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAATTTTTTTATGYGAGTTGTTTGAGGAT59.9–61.5
rarb_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTCCTTCCAAATAAATACTTACAAAAAA59.9chr3:25469822-25469959
mlh1_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGYGGGAGGTTATAAGAGTAGGGTT60.9–62.9
mlh1_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGATACRAAATATCCAACCAATAAAAACAAAA59.8–61chr3:37034573-37034734
rassf1a_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTTTTYGTAGTTTAATGAGTTTAGGTTTT60.5–62.1
rassf1a_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAATCCCTACACCCAAATTTCCATTA60.9chr3:50378200-50378398
apc_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGAGAGAGAAGTAGTTGTGTAAT60.3
apc_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCATTCTATCTCCAATAACACCCTAA60.9chr5:112073447-112073596
cdkn2a_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGATTTTGTTTTTTAAATTTTTTGGAGGGAT59.2
cdkn2a_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCCAACCTAAAACRACTTCAAAAATA60.1–61.7chr9:21974960-21975097
dapk1_p1_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTTTYGGAGTGTGAGGAGGATAGT60.9–62.9
dapk1_p1_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGRACRACRAAAACACAACTAAAAAATAAATA58.5–62.6chr9:90112783-90112938
dapk1_p2_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGYGGAGGGATYGGGGAGTTTTT62.1–65.5
dapk1_p2_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCCRCCTTAACCTTCCCAATTA63.6–65.2chr9:90112991-90113144
dapk1_i1_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGAGGYGGGGAGGTTAGTTAT61.2–63.2
dapk1_i1_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAAATAAAAAAAAACACCCTTTATTAAAACTAA59.8chr9:90113588-90113759
gstp1_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTTTGGGAAAGAGGGAAAGGTTTTT60.3
gstp1_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGRCRACCTCCRAACCTTATAAAAATAA58.4–62.9chr11:67351064-67351273
cdh1_snp_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGATTTTAGTAATTTTAGGTTAGAGGGTT59.2
cdh1_snp_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAAAAATAAATACRTAACTACAACCAAATAAA59–60.2chr16:68771006-68771197
cdh1_3ê_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTYGGAATTGTAAAGTATTTGTGAGT60.1–61.7
cdh1_3ê_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGATCAAAAAATCCRAAATACCTACAACAA59.5–61.5chr16:68771201-68771385
brca1_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTTTAGTTATTTGAGAAATTTTATAGTTTGTT59
brca1_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAATTTCRTATTCTAAAAAACTACTACTTAA58.5–59.8chr17:41277330-41277493
AluSx_1_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGAGATTAGTTTGGTTAATATGGTGAAATT59.9
AluSx_1_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTCTATCRCCCAAACTAAAATACAATA60.8–62.1
AluSx_2_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTTTGTAATTTTAGTATTTTGGGAGGT60.8
AluSx_2_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAACCTCCCRAATAACTAAAACTACAA60.1–61.7
L1ME_ORF2_1_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGATGATAAAAGGGTTAATTTATTAGAAAGAT59.8
L1ME_ORF2_1_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTATCTAATTATTCTRTCAATTACTAAAAA58.5–59.8
L1ME_ORF2_2_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGATTGATAAAGAAGAAAATAGATAAGATAT59.8
L1ME_ORF2_2_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTATTCAAATTTTCTATTTCTTTTTAAATCAA59.8
foxe3_2_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTTTTGGGGAGGTTTATTTGAGGT59.2
foxe3_2_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAACRCAAAATATACTCCAAACCAAAATA59.9–61.5chr1
foxp3_1_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTTTGGGTTTAGGGTTTTATTTGTAGT59.2
foxp3_1_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGACCCAAAACCTCAAACCTACTAAA60.3chrX
foxp3_2_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTTTTTGGGGATGGGTTAAGGGTT60.9
foxp3_2_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCAACCAATACCTACTTTAACCAAAAA60.1chrX
tlx3_1_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTTYGGTTTAAGAAAGATGATATAGAGTT59.9–61.5
tlx3_1_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTCCATCCTAAACRAACRAAAAAACTAA59.2–62.1chr5
tlx3_2_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGYGTTAGTTATTTGGGAGGGTTT59.2–60.9
tlx3_2_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAACRCTAAACTCAAATTCACACTATAAA59.5–61.5chr5
uniq_noCG_1_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGAGTTATGTAGTTTTAGTTAGAAGTTT59.2
uniq_noCG_1_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAAATCTAAATTTTAACACCTAAAACTATTTTAA59.8chr5
uniq_noCG_2_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGATATGAAAGGTTGGTTTTATTGTTGAAT59.9
uniq_noCG_2_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAAAATAAACTTAATAACTCTACTCTTATATA59chr5
mgmt_1_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTTGAGTTAGGTTTTGGTAGTGTT60.3
mgmt_1_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCTAATACCRCTCCCCTAATCAAAA60.3–62chr10
mgmt_2_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTGGTAGTTTYGAGTGGTTTTGT59.2–60.9
mgmt_2_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGAACTAAACAACACCTAAAAAACACTTAA59.9chr10
mito_1_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTATTTATTTTTAATAGTATATAGTATATAAAGTT58.5
mito_1_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGACTTTAACTACCCCCAAATATTATAA58.4chrM
mito_2_plus_FTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGATGATTTTTAATAGGGGTTTTTTTAGTTT59.2
mito_2_plus_RGTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCRTATCRAAAACCTTTTTAAACAAATAATA58.5–61chrM
Human Samples used in this study Bisulfite PCR primers used in this study Our initial QC assessment indicated high bisulfite conversion efficiency with very low non-CpG Cs in reads. An additional amplicon that corresponded to a sequence containing no CpG sites was also included as a control, from which all cytosines were observed to have converted to thymidine residues [1]. The data included here are the Sequence Read Archive files generated from our experiment. These have been aligned onto the hg38 reference genome using Bismark v0.9.0, from which a BAM file for each sample is generated. Using the Bismark_methylation_extractor command, the methylation status of cytosine residues within each read is output to a tab-delimited file. Methpat then operates on this output file to generate both a summarized tab-delimited file of read pattern counts and a HTML file for visualization. We have included the BAM files, Bismark_methylation_extractor output files and Methpat output files as supporting data. Methpat requires a Browser Extensive Data (BED)-format-like file that contains the coordinates for each amplicon of interest, their size and their primer lengths to extract and summarize DNA methylation pattern counts. The flow of data is summarized in Fig. 1.
Fig. 1

Flow of data towards visualization via Methpat. Raw fastq files are aligned to the hg38 reference genome in bisulfite space. a hg38 reference is prepared for Bismark using Bismark_genome_preparation with default parameters. b Bismark is used to align raw reads from fastq files to generate BAM alignment files. c Bismark_methylation_extractor is then used to extract the methylation status of all cytosines in every aligned read and outputs a tab-delimited file that Methpat operates on. Methpat requires this file along with a BED formatted file containing information for each amplicon of interest. This includes the start and end coordinates of the amplicon and the primer lengths for each amplicon. The output of Methpat is a summary tab-delimited file containing read counts of DNA methylation patterns of the amplicons of interest and an HTML file for visualization and publication quality figures

Flow of data towards visualization via Methpat. Raw fastq files are aligned to the hg38 reference genome in bisulfite space. a hg38 reference is prepared for Bismark using Bismark_genome_preparation with default parameters. b Bismark is used to align raw reads from fastq files to generate BAM alignment files. c Bismark_methylation_extractor is then used to extract the methylation status of all cytosines in every aligned read and outputs a tab-delimited file that Methpat operates on. Methpat requires this file along with a BED formatted file containing information for each amplicon of interest. This includes the start and end coordinates of the amplicon and the primer lengths for each amplicon. The output of Methpat is a summary tab-delimited file containing read counts of DNA methylation patterns of the amplicons of interest and an HTML file for visualization and publication quality figures Our data has the potential to be used to investigate co-methylation [8], given the unprecedented depth of coverage of the amplicons investigated even in a single MiSeq run. We have interrogated a variety of regions of the genome including repetitive elements and the mitochondrial genome, which remain a challenge for most short read aligners. The interpretation of DNA methylation at repetitive sequence elements has always been a challenge and they are assumed to be methylated [9]. However, the dynamics of repetitive element DNA methylation in cancer [10] and development [11] remain areas of interest that can now be properly interpreted with massively parallel sequencing and visualization tools such as Methpat.

Availability of software and requirements

Project name: Methpat Project home page: http://bjpop.github.io/methpat/ Operating system(s): any POSIX-like operating system (i.e.: Linux, OS X) Programming language: Python 2.7, HTML and Javascript Other requirements: Web Browser to view visualization output (HTML file). Suggested browsers include Firefox, Chrome or Safari. Methpat requires output files derived by Bismark (http://www.bioinformatics.babraham.ac.uk/projects/bismark/) and the Bismark_methylation_extractor command. Methpat can be accessed directly from http://bjpop.github.io/methpat/. With further instructions found at the URL. License: 3-clause BSD License Any restrictions to use by non-academics: None A flow diagram of analytical requirements and files can be found in Fig. 1.

Availability of supporting data and materials

Sequence files associated with main research publication deposited in GEO, GSE67856 [5]. Remaining files are deposited in GEO, GSE71804 [6]. BAM files, bismark_methylation_extractor output files and Methpat output files for each sample analyzed in this paper are available in the GigaScience GigaDB repository [12].
  10 in total

Review 1.  DNA methylation: a profile of methods and applications.

Authors:  Mario F Fraga; Manel Esteller
Journal:  Biotechniques       Date:  2002-09       Impact factor: 1.993

2.  Genome-wide dynamic changes of DNA methylation of repetitive elements in human embryonic stem cells and fetal fibroblasts.

Authors:  Jianzhong Su; Xiujuan Shao; Hongbo Liu; Shengqiang Liu; Qiong Wu; Yan Zhang
Journal:  Genomics       Date:  2011-10-25       Impact factor: 5.736

3.  DNA co-methylation analysis suggests novel functional associations between gene pairs in breast cancer samples.

Authors:  Ruslan Akulenko; Volkhard Helms
Journal:  Hum Mol Genet       Date:  2013-04-09       Impact factor: 6.150

Review 4.  Defining the E-cadherin repressor interactome in epithelial-mesenchymal transition: the PMC42 model as a case study.

Authors:  Honor J Hugo; Maria I Kokkinos; Tony Blick; M Leigh Ackland; Erik W Thompson; Donald F Newgreen
Journal:  Cells Tissues Organs       Date:  2010-11-02       Impact factor: 2.481

Review 5.  DNA hypomethylation and human diseases.

Authors:  Ann S Wilson; Barbara E Power; Peter L Molloy
Journal:  Biochim Biophys Acta       Date:  2006-09-01

Review 6.  The implications of heterogeneous DNA methylation for the accurate quantification of methylation.

Authors:  Thomas Mikeska; Ida L M Candiloro; Alexander Dobrovic
Journal:  Epigenomics       Date:  2010-08       Impact factor: 4.778

Review 7.  Cytosine methylation and the ecology of intragenomic parasites.

Authors:  J A Yoder; C P Walsh; T H Bestor
Journal:  Trends Genet       Date:  1997-08       Impact factor: 11.639

8.  MethPat: a tool for the analysis and visualisation of complex methylation patterns obtained by massively parallel sequencing.

Authors:  Nicholas C Wong; Bernard J Pope; Ida L Candiloro; Darren Korbie; Matt Trau; Stephen Q Wong; Thomas Mikeska; Xinmin Zhang; Mark Pitman; Stefanie Eggers; Stephen R Doyle; Alexander Dobrovic
Journal:  BMC Bioinformatics       Date:  2016-02-24       Impact factor: 3.169

9.  DNA methylation arrays as surrogate measures of cell mixture distribution.

Authors:  Eugene Andres Houseman; William P Accomando; Devin C Koestler; Brock C Christensen; Carmen J Marsit; Heather H Nelson; John K Wiencke; Karl T Kelsey
Journal:  BMC Bioinformatics       Date:  2012-05-08       Impact factor: 3.169

10.  Exemplary multiplex bisulfite amplicon data used to demonstrate the utility of Methpat.

Authors:  Nicholas C Wong; Bernard J Pope; Ida Candiloro; Darren Korbie; Matt Trau; Stephen Q Wong; Thomas Mikeska; Bryce J W van Denderen; Erik W Thompson; Stefanie Eggers; Stephen R Doyle; Alexander Dobrovic
Journal:  Gigascience       Date:  2015-11-26       Impact factor: 6.524

  10 in total
  2 in total

1.  Assessing alternative base substitutions at primer CpG sites to optimise unbiased PCR amplification of methylated sequences.

Authors:  Ida L M Candiloro; Thomas Mikeska; Alexander Dobrovic
Journal:  Clin Epigenetics       Date:  2017-04-04       Impact factor: 6.551

2.  Exemplary multiplex bisulfite amplicon data used to demonstrate the utility of Methpat.

Authors:  Nicholas C Wong; Bernard J Pope; Ida Candiloro; Darren Korbie; Matt Trau; Stephen Q Wong; Thomas Mikeska; Bryce J W van Denderen; Erik W Thompson; Stefanie Eggers; Stephen R Doyle; Alexander Dobrovic
Journal:  Gigascience       Date:  2015-11-26       Impact factor: 6.524

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.