| Literature DB >> 33048617 |
Nan Lin1, Jinpeng Liu1, James Castle1, Jun Wan2, Aditi Shendre3, Yunlong Liu2, Chi Wang1, Chunyan He1,4.
Abstract
A newly-developed platform, the Illumina TruSeq Methyl Capture EPIC library prep (TruSeq EPIC), builds on the content of the Infinium MethylationEPIC Beadchip Microarray (EPIC-array) and leverages the power of next-generation sequencing for targeted bisulphite sequencing. We empirically examined the performance of TruSeq EPIC and EPIC-array in assessing genome-wide DNA methylation in breast tissue samples. TruSeq EPIC provided data with a much higher density in the regions when compared to EPIC-array (~2.74 million CpGs with at least 10X coverage vs ~752 K CpGs, respectively). Approximately 398 K CpGs were common and measured across the two platforms in every sample. Overall, there was high concordance in methylation levels between the two platforms (Pearson correlation r = 0.98, P < 0.0001). However, we observed that TruSeq EPIC measurements provided a wider dynamic range and likely a higher quantitative sensitivity for CpGs that were either hypo- or hyper-methylated (β close to 0 or 1, respectively). In addition, when comparing different breast tissue types TruSeq EPIC identified more differentially methylated CpGs than EPIC-array, not only out of additional sites interrogated by TruSeq EPIC alone, but also out of common sites interrogated by both platforms. Our results suggest that both platforms show high reproducibility and reliability in genome-wide DNA methylation profiling, while TruSeq EPIC had a significant improvement over EPIC-array regarding genomic resolution and coverage. The wider dynamic range and likely higher precision of the estimates by the TruSeq EPIC may lead to the identification of novel differentially methylated markers that are associated with disease risk.Entities:
Keywords: DNA methylation; differentially methylated sites; genome coverage; genomic resolution; infinium MethylationEPIC Beadchip; methyl-capture sequencing; methylome; microarray; next-generation sequencing; quantitative sensitivity
Mesh:
Year: 2020 PMID: 33048617 PMCID: PMC8216193 DOI: 10.1080/15592294.2020.1827703
Source DB: PubMed Journal: Epigenetics ISSN: 1559-2294 Impact factor: 4.528
Sample information for the 11 beast tissue samples in this study
| Sample ID | Tissue Type | Age | Race | Subtype info |
|---|---|---|---|---|
| K1 | Normal | 70 | White | |
| K2 | Normal | 56 | White | |
| K3 | Normal | 63 | White | |
| AN1 | Adjacent Normal | 71 | White | |
| AN2 | Adjacent Normal | 56 | White | |
| AN3 | Adjacent Normal | 63 | White | |
| AN4 | Adjacent Normal | 43 | White | |
| T1 | Tumour | 71 | White | ER+/PR+/HER2- |
| T2 | Tumour | 56 | White | ER+/PR-/HER2- |
| T3 | Tumour | 63 | White | ER+/PR+/HER2- |
| T4 | Tumour | 43 | White | ER-/PR-/HER2 + |
Note: T and AN samples were paired samples from same breast cancer patients, K samples from healthy women were matched with T and AN samples on age (within one year) and race.
Summary of sequencing alignment and duplication rates for the 11 breast tissue samples in the study for TruSeq EPIC
| Sample ID | Raw Paired Reads | Paired Reads Analyzed | Unique Aligned | Mapping Efficiency (%) | Duplication Rate (%) | Usable Aligned | Reads in Target Region (%) |
|---|---|---|---|---|---|---|---|
| K1 | 48,362,869 | 45,836,705 | 38,908,769 | 84.9 | 53.1 | 18,247,939 | 95.73 |
| K2 | 57,267,567 | 54,819,987 | 46,870,975 | 85.5 | 29.6 | 32,989,161 | 94.78 |
| K3 | 43,425,858 | 41,897,565 | 35,948,675 | 85.8 | 33.5 | 23,903,913 | 96.48 |
| AN1 | 54,299,349 | 51,273,091 | 43,424,920 | 84.7 | 41.2 | 25,535,830 | 94.43 |
| AN2 | 59,511,340 | 56,987,839 | 48,549,950 | 85.2 | 32.1 | 32,978,423 | 95.76 |
| AN3 | 52,000,198 | 49,803,822 | 42,539,248 | 85.4 | 26.5 | 31,254,585 | 95.78 |
| AN4 | 62,153,856 | 59,671,655 | 51,034,628 | 85.5 | 28.8 | 36,331,972 | 96.01 |
| T1 | 84,397,277 | 80,140,512 | 68,142,382 | 85.1 | 43.5 | 38,532,574 | 94.96 |
| T2 | 57,345,325 | 54,985,466 | 46,779,470 | 85.1 | 32.5 | 31,583,044 | 95.88 |
| T3 | 67,981,266 | 65,416,988 | 55,886,719 | 85.5 | 27.5 | 40,510,431 | 96.23 |
| T4 | 61,435,469 | 59,012,172 | 50,443,336 | 85.5 | 29.7 | 35,447,364 | 96.06 |
| Average |
Figure 1.Genomic coverage of the TruSeq EPIC at different sequencing depths for the 11 breast tissue samples. T, breast tumour tissue; AN, adjacent normal breast tissue; K, normal breast tissue
The number of CpGs detected by Truseq EPIC at different sequencing depths
| Sample ID | ≥1X | ≥10X | ≥20X | ≥30X | ≥40X | ≥50X |
|---|---|---|---|---|---|---|
| K1 | 3,326,519 | 2,906,961 | 1,935,191 | 1,147,526 | 649,988 | 345,863 |
| K2 | 3,324,716 | 2,831,944 | 1,791,353 | 997,678 | 521,914 | 249,524 |
| K3 | 3,325,728 | 2,826,205 | 1,763,117 | 981,020 | 520,099 | 255,283 |
| AN1 | 3,324,094 | 2,862,913 | 1,873,173 | 1,087,939 | 602,832 | 313,637 |
| AN2 | 3,323,823 | 2,810,763 | 1,763,491 | 990,840 | 529,811 | 262,480 |
| AN3 | 3,318,153 | 2,564,387 | 1,404,857 | 695,249 | 318,028 | 129,452 |
| AN4 | 3,316,749 | 2,544,485 | 1,355,798 | 635,128 | 265,809 | 96,014 |
| T1 | 3,322,051 | 2,725,132 | 1,642,768 | 902,016 | 475,439 | 238,078 |
| T2 | 3,322,979 | 2,815,293 | 1,805,303 | 1,042,472 | 575,265 | 300,211 |
| T3 | 3,319,143 | 2,592,923 | 1,468,107 | 771,884 | 392,301 | 193,353 |
| T4 | 3,320,280 | 2,702,866 | 1,664,471 | 940,301 | 509,364 | 263,823 |
| Average |
Figure 2.The distribution of CpGs by different genomic annotations from TruSeq EPIC (≥10X) and EPIC-array platforms. (a) CpG-island context; (b) genomic function context; (c) regulatory region context; (d) chromosome
Figure 3.Distribution of methylation β values from the two platforms for our 11 breast tissue samples. (a) the common CpGs across the two platforms in TruSeq EPIC (≥10X); (b) the common CpGs in EPIC-array; (c) all CpGs detected in TruSeq EPIC (≥10X); (d) all CpG detected in EPIC-array
Figure 4.Pearson correlation between methylation β-values of the common CpGs across TruSeq EPIC (≥10X) and EPIC-array platforms by different sequencing depth for our 11 breast tissue samples
Figure 5.Scatterplots and Pearson correlations of the mean methylation β values for the common CpGs from TruSeq EPIC (≥10X) and EPIC-array data. Red dotted lines denote Y = X. (a) T samples; (b) AN samples; (c) K samples; (d) all samples combined
Concordance of the mean methylation β values of the common CpGs from TruSeq EPIC (≥10X) and EPIC-array platforms
| TruSeq EPIC | ||||
|---|---|---|---|---|
| EPIC-array | Hypo (β < 0.3) | Hemi (0.3 ≤ β ≤ 0.7) | Hyper (β > 0.7) | |
| 116,474 | 1,465 | 10 | ||
| 20,982 | 77,966 | 25,337 | ||
| 19 | 8,632 | 147,693 | ||
Figure 6.Scatterplots and Pearson correlations of the mean differences of methylation β values (∆β) for the common CpGs between two tissue types from TruSeq EPIC (≥10X) and EPIC-array data. Red dotted lines denote Y = X. (a) T vs. K; (b) T vs. AN; and (c) AN vs. K
Concordance of the mean differences of methylation β values (∆β) between two tissue types from TruSeq EPIC (≥10X) and EPIC-array data
| T vs. K | TruSeq EPIC | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Δβ | Δ | Δ | ||||||||||
| Δ | 16,425 | 1,319 | 0 | |||||||||
| 19,263 | 333,883 | 15,628 | ||||||||||
| Δ | 0 | 2,679 | 9,382 | |||||||||
| Δ | Δ | Δ | ||||||||||
| Δ | 20,458 | 1,524 | 0 | |||||||||
| 22,002 | 335,620 | 11,297 | ||||||||||
| Δ | 0 | 2,026 | 5,652 | |||||||||
| Δ | Δ | Δ | ||||||||||
| Δ | 28 | 22 | 0 | |||||||||
| 2,539 | 392,731 | 3,242 | ||||||||||
| Δ | 0 | 8 | 9 | |||||||||
Figure 7.The number of differentially methylated positions (DMPs) between tissue types identified by TruSeq EPIC (≥10X) and EPIC-array platforms. DMPs were defined by FDR < 0.05 and |∆β| ≥ 0.1. (a) the common CpGs across the two platforms; (b) all CpGs detected by each platform
Summary of the key features of TruSeq EPIC and the EPIC-array platforms
| Technology | Method | Resolution | DNA Amount | #CpGs | Analytic Pipeline* | Cost ** | |
|---|---|---|---|---|---|---|---|
| NGS-based | Methylation Sequencing | Single base | 500ng | 3.3 M | +++ | ++ | |
| Microarray-based | Methylation Array | Single base | 250ng | 850 K | + | + |
*, The TruSeq EPIC requires an analytic pipeline on sequencing data, including alignment, base call, and QC criteria on sequencing depth; the EPIC-array requires QC criteria that removes CpGs that could be affected by poor hybridization, such as CpGs close to known SNPs.
**, the cost is changing over time and also depends on different service providers and the number of samples being processed. As of early 2020, the cost for TruSeq EPIC for a depth of 50 M reads generally ranges between $550-$650/sample, and the cost for EPIC-array generally ranges between $350-$450/sample.