| Literature DB >> 29807544 |
Shalima S Nair1,2, Phuc-Loi Luu1,2, Wenjia Qu1, Madhavi Maddugoda1,2, Lily Huschtscha3, Roger Reddel3, Georgia Chenevix-Trench4, Martina Toso4, James G Kench5,6, Lisa G Horvath6,7,8, Vanessa M Hayes1,2,6, Phillip D Stricker9, Timothy P Hughes10,11,12,13, Deborah L White10,11,14,15, John E J Rasko16,17,18, Justin J-L Wong16,17,19, Susan J Clark20,21,22.
Abstract
BACKGROUND: Comprehensive genome-wide DNA methylation profiling is critical to gain insights into epigenetic reprogramming during development and disease processes. Among the different genome-wide DNA methylation technologies, whole genome bisulphite sequencing (WGBS) is considered the gold standard for assaying genome-wide DNA methylation at single base resolution. However, the high sequencing cost to achieve the optimal depth of coverage limits its application in both basic and clinical research. To achieve 15× coverage of the human methylome, using WGBS, requires approximately three lanes of 100-bp-paired-end Illumina HiSeq 2500 sequencing. It is important, therefore, for advances in sequencing technologies to be developed to enable cost-effective high-coverage sequencing.Entities:
Keywords: DNA methylation; Epigenetics; HiSeq X Ten; HiSeq 2500; SNP; Whole genome bisulphite sequencing
Mesh:
Substances:
Year: 2018 PMID: 29807544 PMCID: PMC5971424 DOI: 10.1186/s13072-018-0194-0
Source DB: PubMed Journal: Epigenetics Chromatin ISSN: 1756-8935 Impact factor: 4.954
Comparison of different library preparation kits on intact genomic DNA using HiSeq 2500 Platform
| Library preparation method | Library preparation kit | Input amount (ng) | Size selection (bp) | Duplicate read (%) | Fragment size (stdev) | Coverageb | Ratio of bias in CpG island/shores |
|---|---|---|---|---|---|---|---|
| Pre-BS | KAPA LTP (Kapa Biosystems) | 1000 | < 300a | 1.00 | 175 bp (42 bp) | 8.8× | 0.7:0.9 |
| > 300 | 3.50 | 226 bp (69 bp) | 12.4× | 0.3:0.7 | |||
| < 400 | 0.72 | 229 bp (62 bp) | 13.5× | 0.3:0.6 | |||
| > 400 | 1.90 | 237 bp (82 bp) | 17.2× | 0.8:0.9 | |||
| KAPA Hyperprep (Kapa Biosystems) | 100 | < 300a | 5.50 | 164 bp (72 bp) | 6.3× | 0.3:0.7 | |
| > 300 | 5.50 | 189 bp (88 bp) | 7.0× | 0.3:0.7 | |||
| Post-BS | TruSeq DNA methylation (Illumina) | 50 | < 200a | 10.00 | 156 bp (77 bp) | 4.0× | 2.1:1.5 |
| TruMethyl WGv 1.9 (Cambridge Epigenetix) | 200 | > 300a | 0.95 | 227 bp (98 bp) | 12.9× | 0.8:1.1 | |
| Accel-NGS Methyl-Seq (Swift Biosciences) | 100 | < 300a | 1.40 | 179 bp (61 bp) | 10.7× | 0.7:0.9 | |
| > 300 | 1.10 | 202 bp (70 bp) | 12.6× | 0.7:0.9 |
aManufacturer’s recommended method
bTwo lanes( HiSeq 2500 rapid run)
Fig. 1Comparing different library preparation methods using genomic DNA on the HiSeq 2500. a Cluster plot of sequencing output metrics obtained from different library preparation methods from intact genomic DNA. b Bar graph showing fragment size distribution for different bead size selection for each library preparation method. c Plot with x-axis indicating coverage and y-axis indicating fragment size illustrating how increase in fragment size leads to improved coverage
Comparison of coverage obtained with different bisulphite library and PhiX loading concentrations
| Library preparation method | Library preparation kit name | Input amount (ng) | DNA samples | Sequencing platform | Bisulphite library loading concentration | PhiX loading concentration | % of spike-in | Coverage/lane |
|---|---|---|---|---|---|---|---|---|
| Post-BS | TruMethyl WG v1.9 | 200 | Cell line (LNCaP) | HiSeq X Ten | 300 pM | 250 pM | 25% | 8.70× |
| Cell line (LNCaP) | 250 pM | 250 pM | 14.50× | |||||
| Cell line (LNCaP) | 300 pM | 15.19× | ||||||
| 1. Blood sample (2) | 250 pM | 15.46× | ||||||
| 2. Blood sample (3) | 250 pM | 13.87× | ||||||
| 3. Blood sample (4) | 250 pM | 12.15× | ||||||
| 4. Blood sample (6) | 300 pM | 20.24× | ||||||
| 5. Blood sample (23) | 300 pM | 16.59× | ||||||
| 6. Blood sample (24) | 300 pM | 19.67× |
Fig. 2Optimisation of spike-in library and its loading concentration. a Bar plot showing the difference in coverage obtained from sequencing two different loading concentrations of the PhiX spike-in library for six WGBS libraries obtained from blood DNA samples. b Bar plot showing the coverage obtained from both the bisulphite library and genomic library when 25% of genomic library is spiked in instead of PhiX, for five different bisulphite libraries prepared by the TruMethyl WG method. c Box plot showing the coverage distribution of the same set of bisulphite libraries when spiked with either 25% of 250 pM genomic library or 25% of 300 pM of PhiX library
Comparison of coverage from bisulphite and genomic library when 25% of genomic library is spiked
| Library preparation method | Library preparation kit name | Input amount (ng) | WGBS library samples | Sequencing platform | WGBS library loading concentration | WGS library samplea | % of spike-in library | WGBS coverage/lane | WGS coverage/lane |
|---|---|---|---|---|---|---|---|---|---|
| Post-BS | TruMethyl WG v1.9 | 200 | 1. Cell line (LNCaP) | HiSeq X Ten | 250 pM | Cell line (LNCaP) | 25% | 15.57× | 7.40× |
| 2. Cell line (LNCaP) | 14.10× | 7.02× | |||||||
| 3. Cell line (B80-T17-P12) | 16.50× | 9.50× | |||||||
| 4. Blood sample 3 | 14.80× | 5.47× | |||||||
| 5. Blood sample 4 | 14.40× | 12.47× |
aGenomic library used as spike-in
Comparison of coverage and duplicate reads for bisulphite libraries when spiked with genomic library or PhiX
| Library preparation method | Library preparation kit | Input amount (ng) | Sequencing platform | DNA samples | Coverage with genomic spike-ina | Coverage with PhiX spike-inb | Duplicate read % with genomic spike-in | Duplicate read % with PhiX spike-in |
|---|---|---|---|---|---|---|---|---|
| Post-BS | TruMethyl WG v1.9 | 200 | HiSeq X Ten | 1. Blood sample (23) | 14.02× | 16.6× | 32 | 39 |
| 2. Blood sample (24) | 13.66× | 19.67× | 21 | 30 | ||||
| 3. Blood sample (25) | 12.43× | 18.37× | 22 | 33 | ||||
| 4. Blood sample (26) | 12.6× | 17.34× | 28 | 36 | ||||
| 5. Cell line (27) | 12.36× | 16.29× | 33 | 44 | ||||
| 6. Cell line (28) | 13.37× | 18.33× | 25 | 36 | ||||
| 7. Blood sample (5) | 11.8× | 15.19× | 36 | 48 | ||||
| 8. Blood sample (6) | 15.14× | 20.24× | 21 | 28 | ||||
| 9. Blood sample (13) | 10.86× | 17.41× | 18 | 30 | ||||
| 10. Blood sample (14) | 11.79× | 15.55× | 31 | 43 |
a25% of 250 pM genomic library spiked in
b25% of 300 pM PhiX library spiked in
Comparison of coverage from bisulphite and genomic library when sequenced in a 50:50 ratio on a single lane of HiSeq X Ten
| Library preparation method | Library preparation kit name | Input amount (ng) | WGBS library sample | WGBS library loading concentration | WGS library samplea | % of spike-in library | WGBS coverage/lane | WGS coverage/lane |
|---|---|---|---|---|---|---|---|---|
| Post-BS | TruMethyl WG v1.9 | 200 | 1a. Prostate cancer DNA (5287) | 250 pM | 1a. Prostate cancer DNA (5287) | 50% | 13.48× | 13.48× |
| 1b. Prostate cancer DNA (5287) | 1b. Prostate cancer DNA (5287) | 13.16× | 13.97× | |||||
| 2a. Prostate cancer DNA (5060) | 2a. Prostate cancer DNA (5060) | 13.20× | 15.38× | |||||
| 2b. Prostate cancer DNA (5060) | 2b. Prostate cancer DNA (5060) | 12.70× | 15.70× | |||||
| 3a. Prostate cancer DNA (13179) | 3 a. Prostate cancer DNA (13179) | 10.70× | 16.12× | |||||
| 3b. Prostate cancer DNA (13179) | 3b. Prostate cancer DNA (13179) | 9.80× | 16.83× | |||||
| 4a. Prostate cancer DNA (10738) | 4a. Prostate cancer DNA (10738) | 11.80× | 16.50× | |||||
| 4b. Prostate cancer DNA (10738) | 4b. Prostate cancer DNA (10738) | 11.20× | 16.45× |
a50% of corresponding genomic library used as spike-in
Fig. 3Integrating whole genome and whole genome bisulphite sequencing. a Bar plot depicting the coverage obtained when both genomic and its corresponding bisulphite library is sequenced on the same lane of the HiSeq X Ten, for four prostate cancer samples sequenced in duplicate (a, b). b A representative IGV plot showing a C to T SNP identified in both the WGS and WGBS data at approximately 13× coverage. c Bar plot indicating the percentage of SNPs from WGBS concordant in spike-in WGS at 13× and 26× coverage. d A representative Venn diagram for one prostate cancer sample, 2a showing the number of SNPs concordant and discordant at 13× and 26× coverage for both WGBS and spike-in WGS
Comparison of duplicate reads obtained for the same libraries sequenced on both HiSeq 2500 and HiSeq X Ten
| Library preparation method | Library preparation kit | Input amount (ng) | Sequencing platform | DNA samples | Raw reads | Duplicate reads (%) | Coverage | Ratio of CpG islands/shores |
|---|---|---|---|---|---|---|---|---|
| Post-BS | TruMethyl WG v1.9 | 200 | HiSeq 2500 Rapid runa | 1. Cell line (B80-T17-p12) | 303298084 | 2.0 | 9.12× | 1.0:1.1 |
| 2. Cell line (B80-T17-p95) | 412412960 | 2.7 | 12.40× | 0.9:1.1 | ||||
| 3. Cell line (B80-T8-p8) | 309366352 | 1.9 | 9.48× | 1.0:1.1 | ||||
| 4. Cell line (B80-T8-p46) | 337623632 | 2.2 | 10.20× | 1.0:1.1 | ||||
| 5. Cell line (MCF7) | 232184682 | 1.6 | 7.24× | 1.0:1.1 | ||||
| 6. Cell line (TAMR) | 28866392 | 1.2 | 6.66× | 1.0:1.1 | ||||
| HiSeq X Ten | 1. Cell line (B80-T17-p12) | 549,330,057 | 18 | 16.41× | 1.0:1.1 | |||
| 2. Cell line (B80-T17-p95) | 623,032,192 | 18 | 18.83× | 0.8:1.0 | ||||
| 3. Cell line (B80-T8-p8) | 534,563,634 | 15 | 16.84× | 1.0:1.1 | ||||
| 4. Cell line (B80-T8-p46) | 551,950,690 | 16 | 17.24× | 1.0:1.1 | ||||
| 5. Cell line (MCF7) | 634,201,220 | 19 | 23.43× | 0.9:1.0 | ||||
| 6. Cell line (TAMR) | 571,714,843 | 15 | 20.48× | 0.9:1.0 |
aOne rapid run is two lanes
Fig. 4Correlation between duplicate reads, spike-ins and the sequencing platforms. a, b Plots showing the frequency of distribution of duplicate reads for two cell line DNA samples during down sampling of the raw reads from HiSeq X Ten to the number of raw reads obtained from HiSeq 2500. c Box plot showing the difference in duplicate percentage when the same set of ten bisulphite libraries were spiked with 25% of 250 pM genomic library and 25% of 300 pM PhiX library
Fig. 5Coverage comparison between HiSeq 2500 and HiSeq X Ten. a Plot shows the fraction of genome covered at different depths for four samples sequenced together on one lane of the HiSeq 2500 versus when each of the samples is sequenced on a single lane of the HiSeq X Ten. The coverage plot for the HiSeq 2500 HO mode is the merged coverage obtained from multiplexing the four samples. b Plot shows the fraction of CpG sites covered at different depths when four clinical samples are sequenced together on one lane of the HiSeq 2500 versus when each of the samples is sequenced on a single lane of the HiSeq X Ten. c, d Box plot showing the coverage distribution across exons, intergenic regions, introns, promoter regions and repeat regions of the genome for a sample sequenced on one lane of HiSeq X Ten (c) and HiSeq 2500 (d)
Comparison of different library preparation kits on FFPET using HiSeq 2500 Platform
| Library preparation method | Library preparation kit | Input amount (ng) | Duplicate read (%) | Fragment size (stdev) | Coveragea | Ratio of bias in CpG island/shores |
|---|---|---|---|---|---|---|
| Pre-BS | KAPA LTP (Kapa Biosystems) | 1000 | 4.2 | 138 bp (41 bp) | 5.4× | 0.5:0.8 |
| KAPA Hyperprep (Kapa Biosystems) | 100 | 5.6 | 105 bp (49 bp) | 3.9× | 0.2:0.6 | |
| Post-BS | TruSeq DNA methylation (Illumina) | 50 | 9.6 | 73 bp (37 bp) | 4.2× | 3.3:2.0 |
| TruMethyl WG v1.9 (Cambridge Epigenetix) | 200 | 1.8 | 120 bp (49 bp) | 5.1× | 1.1:1.0 | |
| Accel-NGS methyl-seq (Swift Biosciences) | 100 | 1.4 | 132 bp (50 bp) | 6.6× | 0.6:0.9 |
aTwo lanes (HiSeq 2500 rapid run)
Fig. 6Comparing different library preparation methods using FFPET DNA. Cluster plot of sequencing output metrics obtained from different library preparation methods from FFPET DNA
Comparison of coverage output of FFPET on the HiSeq X Ten using two library preparation kits
| Library preparation method | Library preparation kit name | Input amount (ng) | DNA samples FFPET | Sequencing platform | Fragment size (stdev) | Duplicate read (%) | Coverage/lane | CpG island/shorea |
|---|---|---|---|---|---|---|---|---|
| Post-BS | Accel-NGS Methyl-Seq | 100 | 1. Prostate normal(1601) | HiSeq X Ten | 158 bp (51 bp) | 13 | 13.05× | 0.6:1.0 |
| 2. Prostate cancer (1601) | 177 bp (53 bp) | 18 | 13.97× | 0.4:0.9 | ||||
| TruMethyl WG v1.9 | 200 | 1. Prostate normal (1601) | 161 bp (81 bp) | 23 | 11.16× | 1.3:1.3 | ||
| 2. Prostate cancer (1601) | 174 bp (91 bp) | 27 | 10.84× | 0.8:1.1 |
aRatio of coverage represented in the CpG islands and shores of the genome
Fig. 7Difference in HiSeq X Ten coverage distribution for FFPET bisulphite library prepared from two methods. a, b Box plot showing the difference in coverage across CpG islands, CpG shores and other regions of the genome for TruMethyl WG (a) and Accel-NGS Methyl-Seq (b) methods, when sequenced on the HiSeq X Ten. c IGV plot showing the difference in distribution of reads for a FFPET library obtained from the TruMethyl WG method and Accel-NGS Methyl-Seq method across a CpG island. d, e Box plots showing the coverage distribution across exons, intergenic regions, introns, promoter regions and repeat regions of the genome for a FFPET library prepared by the TruMethyl WG (d) and Accel-NGS Methyl-Seq (e) methods and sequenced on one lane of HiSeq X Ten
Fig. 8Comparison of methylation correlation between HiSeq 2500 and HiSeq X Ten. a Correlation plots of methylation levels obtained from a cell line, a clinical sample and a FFPET sample sequenced on the Hiseq 2500 versus HiSeq X Ten (Pearson r > 0.94). b Correlation of methylation values obtained from HiSeq 2500 and HiSeq X Ten for a cell line, clinical sample and a FFPET sample after grouping them into four bins of methylation percentages. c Average kappa values for six sample pairs, including two cell lines, two clinical samples and two FFPE samples compared between the HiSeq 2500 and HiSeq X Ten platform. d Bar plot showing the distribution of percentage of discordant sites across the genome for a cell line, clinical sample and a FFPET sample