| Literature DB >> 23193365 |
Aniruddha Chatterjee1, Euan J Rodger, Peter A Stockwell, Robert J Weeks, Ian M Morison.
Abstract
Reduced representation bisulfite sequencing (RRBS), which couples bisulfite conversion and next generation sequencing, is an innovative method that specifically enriches genomic regions with a high density of potential methylation sites and enables investigation of DNA methylation at single-nucleotide resolution. Recent advances in the Illumina DNA sample preparation protocol and sequencing technology have vastly improved sequencing throughput capacity. Although the new Illumina technology is now widely used, the unique challenges associated with multiplexed RRBS libraries on this platform have not been previously described. We have made modifications to the RRBS library preparation protocol to sequence multiplexed libraries on a single flow cell lane of the Illumina HiSeq 2000. Furthermore, our analysis incorporates a bioinformatics pipeline specifically designed to process bisulfite-converted sequencing reads and evaluate the output and quality of the sequencing data generated from the multiplexed libraries. We obtained an average of 42 million paired-end reads per sample for each flow-cell lane, with a high unique mapping efficiency to the reference human genome. Here we provide a roadmap of modifications, strategies, and trouble shooting approaches we implemented to optimize sequencing of multiplexed libraries on an a RRBS background.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23193365 PMCID: PMC3495292 DOI: 10.1155/2012/741542
Source DB: PubMed Journal: J Biomed Biotechnol ISSN: 1110-7243
Figure 1Representative analytical PCR of size-selected RRBS libraries.The 160–340 bp size-selected RRBS libraries (represented by a, b, and c) were amplified with either 15 or 20 cycles of PCR to determine the optimal number of cycles for large-scale amplification. PCR products were visualized on a 4–20% Criterion gradient polyacrylamide TBE gel stained with SYBR green nucleic acid gel stain alongside a 25 bp DNA ladder. For these libraries, 13 (a) and 14 (b and c) cycles were chosen for large-scale amplification. The distinct band at 125 bp in all libraries was possibly due to adaptor-adaptor dimerization.
Figure 2Comparison of base-calling between regular genomic libraries and RRBS libraries. Relative intensities for each base channel are shown across the 200 cycles of a 100 bp paired-end HiSeq 2000 sequencing run. A. Regular multiplexed human genomic libraries, lane 4. B. Multiplexed RRBS libraries, lane 8.
Comparison between base calling performed by RTA (real-time analyzer) and OLB (off-line basecaller) of multiplexed samples.
| Sample ID | Number of bases after RTA | Number of bases after OLB | Percentage of change (RTA versus OLB) | Number of reads after RTA | Number of reads after OLB | Percentage of change (RTA versus OLB) |
|---|---|---|---|---|---|---|
| Read1 | ||||||
| 1 | 1520286198 | 1548285294 | 1.8 | 16377421 | 16503971 | 0.8 |
| 2 | 2282620124 | 2350814836 | 3.0 | 24702835 | 25201462 | 2.0 |
| 3 | 2753391480 | 2846632920 | 3.4 | 30043201 | 30847092 | 1.7 |
| 4 | 1280388372 | 1325584268 | 3.5 | 13754015 | 14152462 | 2.9 |
| 5 | 1837282806 | 1881566740 | 2.4 | 20849236 | 21178885 | 1.6 |
|
| ||||||
| Read2 | ||||||
| 1 | 1512562937 | 1503732114 | −0.7 | 16377421 | 16503971 | 0.8 |
| 2 | 2214536843 | 2212529249 | −0.1 | 24702835 | 25200712 | 2.0 |
| 3 | 2621636705 | 2631071280 | 0.4 | 30043201 | 30847092 | 2.7 |
| 4 | 1265479416 | 1276840465 | 0.9 | 13754015 | 14152462 | 2.9 |
| 5 | 1516530411 | 1492797800 | −1.6 | 20849236 | 21178885 | 1.6 |
Figure 3Per base sequence quality of sample 2 as generated by FASTQC for the dataset obtained from RTA base calling (a) and OLB base calling (b). The yellow box plots (red bar: median, box: interquartile ranges 25–75%, and whisker: 10–90% percentile) show the base-calling quality scores across all sequencing reads of sample 2. The blue line indicates the mean quality score. The other samples had similar per base sequence quality.
Details of data generated for multiplexed RRBS libraries.
| Sample ID | Adaptor | Raw data including cif files (Gb)1 | After RTA base calling (Gb)1 | Sequence volume (Gb)2 | Uncompressed, 2 reads (Gb) | Paired-end reads (106) |
|---|---|---|---|---|---|---|
| 1 | 1 | 3.30 | 8.6 | 32.8 | ||
| 2 | 3 | 4.99 | 12.9 | 49.4 | ||
| 3 | 8 | 320 | 55.3 | 6.06 | 15.7 | 60.0 |
| 4 | 9 | 2.78 | 7.2 | 27.5 | ||
| 5 | 10 | 4.21 | 10.9 | 41.7 |
1RTA uses the cif files to perform the base calling and produce. bcl files; the samples are not demultiplexed at this stage.
2CASAVA performs the demultiplexing and uses the. bcl files to generate FASTQ files for each of the samples.
Comparison of mapping performance between RTA and OLB datasets.
| Sample ID | RTA | OLB | ||
|---|---|---|---|---|
| Unique mapping (%) | Uniquely aligned reads | Unique mapping (%) | Uniquely aligned reads | |
| 1 | 71.2 | 11513092 | 71.9 | 11681235 |
| 2 | 59.4 | 13998639 | 58.3 | 13945432 |
| 3 | 66.5 | 18785673 | 55.3 | 15940503 |
| 4 | 71.4 | 9603345 | 71.9 | 9907812 |
| 5 | 63.4 | 11296038 | 63.8 | 11508956 |
1The mapping runs were performed on a Mac Pro with 64 bit duo quad core Intel Xeon processors and with 22 Gb RAM running Mac OS 10.6. The samples were mapped using Bismark v0.6.4 against the GRCh37.65 build of the human genome.