| Literature DB >> 26922377 |
Saurabh Baheti1, Rahul Kanwar2, Meike Goelzenleuchter3,4, Jean-Pierre A Kocher5, Andreas S Beutler6, Zhifu Sun7.
Abstract
BACKGROUND: DNA methylation is an important epigenetic modification involved in many biological processes. Reduced representation bisulfite sequencing (RRBS) is a cost-effective method for studying DNA methylation at single base resolution. Although several tools are available for RRBS data processing and analysis, it is not clear which strategy performs the best and there has not been much attention to the contamination issue from artificial cytosines incorporated during the end repair step of library preparation. To address these issues, we describe a new method, Targeted Alignment and Artificial Cytosine Elimination for RRBS (TRACE-RRBS), which aligns bisulfite sequence reads to MSP1 digitally digested reference and specifically removes the end repair cytosines. We compared this approach on a simulated and a real dataset with 7 other RRBS analysis tools and Illumina 450 K microarray platform.Entities:
Mesh:
Substances:
Year: 2016 PMID: 26922377 PMCID: PMC4769831 DOI: 10.1186/s12864-016-2494-8
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1RRBS concept and workflow. Genomic DNA is first digested by MSP1 which cuts at CCGG sites. These fragments are end-repaired, A-tailed, and adapter-attached and then bisulfite treated. The unmethylated C is converted to U (T) and methylated C is unchanged. The blue bases are from media and are not real biological signal that needed to be removed
Fig. 2TRACE-RRBS Flowchart. The reference genome in fasta is first digitally digested into fragments mimicking a real RRBS experiment. These fragments are size selected, end repaired, A-tailed and an adapter annealed. They are further indexed using Bowtie 2. Both sequence reads and the reference fragments are converted fully from Cs to Ts for forward and Gs to As for reverse strand before alignment by Bowtie 2. After alignment TRACE-RRBS parses the bam, removes an adapter and incorporated Cs, and compares with unconverted sequence for methylation calculation
Alignment performance comparison of simulated RRBS reads (10 Million)
| Tool | Alignmenta time (hours) | Memory usage (GB) | %Unique reads | %Correct mapped |
|---|---|---|---|---|
| BISMARK | 0.53 | 8.4 | 81.03 % | 80.79 % |
| BRAT-BW | 2.89 | 12.69 | 80.46 % | 74.11 % |
| BSMAP | 0.34 | 2.78 | 87.20 % | 85.92 % |
| BS-SEEKER2 | 0.66 | 5.3 | 73.28 % | 71.85 % |
| LAST | 0.71 | 16.1 | 78.11 % | 77.96 % |
| TRACE-RRBS | 0.26 | 8.76 | 88.59 % | 87.45 % |
| METHYLCODER | 2.5 | 25.83 | 62.38 % | 44.82 % |
| NOVOALIGN | 1.19 | 16.1 | 86.59 % | 86.59 % |
aAlignment time doesn’t include the genome preparation time
Methylation extraction and calculation of simulated RRBS reads
| Tool | Time (hours) | Memory usage (GB) | %Recalla | R2 with Truth from Simulator |
|---|---|---|---|---|
| BISMARK | 1.08 | 5.78 | 81.03 % | 0.844 |
| BRAT-BW | 2.89 | 2.5 | 80.46 % | NA |
| BSMAP | 0.4b | 3.14b | 87.20 % | 0.864 |
| BS-SEEKER2 | 0.54 | 6.1 | 73.28 % | 0.869 |
| LAST | -- | -- | 78.11 % | 0.757 |
| TRACE-RRBS | 0.17 | 7.68 | 88.59 % | 0.988 |
| METHYLCODER | -- | -- | 62.38 % | 0.645 |
| NOVOALIGN | 1.19 | 6.2 | 86.59 % | 0.878 |
aNumber of CpGs with at least 1X coverage divided by total expected CpGs
btime and memory for chromosome 1 as BSMAP script takes a huge amount of memory for whole genome
NA no available script to extract methylation data for the comparison; -- the tools have their methylation extraction script but was not able to be used in this case and an in-house script was used
Performance comparison of MCF7 RRBS reads
| TRACE-RRBS | BISMARK | BRAT-BW | BS-SEEKER2 | BSMAP | METHYL-CODER | NOVO-ALIGN | LAST | |
|---|---|---|---|---|---|---|---|---|
| Run Time (hours)a | 3.2 | 10.89 | 9.1 | 3.6 | 3.19 | 14.4 | 24.5 | 11.2 |
| Memory Usage (GB) | 8.4 | 12.69 | 2.78 | 5.3 | 6.1 | 18.76 | 13 | 16.1 |
| %Unique Reads | 49.4 | 48.1 | 40.7 | 48.5 | 49.6 | 41.2 | 49.5 | 45.5 |
| #C@10X (million) | 2.38 | 1.87 | NA | 1.98 | 2.11 | 2.72 | 2.14 | 2.41 |
| Correlation with 450 K chip (R2) | 0.95 | 0.91 | NA | 0.95 | 0.91 | 0.64 | 0.91 | 0.84 |
aAlignment time only
NA no associated script to extract methylation data
Fig. 3Methylation ratio auto-correlation of CpGs at fixed distances before and after end repair C removal. Autocorrelation plot shows the correlation among neighbouring CpG sites (y-axis) dependents on the distance between two sites compared (x-axis in bp). The red dots represent the case where the artificially added CpGs are included in the cytosine methylation calculation. The blue dots represent the case when they are excluded. Significantly improved auto-correlation is observed after artificial Cs are removed