| Literature DB >> 22344695 |
Aniruddha Chatterjee1, Peter A Stockwell, Euan J Rodger, Ian M Morison.
Abstract
Recent advances in next generation sequencing (NGS) technology now provide the opportunity to rapidly interrogate the methylation status of the genome. However, there are challenges in handling and interpretation of the methylation sequence data because of its large volume and the consequences of bisulphite modification. We sequenced reduced representation human genomes on the Illumina platform and efficiently mapped and visualized the data with different pipelines and software packages. We examined three pipelines for aligning bisulphite converted sequencing reads and compared their performance. We also comment on pre-processing and quality control of Illumina data. This comparison highlights differences in methods for NGS data processing and provides guidance to advance sequence-based methylation data analysis for molecular biologists.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22344695 PMCID: PMC3378906 DOI: 10.1093/nar/gks150
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Comparison of mapping performance of the different packages
| Programme | Aligner | Number of readsb | Uniquely mapped readsc (%) | Multiple mapping | Cores used | CPU time taken | Reads/sec |
|---|---|---|---|---|---|---|---|
| Bismark | Perl application uses Bowtie | 18 490 898 | 42.2 | 7.7 | 4 | 3 h 7 min | 1642 |
| BSMAP v1.02 | SOAP modified | 18 471 799 | 58.9 | 14.0 | 6 | 38 days | 5.6 |
| BSMAP v1.2 | SOAP modified | 18 490 898 | 55.5 | 10.9 | 6 | 22.2 h | 231.2 |
| RMAPBS | RMAP modified | 18 458 028 | 65.1 | 16.8 | 1 | 42 h 48 min | 119.6 |
aThe reads were mapped against the complete human genome GRCh37. bBSMAP v1.02 and RMAPBS rejected a proportion of lower quality reads. cThe reads were hard trimmed to 75 bp for better alignment.
Comparison of mapping against RR genome and full length genome and different read lengths
| Programme | Read length | % Uniquely mapped sequence against RR | CPU time (h) | % Uniquely mapped sequence against full genomeb | No. of CpG sites in size selected region | % of CpG sites in size selected region |
|---|---|---|---|---|---|---|
| Bismark | 75 | 42.0 | 1.30 | 42.2 | 23 415 803 | 82.9 |
| 60 | 53.2 | 0.85 | 54.3 | 27 795 960 | 84.5 | |
| RMAPBS | 75 | 59.1 | 8.65 | 65.1 | 44 104 796 | 91.4 |
| 60 | 64.0 | 4.37 | 65.2 | 36 700 533 | 82.8 | |
| BSMAP v1.02 | 75 | 49.3 | 19.37 | 58.9 | 29 829 964 | 86.7 |
| 60 | 58.8 | 24.03c | 64.0 | 25 976 326 | 91.8 | |
| BSMAP v1.2 | 75 | 49.3 | 1.52 | 55.5 | 21 453 738 | 81.7 |
| 60 | 58.7 | 0.58 | 65.0 | 35 642 607 | 83.9 |
aRR genome (40–220 bp). bComplete human genome GRCh37. cThe longer time taken for the 60 bp reads must reflect more time spent resolving potential mismatches in comparison with that necessary for 75 bp reads.
Effect of sequence trimming on alignment efficiency and methylation percentage
| Program | Total number of reads | Uniquely mapped reads (%) | Total methylation percentage against RR | Methylated CpG sites against RR |
|---|---|---|---|---|
| 50 bp data set | ||||
| Bismark | 18 490 898 | 52.5 | 36.6 | 11 238 582 |
| RMAPBS | 18 458 028 | 62.5 | 34.7 | 12 813 410 |
| BSMAP v1.02 | 18 471 799 | 58.2 | 35.3 | 13 074 829 |
| BSMAP v1.2 | 18 490 898 | 59.5 | 35.4 | 13 044 161 |
| 40 bp data set | ||||
| Bismark | 18 490 898 | 53.2 | 35.4 | 9 635 443 |
| RMAPBS | 18 458 028 | 62.9 | 33.4 | 10 651 296 |
| BSMAP v1.02 | 18 471 799 | 59.1 | 34.7 | 10 933 808 |
| BSMAP v1.2 | 18 490 898 | 58.0 | 34.8 | 10 805 047 |
| 36 bp data set | ||||
| Bismark | 18 490 898 | 52.7 | 34.9 | 8 824 724 |
| RMAPBS | 18 458 028 | 62.6 | 32.9 | 9 700 349 |
| BSMAP v1.02 | 18 471 799 | 54.0 | 34.1 | 9 960 586 |
| BSMAP v1.2 | 18 490 898 | 57.1 | 34.2 | 9 836 729 |
aThe runs were performed against our RR genome (40–220 bp).
Comparison of methylation mapping between different aligners
| Aligners | Total methylation percentage against complete genome | Methylated CpG sites against complete genome | Total methylation percentage against RR | Methylated CpG sites against RR |
|---|---|---|---|---|
| 75 bp data set | ||||
| Bismark | 44.8 | 12 646 435 | 43.2 | 13 184 924 |
| RMAPBS | 36.9 | 8 557 383 | 38.6 | 17 997 752 |
| BSMAP v1.02 | 18.6b | 6 395 684b | 42.1 | 16 196 332 |
| BSMAP v1.2 | 42.9 | 11 251 307 | 42.1 | 16 212 461 |
| 60 bp data set | ||||
| Bismark | 40.2 | 13 227 156 | 39.4 | 13 579 291 |
| RMAPBS | 36.3 | 16 115 692 | 36.1 | 15 634 090 |
| BSMAP v1.02 | 12.3b | 3 484 013b | 38.0 | 15 374 907 |
| BSMAP v1.2 | 38.7 | 16 439 913 | 38.0 | 15 369 712 |
aRR genome constructed in the size range of 40–220 bp. bSee text.
Figure 1.SeqMonk display of differential methylation from different aligners. About 75 bp trimmed read data for 18 × 106 reads were aligned against the Human genome GRCh37 build by Bismark. v0.23, RMAPBS v2.05 and BSMAP v1.2 for which the methylation is displayed, respectively, from top to bottom below the gene, mRNA and CDS panes. Methylated CpG positions are shown in the red panes for each aligner, and unmethylated CpGs are in the blue panes. The display is of a randomly selected 2.55 Mbp region of chromosome 1. The black boxes indicate some regions of significant difference in methylation.