| Literature DB >> 26069264 |
Joaquín Tárraga1, Mariano Pérez2, Juan M Orduña2, José Duato3, Ignacio Medina1, Joaquín Dopazo1.
Abstract
MOTIVATION: DNA methylation analysis suffers from very long processing time, as the advent of Next-Generation Sequencers has shifted the bottleneck of genomic studies from the sequencers that obtain the DNA samples to the software that performs the analysis of these samples. The existing software for methylation analysis does not seem to scale efficiently neither with the size of the dataset nor with the length of the reads to be analyzed. As it is expected that the sequencers will provide longer and longer reads in the near future, efficient and scalable methylation software should be developed.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26069264 PMCID: PMC4679392 DOI: 10.1093/bioinformatics/btv357
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.DNA sequence and bisulphite-treated reads produced by NGS
Fig. 2.Pipeline scheme of the Methylation mapper
Relative weight of matches and mismatches used in the SWA
| Genome | ||||||
|---|---|---|---|---|---|---|
| A | C | G | T | N | ||
| Read | A | 5 | −4 | −4 | −4 | −4 |
| C | −400 | 500 | −400 | −400 | −4 | |
| G | −4 | −4 | 5 | −4 | −4 | |
| T | −2 | 2.5 | −2 | 2.5 | −4 | |
| N | −4 | −4 | −4 | −4 | −4 | |
Fig. 3.Histogram for Nc
Fig. 4.Histogram for Ng
Fig. 5.Software sensitivity for a synthetic dataset with 0.1% of mutations
Fig. 6.Execution times for a synthetic dataset with 0.1% of mutations
Fig. 7.Software sensitivity for a synthetic dataset with 1% of mutations
Fig. 8.Execution times for a synthetic dataset with 1% of mutations
Comparative study of sensitivity on a synthetic dataset with a mutation rate of 0.1%
| Length (nt) | HPG-Methyl | Bismark | ||
|---|---|---|---|---|
| R | W | R | W | |
| 75 | 96.31 | 0.06 | 89.57 | 0.01 |
| 150 | 99.20 | 0.03 | 95.15 | 0.00 |
| 400 | 99.90 | 0.02 | 97.66 | 0.00 |
| 800 | 99.96 | 0.01 | 98.51 | 0.00 |
| Length (nt) | BSMAP | BS-Seeker | ||
| R | W | R | W | |
| 75 | 93.9 | 5.88 | 92.07 | 0.11 |
| 150 | 96.92 | 2.39 | 96.40 | 0.03 |
| 400 | 48.49 | 50.84 | 98.07 | 0.01 |
| 800 | 48.78 | 51.21 | — | — |
Comparative study of execution times (min) required for processing a synthetic dataset with a mutation rate of 0.1%
| Length (nt) | HPG-Methyl | Bismark | BSMAP | BS-Seeker |
|---|---|---|---|---|
| 75 | 1.283 | 63.436 | 3.464 | 118.229 |
| 150 | 1.600 | 106.338 | 3.230 | 139.766 |
| 400 | 2.916 | 244.733 | 3.464 | 315.765 |
| 800 | 9.266 | 1220.34 | 3.530 | — |
Comparative study of sensitivity on a synthetic dataset with a mutation rate of 1%
| Length (nt) | HPG-Methyl | Bismark | ||
|---|---|---|---|---|
| R | W | R | W | |
| 75 | 93.37 | 0.62 | 88.30 | 0.1 |
| 150 | 96.87 | 0.80 | 94.59 | 0.08 |
| 400 | 97.55 | 0.48 | 97.55 | 0.1 |
| 800 | 97.58 | 0.43 | 98.45 | 0.08 |
| Length (nt) | BSMAP | BS-Seeker | ||
| R | W | R | W | |
| 75 | 91.36 | 6.5 | 89.11 | 1.71 |
| 150 | 90.68 | 2.67 | 92.68 | 0.37 |
| 400 | 45.29 | 48.05 | 76.22 | 0.08 |
| 800 | 48.87 | 51.13 | — | — |
Comparative study of execution times (min) for a synthetic dataset with a mutation rate of 1%
| Length (nt) | HPG-Methyl | Bismark | BSMAP | BS-Seeker |
|---|---|---|---|---|
| 75 | 1.366 | 62.579 | 4.601 | 116.044 |
| 150 | 1.950 | 106.173 | 6.220 | 141.442 |
| 400 | 10.850 | 248.207 | 4.601 | 324.616 |
| 800 | 50.600 | 1246.89 | 4.159 | — |
Comparative study of percentage of reads mapped on real datasets
| Dataset | HPG-Methyl | Bismark | BSMAP | BS-Seeker |
|---|---|---|---|---|
| SRR309230_1 | 87.71 | 71.81 | 89.21 | 84.18 |
| SRR837425_1 | 82.75 | 68.42 | 82.84 | 77.35 |
Comparative study of execution times (min) for real datasets
| Dataset | HPG-Methyl | Bismark | BSMAP | BS-Seeker |
|---|---|---|---|---|
| SRR309230_1 | 12.053 | 82.120 | 9.975 | 250.686 |
| SRR837425_1 | 19.047 | 95.194 | 15.286 | 271.048 |