| Literature DB >> 24363377 |
Daniel R Zerbino1, Nathan Johnson, Thomas Juettemann, Steven P Wilder, Paul Flicek.
Abstract
MOTIVATION: Using high-throughput sequencing, researchers are now generating hundreds of whole-genome assays to measure various features such as transcription factor binding, histone marks, DNA methylation or RNA transcription. Displaying so much data generally leads to a confusing accumulation of plots. We describe here a multithreaded library that computes statistics on large numbers of datasets (Wiggle, BigWig, Bed, BigBed and BAM), generating statistical summaries within minutes with limited memory requirements, whether on the whole genome or on selected regions.Entities:
Mesh:
Year: 2013 PMID: 24363377 PMCID: PMC3967112 DOI: 10.1093/bioinformatics/btt737
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Benchmarking CPU and memory requirements to compute the sum of 126 BigWig files (121 GB of data in total)
| Pipeline | Stage | CPUs | Time/CPU (s) | RAM/CPU (GB) |
|---|---|---|---|---|
| 1 | wiggletools | 116 | 351 mean | 0.22 mean |
| 739 maximum | 0.32 maximum | |||
| bigWigCat | 1 | 378 | 5.23 | |
| Overall | 116 | 1090 | 5.23 | |
| 2 | wiggletools | 116 | 351 mean | 0.22 mean |
| 739 maximum | 0.32 maximum | |||
| bigWigMerge | 1 | 3441 | 6.93 | |
| wigToBigWig | 1 | 8887 | 68.85 | |
| Overall | 116 | 13 067 | 68.85 | |
| 3 | bigWigMerge | 1 | 11 036 | 43.73 |
| wigToBigWig | 1 | 9423 | 75.12 | |
| Overall | 1 | 20 459 | 75.12 |
Note: Several pipelines are compared; hence some components appear multiple times.