| Literature DB >> 24930138 |
Abstract
MOTIVATION: The reference CRAM file format implementation is in Java. We present 'Scramble': a new C implementation of SAM, BAM and CRAM file I/O.Entities:
Mesh:
Year: 2014 PMID: 24930138 PMCID: PMC4173023 DOI: 10.1093/bioinformatics/btu390
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
CRAM breakdown by file percentage
| Data type | File % age (40 Quality bins) | File % age (8 Quality bins) |
|---|---|---|
| Quality values | 80.9 | 68.6 |
| Sequence identifiers | 8.3 | 13.7 |
| Auxiliary tags | 3.9 | 6.4 |
| Flags | 1.5 | 2.5 |
| Alignment position | 1.4 | 2.4 |
| CIGAR string | 1.4 | 2.3 |
| Sequence bases | 1.3 | 2.1 |
| Template position/size | 0.6 | 1.0 |
| Mapping quality | 0.2 | 0.4 |
| Other/overhead | 0.5 | 0.8 |
Note: Total file sizes for ERR317482: 3.46 Gb for 40 bins, 2.11 Gb for 8 bins.
9827_2#49.bam (ERR317482)
| 40 quality bins | 8 quality bins | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Tool | Format | Read(s) | Write(s) | Flagstat | Index | Size (Gb) | Read(s) | Write(s) | Flagstat | Index | Size (Gb) |
| Scramble | BAM | 773.6 | 76.9 | – | 6.50 | 1063.6 | 63.3 | – | 4.80 | ||
| Scramble | CRAM | 117.1 | 111.1 | ||||||||
| Cramtools | CRAM | 223.1 | 1333.2 | – | 48.4 | 3.78 | 209.0 | 1217.1 | – | 63.8 | 2.33 |
| Samtools | BAM | 89.1 | 759.0 | 89.1 | 81.1 | 6.50 | 69.6 | 1053.8 | 69.6 | 64.7 | 4.80 |
| Picard | BAM | 120.8 | 518.4 | – | 124.8 | 6.52 | 111.9 | 460.6 | – | 113.1 | 4.90 |
Note: User + System CPU times in seconds for encoding and decoding along with the produced file size. The timings correspond to a single 2.2 GHz Intel Xeon E5-2660 (of 16). The data were in the file system cache, and so these tasks are CPU-bound. Note that not all tools provide index and flagstat equivalents for all file formats, and so timings are omitted in these cases. Bold values represent the fastest or smallest figure in each column.
Fig. 1.Real time taken to convert from 230 Gb BAM to BAM (Scramble, Samtools) and BAM to CRAM (Scramble) formats. The system was a 16 core 2.2 GHz Intel Xeon E5-2660 with a local RAID XFS file system. Tests on slower disks and with smaller locally cached data files are in the Supplementary Material, including benchmarks of Sambamba (https://github.com/lomereiter/sambamba) and Biobambam (Tischler and Leonard, 2013)