| Literature DB >> 28633357 |
Kylie A Bemis1, Olga Vitek1,2.
Abstract
Summary: We introduce matter , an R package for direct interactions with larger-than-memory datasets, stored in an arbitrary number of files of any size. matter is primarily designed for datasets in new and rapidly evolving file formats, which may lack extensive software support. matter enables a wide variety of data exploration and manipulation steps, and is extensible to many bioinformatics applications. It supports reproducible research by minimizing the need of converting and storing data in multiple formats. We illustrate the performance of matter in conjunction with the Bioconductor package Cardinal for analysis of high-resolution, high-throughput mass spectrometry imaging experiments. Availability: The package, vignettes, and examples of applications in several areas of bioinformatics are available open-source at www.bioconductor.org under the Artistic-2.0 license. Contact: o.vitek@neu.edu.Entities:
Year: 2017 PMID: 28633357 PMCID: PMC5870624 DOI: 10.1093/bioinformatics/btx392
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Performance of matter with bigmemory and ff for linear regression and calculation of the first two principal components on simulated datasets of 1.2 GB
| Linear regression | Principle component analysis | ||||||
|---|---|---|---|---|---|---|---|
| Method | Mem. used | Mem. overhead | Time (s) | Method | Mem. used | Mem. overhead | Time (s) |
| R matrices + lm | 7 GB | 1.4 GB | 33 | R matrices + svd | 3.9 GB | 2.4 GB | 66 |
| bigmemory + biglm | 4.4 GB | 3.9 GB | 21 | bigmemory + irlba | 3.1 GB | 2.7 GB | 15 |
| ff + biglm | 1.9 MB | 1.6 GB | 57 | ff + irlba | 1.8 GB | 1.4 GB | 130 |
| matter + biglm | 1 GB | 660 MB | 47 | matter + irlba | 890 MB | 490 MB | 110 |
Memory overhead is the maximum memory used during the execution minus the memory used after completion.
Calculation of the first two principal components on all the MSI datasets in ‘continuous’ imzML format in Oetjen
| Dataset | Size | Pixels | Features | Mem. used | Mem. overhead | Time |
|---|---|---|---|---|---|---|
| 3D microbial time course | 2.9 GB | 17 672 | 40 299 | 228 MB | 50 MB | 13 min, 6 s |
| 3D oral squamous cell carcinoma | 25.4 GB | 828 558 | 7680 | 977 MB | 668 MB | 2 h, 7 min, 9 s |
| 3D mouse pancreas | 26.4 GB | 497 225 | 13 312 | 628 MB | 370 MB | 2 h, 12 min, 46 s |
| 3D mouse kidney | 41.8 GB | 1 362 830 | 7680 | 1.5 GB | 1.1 GB | 3 h, 22 min, 24 s |
Comparisons with bigmemory and ff on these datasets are available in the package vignettes.