Literature DB >> 33715007

ExperimentSubset: An R package to manage subsets of Bioconductor Experiment objects.

Irzam Sarfraz1, Muhammad Asif1, Joshua D Campbell2.   

Abstract

MOTIVATION: R Experiment objects such as the SummarizedExperiment or SingleCellExperiment are data containers for storing one or more matrix-like assays along with associated row and column data. These objects have been used to facilitate the storage and analysis of high-throughput genomic data generated from technologies such as single-cell RNA sequencing. One common computational task in many genomics analysis workflows is to perform subsetting of the data matrix before applying down-stream analytical methods. For example, one may need to subset the columns of the assay matrix to exclude poor quality samples or subset the rows of the matrix to select the most variable features. Traditionally, a second object is created that contains the desired subset of assay from the original object. However, this approach is inefficient as it requires the creation of an additional object containing a copy of the original assay and leads to challenges with data provenance.
RESULTS: To overcome these challenges, we developed an R package called ExperimentSubset, which is a data container that implements classes for efficient storage and streamlined retrieval of assays that have been subsetted by rows and/or columns. These classes are able to inherently provide data provenance by maintaining the relationship between the subsetted and parent assays. We demonstrate the utility of this package on a single-cell RNA-seq dataset by storing and retrieving subsets at different stages of the analysis while maintaining a lower memory footprint. Overall, the ExperimentSubset is a flexible container for the efficient management of subsets. AVAILABILITY: ExperimentSubset package is available at Bioconductor: https://bioconductor.org/packages/ExperimentSubset/ and Github: https://github.com/campbio/ExperimentSubset. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) (2021). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Year:  2021        PMID: 33715007     DOI: 10.1093/bioinformatics/btab179

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  1 in total

1.  TidyMass an object-oriented reproducible analysis framework for LC-MS data.

Authors:  Xiaotao Shen; Hong Yan; Chuchu Wang; Peng Gao; Caroline H Johnson; Michael P Snyder
Journal:  Nat Commun       Date:  2022-07-28       Impact factor: 17.694

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.