Alexey Stupnikov1, Shailesh Tripathi2, Ricardo de Matos Simoes1, Darragh McArt3, Manuel Salto-Tellez3, Galina Glazko4, Matthias Dehmer5, Frank Emmert-Streib6. 1. Computational Biology and Machine Learning Laboratory, Center for Cancer Research and Cell Biology, School of Medicine, Dentistry and Biomedical Sciences, Faculty of Medicine Health and Life Sciences, Queen's University Belfast, BT9 7AE Belfast, UK. 2. Computational Biology and Machine Learning Laboratory, Center for Cancer Research and Cell Biology, School of Medicine, Dentistry and Biomedical Sciences, Faculty of Medicine Health and Life Sciences, Queen's University Belfast, BT9 7AE Belfast, UK School of Mathematics and Physics, Queen's University Belfast, BT7 1NN Belfast, UK. 3. Northern Ireland Molecular Pathology Laboratory, Centre for Cancer Research and Cell Biology, Queen's University Belfast, BT9 7AE Belfast, UK. 4. Division of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA. 5. Department of Biomedical Computer Science and Mechatronics, UMIT, Hall in Tirol, Austria College of Computer and Control Engineering, Nankai University, Tianjin, P.R. China. 6. Computational Biology and Machine Learning Laboratory, Center for Cancer Research and Cell Biology, School of Medicine, Dentistry and Biomedical Sciences, Faculty of Medicine Health and Life Sciences, Queen's University Belfast, BT9 7AE Belfast, UK Predictive Medicine and Analytics Lab, Department of Signal Processing, Tampere University of Technology, 33720 Tampere, Finland Institute of Biosciences and Medical Technology, 33720 Tampere, Finland.
Abstract
MOTIVATION: Data from RNA-seq experiments provide us with many new possibilities to gain insights into biological and disease mechanisms of cellular functioning. However, the reproducibility and robustness of RNA-seq data analysis results is often unclear. This is in part attributed to the two counter acting goals of (i) a cost efficient and (ii) an optimal experimental design leading to a compromise, e.g. in the sequencing depth of experiments. RESULTS: We introduce an R package called samExploreR that allows the subsampling (m out of n bootstraping) of short-reads based on SAM files facilitating the investigation of sequencing depth related questions for the experimental design. Overall, this provides a systematic way for exploring the reproducibility and robustness of general RNA-seq studies. We exemplify the usage of samExploreR by studying the influence of the sequencing depth and the annotation on the identification of differentially expressed genes. AVAILABILITY AND IMPLEMENTATION: samExploreR is available as an R package from Bioconductor. CONTACT: v@bio-complexity.comSupplementary information: Supplementary data are available at Bioinformatics online.
MOTIVATION: Data from RNA-seq experiments provide us with many new possibilities to gain insights into biological and disease mechanisms of cellular functioning. However, the reproducibility and robustness of RNA-seq data analysis results is often unclear. This is in part attributed to the two counter acting goals of (i) a cost efficient and (ii) an optimal experimental design leading to a compromise, e.g. in the sequencing depth of experiments. RESULTS: We introduce an R package called samExploreR that allows the subsampling (m out of n bootstraping) of short-reads based on SAM files facilitating the investigation of sequencing depth related questions for the experimental design. Overall, this provides a systematic way for exploring the reproducibility and robustness of general RNA-seq studies. We exemplify the usage of samExploreR by studying the influence of the sequencing depth and the annotation on the identification of differentially expressed genes. AVAILABILITY AND IMPLEMENTATION: samExploreR is available as an R package from Bioconductor. CONTACT: v@bio-complexity.comSupplementary information: Supplementary data are available at Bioinformatics online.
Authors: Alexey Stupnikov; Paul G O'Reilly; Caitriona E McInerney; Aideen C Roddy; Philip D Dunne; Alan Gilmore; Hayley P Ellis; Tom Flannery; Estelle Healy; Stuart A McIntosh; Kienan Savage; Kathreena M Kurian; Frank Emmert-Streib; Kevin M Prise; Manuel Salto-Tellez; Darragh G McArt Journal: JCO Precis Oncol Date: 2018-09-13
Authors: A Stupnikov; C E McInerney; K I Savage; S A McIntosh; F Emmert-Streib; R Kennedy; M Salto-Tellez; K M Prise; D G McArt Journal: Comput Struct Biotechnol J Date: 2021-05-26 Impact factor: 7.271