Weiguang Mao1,2, Maziyar Baran Pouyan3, Dennis Kostka1,2,3,4, Maria Chikina1,2. 1. Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA. 2. Joint Carnegie Mellon-University of Pittsburgh Ph.D. Program in Computational Biology, Pittsburgh, PA 15260, USA. 3. Department of Developmental Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA. 4. Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA.
Abstract
MOTIVATION: Single-cell RNA-seq analysis has emerged as a powerful tool for understanding inter-cellular heterogeneity. Due to the inherent noise of the data, computational techniques often rely on dimensionality reduction (DR) as both a pre-processing step and an analysis tool. Ideally, DR should preserve the biological information while discarding the noise. However, if the DR is to be used directly to gain biological insight it must also be interpretable-that is the individual dimensions of the reduction should correspond to specific biological variables such as cell-type identity or pathway activity. Maximizing biological interpretability necessitates making assumption about the data structures and the choice of the model is critical. RESULTS: We present a new probabilistic single-cell factor analysis model, Non-negative Independent Factor Analysis (NIFA), that incorporates different interpretability inducing assumptions into a single modeling framework. The key advantage of our NIFA model is that it simultaneously models uni- and multi-modal latent factors, and thus isolates discrete cell-type identity and continuous pathway activity into separate components. We apply our approach to a range of datasets where cell-type identity is known, and we show that NIFA-derived factors outperform results from ICA, PCA, NMF and scCoGAPS (an NMF method designed for single-cell data) in terms of disentangling biological sources of variation. Studying an immunotherapy dataset in detail, we show that NIFA is able to reproduce and refine previous findings in a single analysis framework and enables the discovery of new clinically relevant cell states. AVAILABILITY AND IMPLEMENTATION: NFIA is a R package which is freely available at GitHub (https://github.com/wgmao/NIFA). The test dataset is archived at https://zenodo.org/record/6286646. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Single-cell RNA-seq analysis has emerged as a powerful tool for understanding inter-cellular heterogeneity. Due to the inherent noise of the data, computational techniques often rely on dimensionality reduction (DR) as both a pre-processing step and an analysis tool. Ideally, DR should preserve the biological information while discarding the noise. However, if the DR is to be used directly to gain biological insight it must also be interpretable-that is the individual dimensions of the reduction should correspond to specific biological variables such as cell-type identity or pathway activity. Maximizing biological interpretability necessitates making assumption about the data structures and the choice of the model is critical. RESULTS: We present a new probabilistic single-cell factor analysis model, Non-negative Independent Factor Analysis (NIFA), that incorporates different interpretability inducing assumptions into a single modeling framework. The key advantage of our NIFA model is that it simultaneously models uni- and multi-modal latent factors, and thus isolates discrete cell-type identity and continuous pathway activity into separate components. We apply our approach to a range of datasets where cell-type identity is known, and we show that NIFA-derived factors outperform results from ICA, PCA, NMF and scCoGAPS (an NMF method designed for single-cell data) in terms of disentangling biological sources of variation. Studying an immunotherapy dataset in detail, we show that NIFA is able to reproduce and refine previous findings in a single analysis framework and enables the discovery of new clinically relevant cell states. AVAILABILITY AND IMPLEMENTATION: NFIA is a R package which is freely available at GitHub (https://github.com/wgmao/NIFA). The test dataset is archived at https://zenodo.org/record/6286646. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Elana J Fertig; Jie Ding; Alexander V Favorov; Giovanni Parmigiani; Michael F Ochs Journal: Bioinformatics Date: 2010-09-01 Impact factor: 6.937
Authors: Vésteinn Thorsson; David L Gibbs; Scott D Brown; Denise Wolf; Dante S Bortone; Tai-Hsien Ou Yang; Eduard Porta-Pardo; Galen F Gao; Christopher L Plaisier; James A Eddy; Elad Ziv; Aedin C Culhane; Evan O Paull; I K Ashok Sivakumar; Andrew J Gentles; Raunaq Malhotra; Farshad Farshidfar; Antonio Colaprico; Joel S Parker; Lisle E Mose; Nam Sy Vo; Jianfang Liu; Yuexin Liu; Janet Rader; Varsha Dhankani; Sheila M Reynolds; Reanne Bowlby; Andrea Califano; Andrew D Cherniack; Dimitris Anastassiou; Davide Bedognetti; Younes Mokrab; Aaron M Newman; Arvind Rao; Ken Chen; Alexander Krasnitz; Hai Hu; Tathiane M Malta; Houtan Noushmehr; Chandra Sekhar Pedamallu; Susan Bullman; Akinyemi I Ojesina; Andrew Lamb; Wanding Zhou; Hui Shen; Toni K Choueiri; John N Weinstein; Justin Guinney; Joel Saltz; Robert A Holt; Charles S Rabkin; Alexander J Lazar; Jonathan S Serody; Elizabeth G Demicco; Mary L Disis; Benjamin G Vincent; Ilya Shmulevich Journal: Immunity Date: 2018-04-05 Impact factor: 43.474
Authors: Dylan Kotliar; Adrian Veres; M Aurel Nagy; Shervin Tabrizi; Eran Hodis; Douglas A Melton; Pardis C Sabeti Journal: Elife Date: 2019-07-08 Impact factor: 8.140
Authors: Moshe Sade-Feldman; Keren Yizhak; Stacey L Bjorgaard; John P Ray; Carl G de Boer; Russell W Jenkins; David J Lieb; Jonathan H Chen; Dennie T Frederick; Michal Barzily-Rokni; Samuel S Freeman; Alexandre Reuben; Paul J Hoover; Alexandra-Chloé Villani; Elena Ivanova; Andrew Portell; Patrick H Lizotte; Amir R Aref; Jean-Pierre Eliane; Marc R Hammond; Hans Vitzthum; Shauna M Blackmon; Bo Li; Vancheswaran Gopalakrishnan; Sangeetha M Reddy; Zachary A Cooper; Cloud P Paweletz; David A Barbie; Anat Stemmer-Rachamimov; Keith T Flaherty; Jennifer A Wargo; Genevieve M Boland; Ryan J Sullivan; Gad Getz; Nir Hacohen Journal: Cell Date: 2018-11-01 Impact factor: 41.582
Authors: Hanna Mendes Levitin; Jinzhou Yuan; Yim Ling Cheng; Francisco Jr Ruiz; Erin C Bush; Jeffrey N Bruce; Peter Canoll; Antonio Iavarone; Anna Lasorella; David M Blei; Peter A Sims Journal: Mol Syst Biol Date: 2019-02-22 Impact factor: 11.429