| Literature DB >> 31028141 |
Yingxin Lin1, Shila Ghazanfar1,2, Kevin Y X Wang1, Johann A Gagnon-Bartsch3, Kitty K Lo1, Xianbin Su4,5, Ze-Guang Han4,5, John T Ormerod1, Terence P Speed6,7, Pengyi Yang8,2, Jean Yee Hwa Yang8,2.
Abstract
Concerted examination of multiple collections of single-cell RNA sequencing (RNA-seq) data promises further biological insights that cannot be uncovered with individual datasets. Here we present scMerge, an algorithm that integrates multiple single-cell RNA-seq datasets using factor analysis of stably expressed genes and pseudoreplicates across datasets. Using a large collection of public datasets, we benchmark scMerge against published methods and demonstrate that it consistently provides improved cell type separation by removing unwanted factors; scMerge can also enhance biological discovery through robust data integration, which we show through the inference of development trajectory in a liver dataset collection.Keywords: data integration; factor analysis; normalization; pseudoreplications; single-cell RNA-seq data
Year: 2019 PMID: 31028141 PMCID: PMC6525515 DOI: 10.1073/pnas.1820006116
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205