Yue Cao1,2, Yingxin Lin1,2, Ellis Patrick1,2,3, Pengyi Yang1,2,3, Jean Yee Hwa Yang1,2,4. 1. Charles Perkins Centre, The University of Sydney, Sydney, NSW 2006, Australia. 2. School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia. 3. Computational Systems Biology Group, Children's Medical Research Institute, Westmead, NSW 2145, Australia. 4. Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China.
Abstract
MOTIVATION: With the recent surge of large-cohort scale single cell research, it is of critical importance that analytical methods can fully utilize the comprehensive characterization of cellular systems that single cell technologies produce to provide insights into samples from individuals. Currently, there is little consensus on the best ways to compress information from the complex data structures of these technologies to summary statistics that represent each sample (e.g. individuals). RESULTS: Here, we present scFeatures, an approach that creates interpretable cellular and molecular representations of single-cell and spatial data at the sample level. We demonstrate that summarizing a broad collection of features at the sample level is both important for understanding underlying disease mechanisms in different experimental studies and for accurately classifying disease status of individuals. AVAILABILITY AND IMPLEMENTATION: scFeatures is publicly available as an R package at https://github.com/SydneyBioX/scFeatures. All data used in this study are publicly available with accession ID reported in the Section 2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: With the recent surge of large-cohort scale single cell research, it is of critical importance that analytical methods can fully utilize the comprehensive characterization of cellular systems that single cell technologies produce to provide insights into samples from individuals. Currently, there is little consensus on the best ways to compress information from the complex data structures of these technologies to summary statistics that represent each sample (e.g. individuals). RESULTS: Here, we present scFeatures, an approach that creates interpretable cellular and molecular representations of single-cell and spatial data at the sample level. We demonstrate that summarizing a broad collection of features at the sample level is both important for understanding underlying disease mechanisms in different experimental studies and for accurately classifying disease status of individuals. AVAILABILITY AND IMPLEMENTATION: scFeatures is publicly available as an R package at https://github.com/SydneyBioX/scFeatures. All data used in this study are publicly available with accession ID reported in the Section 2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Christopher S Smillie; Moshe Biton; Jose Ordovas-Montanes; Keri M Sullivan; Grace Burgin; Daniel B Graham; Rebecca H Herbst; Noga Rogel; Michal Slyper; Julia Waldman; Malika Sud; Elizabeth Andrews; Gabriella Velonias; Adam L Haber; Karthik Jagadeesh; Sanja Vickovic; Junmei Yao; Christine Stevens; Danielle Dionne; Lan T Nguyen; Alexandra-Chloé Villani; Matan Hofree; Elizabeth A Creasey; Hailiang Huang; Orit Rozenblatt-Rosen; John J Garber; Hamed Khalili; A Nicole Desch; Mark J Daly; Ashwin N Ananthakrishnan; Alex K Shalek; Ramnik J Xavier; Aviv Regev Journal: Cell Date: 2019-07-25 Impact factor: 41.582
Authors: Yingxin Lin; Shila Ghazanfar; Kevin Y X Wang; Johann A Gagnon-Bartsch; Kitty K Lo; Xianbin Su; Ze-Guang Han; John T Ormerod; Terence P Speed; Pengyi Yang; Jean Yee Hwa Yang Journal: Proc Natl Acad Sci U S A Date: 2019-04-26 Impact factor: 11.205