Yingxin Lin1, Shila Ghazanfar1,2, Dario Strbenac1, Andy Wang1,3, Ellis Patrick1,4, David M Lin5, Terence Speed6,7, Jean Y H Yang1, Pengyi Yang1,8. 1. School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia. 2. Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK. 3. Sydney Medical School, University of Sydney, Sydney, NSW 2006, Australia. 4. Westmead Institute for Medical Research, University of Sydney, Westmead, NSW 2145, Australia. 5. Department of Biomedical Sciences, Cornell University, Ithaca, NY 14853, USA. 6. Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, VIC 3052, Australia. 7. Department of Mathematics and Statistics, University of Melbourne, Melbourne, VIC 3010, Australia. 8. Computational Systems Biology Group, Children's Medical Research Institute, University of Sydney, Westmead, NSW 2145, Australia.
Abstract
BACKGROUND: Single-cell RNA-seq (scRNA-seq) profiling has revealed remarkable variation in transcription, suggesting that expression of many genes at the single-cell level is intrinsically stochastic and noisy. Yet, on the cell population level, a subset of genes traditionally referred to as housekeeping genes (HKGs) are found to be stably expressed in different cell and tissue types. It is therefore critical to question whether stably expressed genes (SEGs) can be identified on the single-cell level, and if so, how can their expression stability be assessed? We have previously proposed a computational framework for ranking expression stability of genes in single cells for scRNA-seq data normalization and integration. In this study, we perform detailed evaluation and characterization of SEGs derived from this framework. RESULTS: Here, we show that gene expression stability indices derived from the early human and mouse development scRNA-seq datasets and the "Mouse Atlas" dataset are reproducible and conserved across species. We demonstrate that SEGs identified from single cells based on their stability indices are considerably more stable than HKGs defined previously from cell populations across diverse biological systems. Our analyses indicate that SEGs are inherently more stable at the single-cell level and their characteristics reminiscent of HKGs, suggesting their potential role in sustaining essential functions in individual cells. CONCLUSIONS: SEGs identified in this study have immediate utility both for understanding variation and stability of single-cell transcriptomes and for practical applications such as scRNA-seq data normalization. Our framework for calculating gene stability index, "scSEGIndex," is incorporated into the scMerge Bioconductor R package (https://sydneybiox.github.io/scMerge/reference/scSEGIndex.html) and can be used for identifying genes with stable expression in scRNA-seq datasets.
BACKGROUND: Single-cell RNA-seq (scRNA-seq) profiling has revealed remarkable variation in transcription, suggesting that expression of many genes at the single-cell level is intrinsically stochastic and noisy. Yet, on the cell population level, a subset of genes traditionally referred to as housekeeping genes (HKGs) are found to be stably expressed in different cell and tissue types. It is therefore critical to question whether stably expressed genes (SEGs) can be identified on the single-cell level, and if so, how can their expression stability be assessed? We have previously proposed a computational framework for ranking expression stability of genes in single cells for scRNA-seq data normalization and integration. In this study, we perform detailed evaluation and characterization of SEGs derived from this framework. RESULTS: Here, we show that gene expression stability indices derived from the early human and mouse development scRNA-seq datasets and the "Mouse Atlas" dataset are reproducible and conserved across species. We demonstrate that SEGs identified from single cells based on their stability indices are considerably more stable than HKGs defined previously from cell populations across diverse biological systems. Our analyses indicate that SEGs are inherently more stable at the single-cell level and their characteristics reminiscent of HKGs, suggesting their potential role in sustaining essential functions in individual cells. CONCLUSIONS: SEGs identified in this study have immediate utility both for understanding variation and stability of single-cell transcriptomes and for practical applications such as scRNA-seq data normalization. Our framework for calculating gene stability index, "scSEGIndex," is incorporated into the scMerge Bioconductor R package (https://sydneybiox.github.io/scMerge/reference/scSEGIndex.html) and can be used for identifying genes with stable expression in scRNA-seq datasets.
Authors: Alexandra-Chloé Villani; Rahul Satija; Gary Reynolds; Siranush Sarkizova; Karthik Shekhar; James Fletcher; Morgane Griesbeck; Andrew Butler; Shiwei Zheng; Suzan Lazo; Laura Jardine; David Dixon; Emily Stephenson; Emil Nilsson; Ida Grundberg; David McDonald; Andrew Filby; Weibo Li; Philip L De Jager; Orit Rozenblatt-Rosen; Andrew A Lane; Muzlifah Haniffa; Aviv Regev; Nir Hacohen Journal: Science Date: 2017-04-21 Impact factor: 47.728
Authors: Celia Pilar Martinez-Jimenez; Nils Eling; Hung-Chang Chen; Catalina A Vallejos; Aleksandra A Kolodziejczyk; Frances Connor; Lovorka Stojic; Timothy F Rayner; Michael J T Stubbington; Sarah A Teichmann; Maike de la Roche; John C Marioni; Duncan T Odom Journal: Science Date: 2017-03-31 Impact factor: 47.728
Authors: Rhonda Bacher; Li-Fang Chu; Ning Leng; Audrey P Gasch; James A Thomson; Ron M Stewart; Michael Newton; Christina Kendziorski Journal: Nat Methods Date: 2017-04-17 Impact factor: 28.547
Authors: David Croft; Antonio Fabregat Mundo; Robin Haw; Marija Milacic; Joel Weiser; Guanming Wu; Michael Caudy; Phani Garapati; Marc Gillespie; Maulik R Kamdar; Bijay Jassal; Steven Jupe; Lisa Matthews; Bruce May; Stanislav Palatnik; Karen Rothfels; Veronica Shamovsky; Heeyeon Song; Mark Williams; Ewan Birney; Henning Hermjakob; Lincoln Stein; Peter D'Eustachio Journal: Nucleic Acids Res Date: 2013-11-15 Impact factor: 16.971
Authors: John C Marioni; Jean Yee Hwa Yang; Shila Ghazanfar; Yingxin Lin; Xianbin Su; David Ming Lin; Ellis Patrick; Ze-Guang Han Journal: Nat Methods Date: 2020-07-13 Impact factor: 28.547
Authors: Yi Zhou; Yijing Su; Shiying Li; Benjamin C Kennedy; Daniel Y Zhang; Allison M Bond; Yusha Sun; Fadi Jacob; Lu Lu; Peng Hu; Angela N Viaene; Ingo Helbig; Sudha K Kessler; Timothy Lucas; Ryan D Salinas; Xiaosong Gu; H Isaac Chen; Hao Wu; Joel E Kleinman; Thomas M Hyde; David W Nauen; Daniel R Weinberger; Guo-Li Ming; Hongjun Song Journal: Nature Date: 2022-07-06 Impact factor: 69.504
Authors: Roman A Romanov; Evgenii O Tretiakov; Maria Eleni Kastriti; Maja Zupancic; Martin Häring; Solomiia Korchynska; Konstantin Popadin; Marco Benevento; Patrick Rebernik; Francois Lallemend; Katsuhiko Nishimori; Frédéric Clotman; William D Andrews; John G Parnavelas; Matthias Farlik; Christoph Bock; Igor Adameyko; Tomas Hökfelt; Erik Keimpema; Tibor Harkany Journal: Nature Date: 2020-05-06 Impact factor: 49.962
Authors: L C Stetson; Dheepa Balasubramanian; Susan Pereira Ribeiro; Tammy Stefan; Kalpana Gupta; Xuan Xu; Slim Fourati; Anne Roe; Zachary Jackson; Robert Schauner; Ashish Sharma; Banumathi Tamilselvan; Samuel Li; Marcos de Lima; Tae Hyun Hwang; Robert Balderas; Yogen Saunthararajah; Jaroslaw Maciejewski; Thomas LaFramboise; Jill S Barnholtz-Sloan; Rafick-Pierre Sekaly; David N Wald Journal: Leukemia Date: 2021-07-09 Impact factor: 12.883