Alex M Ascensión1,2, Olga Ibáñez-Solé1,2, Iñaki Inza3, Ander Izeta2, Marcos J Araúzo-Bravo1,4,5,6. 1. Biodonostia Health Research Institute, Computational Biology and Systems Biomedicine Group, Paseo Dr. Begiristain, s/n, Donostia-San Sebastian, 20014, Spain. 2. Biodonostia Health Research Institute, Tissue Engineering Group, Paseo Dr. Begiristain, s/n, Donostia-San Sebastian, 20014, Spain. 3. Intelligent Systems Group, Computer Science Faculty, University of the Basque Country, Donostia-San Sebastian, 20018, Spain. 4. Max Planck Institute for Molecular Biomedicine, Roentgenstr. 20, 48149 Muenster, German. 5. IKERBASQUE, Basque Foundation for Science, Euskadi plaza 5, Bilbao, 48009, Spain. 6. Department of Cell Biology and Histology, Faculty of Medicine and Nursing, University of Basque Country (UPV/EHU), 48940 Leioa, Spain.
Abstract
BACKGROUND: Feature selection is a relevant step in the analysis of single-cell RNA sequencing datasets. Most of the current feature selection methods are based on general univariate descriptors of the data such as the dispersion or the percentage of zeros. Despite the use of correction methods, the generality of these feature selection methods biases the genes selected towards highly expressed genes, instead of the genes defining the cell populations of the dataset. RESULTS: Triku is a feature selection method that favors genes defining the main cell populations. It does so by selecting genes expressed by groups of cells that are close in the k-nearest neighbor graph. The expression of these genes is higher than the expected expression if the k-cells were chosen at random. Triku efficiently recovers cell populations present in artificial and biological benchmarking datasets, based on adjusted Rand index, normalized mutual information, supervised classification, and silhouette coefficient measurements. Additionally, gene sets selected by triku are more likely to be related to relevant Gene Ontology terms and contain fewer ribosomal and mitochondrial genes. CONCLUSION: Triku is developed in Python 3 and is available at https://github.com/alexmascension/triku.
BACKGROUND: Feature selection is a relevant step in the analysis of single-cell RNA sequencing datasets. Most of the current feature selection methods are based on general univariate descriptors of the data such as the dispersion or the percentage of zeros. Despite the use of correction methods, the generality of these feature selection methods biases the genes selected towards highly expressed genes, instead of the genes defining the cell populations of the dataset. RESULTS: Triku is a feature selection method that favors genes defining the main cell populations. It does so by selecting genes expressed by groups of cells that are close in the k-nearest neighbor graph. The expression of these genes is higher than the expected expression if the k-cells were chosen at random. Triku efficiently recovers cell populations present in artificial and biological benchmarking datasets, based on adjusted Rand index, normalized mutual information, supervised classification, and silhouette coefficient measurements. Additionally, gene sets selected by triku are more likely to be related to relevant Gene Ontology terms and contain fewer ribosomal and mitochondrial genes. CONCLUSION: Triku is developed in Python 3 and is available at https://github.com/alexmascension/triku.
Authors: Elisabetta Mereu; Atefeh Lafzi; Catia Moutinho; Christoph Ziegenhain; Davis J McCarthy; Adrián Álvarez-Varela; Eduard Batlle; Dominic Grün; Julia K Lau; Stéphane C Boutet; Chad Sanada; Aik Ooi; Robert C Jones; Kelly Kaihara; Chris Brampton; Yasha Talaga; Yohei Sasagawa; Kaori Tanaka; Tetsutaro Hayashi; Caroline Braeuning; Cornelius Fischer; Sascha Sauer; Timo Trefzer; Christian Conrad; Xian Adiconis; Lan T Nguyen; Aviv Regev; Joshua Z Levin; Swati Parekh; Aleksandar Janjic; Lucas E Wange; Johannes W Bagnoli; Wolfgang Enard; Marta Gut; Rickard Sandberg; Itoshi Nikaido; Ivo Gut; Oliver Stegle; Holger Heyn Journal: Nat Biotechnol Date: 2020-04-06 Impact factor: 54.908
Authors: Ignacio Sanz; Chungwen Wei; Scott A Jenks; Kevin S Cashman; Christopher Tipton; Matthew C Woodruff; Jennifer Hom; F Eun-Hyung Lee Journal: Front Immunol Date: 2019-10-18 Impact factor: 7.561