Xiao-Fei Zhang1, Le Ou-Yang2, Shuo Yang3, Xing-Ming Zhao4, Xiaohua Hu5, Hong Yan6. 1. Department of Statistics, School of Mathematics and Statistics, Central China Normal University, Wuhan 430079, China. 2. Guangdong Key Laboratory of Intelligent Information Processing and Shenzhen Key Laboratory of Media Security, Shenzhen University, Shenzhen 518060, China. 3. Department of Respiratory Medicine, Wuhan Number 1 Hospital, Wuhan 430022, China. 4. Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China. 5. Department of Computer Science, College of Computing and Informatics, Drexel University, Philadelphia, PA 19104, USA. 6. Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China.
Abstract
SUMMARY: Imputation of dropout events that may mislead downstream analyses is a key step in analyzing single-cell RNA-sequencing (scRNA-seq) data. We develop EnImpute, an R package that introduces an ensemble learning method for imputing dropout events in scRNA-seq data. EnImpute combines the results obtained from multiple imputation methods to generate a more accurate result. A Shiny application is developed to provide easier implementation and visualization. Experiment results show that EnImpute outperforms the individual state-of-the-art methods in almost all situations. EnImpute is useful for correcting the noisy scRNA-seq data before performing downstream analysis. AVAILABILITY AND IMPLEMENTATION: The R package and Shiny application are available through Github at https://github.com/Zhangxf-ccnu/EnImpute. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
SUMMARY: Imputation of dropout events that may mislead downstream analyses is a key step in analyzing single-cell RNA-sequencing (scRNA-seq) data. We develop EnImpute, an R package that introduces an ensemble learning method for imputing dropout events in scRNA-seq data. EnImpute combines the results obtained from multiple imputation methods to generate a more accurate result. A Shiny application is developed to provide easier implementation and visualization. Experiment results show that EnImpute outperforms the individual state-of-the-art methods in almost all situations. EnImpute is useful for correcting the noisy scRNA-seq data before performing downstream analysis. AVAILABILITY AND IMPLEMENTATION: The R package and Shiny application are available through Github at https://github.com/Zhangxf-ccnu/EnImpute. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: David Lähnemann; Johannes Köster; Ewa Szczurek; Davis J McCarthy; Stephanie C Hicks; Mark D Robinson; Catalina A Vallejos; Kieran R Campbell; Niko Beerenwinkel; Ahmed Mahfouz; Luca Pinello; Pavel Skums; Alexandros Stamatakis; Camille Stephan-Otto Attolini; Samuel Aparicio; Jasmijn Baaijens; Marleen Balvert; Buys de Barbanson; Antonio Cappuccio; Giacomo Corleone; Bas E Dutilh; Maria Florescu; Victor Guryev; Rens Holmer; Katharina Jahn; Thamar Jessurun Lobo; Emma M Keizer; Indu Khatri; Szymon M Kielbasa; Jan O Korbel; Alexey M Kozlov; Tzu-Hao Kuo; Boudewijn P F Lelieveldt; Ion I Mandoiu; John C Marioni; Tobias Marschall; Felix Mölder; Amir Niknejad; Lukasz Raczkowski; Marcel Reinders; Jeroen de Ridder; Antoine-Emmanuel Saliba; Antonios Somarakis; Oliver Stegle; Fabian J Theis; Huan Yang; Alex Zelikovsky; Alice C McHardy; Benjamin J Raphael; Sohrab P Shah; Alexander Schönhuth Journal: Genome Biol Date: 2020-02-07 Impact factor: 13.583