Literature DB >> 35281805

A Systematic Evaluation of Supervised Machine Learning Algorithms for Cell Phenotype Classification Using Single-Cell RNA Sequencing Data.

Xiaowen Cao1,2, Li Xing3, Elham Majd2, Hua He4, Junhua Gu1, Xuekui Zhang2.   

Abstract

The new technology of single-cell RNA sequencing (scRNA-seq) can yield valuable insights into gene expression and give critical information about the cellular compositions of complex tissues. In recent years, vast numbers of scRNA-seq datasets have been generated and made publicly available, and this has enabled researchers to train supervised machine learning models for predicting or classifying various cell-level phenotypes. This has led to the development of many new methods for analyzing scRNA-seq data. Despite the popularity of such applications, there has as yet been no systematic investigation of the performance of these supervised algorithms using predictors from various sizes of scRNA-seq datasets. In this study, 13 popular supervised machine learning algorithms for cell phenotype classification were evaluated using published real and simulated datasets with diverse cell sizes. This benchmark comprises two parts. In the first, real datasets were used to assess the computing speed and cell phenotype classification performance of popular supervised algorithms. The classification performances were evaluated using the area under the receiver operating characteristic curve, F1-score, Precision, Recall, and false-positive rate. In the second part, we evaluated gene-selection performance using published simulated datasets with a known list of real genes. The results showed that ElasticNet with interactions performed the best for small and medium-sized datasets. The NaiveBayes classifier was found to be another appropriate method for medium-sized datasets. With large datasets, the performance of the XGBoost algorithm was found to be excellent. Ensemble algorithms were not found to be significantly superior to individual machine learning methods. Including interactions in the ElasticNet algorithm caused a significant performance improvement for small datasets. The linear discriminant analysis algorithm was found to be the best choice when speed is critical; it is the fastest method, it can scale to handle large sample sizes, and its performance is not much worse than the top performers.
Copyright © 2022 Cao, Xing, Majd, He, Gu and Zhang.

Entities:  

Keywords:  classification; ensemble algorithms; gene selection; machine learning; single-cell RNA sequencing; supervised algorithms

Year:  2022        PMID: 35281805      PMCID: PMC8905542          DOI: 10.3389/fgene.2022.836798

Source DB:  PubMed          Journal:  Front Genet        ISSN: 1664-8021            Impact factor:   4.599


  26 in total

1.  Clustering and classification methods for single-cell RNA-sequencing data.

Authors:  Ren Qi; Anjun Ma; Qin Ma; Quan Zou
Journal:  Brief Bioinform       Date:  2020-07-15       Impact factor: 11.622

2.  Method of the year 2013.

Authors: 
Journal:  Nat Methods       Date:  2014-01       Impact factor: 28.547

3.  Bias, robustness and scalability in single-cell differential expression analysis.

Authors:  Charlotte Soneson; Mark D Robinson
Journal:  Nat Methods       Date:  2018-02-26       Impact factor: 28.547

4.  Diagnostic classification of cancers using extreme gradient boosting algorithm and multi-omics data.

Authors:  Baoshan Ma; Fanyu Meng; Ge Yan; Haowen Yan; Bingjie Chai; Fengju Song
Journal:  Comput Biol Med       Date:  2020-04-16       Impact factor: 4.589

5.  Compressed kNN: K-Nearest Neighbors with Data Compression.

Authors:  Jaime Salvador-Meneses; Zoila Ruiz-Chavez; Jose Garcia-Rodriguez
Journal:  Entropy (Basel)       Date:  2019-02-28       Impact factor: 2.524

6.  diceR: an R package for class discovery using an ensemble driven approach.

Authors:  Derek S Chiu; Aline Talhouk
Journal:  BMC Bioinformatics       Date:  2018-01-15       Impact factor: 3.169

7.  CaSTLe - Classification of single cells by transfer learning: Harnessing the power of publicly available single cell RNA sequencing experiments to annotate new experiments.

Authors:  Yuval Lieberman; Lior Rokach; Tal Shay
Journal:  PLoS One       Date:  2018-10-10       Impact factor: 3.240

8.  WERFE: A Gene Selection Algorithm Based on Recursive Feature Elimination and Ensemble Strategy.

Authors:  Qi Chen; Zhaopeng Meng; Ran Su
Journal:  Front Bioeng Biotechnol       Date:  2020-05-28

9.  scID Uses Discriminant Analysis to Identify Transcriptionally Equivalent Cell Types across Single-Cell RNA-Seq Data with Batch Effect.

Authors:  Katerina Boufea; Sohan Seth; Nizar N Batada
Journal:  iScience       Date:  2020-02-14

10.  scPred: accurate supervised method for cell-type classification from single-cell RNA-seq data.

Authors:  Jose Alquicira-Hernandez; Anuja Sathe; Hanlee P Ji; Quan Nguyen; Joseph E Powell
Journal:  Genome Biol       Date:  2019-12-12       Impact factor: 13.583

View more
  2 in total

Review 1.  Mapping and Validation of scRNA-Seq-Derived Cell-Cell Communication Networks in the Tumor Microenvironment.

Authors:  Kate Bridges; Kathryn Miller-Jensen
Journal:  Front Immunol       Date:  2022-04-28       Impact factor: 8.786

2.  Machine learning for cell type classification from single nucleus RNA sequencing data.

Authors:  Huy Le; Beverly Peng; Janelle Uy; Daniel Carrillo; Yun Zhang; Brian D Aevermann; Richard H Scheuermann
Journal:  PLoS One       Date:  2022-09-23       Impact factor: 3.752

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.