Zhengyang Dong1, Gil Alterovitz2,3. 1. Department of Computer Science, Stanford University, Stanford, CA 94305. 2. Department of Medicine, Brigham and Women's Hospital/Harvard Medical School, Boston, MA 021153. 3. National Artificial Intelligence Institute, U.S Department of Veterans Affairs, Washington, DC 20571.
Abstract
MOTIVATION: Single-cell RNA sequencing allows us to study cell heterogeneity at an unprecedented cell-level resolution and identify known and new cell populations. Current cell labeling pipeline uses unsupervised clustering and assigns labels to clusters by manual inspection. However, this pipeline does not utilize available gold-standard labels because there are usually too few of them to be useful to most computational methods. This article aims to facilitate cell labeling with a semi-supervised method in an alternative pipeline, in which a few gold-standard labels are first identified and then extended to the rest of the cells computationally. RESULTS: We built a semi-supervised dimensionality reduction method, a network-enhanced autoencoder (netAE). Tested on three public datasets, netAE outperforms various dimensionality reduction baselines and achieves satisfactory classification accuracy even when the labeled set is very small, without disrupting the similarity structure of the original space. AVAILABILITY AND IMPLEMENTATION: The code of netAE is available on GitHub: https://github.com/LeoZDong/netAE. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Single-cell RNA sequencing allows us to study cell heterogeneity at an unprecedented cell-level resolution and identify known and new cell populations. Current cell labeling pipeline uses unsupervised clustering and assigns labels to clusters by manual inspection. However, this pipeline does not utilize available gold-standard labels because there are usually too few of them to be useful to most computational methods. This article aims to facilitate cell labeling with a semi-supervised method in an alternative pipeline, in which a few gold-standard labels are first identified and then extended to the rest of the cells computationally. RESULTS: We built a semi-supervised dimensionality reduction method, a network-enhanced autoencoder (netAE). Tested on three public datasets, netAE outperforms various dimensionality reduction baselines and achieves satisfactory classification accuracy even when the labeled set is very small, without disrupting the similarity structure of the original space. AVAILABILITY AND IMPLEMENTATION: The code of netAE is available on GitHub: https://github.com/LeoZDong/netAE. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.