| Literature DB >> 34238220 |
Tianyu Wang1, Jun Bai1, Sheida Nabavi2.
Abstract
BACKGROUND: Analyzing single-cell RNA sequencing (scRNAseq) data plays an important role in understanding the intrinsic and extrinsic cellular processes in biological and biomedical research. One significant effort in this area is the identification of cell types. With the availability of a huge amount of single cell sequencing data and discovering more and more cell types, classifying cells into known cell types has become a priority nowadays. Several methods have been introduced to classify cells utilizing gene expression data. However, incorporating biological gene interaction networks has been proved valuable in cell classification procedures.Entities:
Keywords: Cell classification; Convolutional neural network; Deep learning; Graph convolutional neural network; Single cell RNA sequencing
Mesh:
Year: 2021 PMID: 34238220 PMCID: PMC8268184 DOI: 10.1186/s12859-021-04278-2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Structure of the proposed deep learning network (sigGCN) for single cell classification
Fig. 2Density plot of the distance between the centroids of cell populations to show the complexity of the datasets
Fig. 3Bar plots of the four metrics to show the performance of the scRNAseq data classifier tools and conventional classifiers on Zhengsorted dataset
Accuracy of the eight scRNAseq data classifier tools and the four conventional classifiers on the seven datasets (N = 1000)
| Methods | Zhengsorted | Zheng68K | BaronHuman | Muraro | Segerstolpe | BaronMouse | Xin |
|---|---|---|---|---|---|---|---|
| sigGCN | 0.974 | 0.993 | |||||
| FC | 0.893 | 0.668 | 0.967 | 0.986 | 0.967 | 0.958 | 0.993 |
| scID | 0.721 | 0.484 | 0.46 | 0.577 | 0.285 | 0.286 | 0.986 |
| scPred | 0.515 | 0.140 | 0.86 | 0.915 | 0.827 | 0.862 | 0.91 |
| CasTLe | 0.836 | 0.736 | 0.971 | 0.972 | 0.953 | 0.91 | 0.993 |
| SingleR | 0.723 | 0.388 | 0.951 | 0.977 | 0.953 | 0.868 | |
| scmapcluster | 0.395 | 0.409 | 0.946 | 0.962 | 0.949 | 0.905 | 0.931 |
| scmapcell | 0.727 | 0.246 | 0.895 | 0.972 | 0.949 | 0.778 | 0.952 |
| ACTINN | 0.845 | 0.737 | 0.977 | 0.991 | 0.958 | ||
| RF | 0.835 | 0.69 | 0.968 | 0.981 | 0.967 | 0.968 | 0.993 |
| SVM-linear | 0.859 | 0.652 | 0.824 | 0.981 | 0.383 | 0.704 | |
| SVM-rbf | 0.884 | 0.677 | 0.93 | 0.981 | 0.841 | 0.794 | |
| KNN | 0.824 | 0.594 | 0.953 | 0.991 | 0.935 | 0.926 |
Median F1 of the eight scRNAseq data classifier tools and the four conventional classifiers on the seven datasets (N = 1000)
| Methods | Zhengsorted | Zheng68K | BaronHuman | Muraro | Segerstolpe | BaronMouse | Xin |
|---|---|---|---|---|---|---|---|
| sigGCN | 0.977 | 0.969 | 0.995 | ||||
| FC | 0.952 | 0.681 | 0.938 | 0.968 | 0.902 | 0.997 | |
| scID | 0.66 | 0.535 | 0.22 | 0.578 | 0 | 0 | |
| scPred | 0.568 | 0.105 | 0.833 | 0.932 | 0.8 | 0.97 | 0.784 |
| CasTLe | 0.834 | 0.667 | 0.956 | 0.967 | 0.965 | 0.848 | 0.997 |
| SingleR | 0.678 | 0.335 | 0.946 | 0.984 | 0.898 | ||
| scmapcluster | 0.729 | 0.357 | 0.9 | 0.997 | 0.965 | 0.88 | 0.991 |
| scmapcell | 0.305 | 0.198 | 0.95 | 0.993 | 0.977 | 0.942 | 0.708 |
| ACTINN | 0.892 | 0.753 | 0.97 | 0.997 | |||
| RF | 0.853 | 0.646 | 0.956 | 0.987 | 0.993 | 0.984 | 0.997 |
| SVM-linear | 0.868 | 0.663 | 0.362 | 0.059 | 0.238 | ||
| SVM-rbf | 0.9 | 0.671 | 0.906 | 0.772 | 0.695 | ||
| KNN | 0.795 | 0.613 | 0.928 | 0.961 | 0.889 |
Fig. 4Confusion matrix of the class predictions on the Zhengsorted dataset using sigGCN
Fig. 5a Average AUC and ROC curves of sigGCN using the ten Zhengsorted cell types. b ROC curves of class 4 (CD4 + /CD25 T Reg) and the p-values calculated by the McNeil & Hanley's test that show the significance of difference between the areas under the ROC curve of sigGCN and that of each method
Accuracy of the eight scRNAseq data classifier and the four conventional classifiers tools on the four experiments (N = 1000)
| Training dataset | BaronHuman + Muraro + Segerstolpe | Xin + Muraro + Segerstolpe | Xin + BaronHuman + Segerstolpe | Xin + BaronHuman + Muraro |
|---|---|---|---|---|
| Testing dataset | Xin | BaronHuman | Muraro | Segerstolpe |
| sigGCN | 0.974 | 0.993 | ||
| FC | 0.992 | 0.977 | 0.968 | 0.993 |
| scID | 0.989 | 0.747 | 0.97 | 0.979 |
| scPred | 0.945 | 0.467 | 0.92 | 0.814 |
| CasTLe | 0.99 | 0.944 | 0.992 | |
| SingleR | 0.995 | 0.984 | ||
| scmapcluster | 0.196 | 0.003 | 0.051 | 0.568 |
| scmapcell | 0.756 | 0.421 | 0.64 | 0.367 |
| ACTINN | 0.993 | 0.984 | 0.974 | 0.992 |
| RF | 0.982 | 0.941 | 0.947 | 0.938 |
| SVM-linear | 0.994 | 0.979 | 0.972 | 0.992 |
| SVM-rbf | 0.986 | 0.983 | 0.973 | 0.97 |
| KNN | 0.934 | 0.864 | 0.788 | 0.817 |
Median F1 of the eight scRNAseq data classifier and the four conventional classifiers tools on the four experiments (N = 1000)
| Training dataset | BaronHuman + Muraro + Segerstolpe | Xin + Muraro + Segerstolpe | Xin + BaronHuman + Segerstolpe | Xin + BaronHuman + Muraro |
|---|---|---|---|---|
| Testing dataset | Xin | BaronHuman | Muraro | Segerstolpe |
| sigGCN | 0.976 | 0.957 | 0.989 | |
| FC | 0.976 | 0.965 | 0.945 | 0.990 |
| scID | 0.991 | |||
| scPred | 0.923 | 0.231 | 0.912 | 0.725 |
| CasTLe | 0.974 | 0.916 | 0.966 | 0.988 |
| SingleR | 0.988 | 0.974 | 0.969 | 0 |
| scmapcluster | 0.133 | 0.001 | 0.001 | 0 |
| scmapcell | 0.415 | 0.232 | 0.353 | 0.241 |
| ACTINN | 0.982 | 0.977 | 0.957 | 0.989 |
| RF | 0.946 | 0.914 | 0.957 | 0.903 |
| SVM-linear | 0.986 | 0.972 | 0.952 | 0.986 |
| SVM-rbf | 0.963 | 0.97 | 0.958 | 0.952 |
| KNN | 0.766 | 0.74 | 0.504 | 0.602 |
Fig. 6Average runtimes of the methods using Zhengsorted dataset
Fig. 7Results of choosing different numbers of genes as features of all the scRNAseq classifiers using Zhengsorted dataset
Fig. 8Results of all the scRNAseq classifiers using different numbers of cells from the Zhengsorted dataset