| Literature DB >> 29590361 |
Brian D Aevermann1, Mark Novotny1, Trygve Bakken2, Jeremy A Miller2, Alexander D Diehl3, David Osumi-Sutherland4, Roger S Lasken1, Ed S Lein2, Richard H Scheuermann1,5.
Abstract
Cells are fundamental function units of multicellular organisms, with different cell types playing distinct physiological roles in the body. The recent advent of single-cell transcriptional profiling using RNA sequencing is producing 'big data', enabling the identification of novel human cell types at an unprecedented rate. In this review, we summarize recent work characterizing cell types in the human central nervous and immune systems using single-cell and single-nuclei RNA sequencing, and discuss the implications that these discoveries are having on the representation of cell types in the reference Cell Ontology (CL). We propose a method, based on random forest machine learning, for identifying sets of necessary and sufficient marker genes, which can be used to assemble consistent and reproducible cell type definitions for incorporation into the CL. The representation of defined cell type classes and their relationships in the CL using this strategy will make the cell type classes being identified by high-throughput/high-content technologies findable, accessible, interoperable and reusable (FAIR), allowing the CL to serve as a reference knowledgebase of information about the role that distinct cellular phenotypes play in human health and disease.Entities:
Mesh:
Year: 2018 PMID: 29590361 PMCID: PMC5946857 DOI: 10.1093/hmg/ddy100
Source DB: PubMed Journal: Hum Mol Genet ISSN: 0964-6906 Impact factor: 6.150
Figure 3.Formal rosehip neuron definition using logical axioms. A set of logical axioms about the anatomical location of the cell body (soma), the functional capacity and the necessary and sufficient marker gene expressions are combined to construct an equivalent class cell type definition for the rosehip neuron interneuron cluster—i5 (see 14 for more information about how this cell type was characterized).
Model tissues investigated by single-cell/single-nuclei RNA sequencing
| Tissue | Number of cell types | Method | Reference |
|---|---|---|---|
| Brain | 6 cell categories | Single-cell RNAseq | ( |
| Brain | 7 neuron subtypes | Single-cell RNAseq | ( |
| Brain | 16 neuron subtypes | Single-nuclei RNAseq | ( |
| Brain | 11 inhibitory neuron subtypes | Single-nuclei RNAseq | ( |
| Immune system | 5 CD127+ subtypes | Single-cell RNAseq | ( |
| Immune system | 6 dendritic cell subtypes | Single-cell RNAseq | ( |
| Immune system | 6 dendritic cell and 4 monocyte subtypes | Single-cell RNAseq | ( |
| Tumor microenvironment | 6 infiltrating immune subsets | Single-cell RNAseq | ( |
| Tumor microenvironment | Regulartory T cells and exhausted CD8 T cells | Single-cell RNAseq | ( |
| Kidney | 6 distinct epithelial subtypes | Single-nuclei RNAseq | ( |
| Lung | 4 cell types (C1–C4): AT2, indeterminate, basal and club/goblet cells | Single-cell RNAseq | ( |
| Pancreas | 6 cell types (alpha, beta, delta, PP, acinar or ductal) | Single-cell RNAseq | ( |
| Pancreas | 6 cell types (alpha, beta, delta, PP, acinar or ductal) | Single-cell RNAseq | ( |
| Pancreas | 14 cell types including known exocrine and endocrine types | Single-cell RNAseq | ( |
| Pancreas | 9 cell types including known exocrine and endocrine types | Single-cell RNAseq | ( |
Additional tools for deterimination of cell type-specific differentially expressed genes
| Software | Methodology | Reference |
|---|---|---|
| Seurat | Seurat implements numerous methodolgies for clustering, visualization and marker determination using differential expression analysis between cluster pairs | ( |
| SC3 | SC3 provides an integrated suite that performs an ensemble clustering followed by marker determination using a Wilcoxon signed ranked test combined with an AUROC analysis | ( |
| SAKE | SAKE performs a negative matrix factorization (NMF) where the importance of a given cell and gene are estimated during the clustering procedure, these important genes are then considered markers | ( |
Figure 1.Identification of necessary and sufficient marker genes using NSforest. (A) A typical single-cell/single-nuclei RNA sequencing workflow in which a tissue specimen is obtained, single cells/nuclei isolated by fluorescence-activated cell sorting, amplified cDNA processed by sequencing and cell types identified by clustering the resultant transcriptional profiles. (B) The NSforest approach takes a data matrix of expression values (e.g. transcripts per million reads) of genes (rows) in single cell/nuclei samples (columns) grouped by cell type cluster membership. In the first step, the expression levels of genes are used as features in the random forest machine learning procedure to train classification models comparing single cell/nuclei expression data in one cell type cluster against single cell/nuclei expression data in all other clusters, for every cell type cluster separately, using a Random Forest Learner like KNIME v3.1.2. Each cell type cluster classification model is constructed from a collection of trees (e.g. 1000 trees) using information gain ratio as the splitting criteria, where each decision tree is generated using the specific bagging parameters (e.g. the square root of the number of features and a bootstrap of samples equal to the training set size). For each cell type cluster classification model, the method outputs usage statistics, including how often each gene is used as a branching criterion and the number of times it was a candidate across all random decision trees. By summing the frequency of use when available as a candidate feature along the first three branching levels, the list of genes can be ranked by their usefulness in distinguishing one cell type cluster from the other clusters. In the second step, single decision trees are constructed using the first gene from the ranked list, the first two genes, the first three genes, etc. Each individual tree is then assessed for classification accuracy and tree topology using the training data. Given the objective of determining the necessary and sufficient marker genes, we apply additional criteria in scoring the trees—we restrict each gene to being used in only one branch per tree, and find the optimal classification for the target cluster only, rather than the overall classification score. The addition of genes from the ranked list is stopped when an optimal classification or stable tree topology is achieved. The minimum number of genes used to produce this optimal result corresponds to the set of necessary and sufficient marker genes required to define the cell type cluster.
Figure 2.Marker gene expression patterns in single nuclei grouped by cluster. A heatmap of expression levels for the necessary and sufficient marker genes identified for all 16 clusters across all single nuclei grouped by cell type cluster is shown, including 1 excitatory (e1), 11 inhibitory (i1–i11) and 4 glial (g1–g4) cell type clusters. In total, 49 markers genes were selected as being necessary and sufficient to distinguish these 16 different cell type clusters from cortical layer 1/2 of the human brain MTG region.
Cell types identified in cortical layer 1/2 of the human MTG
| Cluster ID | Cell type name | Cell type definition |
|---|---|---|
| TESPA1-expressing MTG cortical layer 2 excitatory neuron, human | A human MTG cortical layer 2 excitatory neuron that selectively expresses TESPA1, LINC00507 and SLC17A7 mRNAs, and lacks expression of KCNIP1 mRNA | |
| COL5A2-expressing MTG cortical layer 1 interneuron, human | A human MTG cortical layer 1 GABAergic interneuron that selectively expresses COL5A2 and NDNF and FAT1 mRNAs | |
| LHX6-expressing MTG cortical layer 2 interneuron, human | A human MTG cortical layer 2 GABAergic interneuron that selectively expresses LHX6, GRIK3 and FLT3, while of lacking expression of COBL and CALB2 mRNAs | |
| BAGE2 expressing MTG cortical layer 1 interneuron, human | A human MTG cortical layer 1 GABAergic interneuron that selectively expresses BAGE2 and SEMA3C and SYT10 and CALB2 and COL21A1 mRNAs | |
| ARHGAP36 expressing MTG cortical layer 1 interneuron, human | A human MTG cortical layer 1 GABAergic interneuron that selectively expresses ARHGAP36 and ADAM33 and LINC01435 and MC4R mRNAs | |
| KIT-expressing MTG cortical layer 1 interneuron, human | A human MTG cortical layer 1 GABAergic interneuron that selectively expresses KIT and NTNG1 and POU6F2 mRNAs | |
| GPR149-expressing MTG cortical layer 1 interneuron, human | A human MTG cortical layer 1 GABAergic interneuron that selectively expresses GPR149 and VIP and PLCE1 mRNAs | |
| TGFBR2-expressing MTG cortical layer 1 interneuron, human | A human MTG cortical layer 1 GABAergic interneuron that selectively expresses TGFBR2 and HCRTR2 and PAX6 mRNAs | |
| SNCG-expressing MTG cortical layer 1 interneuron, human | A human MTG cortical layer 1 GABAergic interneuron that selectively expresses SNCG and EDNRA and KCNK2 and ARHGAP18 mRNAs | |
| VIP-expressing MTG cortical layer 1 interneuron, human | A human MTG cortical layer 1 GABAergic interneuron that selectively expresses VIP and IQGAP2 and TAC3 mRNAs | |
| TSPAN12-expressing MTG cortical layer 1 interneuron, human | A human MTG cortical layer 1 GABAergic interneuron that selectively expresses TSPAN12 and CHRNB3 and FAM46A and DCN mRNAs | |
| EGF-expressing MTG cortical layer 1 interneuron, human | A human MTG cortical layer 1 GABAergic interneuron that selectively expresses EGF and NRG1-IT1 mRNAs | |
| Linc00499-expressing MTG cortical layer 1 glial cell, human | A human MTG cortical layer 1 glial cell that selectively expresses Linc00499 and ATP1A2 mRNAs | |
| APBB1IP-expressing MTG cortical layer 1 glial cell, human | A human MTG cortical layer 1 glial cell that selectively expresses APBB1IP mRNAs | |
| PTPRZ1-expressing MTG cortical layer 1 glial cell, human | A human MTG cortical layer 1 glial cell that selectively expresses PTPRZ1 and XYLT1 mRNAs | |
| ST18-expressing MTG cortical layer 1 glial cell, human | A human MTG cortical layer 1 glial cell that selectively expresses ST18 mRNAs |