| Literature DB >> 27687735 |
Bradlee D Nelms1,2, Levi Waldron3, Luis A Barrera4,5, Andrew W Weflen6, Jeremy A Goettel6, Guoji Guo7, Robert K Montgomery6, Marian R Neutra6,8, David T Breault8,9, Scott B Snapper6,8,10, Stuart H Orkin11,12, Martha L Bulyk5, Curtis Huttenhower13, Wayne I Lencer14,15,16.
Abstract
We present a sensitive approach to predict genes expressed selectively in specific cell types, by searching publicly available expression data for genes with a similar expression profile to known cell-specific markers. Our method, CellMapper, strongly outperforms previous computational algorithms to predict cell type-specific expression, especially for rare and difficult-to-isolate cell types. Furthermore, CellMapper makes accurate predictions for human brain cell types that have never been isolated, and can be rapidly applied to diverse cell types from many tissues. We demonstrate a clinically relevant application to prioritize candidate genes in disease susceptibility loci identified by GWAS.Entities:
Keywords: Cell type; Expression; Genome-wide association study; Inflammatory bowel disease; Microarray
Mesh:
Year: 2016 PMID: 27687735 PMCID: PMC5043525 DOI: 10.1186/s13059-016-1062-5
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 17.906
Fig. 1Overview and validation of CellMapper. a Schematic of the approach. CellMapper takes as input a cell type-specific query gene (green) and a set of gene expression data and finds genes with a similar expression profile to the query gene (e.g. “Gene C” above, yellow profile). b Performance comparison between CellMapper and the machine learning algorithm, in silico nano-dissection [17]. CellMapper and in silico nano-dissection were each applied to identify podocyte genes and evaluated based on the recovery of an experimentally defined set of podocyte genes [23]. In silico nano-dissection was applied using the training set selected by Ju et al. [17] for their analysis (46 query genes and 97 negative control genes) or a smaller training set of ten query genes and ten negative control genes (the smallest training set permitted by the algorithm, see “Methods”). CellMapper identified the experimentally defined podocyte genes with similar precision to in silico nano-dissection at all levels of recall, despite using only one query gene
Fig. 2Application of CellMapper to brain cell types that are difficult to address by other methods. a CellMapper was applied to the Allen Brain Atlas dataset using the indicated query genes for four brain cell types. Dot charts display the rank of literature-defined cell-specific markers (positive controls) within CellMapper’s predictions for each cell type. Dots are colored based on their known primary cell type of expression. Dark gray shading covers the area (rank list) required to identify all positive control genes for each cell type. A similar analysis using query genes other than GAD1, SLC6A2, SLC6A4, and PDGFRA for the four cell types is provided in Additional file 16. b–e Performance evaluation of CellMapper and other computational methods to recover genes expressed in the four brain cell types. Each method was evaluated based on the recovery of an experimentally-defined [3–6] set of cell type-enriched genes in mouse, as quantified by the area under the precision recall curve (AUPR). WGCNA returns several modules of gene co-expression, the best performing WGCNA module is plotted for each cell type
Fig. 3CellMapper is accurate across diverse cell types. CellMapper was applied using query genes for 30 cell types (Additional file 12); Tukey boxplots display the rank of 4–10 literature curated markers (positive controls; Additional file 13) and ≥48 negative control genes (Additional file 13 and housekeeping genes from [30]) for each cell type, demonstrating that CellMapper sensitively identified established cell type markers in every case. Filled circles represent the rank of all positive control genes; open gray circles represent negative control genes that fall outside 1.5 times the interquartile range of the other negative control genes (“outliers”). In only eight instances (0.5 %) was a negative control gene identified within the top 100 predictions for a cell type. EECs enteroendocrine cells
Fig. 4Using CellMapper to prioritize GWAS disease genes. a The genetic locus surrounding sentinel SNP rs381144, associated with erythrocyte (Ery) and platelet (MkP) cell number. Other relevant SNPs in the region are shown. All genes predicted for expression in erythrocytes and platelets are displayed in red. b TRIM58 expression in primary mouse hematopoietic cells by qRT-PCR. MPP multi-potent progenitor, PreMegE pre-megakaryocyte-erythrocyte, Ery erythrocyte, MkP megakaryocyte/platelet, GMP granulocyte-monocyte progenitor, Neu neutrophil, MΦ macrophage, cDC conventional dendritic cell, B B cell, T T cell, NK natural killer cell. c The genetic locus surrounding sentinel SNP rs7554522, associated with inflammatory bowel disease (IBD). Genes colored in purple are predicted for simple epithelial cells, genes colored in green are predicted for T and NK cells. d C1orf106 and KIF21B expression in human primary cells and cell lines. Mono monocyte, HMEC1 endothelial cell line, Caco2 colon epithelial cell line, Organoid primary epithelial organoid from small intestine biopsy. All bars are mean +/− SD (n = 3–7 independent biological replicates) and letters indicate statistically significant differences between groups (p ≤0.05, Tukey’s honest significant difference test)