| Literature DB >> 31682608 |
Camden Jansen1,2, Ricardo N Ramirez1,2, Nicole C El-Ali1, David Gomez-Cabrero3,4, Jesper Tegner3,5,6, Matthias Merkenschlager7, Ana Conesa8, Ali Mortazavi1,2.
Abstract
Rapid advances in single-cell assays have outpaced methods for analysis of those data types. Different single-cell assays show extensive variation in sensitivity and signal to noise levels. In particular, scATAC-seq generates extremely sparse and noisy datasets. Existing methods developed to analyze this data require cells amenable to pseudo-time analysis or require datasets with drastically different cell-types. We describe a novel approach using self-organizing maps (SOM) to link scATAC-seq regions with scRNA-seq genes that overcomes these challenges and can generate draft regulatory networks. Our SOMatic package generates chromatin and gene expression SOMs separately and combines them using a linking function. We applied SOMatic on a mouse pre-B cell differentiation time-course using controlled Ikaros over-expression to recover gene ontology enrichments, identify motifs in genomic regions showing similar single-cell profiles, and generate a gene regulatory network that both recovers known interactions and predicts new Ikaros targets during the differentiation process. The ability of linked SOMs to detect emergent properties from multiple types of highly-dimensional genomic data with very different signal properties opens new avenues for integrative analysis of heterogeneous data.Entities:
Mesh:
Year: 2019 PMID: 31682608 PMCID: PMC6855564 DOI: 10.1371/journal.pcbi.1006555
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Fig 1Single–cell multi–data integration using SOMs.
(A) An inducible Ikaros mouse pre–B cell–line was used to track changes in gene expression and chromatin accessibility during differentiation (0 and 24–hours) in single–cells. (B) Single–cell RNA–seq and ATAC–seq data from an inducible mouse pre–B cell–line were independently trained using SOMatic to generate single–cell SOMs and metaclustered using AIC scoring. These clusters were convolved with the new SOM linking algorithm to generate pair–wise metaclusters of chromatin regions with similar profiles across the single–cell dataset that regulate genes that also share similar profiles. These pair–wise clusters were mined for regulatory connections through motif enrichment analysis.
Fig 2Single–cell gene expression patterns during cellular differentiation are profiled using SOMatic.
(A) A SOM was generated for the single–cell RNA–seq dataset (0–hour 62 cells, 24–hour 66 cells). Maps for 3 cells from each time point were arbitrarily selected for display. (B) 39 metaclusters were identified using AIC scoring. Metacluster number and color were arbitrarily assigned for visualization purposes. (C) SOM difference map comparing 0–hour and 24–hour time–points. Maps for cells from 0 and 24–hour timepoints were averaged to generate a single map for each and then subtracted to create a map that represented gene expression fold change during pre–B cell development. Overlaid metacluster divisions generally follow contours of the map. (D) Trait enrichment analysis deployed on gene metaclusters revealed which are enriched in each time point. Metaclusters of interest are highlighted in panel b. (E–F) Summary showing the representative expression profile for metaclusters 38 and 6. Columns are individual cells color–coded for 0 and 24–hour time–points ordered by hierarchical clustering on every metacluster representative gene expression profile. Cell subpopulations are represented by a 40% cut on that clustering. (G–H) Top gene ontology terms for the 162 genes in metacluster 38 and the 151 genes in metacluster 16.
Fig 3SOMatic reveals the dynamic chromatin landscape in single–cells.
(A) A chromatin SOM was generated for the single–cell ATAC–seq dataset (0–hour 94 cells, 24–hour 133 cells). Maps for 3 cells from each timepoint were arbitrarily selected for display. (B) 107 metaclusters were identified using AIC scoring. Metacluster number and color were arbitrarily assigned for visualization purposes. (C) SOM difference map comparing 0–hour and 24–hour time–points. Maps for cells from 0 and 24–hour timepoints were averaged to generate a single map for each and then subtracted to create a map that represented chromatin accessibility fold change during pre–B cell development. Overlaid metacluster divisions generally follow contours of the map. (D) Trait enrichment analysis deployed on gene metaclusters revealed which are enriched in each time point. Metaclusters of interest are highlighted in panel b. (E–F) Summary showing the representative accessibility profile for SOM metaclusters 62 and 70. Columns are individual cells color–coded for 0 and 24–hour time–points ordered by hierarchical clustering on every metacluster representative gene expression profile. Cell subpopulations are represented by a 40% cut on that clustering. (G–H) Top gene ontology terms for genes associated to chromatin elements from SOM metaclusters 62 and 70. Association was determined through use of the GREAT algorithm (See methods).
Fig 4Transcriptional regulation by Ikzf1 recovered using linked SOMs.
(A) Size of pair–wise metaclusters that contain both differentially–expressed genes and differentially–accessible chromatin sites. Metaclusters of genes and regions with a higher enrichment at 24–hours are colored blue and are ordered by enrichment in the two time points. (B) Number of statistically–significant motifs found in each pair–wise metacluster from (a). Presence and enrichment of the Ikzf1 motif in the pair–wise metacluster is noted. (C) Heatmap of expression fold change for genes predicted to be regulated by Ikzf1. Genes with the largest change between time points are noted. (D) Predicted downstream targets of Ikzf1 with significant change over the time course. Each gene is labels with the fold change between time points with the same scale as 4c. (E) Predicted gene regulatory network downstream of Ikzf1. Genes are ordered left to right by their fold change over the time course. Connections are dashed if their signal is significantly lower at the 24–hour time point. Connections at each gene are labeled by level of evidence found in existing literature. Teal triangles indicate experimental evidence and green triangles indicate previous computational prediction.