| Literature DB >> 26600239 |
Minzhe Guo1,2, Hui Wang1, S Steven Potter3, Jeffrey A Whitsett1, Yan Xu1,4.
Abstract
A major challenge in developmental biology is to understand the genetic and cellular processes/programs driving organ formation and differentiation of the diverse cell types that comprise the embryo. While recent studies using single cell transcriptome analysis illustrate the power to measure and understand cellular heterogeneity in complex biological systems, processing large amounts of RNA-seq data from heterogeneous cell populations creates the need for readily accessible tools for the analysis of single-cell RNA-seq (scRNA-seq) profiles. The present study presents a generally applicable analytic pipeline (SINCERA: a computational pipeline for SINgle CEll RNA-seq profiling Analysis) for processing scRNA-seq data from a whole organ or sorted cells. The pipeline supports the analysis for: 1) the distinction and identification of major cell types; 2) the identification of cell type specific gene signatures; and 3) the determination of driving forces of given cell types. We applied this pipeline to the RNA-seq analysis of single cells isolated from embryonic mouse lung at E16.5. Through the pipeline analysis, we distinguished major cell types of fetal mouse lung, including epithelial, endothelial, smooth muscle, pericyte, and fibroblast-like cell types, and identified cell type specific gene signatures, bioprocesses, and key regulators. SINCERA is implemented in R, licensed under the GNU General Public License v3, and freely available from CCHMC PBGE website, https://research.cchmc.org/pbge/sincera.html.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26600239 PMCID: PMC4658017 DOI: 10.1371/journal.pcbi.1004575
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Top 20 Predicted Key Transcription Factors for Lung Epithelial Cells at E16.5.
| TF | Name | DFC | DCC | DDC | DC | CC | BC | Average Rank |
|---|---|---|---|---|---|---|---|---|
|
| HOP homeobox | 2 | 6 | 1 | 1 | 2 | 1 | 1 |
|
| D4, zinc and double PHD fingers family 2 | 4 | 10 | 6 | 2 | 3 | 5 | 2 |
|
| enolase 1, alpha non-neuron | 1 | 4 | 2 | 5 | 16 | 8 | 3 |
|
| recombination signal binding protein for immunoglobulin kappa J region-like | 7 | 14 | 7 | 2 | 7 | 3 | 4 |
|
| ets variant 5 | 7 | 14 | 10 | 5 | 4 | 10 | 5 |
|
| cyclin-dependent kinase 7 | 4 | 2 | 3 | 11 | 20 | 17 | 6 |
|
| trans-acting transcription factor 1 | 2 | 6 | 4 | 2 | 39 | 9 | 7 |
|
| NK2 homeobox 1 | 15 | 24 | 8 | 8 | 5 | 4 | 8 |
|
| Kruppel-like factor 5 | 7 | 14 | 13 | 5 | 29 | 12 | 9 |
|
| pterin 4 alpha carbinolamine dehydratase/dimerization cofactor of hepatocyte nuclear factor 1 alpha (TCF1) 1 | 15 | 24 | 14 | 8 | 6 | 14 | 10 |
|
| mortality factor 4 like 2 | 15 | 24 | 19 | 11 | 25 | 19 | 11 |
|
| high mobility group AT-hook 1 | 15 | 9 | 12 | 21 | 34 | 22 | 11 |
|
| BCL2-associated transcription factor 1 | 15 | 24 | 22 | 16 | 17 | 23 | 12 |
|
| MDS1 and EVI1 complex locus | 15 | 24 | 23 | 16 | 27 | 13 | 13 |
|
| TAF9 RNA polymerase II, TATA box binding protein (TBP)-associated factor | 15 | 24 | 28 | 37 | 8 | 15 | 14 |
|
| mediator complex subunit 13-like | 7 | 5 | 11 | 21 | 60 | 28 | 15 |
|
| hairy and enhancer of split 6 | 15 | 1 | 5 | 37 | 64 | 11 | 16 |
|
| E74-like factor 3 | 7 | 12 | 17 | 21 | 56 | 20 | 16 |
|
| inhibitor of DNA binding 2 | 15 | 24 | 30 | 21 | 11 | 36 | 17 |
|
| hepatoma-derived growth factor | 7 | 12 | 15 | 16 | 69 | 21 | 18 |
All ranks are in decreasing order of the TF importance metric values. TFs in bold font are associated with lung-related mouse phenotypes (http://www.informatics.jax.org/mp/annotations/MP:0005388).
Top 20 Predicted Regulatory Targets of Nkx2-1 Identified from a Consensus among Expression based Prediction, ChIP-seq, and Literature Evidence.
| Target | Name | Expression based Prediction (EP) | ChIP-seq | Literature Evidence 1 | Literature Evidence 2 | Consensus Maximized Score (CM) | Rank by EP | Rank by CM |
|---|---|---|---|---|---|---|---|---|
|
| ets variant 5 | 1.53E-01 | 1 | 1 | 1 | 7.08E-01 | 22 | 1 |
|
| claudin 18 | 1.34E-02 | 0 | 1 | 1 | 7.05E-01 | 6 | 2 |
|
| surfactant associated protein B | 4.74E-01 | 1 | 1 | 1 | 7.05E-01 | 55 | 3 |
|
| sonic hedgehog | 5.95E-01 | 1 | 1 | 1 | 7.04E-01 | 74 | 4 |
|
| surfactant associated protein C | 5.11E-01 | 0 | 1 | 1 | 7.01E-01 | 62 | 5 |
|
| forkhead box A1 | 7.87E-01 | 0 | 1 | 1 | 6.99E-01 | 112 | 6 |
|
| GATA binding protein 6 | 9.43E-01 | 0 | 1 | 1 | 6.97E-01 | 175 | 7 |
|
| podoplanin | 9.98E-01 | 0 | 1 | 1 | 6.97E-01 | 296 | 8 |
|
| advanced glycosylation end product-specific receptor | 9.99E-01 | 0 | 1 | 1 | 6.97E-01 | 321 | 9 |
|
| ATP-binding cassette, sub-family A (ABC1), member 3 | 5.06E-03 | 1 | 0 | 1 | 6.76E-01 | 4 | 10 |
|
| v-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog | 2.20E-01 | 1 | 0 | 1 | 6.74E-01 | 30 | 11 |
|
| melanoma inhibitory activity | 9.94E-01 | 1 | 1 | 0 | 6.73E-01 | 261 | 12 |
|
| solute carrier family 6 (neurotransmitter transporter), member 14 | 9.98E-01 | 1 | 1 | 0 | 6.73E-01 | 292 | 13 |
|
| serine (or cysteine) peptidase inhibitor, clade B, member 6b | 9.21E-01 | 0 | 1 | 0 | 6.70E-01 | 163 | 14 |
|
| HOP homeobox | 9.89E-01 | 1 | 0 | 1 | 6.68E-01 | 237 | 15 |
|
| high mobility group AT-hook 2 | 9.89E-01 | 1 | 0 | 1 | 6.68E-01 | 238 | 16 |
|
| forkhead box P2 | 9.96E-01 | 1 | 0 | 1 | 6.68E-01 | 276 | 17 |
|
| grainyhead-like 2 (Drosophila) | 9.99E-01 | 1 | 0 | 1 | 6.68E-01 | 308 | 18 |
|
| mucin 1, transmembrane | 9.97E-01 | 0 | 0 | 1 | 6.64E-01 | 283 | 19 |
|
| growth arrest and DNA-damage-inducible 45 gamma | 8.90E-07 | 1 | 0 | 0 | 6.48E-01 | 1 | 20 |
Regulatory targets are ranked in the increasing order of “Rank by CM”. The full set of candidate targets for consensus maximization consisted of genes that are differentially expressed in epithelial cells (p-value<0.01). Targets with bold font are known Nkx2-1 targets in lung epithelial cells. “Expression based Prediction (EP)” is based on the first-order conditional dependence inference described in the Methods. “ChIP-seq” is based on the result of previous Nkx2-1 ChIP-seq experiment: 1-represents the target has at least one predicted peak region; 0-means no predicted peak. “Literature Evidence 1” and “Literature Evidence 2” encodes the literature support from Ingenuity IPA (http://www.ingenuity.com/products/ipa) and Genomatix (https://www.genomatix.de), respectively. “Consensus Maximized Score (CM)” is the output of the consensus maximization. “Rank by EP” is the ranking of targets in the increasing order of the values in “Expression based Prediction (EP)”. “Rank by CM” is the ranking of targets in the decreasing order of the values in “Consensus Maximized Score (CM)”.