| Literature DB >> 30670690 |
Xuran Wang1, Jihwan Park2, Katalin Susztak2, Nancy R Zhang3, Mingyao Li4.
Abstract
Knowledge of cell type composition in disease relevant tissues is an important step towards the identification of cellular targets of disease. We present MuSiC, a method that utilizes cell-type specific gene expression from single-cell RNA sequencing (RNA-seq) data to characterize cell type compositions from bulk RNA-seq data in complex tissues. By appropriate weighting of genes showing cross-subject and cross-cell consistency, MuSiC enables the transfer of cell type-specific gene expression information from one dataset to another. When applied to pancreatic islet and whole kidney expression data in human, mouse, and rats, MuSiC outperformed existing methods, especially for tissues with closely related cell types. MuSiC enables the characterization of cellular heterogeneity of complex tissues for understanding of disease mechanisms. As bulk tissue data are more easily accessible than single-cell RNA-seq, MuSiC allows the utilization of the vast amounts of disease relevant bulk tissue RNA-seq data for elucidating cell type contributions in disease.Entities:
Mesh:
Year: 2019 PMID: 30670690 PMCID: PMC6342984 DOI: 10.1038/s41467-018-08023-x
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Overview of MuSiC framework. MuSiC starts from scRNA-seq data from multiple subjects, classified into cell types (shown in different colors), and constructs a hierarchical clustering tree reflecting the similarity between cell types. Based on this tree, the user can determine the stages of recursive estimation and which cell types to group together at each stage. MuSiC then determines the group-consistent genes and calculates cross-subject mean (red to blue) and cross-subject variance (black to white) for these genes in each cell type. MuSiC up-weighs genes with low cross-subject variance and down-weighs genes with high cross-subject variance. In the example shown, deconvolution is performed in two stages, only cluster proportions are estimated for the first stage. Constrained by these cluster proportions, the second stage estimates cell type proportions, illustrated by the length of the bar with different colors. The deconvolved cell type proportions can then be compared across disease cohorts
Fig. 2Pancreatic islet cell type composition in healthy and T2D human samples. a, b Benchmarking of deconvolution accuracy on bulk data constructed by combining together scRNA-seq samples. a The bulk data is constructed for 10 subjects from Segerstolpe et al. while the single-cell reference is taken from the same dataset. The cell type proportions of healthy subjects are estimated by leave-one-out single cell reference. The subject names are relabeled; the table shows average root mean square error (RMSD), mean absolute deviation (mAD), and Pearson correlation (R) across all samples and cell types. b The bulk data is constructed for 18 subjects from Xin et al. while the single cell reference is six healthy subjects from Segerstolpe et al. c Jitter plots of estimated cell type proportions for Fadista et al. subjects, color-coded by deconvolution method. Of the 89 subjects from Fadista et al., only the 77 that have recorded HbA1c level are plotted, and T2D subjects are denoted as triangles while non-diabetic subjects are denoted as dots. d HbA1c vs beta cell type proportions estimated by each of 4 methods. The reported p-values are from single variable regression β cell proportion ~HbA1c. Multivariable regression results are reported in Supplementary Table 1. Supplementary Figure 7 shows the deconvolution results of Fadista et al. with the inDrop data from Baron et al. as single-cell reference. The corresponding multivariable regression results are shown in Supplementary Table 2. Source data are provided as a Source Data file
Pancreatic islet datasets
| Name | Journal | Year | Accesession # | Tissue type | Data type | Protocol | # Samples | # Cells | # Genes | # Cell types |
|---|---|---|---|---|---|---|---|---|---|---|
| Segerstolpe et al.[ | Cell Metabolism | 2016 |
| Pancreatic islet | Single-cell RNA-seq | Smart-seq2 | 10 (6 H + 4 T2D) | 2209 | 25,453 | 14 + 1 NA |
| Segerstolpe et al.[ | Cell Metabolism | 2016 |
| Pancreatic islet | Bulk RNA-seq | Smart-seq2 | 7 (3 H + 4 T2D) | NA | 25,453 | NA |
| Xin et al.[ | Cell Metabolism | 2016 |
| Islet: endocrine | Single-cell RNA-seq | Illumina HiSeq 2500 | 18 (12 H + 6 T2D) | 1492 | 39,849 | 4 |
| Baron et al.[ | Cell Systems | 2016 |
| Pancreatic islet | Single-cell RNA-seq | Illumina HiSeq 2500 (InDrop) | 3 healthy | 7729 | 17,434 | 14 + 1 NA |
| Fadista et al.[ | PNAS | 2014 |
| Pancreatic islet | Bulk RNA-seq | Illumina HiSeq 2000 | 89 | NA | 56,638 | NA |
Mouse/rat kidney datasets
| Name | Journal | Year | Accesession # | Tissue type | Data type | Protocol | # Samples | # Cells | # Genes | # Cell types |
|---|---|---|---|---|---|---|---|---|---|---|
| Park et al.[ | Science | 2018 |
| Kidney | Single-cell RNA-seq | 10x | 7 health, male | 43,745 | 16,273 | 14 + 2 novel |
| Beckerman et al.[ | Nature Medicine | 2017 |
| Kidney | Bulk RNA-seq | Illumina HiSeq 2500 | 10 (6 control + 4 APOL1) | NA | 19,033 | NA |
| Lee et al.[ | JASN | 2015 |
| Kidney tubule | Bulk RNA-seq | Illumina HiSeq 2000 | 118 replicates (14 segments) | NA | 10,903 | NA |
| Craciun et al.[ | JASN | 2015 |
| Kidney | Bulk RNA-seq | Illumina HiSeq 2000 | 18 replicates (6 time points) | NA | 25,219 | NA |
| Arvaniti et al.[ | Scientific Reports | 2016 |
| Kidney | Bulk RNA-seq | Illumina HiSeq 2000 | 10 replicates (Sham + 2 time points) | NA | 38,683 | NA |
Fig. 3Cell type composition in kidney of mouse CKD models and rat. a Cluster dendrogram showing similarity between 13 cell types that were confidently characterized in Park et al. Abbreviations: Neutro: neutrophils, Podo: podocytes, Endo: endothelials, LOH: loop of Henle, DCT: distal convolved tubule, PT: proximal tubule, CD-PT: collecting duct principal cell, CD-IC: CD intercalated cell, Macro: macrophages, Fib: fibroblasts, NK: natural killers. b–d Average estimated proportions for 6 cell types in bulk RNA-seq samples taken from three different studies, each study based on a different mouse model for chronic kidney disease. Results from three different deconvolution methods (MuSiC, BSEQ-sc and CIBERSORT) are shown by different colors. Supplementary Figure 5a–c show complete estimation results of all 13 cell types. b Bulk samples are from Beckerman et al., who sequenced 6 control and 4 APOL1 mice. c Bulk data are from Craciun et al.[9], where samples are taken before (C) and at 1, 2, 3, 7, 14 days after administering folic acid. Line plot shows cell type proportion changes over time (days), averaged over 3 replicates at each time point. d Bulk data are from Arvaniti et al.[10], where samples are taken from mice after Sham operation (C), 2 days after UUO operation (D2), and 8 days after UUO operation (D8). The average proportions at each time point are plotted. e MuSiC estimated cell type proportions of rat renal tubule segments. The estimated cell type proportions (left) and the proportions correlations between samples (right) are shown as heatmap. Segment names are color coded and aligned according to their physical positions along the renal tubule. Supplementary Figure 6a–c show NNLS, BSEQ-sc and CIBERSORT results. Segment name abbreviation: S1 S1 proximal tubule, S2 S2 proximal tubule, S3 S3 proximal tubule, SDL short descending limb, LDLOM long descending limb, outer medulla, LDLIM long descending limb, inner medulla, tAL thin ascending limb, mTAL medullary thick ascending limb, cTAL cortical thick ascending limb, DCT distal convoluted tubule, CNT connecting tubule, CCD cortical collecting duct, OMCD outer medullary collecting duct, IMCD inner medullar collecting duct. Source data are provided as a Source Data file