| Literature DB >> 31110232 |
Bin Duan1,2, Chi Zhou1, Chengyu Zhu1, Yifei Yu1, Gaoyang Li3,4, Shihua Zhang5, Chao Zhang1, Xiangyun Ye6, Hanhui Ma7, Shen Qu1, Zhiyuan Zhang8, Ping Wang9,10, Shuyang Sun11, Qi Liu12,13.
Abstract
The recently developed single-cell CRISPR screening techniques, independently termed Perturb-Seq, CRISP-seq, or CROP-seq, combine pooled CRISPR screening with single-cell RNA-seq to investigate functional CRISPR screening in a single-cell granularity. Here, we present MUSIC, an integrated pipeline for model-based understanding of single-cell CRISPR screening data. Comprehensive tests applied to all the publicly available data revealed that MUSIC accurately quantifies and prioritizes the individual gene perturbation effect on cell phenotypes with tolerance for the substantial noise that exists in such data analysis. MUSIC facilitates the single-cell CRISPR screening from three perspectives, i.e., prioritizing the gene perturbation effect as an overall perturbation effect, in a functional topic-specific way, and quantifying the relationships between different perturbations. In summary, MUSIC provides an effective and applicable solution to elucidate perturbation function and biologic circuits by a model-based quantitative analysis of single-cell-based CRISPR screening data.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31110232 PMCID: PMC6527552 DOI: 10.1038/s41467-019-10216-x
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1General workflow of MUSIC. MUSIC comprises three steps for single-cell CRISPR screening data analysis: data preprocessing, model building, perturbation effect prioritizing. In the 1st step, besides the conventional considering of cell quality, several specific factors existed for single-cell CRISPR screening are also considered. These factors are the ratio of nonzero perturbed expression value in all cells, sgRNA efficiency and the minimal perturbed cell number per perturbation. In the 2nd step, MUSIC applies a topic model-based computational framework to derive the functional topics of each cell (including controls) with specific perturbation (PE, perturbation). In the 3rd step, MUSIC quantitatively estimates and prioritizes the individual gene perturbation effect on cell phenotypes from three different perspectives, i.e., prioritizing the gene perturbation effect as an overall perturbation effect, or in a functional topic-specific way, and quantifying the relationships between different perturbations
Fig. 2Comparisons between traditional clustering based analysis and topic model based analysis for single-cell CRISPR screening data. a Difference between traditional clustering based analysis and topic model-based analysis for single-cell CRISPR screening data when a perturbation has a significant phenotype on the cells. Both analyses can detect such phenotype change (see the cell sample with red dotted line). b Difference between traditional clustering-based analysis and topic model-based analysis for single-cell CRISPR screening data when a perturbation has a subtle phenotype on the cells. Topic modeling calculates a topic probability profile for each sample while traditional clustering just makes a hard assignment of the sample to each cluster. Therefore, in this way, topic-model-based analysis can detect such phenotype change based on the change of topic probability profile with and without perturbation, while traditional clustering based analysis failed to detect such subtle phenotype change (see the cell sample with red dotted line)
Fig. 3An illustration result of MUSIC for single-cell CRISPR screening data analysis. We take the dataset of MCF10A cells treated with doxorubicin (GSM2911346) by the updated version of CROP-seq[8] as an example, as illustrated in (a, b). The overall perturbation effect ranking lists identified by MUSIC were also compared between cells with different treatment, as illustrated in (c). a The functional annotations of each topic derived from topic modeling for dataset GSM2911346. b The overall perturbation effect ranking list and the topic-specific perturbation effect ranking list for dataset GSM2911346. c The differences of perturbation impact between different experimental conditions are demonstrated respectively for Perturb-Seq[5] and CROP-seq[7,8] data
Comparisons of detail analysis results between MUSIC and MIMOSCA
| Datasets | Technology | Demonstrated perturbation | Output | MIMOSCA | MUSIC |
|---|---|---|---|---|---|
| Mouse BMDC (3 h post-LPS, GSM2396856) | Perturb-Seq[ |
| Overall perturbation effect | — | Rank 2nd |
| Topic-specific functional perturbation effect | Immune cells activation | • Immune cells activation[ • Cell migration[ | |||
| Perturbations relationship | • | cor( cor( cor( cor( | |||
| • | cor( | ||||
| Human K562 (7 days post transduction, GSM2396858) | Perturb-Seq[ |
| Overall perturbation effect | — | Rank 2nd |
| Topic-specific functional perturbation effect | Mitochondrial function | • Heme metabolic process • Neutrophil activation[ | |||
| Perturbation relationship | — | cor( | |||
| Human K562 (cell cycle regulators, GSM2396861) | Perturb-Seq[ |
| Overall perturbation effect | — | Rank 1st |
| Topic-specific functional perturbation effect | Proliferation | Proliferation | |||
| Perturbation relationship | cor( cor( cor( |
cor(a,b) represents the Pearson correlation coefficient of topic distribution profile between perturbation a and perturbation b
Comparison of detail analysis results between MUSIC and LRICA
| Datasets | Technology | Demonstrated perturbation | Output | LRICA | MUSIC |
|---|---|---|---|---|---|
| Human K562 (3 UPR related genes, GSM2406677) | Perturb-seq[ | ATF6, PERK, IRE1α | Overall perturbation effect | — | The three perturbations’ overall perturbation effect ranks 1st |
| Topic-specific functional perturbation effect | UPR | • UPR • Apoptosis[ | |||
| Perturbation relationship | The perturbation of PERK has a greater impact than those of ATF6 and IRE1α. | TPDS(PERK) = 94.0 TPDS(IRE1α) = 23.2 TPDS(ATF6) = 11.0 | |||
| Human K562 (83 UPR related genes, GSM2406681) | Perturb-seq[ |
| Overall perturbation effect | — | Rank 1st |
| Topic-specific functional perturbation effect | UPR | UPR | |||
| Perturbation relationship | — | cor( |
cor(a,b) represents the Pearson correlation coefficient of topic distribution between perturbation a and perturbation b
TPDS(a) represents the impact score to evaluate the overall perturbation effect of perturbation a
Other representative analysis results of MUSIC
| Datasets | Technology | Demonstrated perturbation | Output | Original study | MUSIC |
|---|---|---|---|---|---|
| Mouse myeloid cell (GSE90486) | CRISP-seq[ |
| Overall perturbation effect | — | Rank 1st |
| Topic-specific functional perturbation effect | Immune cell differentiation | • Immune cell differentiation[ • Cell migration[ | |||
| Perturbation relationship | — | cor( | |||
| Human MCF10A cell (treated with doxorubicin, GSM2911346) | Updated version of CROP-seq[ |
| Overall perturbation effect | — | Rank 1st |
| Topic-specific functional perturbation effect | DNA replication | DNA replication[ | |||
| Perturbation relationship | — | cor( | |||
| Human Jurkat cell (stimulated by anti-CD3/CD28, GSM2439086~GSM2439090) | CROP-seq[ |
| Overall perturbation effect | — | Rank 6th |
| Topic-specific functional perturbation effect | TCR signature | leukocyte differentiation | |||
| Perturbations Relationship | cor( cor( cor( |
cor(a,b) represents the Pearson correlation coefficient of topic distribution between perturbation a and perturbation b
Fig. 4Evaluating the impact of the data preprocessing strategies adopted in MUSIC. a The proportion of filtered cells by quality control for all datasets. The red dash line represents the mean of the data. b The proportion of filtered cells by filtering low efficiency sgRNA for all datasets. The red dash line represents the mean of the data. c zero_rate plot of all knockouts/knockdowns in all datasets. The red dash line represents the mean value of all the knockouts/knockdowns zero_rates. d Comparisons of overall perturbation effect ranking with or without imputation/filtering for all the available datasets