| Literature DB >> 34983369 |
Shaoheng Liang1,2, Jason Willis3, Jinzhuang Dou1, Vakul Mohanty1, Yuefan Huang1, Eduardo Vilar3, Ken Chen4.
Abstract
Cellular heterogeneity underlies cancer evolution and metastasis. Advances in single-cell technologies such as single-cell RNA sequencing and mass cytometry have enabled interrogation of cell type-specific expression profiles and abundance across heterogeneous cancer samples obtained from clinical trials and preclinical studies. However, challenges remain in determining sample sizes needed for ascertaining changes in cell type abundances in a controlled study. To address this statistical challenge, we have developed a new approach, named Sensei, to determine the number of samples and the number of cells that are required to ascertain such changes between two groups of samples in single-cell studies. Sensei expands the t-test and models the cell abundances using a beta-binomial distribution. We evaluate the mathematical accuracy of Sensei and provide practical guidelines on over 20 cell types in over 30 cancer types based on knowledge acquired from the cancer cell atlas (TCGA) and prior single-cell studies. We provide a web application to enable user-friendly study design via https://kchen-lab.github.io/sensei/table_beta.html .Entities:
Keywords: Cell type abundance; Clinical trial; Sample size estimation; Single-cell profiling; Tissue heterogeneity
Mesh:
Year: 2022 PMID: 34983369 PMCID: PMC8728970 DOI: 10.1186/s12859-021-04526-5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Framework of Sensei. a–f show side-by-side the way Sensei (right) models a controlled clinical study (left). a A controlled study involves a control group and a case group for ascertaining the difference in the proportions of T cells between the two groups. b Sensei models the true biological between-group difference and within-group variance using beta distributions. Correlation is also modeled for matched pairs study design. c A biopsy is extracted from each participant and assayed by a single-cell technology. Cell types are identified in silico. d Sensei models technical variations introduced by limited cell number using a binomial distribution (with other technical variations already accounted for in b). e The t-test is performed to identify statistically significant differences. f Sensei infers the distribution of the t-statistics and calculate the false negative (type II error) rates. g A sample input for Sensei. Required are sample sizes, cell numbers, estimated proportions of the cell type and false positive rate (type I error) rate for t-test. h A sample output of Sensei, corresponding to (g). Tabulated are false negative rates for each feasible sample size
Fig. 2Results of simulation studies. a Comparison of false negative rate (y-axis) known from simulation against those estimated by Sensei and by the legacy approach, using datasets sampled from a beta-binomial distribution. Number of samples in the case group is indicated on the x-axis and in the control group by different colors. Markers correspond to result from different approaches. The average error is the mean absolute relative difference between the estimation and the simulation. b Comparison of false negative rates calculated by Sensei and the legacy approach, with those generated by simulation on the proportions of T cells in tumor and juxtatumoral samples in a breast cancer study
Fig. 3Sample size estimated by Sensei. a Estimated sample size for detecting statistically significant difference in normal tissue and primary tumor using an one-sided Welch’s t-test at a significance level of 0.05 with 80% power (the same below). Estimations for unpaired test and paired test are shown in blue and yellow, respectively. Estimations are for infinite (the legacy approach, left end of a whisker), 1,000 (left bar), 384 (right bar, may overlap with the left one), and 100 (right end of a whisker) cells. Fewer cells per sample would require more samples to ascertain an effect. The estimated sample size is for each of the two group in a controlled study, not jointly. For matched-pairs study, it is the same as the number of participants. Sample sizes larger than 200 are omitted. The direction of change in cell type abundance is shown by an arrow. An up arrow indicates a higher abundance in primary tumor compared with normal tissue, and vice versa. b Estimated sample size for detecting statistically significant difference in primary tumor and recurrent tumor for low grade glioma (LGG) and glioblastoma multiforme (GBM) patients. An up arrow indicates a higher abundance in recurrent tumor compared with primary tumor, and vice versa. c Estimated sample size for detecting statistically significant difference in each immune cell type between microsatellite instability-high (MSI-H) and microsatellite stable (MSS) tumor samples in uterine corpus endometrial carcinoma (UCEC), colon adenocarcinoma (COAD), and stomach adenocarcinoma (STAD). An up arrow indicates a higher abundance in MSI-H tumor compared with MSS tumor, and vice versa. d Estimated sample size for detecting statistically significant difference between pre- and post- treatment samples from metastatic melanoma patients. An up arrow indicates a higher abundance in post-treatment tumor compared with pre-treatment tumor, and vice versa.
Source data for generating this figure is included in Additional file 2
Estimated power and actual p-value of the comparisons
| Cell types | Power Estimation (%) | Reported | ||||
|---|---|---|---|---|---|---|
| Before versus health | After versus health | After versus before | Before versus health | After versus health | After versus before | |
| B | 92.6 | 15.2 | 95.5 | 5E−6 | 0.20 | 4E−5 |
| CD4 T | 18.9 | 88.4 | 92.5 | 0.13 | 0.16 | 3E−4 |
| CD8 T | 99.9 | 89.7 | 18.6 | 0.003 | 0.21 | 4E−5 |
| NK | 95.7 | 52.3 | 9.7 | 0.049 | 0.13 | 0.74 |
False negative rate for estimating T-cell abundance changes in colorectal mucosa
| Control | Experimental | ||||||
|---|---|---|---|---|---|---|---|
| 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
| 4 | 0.214 | 0.166 | 0.138 | 0.121 | 0.110 | 0.101 | 0.095 |
| 5 | 0.167 | 0.116 | 0.088 | 0.071 | 0.060 | 0.052 | 0.046 |
| 6 | 0.141 | 0.089 | 0.062 | 0.047 | 0.037 | 0.030 | 0.026 |
| 7 | 0.124 | 0.072 | 0.047 | 0.033 | 0.025 | 0.019 | 0.015 |
| 8 | 0.113 | 0.061 | 0.038 | 0.025 | 0.018 | 0.013 | 0.010 |
| 9 | 0.105 | 0.054 | 0.031 | 0.020 | 0.013 | 0.009 | 0.007 |
| 10 | 0.099 | 0.048 | 0.026 | 0.016 | 0.010 | 0.007 | 0.005 |
Required parameters
| Parameter | Notation |
|---|---|
| Number of samples under condition | |
| Number of cells in each sample | |
| Mean and standard deviation of proportions for each beta distribution | |
| False positive rate |