| Literature DB >> 31718533 |
Alexander Davis1,2, Ruli Gao1, Nicholas E Navin3,4.
Abstract
BACKGROUND: In single cell DNA and RNA sequencing experiments, the number of cells to sequence must be decided before running an experiment, and afterwards, it is necessary to decide whether sufficient cells were sampled. These questions can be addressed by calculating the probability of sampling at least a defined number of cells from each subpopulation (cell type or cancer clone).Entities:
Keywords: Multinomial distributions; Sample size; Single cell sequencing
Mesh:
Year: 2019 PMID: 31718533 PMCID: PMC6852764 DOI: 10.1186/s12859-019-3167-9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Package functions for pmultinom. This table lists the R functions for the package “pmultinom” for calculating multinomial probabilities
| Function | Arguments | Description |
|---|---|---|
| pmultinom | lower, upper, size, probs, method | Probability that a multinomial random vector is elementwise greater than “lower” and elementwise less than or equal to “upper”. “size” and “probs” specify the parameters of the multinomial distribution. Either “lower” or “upper” may be left unspecified. |
| invert.pmultinom | lower, upper, probs, target.prob, method | Returns the “size” parameter required for pmultinom to reach the target probability “target.prob”. |
Fig. 1SCOPIT interface. a. Interface for prospective calculations. Orange lines identify the number of cells required and the target probability of detecting a specified number of each subpopulation. b. Interface for retrospective calculations. The number of cells which were sequenced is entered, and is marked on the plot with a dotted green line. In this example, the orange line is far to the left of the dotted green line, suggesting that more cells were sequenced than required to detect these three subpopulations. To quantify confidence in the results, a dotted black line is plotted that shows the lower end of a 95% credible interval for the probability. The plot title states the upper end of a 95% credible interval for the number of cells required
Comparison of Independent Approximation and Exact Calculations.
| Subpopulation frequency | # of subpopulations | Cells required (exact) | Cells required (approx.) |
|---|---|---|---|
| 0.1 | 6 | 186 | 186 |
| 0.2 | 3 | 85 | 85 |
| 0.3 | 2 | 53 | 53 |
| 0.1 | 8 | 191 | 191 |
| 0.2 | 4 | 87 | 87 |
| 0.4 | 2 | 39 | 39 |
| 0.1 | 9 | 193 | 193 |
| 0.3 | 3 | 55 | 55 |
| 0.1 | 10 | 195 | 194 |
| 0.2 | 5 | 89 | 89 |
| 0.5 | 2 | 30 | 30 |
The number of cells required to achieve a 95% certainty of sampling sufficiently many cells from each subpopulation. The number of cells was calculated in two ways: by an exact calculation, and by an approximate calculation in which the counts of different subpopulations were assumed to be independent