| Literature DB >> 30902100 |
Aaron T L Lun1, Samantha Riesenfeld2, Tallulah Andrews3, The Phuong Dao4, Tomas Gomes3, John C Marioni5,6,7.
Abstract
Droplet-based single-cell RNA sequencing protocols have dramatically increased the throughput of single-cell transcriptomics studies. A key computational challenge when processing these data is to distinguish libraries for real cells from empty droplets. Here, we describe a new statistical method for calling cells from droplet-based data, based on detecting significant deviations from the expression profile of the ambient solution. Using simulations, we demonstrate that EmptyDrops has greater power than existing approaches while controlling the false discovery rate among detected cells. Our method also retains distinct cell types that would have been discarded by existing methods in several real data sets.Entities:
Keywords: Cell detection; Droplet-based protocols; Empty droplets; Single-cell transcriptomics
Mesh:
Substances:
Year: 2019 PMID: 30902100 PMCID: PMC6431044 DOI: 10.1186/s13059-019-1662-y
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Cell-calling results from different algorithms in simulations based on the PBMC dataset. Simulation scenarios are labelled as G1/G2 where G1 and G2 are the number of barcodes in the group of large and small cells, respectively. a The recall for each method, defined as the proportion of detected cells from each group. EmptyDrops was used with an FDR threshold of 0.1%. b The observed FDR in the set of libraries detected by EmptyDrops at a range of nominal FDR thresholds (dotted lines), defined as the proportion of detected droplets that are empty. In both plots, each point represents the result of one simulation iteration, the bar represents the mean value across 10 iterations, and the error bars represent the standard error of the mean
Fig. 2Application of EmptyDrops and other cell detection methods to one sample of the placenta dataset. a A barcode rank plot showing the fitted spline used for knee point detection in EmptyDrops. The detected knee and inflection points are also shown. b The negative log-likelihood for each barcode in the multinomial model of EmptyDrops, plotted against the total count. Barcodes detected as putative cell-containing droplets at an FDR of 0.1% are labelled in red. Only barcodes with t>T are shown. c An UpSet plot [20] of the barcodes detected by each combination of methods (vertical bars). Horizontal bars represent the number of barcodes detected by each method. d Histogram outlines of the log-total count for barcodes detected by each method
Fig. 3t-SNE plots for the placenta dataset (a, b) or the 900 neuron dataset (c, d), constructed from barcodes that were detected with EmptyDrops and/or CellRanger. Each point represents a barcode and is colored based on a, c whether it was detected as a cell with each method; b the expression of monocyte marker genes KCNA5, CFP, STX11, and S100A12; or d the expression of interneuron marker genes Gad1, Gad2, and Sla6c1. Expression of the relevant marker set in each barcode was quantified as the sum of the normalized log-expression values across all marker genes. Arrows mark the putative monocyte and interneuron populations in each dataset