| Literature DB >> 32611394 |
Stefan Niebler1, André Müller1, Thomas Hankeln2, Bertil Schmidt3.
Abstract
BACKGROUND: Obtaining data from single-cell transcriptomic sequencing allows for the investigation of cell-specific gene expression patterns, which could not be addressed a few years ago. With the advancement of droplet-based protocols the number of studied cells continues to increase rapidly. This establishes the need for software tools for efficient processing of the produced large-scale datasets. We address this need by presenting RainDrop for fast gene-cell count matrix computation from single-cell RNA-seq data produced by 10x Genomics Chromium technology.Entities:
Keywords: Big data; Locality sensitive hashing; RNA; Single-cell sequencing
Mesh:
Year: 2020 PMID: 32611394 PMCID: PMC7329424 DOI: 10.1186/s12859-020-03593-4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1General workflow of RainDrop
Fig. 2Pre-processing stage of RainDrop
Fig. 3Mapping stage of RainDrop
Runtime for different datasets and 16 threads. Datasets are ordered by genome and size
| Dataset | neurons_900 | neuron_9k | t_4k | pbmc8k |
|---|---|---|---|---|
| # Reads | 52.8M | 383.4M | 335.2M | 784.1M |
| Filesize | 18.3GB | 145GB | 117GB | 273GB |
| RainDrop | 1m59s | 19m24s | 14m50s | 36m10s |
| Alevin | 6m55 | 32m15s | 33m31s | 84m36s |
| Cell Ranger | 60m23s | 350m26s | 291m21s | 804m17s |
Runtimes (Speedup) of RainDrop for different numbers of CPU threads on the neurons_900 and neurons_9k dataset
| Method | RainDrop | |
|---|---|---|
| Dataset | neurons_900 | neurons_9k |
| 1 Threads | 18m42s (1) | 189m15s (1) |
| 2 Threads | 10m58s (1.67) | 128m12s (1.48) |
| 4 Threads | 5m42s (3.21) | 59m34s (3.18) |
| 8 Threads | 3m16s (5.62) | 31m55s (5.94) |
| 16 Threads | 1m59s (9.25) | 19m24s (9.76) |
Runtime and memory consumption for database/index creation
| Method | RainDrop | Alevin | ||
|---|---|---|---|---|
| Dataset | mm10-3.0.0 | GRCh38-3.0.0 | mm10-3.0.0 | GRCh38-3.0.0 |
| Time | 0m54s | 1m03s | 2m55s | 3m35s |
| Size | 1.3GB | 1.5GB | 2.6GB | 3.2GB |
Cell-agreement matrix (C) of each method compared to RainDrop for different datasets. Datasets are ordered by genome and size. Columns of the matrices show the average value of genes present (left) or absent (right) in cells calculated by RainDrop. Rows show the average gene count present (top) or absent (bottom) in cells calculated by CellRanger (Alevin)
| Method | ||||
|---|---|---|---|---|
| Alevin | ||||
| CellRanger | ||||
| Reference used | mm10-3.0.0 | mm10-3.0.0 | GRCh38-3.0.0 | GRCh38-3.0.0 |
Gene-agreement matrix (G) of each method compared to RainDrop for different datasets. Datasets are ordered by genome and size. Columns of the matrices show the average value of cells containing (left) or not containing (right) a gene calculated by RainDrop. Rows show the average count of cells containig (top) or not containing (bottom) a gene calculated by CellRanger (Alevin)
| Method | ||||
|---|---|---|---|---|
| Alevin | ||||
| CellRanger | ||||
| Reference used | mm10-3.0.0 | mm10-3.0.0 | GRCh38-3.0.0 | GRCh38-3.0.0 |
Mean ± standard deviation of Cell-agreement (rows 1 and 2) and Gene-agreement (rows 3 and 4) of each method compared to RainDrop for different datasets. Datasets are ordered by genome and size
| Method | ||||
|---|---|---|---|---|
| Alevin | 0.97±0.010 | 0.97±0.010 | 0.96±0.013 | 0.95±0.012 |
| CellRanger | 0.98±0.006 | 0.98±0.005 | 0.99±0.005 | 0.99±0.003 |
| Method | ||||
| Alevin | 0.95±0.19 | 0.95±0.19 | 0.93±0.22 | 0.92±0.23 |
| CellRanger | 0.96±0.14 | 0.95±0.17 | 0.95±0.18 | 0.94±0.19 |
| Reference used | mm10-3.0.0 | mm10-3.0.0 | GRCh38-3.0.0 | GRCh38-3.0.0 |
Spearman correlation against bulk datasets
| Method | #Reads | RainDrop | Alevin | Cell Ranger |
|---|---|---|---|---|
| neurons_900 | 52.8M | 0.793 | 0.790 | 0.790 |
| neuron_9k | 383.4M | 0.834 | 0.831 | 0.820 |
| t_4k | 335.2M | 0.741 | 0.722 | 0.740 |
| pbmc_8k | 784.1M | 0.815 | 0.799 | 0.811 |
Fig. 4Mean Cell-Agreement for different thresholds and window sizes
Fig. 5Spearman Correlation of RainDrop (dataset: neurons_900) against bulk-dataset (SRR3532922) for different thresholds and window sizes