Literature DB >> 32513247

CB2 improves power of cell detection in droplet-based single-cell RNA sequencing data.

Zijian Ni¹, Shuyang Chen¹, Jared Brown¹, Christina Kendziorski².

Abstract

An important challenge in pre-processing data from droplet-based single-cell RNA sequencing protocols is distinguishing barcodes associated with real cells from those binding background reads. Existing methods test barcodes individually and consequently do not leverage the strong cell-to-cell correlation present in most datasets. To improve cell detection, we introduce CB2, a cluster-based approach for distinguishing real cells from background barcodes. As demonstrated in simulated and case study datasets, CB2 has increased power for identifying real cells which allows for the identification of novel subpopulations and improves the precision of downstream analyses.

Entities: CellLine Chemical Disease Gene Species

Keywords: Cell detection; Droplet-based protocols; Single-cell RNA-seq

Mesh：

Year: 2020 PMID： 32513247 PMCID： PMC7278076 DOI： 10.1186/s13059-020-02054-8

Source DB: PubMed Journal: Genome Biol ISSN： 1474-7596 Impact factor: 13.583

Background

Droplet-based single-cell RNA sequencing (scRNA-seq) [1] is a powerful and widely used approach for profiling genome-wide gene expression in individual cells. Current commercial droplet-based technologies utilize gel beads [2], each containing oligonucleotide indexes made up of bead-specific barcodes combined with unique molecular identifiers (UMIs) [3] and oligo-dT tags to prime polyadenylated RNA. Single cells of interest are combined with reagents in one channel of a microfluidic chip, and gel beads in another, to form gel beads in emulsion, or GEMs. Oligonucleotide indexes bind polyadenylated RNA within each GEM reaction vesicle before gel beads are dissolved releasing the bound oligos into solution for reverse transcription. By design, each resulting cDNA molecule contains a UMI and a GEM-specific barcode. Indexed cDNA is pooled for PCR amplification and sequencing resulting in a data matrix of UMI counts for each barcode (Additional file 1: Figure S1). Ideally, each barcode will tag mRNA from an individual cell, but this is often not the case in practice. In most datasets, more than 90% of GEMs do not contain viable cells, but rather contain ambient RNA excreted by cells in solution or as a product of cell lysis [2]. As a result, an important challenge in pre-processing droplet-based scRNA-seq data is distinguishing those barcodes corresponding to real cells from those binding ambient, or background, RNA. Early methods to address this challenge defined real cells as those barcodes with total read counts exceeding some threshold [1, 2]. Such methods are suboptimal as they discard small cells as well as those expressing relatively few genes. To address this, Lun et al. [4] developed EmptyDrops (ED), an approach to identify individual barcodes with distributions varying from a background distribution. Similar to previous approaches [1, 2], ED identifies an upper threshold and defines real cells as those barcodes with counts above the threshold. As a second step, ED uses all barcodes with counts below a lower threshold to estimate a background distribution of ambient RNA against which remaining barcodes are tested. Those having expression profiles significantly different from the background distribution are deemed real cells. The ED approach is current state-of-the-art in the field. However, given that ED performs tests for each barcode individually, it does not leverage the strong correlation observed between cells and, consequently, compromises power for identifying cells in many datasets. To increase the power for identifying real cells, we propose CB2, a cluster-based approach for distinguishing real cells from background barcodes in droplet-based scRNA-seq experiments. CB2 extends the ED framework by introducing a clustering step that groups similar barcodes, then conducts a statistical test to identify groups with expression distributions that vary from the background (Fig. 1, Additional file 1: Figure S2). CB2 is implemented in the R package scCB2.

Fig. 1

Overview of CB2. a Projection of a hypothetical cell population containing three subpopulations (red, green, and blue where intensity corresponds to read depth). CB2 takes as input a gene by barcode matrix of UMI counts and returns a gene by cell matrix. b High-count barcodes with counts above a pre-specified upper threshold are considered real cells; barcodes with counts below a lower threshold are used to estimate a background distribution (Additional file 1: Figure S2). The remaining barcodes are clustered, and tight clusters are tested as a group against the estimated background distribution; barcodes not in tight clusters are tested individually (not shown). High-count barcodes and those identified by CB2 are retained for downstream analysis

Results

CB2 was evaluated and compared with ED on simulated and case study data. In SIM IA, counts are generated as in Lun et al. [4]. Briefly, given an input dataset, an inflection point dividing low from high-count barcodes is determined. Low count barcodes are pooled to estimate the background distribution. Background barcodes are sampled from this distribution to match the total number and size of barcodes below the inflection point in the input dataset. Six thousand real cells are then generated as follows. First, 2000 barcodes are randomly sampled from the high-count barcodes (referred to as G1 cells [4]); a second set of 2000 high-count barcodes is sampled and then downsampled by 90% to give G2 cells; the third set (G1.5) is obtained by sampling 2000 barcodes from the high-count range and downsampling by 50%. We note that in Lun et al. [4], only G1 and G2 cells were considered. Here, G1.5 cells were added to better reflect real data. Additional file 1: Figure S3 shows increased power of CB2 with well controlled false discovery rate (FDR) for the 6 datasets considered in Lun et al. [4] as well as 4 additional datasets. SIM IB, also considered by Lun et al. [4], is similar to SIM IA, but in SIM IB 10% of the genes in the real cells are shuffled making the real cells more different from the background and therefore easier to identify (Additional file 1: Figure S4). Additional file 1: Figure S5 shows the increased power of CB2 is maintained. To further evaluate CB2, we applied CB2 and ED to the ten case study datasets used to generate the simulated data as well as one additional dataset considered in the ED case study and compared the number of cells identified in common as well as those uniquely identified by each approach. Additional file 2: Table S1 shows that CB2 finds 24% more cells on average (range 4–81%). Of the extra cells identified, 88% on average (range 44–100%) add to existing subpopulations. The remaining 12% (range 0–56%) make up novel subpopulations. As an example, Fig. 2 and Additional file 1: Figure S6 show results from the Alzheimer data [5] where CB2 identifies 18% more cells. A detailed look at the unique CB2 identifications suggests that the extra cells identified are not false positives, but rather they add to existing excitatory neuron and inhibitory neuron subpopulations, and also reveal a novel subpopulation consisting of 209 cells. Specifically, Fig. 2 b and c show distribution plots and an expression heatmap of the 100 genes having the highest average expression in Subpop1 (the largest subpopulation) for cells identified by both CB2 and ED as well as those identified uniquely by CB2. As shown, cells uniquely identified by CB2 have a distribution similar to other cells, and they differ from the background. Using the marker genes from Mathys et al. [5], Fig. 2d and Additional file 1: Figure S6(b) suggest that cells identified uniquely by CB2 in Subpops 1–4 are neurons, as they show relatively high expression of neuron marker genes SYT1, SNAP25, and GRIN1. More specifically, the CB2 cells in Subpops 1–2 exhibit high expression of excitatory neuronal markers whereas the cells in Subpops 3–4 appear to be inhibitory neurons (Additional file 1: Figure S6(c) and (d)). The novel subpopulation (Subpop5) uniquely shows high expression of both oligodendrocyte and astrocyte marker genes, suggesting that this group may be mixed phenotype glial cells [6] (Additional file 1: Figure S6(e) and (f)).

Fig. 2

Results from the Alzheimer dataset. a t-SNE plot of cells identified by CB2 and ED. High-count barcodes exceeding an upper threshold are identified as real cells by both methods without a statistical test (dark pink); barcodes identified as cells by both methods following statistical test are shown in pink. Cells identified uniquely by CB2 (yellow) and ED (black) are also shown. CB2 identifies an increased number of cells in existing subpopulations (Subpop1–Subpop4) and also identifies a novel subpopulation (Subpop5). b Distribution plots of the 100 genes having highest average expression in Subpop1 are shown for cells identified by both CB2 and ED (upper) and identified uniquely by CB2 (middle). The estimated background distribution is also shown (lower). Cells uniquely identified by CB2 in Subpop1 have a distribution similar to other Subpop1 cells and differ from the background. c Heatmap of log transformed raw UMI counts for the same 100 genes for barcodes identified by CB2 and ED (left) and barcodes uniquely identified by CB2 (right). d t-SNE plots of cells colored by neuron marker genes SYT1, SNAP25, and GRIN1 in all cells (upper) and those identified uniquely by CB2 (lower) By increasing the number of real cells identified, CB2 also improves the power to differentiate Alzheimer’s patients from controls. Specifically, Mathys et al. [5] profiled expression from the prefrontal cortex of 24 AD-pathology patients as well as 24 age-matched controls, and they validated differentially expressed genes in different cell types, including 9 genes in excitatory neurons and 9 in inhibitory neurons. Additional file 1: Figure S7 shows that by identifying additional cells, CB2 improves downstream differential expression analysis by resulting in more significant p values and stronger fold changes. In a second case study (PBMC8K), CB2 increases the number of cells identified across six subpopulations by over 80% (Additional file 2: Table S1). Results are shown in Fig. 3 and Additional file 1: Figure S8. Similar to the Alzheimer’s data analysis, Additional file 1: Figure S8(b) and (c) show that cells identified uniquely by CB2 in Subpop1 have an expression profile that is similar to other cells and differs from the background. Figure 3 provides a detailed look at marker gene expression for the well-characterized PBMC8K cells using markers considered in Zheng et al. [2]. As shown in Fig. 3b, CB2 identifies additional CD14+ monocytes, T cells, B cells, and megakaryocytes. Results from two additional datasets are shown in Additional file 1: Figure S9–S10.

Fig. 3

Results from the PBMC8K dataset. a t-SNE plot of cells identified by CB2 and ED. High-count barcodes exceeding an upper threshold are identified as real cells by both methods without a statistical test (dark pink); barcodes identified as cells by both methods following statistical test are shown in pink. Cells identified uniquely by CB2 (yellow) and ED (black) are also shown. CB2 increases the number of cells identified across the six subpopulations by over 80% (Additional file 2: Table S1). b Subpopulations 1–5 ordered by median normalized UMI count along with marker gene expression for each subpopulation. Marker gene expression in cells uniquely identified by CB2 is similar to that in other groups, and differs from the background. Subpopulation 5 contained no high-count common cells; subpopulation 6 contained no unique CB2 identifications and is therefore not shown in panel b

Discussion

Taken together, the results presented here demonstrate that CB2 provides a powerful approach for distinguishing real cells from background barcodes which will increase the number of cells identified in existing cell subpopulations in most datasets and may facilitate the identification of novel subpopulations. While advantages are expected in many settings, users will benefit from the following considerations. CB2 does not test for doublets or multiplets, and consequently, some of the high-count identifications may consist of two or more cells. Methods for identifying multiplets such as Scrublet [7], DoubletDecon [8], or DoubletFinder [9] may prove useful after applying CB2. A second important post-processing step is filtering based on mitochondrial expression. As noted in Lun et al. [4], any method for distinguishing cells from background barcodes is technically correct in identifying low-quality cells given that damaged cells exhibit expression profiles that differ from the background. Specifically, mitochondrial gene expression is often high in damaged cells; an example is shown in Subpopulation 5 of the PBMC8K data (Fig. 3b). Such cells are typically not of interest in downstream analysis and should therefore be removed. The GetCellMat function in R/scCB2 may be used toward this end.

Conclusions

Droplet-based scRNA-seq technologies provide unprecedented opportunity to address biological questions, but efficient pre-processing is required to maximize the information obtained in an experiment. CB2 allows investigators to maximize the number of cells retained and consequently to increase the power and precision of downstream analysis.

Methods

Versions

For cell identification with R/scCB2 0.99.12 and R/DropletUtils 1.5.4 [4, 10], the latest version of R [11] was used: 3.7-devel (2019-07-17 r76847). Other packages are not yet compatible or not stable with the R developers version, and so for scran 1.12.1 [12], Seurat 3.1.0 [13, 14], and ggplot2 3.2.1 [15], R 3.6.0 (2019-04-24 r76423) was used.

CB2

As CB2 relies on ED, we briefly review the ED approach before detailing the clustering test introduced in CB2. ED expects as input a G × B feature-by-barcode matrix with G features (for simplicity, we refer to features as genes) and B barcodes. Barcodes having zero counts for all genes are filtered out, and the remaining barcodes are divided into three groups based on the sum of gene expression (UMI) counts within a barcode. The background group, B0, contains all barcodes with counts less than or equal to a pre-defined lower threshold (defaults to 100); the high-count barcodes, B2, contain barcodes with counts exceeding an upper threshold (defaults to knee point); the remaining barcodes (B1) are tested (Additional file 1: Figure S2). ED assumes that counts from a background barcode are distributed as Dirichlet-Multinomial with probability vector estimated by averaging the counts in B0 and applying the Good-Turing algorithm [16] to ensure that all probabilities are non-zero, denoted as . For a barcode b ∈ B1, ED tests against the alternative using the log-likelihood under as the test statistic. A Monte-Carlo p value is calculated via simulating Dirichlet-Multinomial barcodes of size ∣b∣ under and calculating the proportion of simulated barcodes having a test statistic more extreme than (or equal to) b’s. The false discovery rate is controlled using the Benjamini-Hochberg procedure [17]. CB2 follows ED by filtering out genes with zero counts and dividing the remaining barcodes into three groups. However, instead of testing all barcodes from B1 individually, CB2 first clusters barcodes and then tests tight clusters to identify those that differ from the background. As in methods for genome-wide association studies (Mieth et al. 2016 [18]), gene co-expression network analysis (Botía et al., 2017 [19]), and de novo transcriptome analysis (Malik et al., 2018 [20]), clustering prior to testing increases power by reducing the total number of tests and increasing the signal to noise ratio. CB2 proceeds as follows: Barcodes grouped by size. CB2 orders barcodes in B1 by total counts where X denotes the count vector of barcode b, ∣X∣ denotes the total UMI count of barcode b, and |B1| denotes the number of barcodes in B1. Groups of size S (defaults to 1000 in R/scCB2) are constructed consisting of barcodes ranging in size from smallest to largest: where is rounded up if not an integer. If , barcodes in B1 are merged with those in B1(. Sorting barcodes by size reduces bias in the clustering and testing steps that follow. Barcodes clustered within group: Barcodes within each group B1 are clustered using hierarchical clustering with pairwise Pearson correlation as the similarity metric. A cluster is considered tight if the average within-cluster pairwise Pearson correlation exceeds a data-driven threshold. Tight clusters are retained for further analysis as described in step 3, below. To determine thresholds, ten tight clusters of varying size are simulated by generating 100 samples from a multinomial distribution with parameters (N, p) where N ranges from 100 to 1000 in increments of size 100. This range is chosen as we found little variation in thresholds for barcode sizes exceeding 1000; p is set to either or , whichever has larger Shannon entropy [21] as the distribution with larger entropy is less affected by outliers. For each simulated cluster C, the threshold κ is defined by its average pairwise Pearson correlation. A cluster is considered tight if the average within-cluster pairwise Pearson correlation exceeds κ for the simulated cluster of closest size. Tight clusters tested: For each tight cluster C, we conduct a Monte-Carlo test to assess dissimilarity from the background. Pairwise Pearson correlations are calculated between every barcode in C and ; the test statistic for cluster C, T, is defined to be the median of these correlations. Similar to ED, to simulate background barcodes, we sample barcodes from a multinomial (N; ) where N is the size of the barcode giving T. The Monte-Carlo p value is: where is the Pearson correlation between and (M defaults to 1000 in R/scCB2). Monte-Carlo p values are calculated for each cluster followed by Benjamini-Hochberg [17] to control the FDR. All barcodes within a significant cluster are identified as real cells. Individual barcodes tested: Barcodes that were not included in a tight cluster in Step 2 as well as those in a tight cluster that were not found to be significant in Step 3 are tested individually using ED. It is important to note that some of the barcodes identified in this step do not overlap with identifications made when ED is applied to the full set of barcodes given differences in the rates of real cells to background barcodes and differences in error rate control.

Simulations

Counts are generated as in Lun et al. [4]. As detailed there, each simulation requires an input dataset. We constructed simulations from 10 datasets: Alzheimer [5], PBMC8K, PBMC33K, mbrain1K, mbrain9K, PanT4K, MALT, PBMC4K, jurkat, and T293 (Additional file 2: Table S2). For each input dataset, the inflection point of the UMI count by sorted barcode plot is used to divide lower count from higher count barcodes. The barcodes in the lower count range are considered background. In SIM IA, two sets of 2000 barcodes randomly sampled from the higher count range are considered real cells. The first set of 2000 is referred to as large ( G1) cells; the second set is downsampled by 90% to give small (G2) cells. We added a third set of medium (G1.5) cells by sampling 2000 cells from the higher count range and downsampling by 50%. The process for simulating data in SIM IB is identical to SIM IA except that in SIM IB, 10% of the genes in each simulated real cell are shuffled making the real cells more different from the background barcodes and, consequently, making real cells easier to identify. SIM IA is a more realistic simulation (Additional file 1: Figure S4).

Case studies

We evaluated the 10 datasets used in the simulation and also the placenta data evaluated in Lun et al. [4]. These datasets vary in sequencing depth as well as in the extent of differences between the real cell and background distributions (Additional file 1: Figure S4). CB2 and ED were applied to each dataset using default settings. For plots that compare identifications between CB2 and ED, cells identified by either approach (or both) were combined and UMI counts were normalized via scran. The Seurat pipeline was used to generate t-SNE plots from the top 4000 most highly variable genes and top 50 principal components. Expression heatmaps show log transformed raw UMI counts. For heatmaps and distribution plots, mitochondrial and ribosomal genes were removed.

Differential expression analysis in Alzheimer data

Cells identified by CB2, ED, or both were combined into a single matrix and filtered similar to Mathys et al. [5]. Specifically, cells with mitochondrial gene expression making up 40% or more of the total UMI counts were removed; genes detected in fewer than two cells were also excluded giving a matrix of 28,208 genes and 74,579 barcodes. Normalization was performed using scran. Cell types were annotated using marker genes as in Mathys et al. [5] Differential expression (DE) tests between cells from Alzheimer’s cases and controls were conducted using Wilcoxon rank-sum tests as in Mathys et al. [5]. Results were compared for known DE genes extracted from Mathys et al. [5].

Implementation of CB2 and ED

For all simulation and case study analyses, CB2 and ED were implemented using default parameters. A target FDR was set at 1%.

Existing subpopulations vs. novel subpopulations

The FindNeighbors and FindClusters functions in Seurat were used with default settings to assign each cell to a cluster, referred to here as a subpopulation. For each subpopulation, we calculated the percentage of cells identified by both CB2 and ED as well as those identified uniquely by CB2. Subpopulations for which over 80% of the cells are uniquely identified by CB2 are referred to as novel subpopulations (Additional file 2: Table S3 shows the number of novel subpopulations identified using 70%, 80%, or 90% as thresholds). Additional file 1: Figure S1. Overview of droplet-based single-cell RNA-seq protocol. Figure S2. Visualization of barcode groups in the UMI counts v.s. barcodes plot. Figure S3. Simulation results under setting SIM IA. Figure S4. Comparison between two simulation settings. Figure S5. Simulation results under setting SIM IB. Figure S6. Additional analysis of Alzheimer dataset. Figure S7. Differential Expression analysis between Alzheimer’s disease (AD) cases and controls. Figure S8. Analysis of PBMC8K dataset. Figure S9. Analysis of mbrain1K dataset. Figure S10. Analysis of placenta dataset. Additional file 2: Table S1. The number of cells identified by CB2, ED, or both in 11 case study datasets. Table S2. Links to all datasets used in this study. Table S3. Number of novel subpopulations identified by CB2 in each dataset. Additional file 3. Review history.

18 in total

1. Comprehensive Integration of Single-Cell Data.

Authors: Tim Stuart; Andrew Butler; Paul Hoffman; Christoph Hafemeister; Efthymia Papalexi; William M Mauck; Yuhan Hao; Marlon Stoeckius; Peter Smibert; Rahul Satija
Journal: Cell Date: 2019-06-06 Impact factor: 41.582

Review 2. Orchestrating high-throughput genomic analysis with Bioconductor.

Authors: Wolfgang Huber; Vincent J Carey; Robert Gentleman; Simon Anders; Marc Carlson; Benilton S Carvalho; Hector Corrada Bravo; Sean Davis; Laurent Gatto; Thomas Girke; Raphael Gottardo; Florian Hahne; Kasper D Hansen; Rafael A Irizarry; Michael Lawrence; Michael I Love; James MacDonald; Valerie Obenchain; Andrzej K Oleś; Hervé Pagès; Alejandro Reyes; Paul Shannon; Gordon K Smyth; Dan Tenenbaum; Levi Waldron; Martin Morgan
Journal: Nat Methods Date: 2015-02 Impact factor: 28.547

3. GFAP-positive and myelin marker-positive glia in normal and pathologic environments.

Authors: C A Dyer; A Kendler; D Jean-Guillaume; R Awatramani; A Lee; L M Mason; J Kamholz
Journal: J Neurosci Res Date: 2000-05-01 Impact factor: 4.164

4. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor.

Authors: Aaron T L Lun; Davis J McCarthy; John C Marioni
Journal: F1000Res Date: 2016-08-31

5. Scrublet: Computational Identification of Cell Doublets in Single-Cell Transcriptomic Data.

Authors: Samuel L Wolock; Romain Lopez; Allon M Klein
Journal: Cell Syst Date: 2019-04-03 Impact factor: 10.304

6. DoubletFinder: Doublet Detection in Single-Cell RNA Sequencing Data Using Artificial Nearest Neighbors.

Authors: Christopher S McGinnis; Lyndsay M Murrow; Zev J Gartner
Journal: Cell Syst Date: 2019-04-03 Impact factor: 10.304

7. Integrating single-cell transcriptomic data across different conditions, technologies, and species.

Authors: Andrew Butler; Paul Hoffman; Peter Smibert; Efthymia Papalexi; Rahul Satija
Journal: Nat Biotechnol Date: 2018-04-02 Impact factor: 54.908

8. Single-cell transcriptomic analysis of Alzheimer's disease.

Authors: Hansruedi Mathys; Jose Davila-Velderrain; Zhuyu Peng; Fan Gao; Shahin Mohammadi; Jennie Z Young; Madhvi Menon; Liang He; Fatema Abdurrob; Xueqiao Jiang; Anthony J Martorell; Richard M Ransohoff; Brian P Hafler; David A Bennett; Manolis Kellis; Li-Huei Tsai
Journal: Nature Date: 2019-05-01 Impact factor: 49.962

9. EmptyDrops: distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data.

Authors: Aaron T L Lun; Samantha Riesenfeld; Tallulah Andrews; The Phuong Dao; Tomas Gomes; John C Marioni
Journal: Genome Biol Date: 2019-03-22 Impact factor: 13.583

10. Detection and removal of barcode swapping in single-cell RNA-seq data.

Authors: Jonathan A Griffiths; Arianne C Richard; Karsten Bach; Aaron T L Lun; John C Marioni
Journal: Nat Commun Date: 2018-07-10 Impact factor: 14.919

4 in total

1. Signatures of plasticity, metastasis, and immunosuppression in an atlas of human small cell lung cancer.

Authors: Joseph M Chan; Álvaro Quintanal-Villalonga; Vianne Ran Gao; Yubin Xie; Viola Allaj; Ojasvi Chaudhary; Ignas Masilionis; Jacklynn Egger; Andrew Chow; Thomas Walle; Marissa Mattar; Dig V K Yarlagadda; James L Wang; Fathema Uddin; Michael Offin; Metamia Ciampricotti; Besnik Qeriqi; Amber Bahr; Elisa de Stanchina; Umesh K Bhanot; W Victoria Lai; Matthew J Bott; David R Jones; Arvin Ruiz; Marina K Baine; Yanyun Li; Natasha Rekhtman; John T Poirier; Tal Nawy; Triparna Sen; Linas Mazutis; Travis J Hollmann; Dana Pe'er; Charles M Rudin
Journal: Cancer Cell Date: 2021-10-14 Impact factor: 31.743

2. Single-cell transcriptomics sheds light on the identity and metabolism of developing leaf cells.

Authors: Rubén Tenorio Berrío; Kevin Verstaen; Niels Vandamme; Julie Pevernagie; Ignacio Achon; Julie Van Duyse; Gert Van Isterdael; Yvan Saeys; Lieven De Veylder; Dirk Inzé; Marieke Dubois
Journal: Plant Physiol Date: 2022-02-04 Impact factor: 8.005

3. Comprehensive generation, visualization, and reporting of quality control metrics for single-cell RNA sequencing data.

Authors: Rui Hong; Yusuke Koga; Shruthi Bandyadka; Anastasia Leshchyk; Yichen Wang; Vidya Akavoor; Xinyun Cao; Irzam Sarfraz; Zhe Wang; Salam Alabdullatif; Frederick Jansen; Masanao Yajima; W Evan Johnson; Joshua D Campbell
Journal: Nat Commun Date: 2022-03-30 Impact factor: 14.919

4. Anti-bias training for (sc)RNA-seq: experimental and computational approaches to improve precision.

Authors: Philip Davies; Matt Jones; Juntai Liu; Daniel Hebenstreit
Journal: Brief Bioinform Date: 2021-11-05 Impact factor: 11.622

4 in total