| Literature DB >> 32393363 |
Ramon Massoni-Badosa1, Giovanni Iacono1, Catia Moutinho1, Marta Kulis2, Núria Palau3, Domenica Marchese1, Javier Rodríguez-Ubreva4, Esteban Ballestar5, Gustavo Rodriguez-Esteban1, Sara Marsal3, Marta Aymerich5, Dolors Colomer2,5,6,7, Elias Campo2,6,7, Antonio Julià3, José Ignacio Martín-Subero2,6,7,8, Holger Heyn9,10.
Abstract
Robust protocols and automation now enable large-scale single-cell RNA and ATAC sequencing experiments and their application on biobank and clinical cohorts. However, technical biases introduced during sample acquisition can hinder solid, reproducible results, and a systematic benchmarking is required before entering large-scale data production. Here, we report the existence and extent of gene expression and chromatin accessibility artifacts introduced during sampling and identify experimental and computational solutions for their prevention.Entities:
Keywords: Benchmarking; Biobank; CLL; Chronic lymphocytic leukemia; Cryopreservation; PBMC; Peripheral blood mononuclear cells; RNA sequencing; Sampling; Single-cell
Mesh:
Year: 2020 PMID: 32393363 PMCID: PMC7212672 DOI: 10.1186/s13059-020-02032-0
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1The impact of sampling time on single-cell transcriptional and open chromatin profiles. a, b scRNA-seq-based tSNE or UMAP embeddings of 7378 PBMC (a, male donor) and 22,443 CLL cells (b, 3 donors) color-coded by sampling time. c Distribution of the first principal component (PC1) across processing times computed for each PBMC subtype independently. d scATAC-seq-based UMAP embedding color-coded by sampling time and highlighting major PBMC cell types. Unlabeled cluster corresponds to cells of unknown type. e Violin plot showing changes in RNA expression for the 50 genes associated with the top 50 distal (enhancer) peaks changing in accessibility (down: closing sites; up: opening sites); p value in Z score scale, Wilcoxon test *p < 0.05, **p < 0.01, ***p < 0.001. f Dot plot representing the time-dependent expression changes of the top up- and downregulated genes with a minimum log (expression) of 0.5, a minimum absolute log fold-change of 0.2 and an adjusted p value < 0.001. The arrows highlight the cold-inducible response binding protein (CIRBP) and the RNA Binding Motif Protein 3 (RBM3) genes. g M (log ratio)-A (mean average) plot showing the log2 fold-change between biased (> 2 h) and unbiased (≤ 2 h) PBMC as a function of the log average expression (Scran normalized expression values). Significant genes are colored in green (adjusted p value < 0.001), and a locally estimated scatterplot smoothing (LOESS) line is drawn in blue. h Motif enrichment analysis performed over the DNA sequences of the top 50 distal peaks with a change in accessibility (same peaks as e). i Time score distribution across processing times (female donor) calculated with the sampling time signature defined in the male PBMC donor. j Receiver operating characteristic (ROC) curve displaying the performance of a logistic regression model in classifying “biased” and “unbiased” PBMC
Fig. 2Solutions to correct or prevent sampling time-induced artifacts. a tSNEs displaying the effect of varying processing times on the transcriptome profiles of 7378 PBMC before (left) and after (right) regressing out the time score for every highly variable gene. b kBET acceptance score distribution across sampling times with or without the computational correction. c tSNE showing the effect of PBMC culturing and activation with anti-CD3 Dynabeads over 2 days. d kBET acceptance score distribution across cell types with or without cell culture/activation. e tSNE highlighting the sampling effect between cells cryopreserved immediately (fresh, 0 h) or after 24 h and 48 h stored cold (4 °C) or at RT (21 °C). f kBET acceptance score distribution across storage temperatures