| Literature DB >> 32039730 |
Logan A Walker1,2, Michael G Sovic2, Chi-Ling Chiang2,3, Eileen Hu2,3, Jiyeon K Denninger4, Xi Chen2, Elizabeth D Kirby4,5, John C Byrd2,3, Natarajan Muthusamy2,3, Ralf Bundschuh6,7,8,9, Pearlly Yan10,11.
Abstract
BACKGROUND: Direct cDNA preamplification protocols developed for single-cell RNA-seq have enabled transcriptome profiling of precious clinical samples and rare cell populations without the need for sample pooling or RNA extraction. We term the use of single-cell chemistries for sequencing low numbers of cells limiting-cell RNA-seq (lcRNA-seq). Currently, there is no customized algorithm to select robust/low-noise transcripts from lcRNA-seq data for between-group comparisons.Entities:
Keywords: Differential gene expression analysis; Pre-filtering; RNA-seq; Rare cells; Ultralow input
Mesh:
Year: 2020 PMID: 32039730 PMCID: PMC7008572 DOI: 10.1186/s12967-020-02247-6
Source DB: PubMed Journal: J Transl Med ISSN: 1479-5876 Impact factor: 5.531
Fig. 2CLEAR Workflow: bin-based coverage analysis by transcript expression. a Data analysis workflow using CLEAR to preprocess lcRNA-seq data. Step 1: Trimmed lcRNA-seq reads are aligned to the reference genome; Step 2: μi, the mean of the positional distribution of aligned reads along each individual transcript, is determined; Step 3: Transcript positional means, μi, (y-axis) are ranked and then binned by the transcript read coverage (x-axis). When μi of a bin is ≈ 0, the read distribution is symmetrical along the length of the transcript. When μi within a bin develops a bimodal distribution with a mode toward + 1 (TTS) and − 1 (TSS), its values will deviate from 0; Step 4: All available transcripts, binned into groups of 250 are fitted to a bimodal distribution model. The emergence of a bimodal distribution identifies when aggregate μi start to deviate from a unimodal distribution around the center of the transcripts, indicated by a change in the fitting parameters a and b; Step 5: When either of the model parameters exceed a value of 2 (indicated by a gray line), transcripts beyond that point are excluded by CLEAR for differential gene expression and other downstream analysis; Step 6: CLEAR transcripts are used in downstream between-group analyses such as hierarchical clustering; b example lcRNA-seq read coverage plots. Read coverage plot for GAPDH depicts a transcript with μi ~ 0, RPS7 depicts a transcript close to the CLEAR cutoff, while DDAH2 depicts a transcript deemed too noisy by CLEAR; c CLEAR profiles for 10-, 100- and 1000-pg input mass lcRNA-seq data. The value of μi is plotted for the 7000 highest expressed primary transcripts for three representative samples. The red line depicts the CLEAR filtering threshold; d violin plots of the same data as shown in c. The end marks indicate the window extrema and the middle bar indicates the mean
Fig. 1lcRNA-seq analyses between sample groups without application of CLEAR. a Workflow for total RNA extraction and QC analysis from FACS-derived CD5+ and CD5− CLL cells as input for lcRNA-seq library generation for the development of CLEAR; b the DEG counts as determined by DESeq2. Contrary to expectation, the 10-pg input has more DEGs than the 100-pg input replicates; c shared DEGs between the three input groups showing more DEGs that are unique than shared; d Unsupervised analysis of CD5+ and CD5− samples by total RNA input amount. PCA reveals sample replicates separated by biological groupings at the 1000- and the 100-pg input level but not at the 10-pg level. CLL Chronic lymphocytic leukemia, PBMC peripheral blood mononuclear cells, FACS fluorescence activated cell sorting, BioA agilent bioanalyzer RNA assay, DEGs differentially expressed genes, PC principal component
Fig. 3lcRNA-seq analyses between sample groups after application of CLEAR. a CLEAR transcripts shared between samples of each input mass group. The red bars depict the number of CLEAR transcripts found in all 6 samples (replicates in both CD5+ and CD5− groups); bLeft: CLEAR transcripts overlap between 10-pg and 100-pg input mass samples; Right: CLEAR transcripts overlap between 100-pg and 1000-pg input mass samples; c DEG counts between CD5+ vs. CD5− cell types using only the shared CLEAR transcripts. The inset shows the data from Fig. 1b without the application of CLEAR; dBottom: Overlap of DEGs from the 100-pg and 1000-pg inputs (top repeats data from Fig. 1c without CLEAR); e PCA plots separating CD5+ and CD5− groups for all input masses using only CLEAR transcripts (inset repeats data from Fig. 1d without CLEAR). PC principal component
Fig. 4Application of CLEAR to a mouse neural lcRNA-seq experiment. a Schematic of cell isolation and preparation for sequencing. The dentate gyri of Nestin-GFP mice were microdissected and dissociated into a single cell suspension. Cells were labeled with fluorescently conjugated antibodies against markers for specific populations of cells present in the hippocampus. GFP+ GLAST+ stem cells, GFP+ GLAST− progenitor cells and GFP-GLAST+ astrocytes were isolated from live cells that were negative for microglial, oligodendroglial and endothelial markers. SMART-seq libraries were generated from these sorted cells; b The means and ranges of CLEAR transcripts from each cell type (4 biological replicates per group). All groups are significantly different when compared using a t test (p < 0.01); c PCA analyses by murine neuronal cell types. Top Panels: PCA plots using all available transcripts; Bottom Panels: PCA plots using only CLEAR transcripts; d Normalized DESeq2 transcript counts for 4 genes that pass CLEAR and are known to be differentially expressed in murine neural stem cells, progenitors and astrocytes are used to confirm the identity of the cell populations derived from the staining and FACS strategies used to enrich these three cell populations. Boxplots: orange line, mean CLEAR transcripts for four biological replicates per neural cell type; whiskers: displaying 1.5× the inter-quartile range (IQR) beyond the first and the third quartiles; circles: outliers. FACS fluorescence activated cell sorting, PC principal component
Application of CLEAR and DESeq2 to murine DG cell type comparisons
| Gene | Astrocyte vs. progenitor | Progenitor vs. stem cell | Astrocyte vs. stem cell |
|---|---|---|---|
| Gfap | N.S. | ** | N.S. |
| Hopx | N.S. | **** | ** |
| Fabp7 | ** | N.S. | * |
| Grin2c | **** | N.S. | *** |
| Sox9 | F.C. | **** | F.C. |
| Neurod1 | F.C. | F.C. | F.C. |
| Dcx | F.C. | F.C. | F.C. |
| Id4 | F.C. | *** | F.C. |
| Pcna | F.C. | N.S. | F.C. |
| Mcm2 | F.C. | F.C. | F.C. |
| Ascl1 | F.C. | F.C. | F.C. |
| Eomes | F.C. | F.C. | F.C. |
| Nes | F.C. | F.C. | F.C. |
| Neurog2 | F.C. | F.C. | F.C. |
Genes known to be differentially expressed in murine astrocytes, stem cells and progenitors. lcRNA-seq data were preprocessed using CLEAR prior to between-group comparisons using DESeq2. Differential gene expression analysis evaluates the effectiveness of staining and FACS strategies in enriching these three cell types. The bisecting line indicates the border between transcripts which pass CLEAR in all samples and those that do not pass CLEAR in all samples. Each comparison was processed using DESeq2 and the significance of the comparison is given. q: FDR-corrected p-values; N.S. (Not Significant), F.C. (Failed CLEAR), * (q ≤ 0.05), ** (q ≤ 0.01), *** (q ≤ 0.001), **** (q ≤ 0.0001)