| Literature DB >> 30583719 |
Dario Romagnoli1, Giulia Boccalini2, Martina Bonechi2, Chiara Biagioni2,3, Paola Fassan4, Roberto Bertorelli4, Veronica De Sanctis4, Angelo Di Leo3, Ilenia Migliaccio2, Luca Malorni2,3, Matteo Benelli5.
Abstract
BACKGROUND: New single-cell isolation technologies are facilitating studies on the transcriptomics of individual cells. Bio-Rad ddSEQ is a droplet-based microfluidic system that, when coupled with downstream Illumina library preparation and sequencing, enables the monitoring of thousands of genes per cell. Sequenced reads show unique features that do not permit the use of freely available tools to perform single cell demultiplexing.Entities:
Keywords: Bioinformatics; Single-cell transcriptomics; scRNA-seq
Mesh:
Substances:
Year: 2018 PMID: 30583719 PMCID: PMC6304778 DOI: 10.1186/s12864-018-5249-x
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Schematic of Bio-Rad ddSEQ/Illumina reads’ structure (Top) In BioRad ddSEQ, barcoded beads capture mRNA molecules through hybridization with mRNA poly-A tails. Each single DNA strand is characterized by the following structure: a phase block (PB), three barcode blocks (BC1, BC2, BC3) interlinked by two different linkers (L1 and L2), and one UMI flanked by two trinucleotides (ACG and GAC). (Middle) Read 1 (R1) contains molecular tags while Read 2 (R2) contains the information of the mRNA sequence (R1 and R2 are not in scale). (Bottom) Separation of cDNA from the beads can occur at different nucleotides within the PB, thus making the position of the two linkers variable
Fig. 2Application of ddSeeker to Illumina test dataset a Dot plot reporting the number of reads for which ddSeeker and BaseSpace identify the same barcode and UMI (Matched BC and UMI), ddSeeker and BaseSpace do not identify valid barcode and/or UMI (unretrieved), only ddSeeker or BaseSpace were able to identify barcode and UMI (ddSeeker- and BaseSpace- only) and the number of reads identified with different barcode and/or UMI by the two tools (Mismatched BC and UMI). b Cumulative fraction of reads per cell in the 5,000 most read barcodes for matched BC reads (solid black line) and ddSeeker-only reads (dashed grey line). Vertical line corresponds to the number of valid cells based on the knee analysis from the Illumina BaseSpace report (n=355). c Scatter plot of the number of reads with matched BC versus the number of ddSeeker-only reads for the 355 valid cells. R is the Pearson’s correlation coefficient. d Scatter plot of the averaged normalized expression across the 100 most read cells of the 200 most expressed human genes following ddSeeker (y-axis) versus BaseSpace (x-axis) pipelines
Sequencing run summary of the in-house dataset
| Run 1 | Run 2 | |
|---|---|---|
| Quality | Low | High |
| # Libraries | 6 | 6 |
| Clusters PF (%) | 33.05 | 93.9 |
| Q30 (%) | 70.9 | 82.6 |
| Total Reads | 112993193 | 197600946 |
| Valid Reads | 70826303 (63%) | 180526539 (91%) |
| # Cells (expected) | 1800 | 1800 |
| # Cells (ddSeeker) | ≈3000 | ≈3000 |
Valid reads are reads with valid barcodes and UMI. Expected number of cells is based on cell capture efficiency, as declared by Bio-Rad
Fig. 3Application of ddSeeker to in-house dataset a Dot plot of the percentage of valid reads (PASS) and reads with errors in barcode and/or UMI in the low (Run 1) and high (Run 2) sequencing quality read sets. Details regarding error classification are reported in Additional file 1. b Number of reads per cell barcodes across the 5000 most read barcodes in the two read sets. Vertical dashed line indicates the number of expected cells (1800) in our libraries based on ddSEQ specifics. c Cumulative fraction of reads per cell in the 5000 most read barcodes in the two read sets. All these plots can be generated with make_graphs.R, a dedicated R script included in the ddSeeker package