| Literature DB >> 33906975 |
Qi Wang1, Sven Boenigk1, Volker Boehm2, Niels H Gehring3, Janine Altmueller4, Christoph Dieterich5.
Abstract
The current ecosystem of single cell RNA-seq platforms is rapidly expanding, but robust solutions for single cell and single molecule full- length RNA sequencing are virtually absent. A high-throughput solution that covers all aspects is necessary to study the complex life of mRNA on the single cell level. The Nanopore platform offers long read sequencing and can be integrated with the popular single cell sequencing method on the 10x Chromium platform. However, the high error-rate of Nanopore reads poses a challenge in downstream processing (e.g. for cell barcode assignment). We propose a solution to this particular problem by using a hybrid sequencing approach on Nanopore and Illumina platforms. Our software ScNapBar enables cell barcode assignment with high accuracy, especially if sequencing satura- tion is low. ScNapBar uses unique molecular identifier (UMI) or Naıve Bayes probabilistic approaches in the barcode assignment, depending on the available Illumina sequencing depth. We have benchmarked the two approaches on simulated and real Nanopore datasets. We further applied ScNapBar to pools of cells with an active or a silenced non-sense mediated RNA decay pathway. Our Nanopore read assignment distinguishes the respective cell populations and reveals characteristic nonsense-mediated mRNA decay events depending on cell status. Published by Cold Spring Harbor Laboratory Press for the RNA Society.Entities:
Keywords: 10X Genomics; Bayesian approach; Cell barcode assignment; Nonsense- mediated mRNA decay; mRNA
Year: 2021 PMID: 33906975 PMCID: PMC8208055 DOI: 10.1261/rna.078154.120
Source DB: PubMed Journal: RNA ISSN: 1355-8382 Impact factor: 4.942
FIGURE 1.Combined single-cell Illumina and Nanopore sequencing strategy. GFP+/− cells are pooled and sequenced on the Illumina and Nanopore platform. The Nanopore platform generates long cDNA sequencing reads that are used in barcode calling and estimating read error parameters. The Illumina data are used to estimate the total number of cells in sequencing and the represented cell barcodes. The simulated data are then used to parameterize a Bayesian model of barcode alignment features to discriminate correct versus false barcode assignments. This model is then used on the real data to assign cell barcodes to Nanopore reads. The GFP label and known NMD transcripts can be used to validate this assignment.
FIGURE 2.Sensitivity and specificity of ScNapBar and Sicelore on 100 Illumina libraries with different levels of saturation. (A) Barcode assignment with UMI matches. (B) Barcode assignment without UMI matches (ScNapBar score >50). (C) Benchmark of the specificity and sensitivity of the Illumina library with 100% saturation. We compared the barcode assignments with ScNapBar score >1–99, and the assignments from Sicelore with UMI support are roughly equivalent to the ScNapBar score >90.
FIGURE 3.Sicelore and ScNapBar CPU time comparison. (A) ScNapBar CPU time depends on the number of whitelist barcodes (allowing an edit distance of >2 and and offset of up to 4 bp between adapter and barcode). Gray area represents the standard deviation for 10 runs. (B) Comparison of ScNapBar and Sicelore CPU times. Benchmark was measured using one million barcode sequences and 2052 barcodes in the whitelist.
FIGURE 4.Number of the Nanopore reads identified by ScNapBar and Sicelore at each processing step. We inspected each processing step on real data (low lllumina saturation of 11.3%). The first two steps are identical for both workflows. Total Reads: Number of input reads, aligned to genome: Number of reads aligned to genome. The next three steps are workflow-specific: Aligned to adapter: Number of reads with identified adapter sequence, aligned to barcode: Number of reads with aligned barcode sequence, Assigned to barcode: Number of predictions by each workflow. The last step is a validation of the previous assignment step after additional Illumina sequencing, which increases the Illumina saturation to 52%, and using UMI matches, see main text.
FIGURE 5.The t-SNE plots of gene-cell count matrices. (A) Illumina. (B) Nanopore.