| Literature DB >> 32520350 |
Saber Hafezqorani1,2, Chen Yang1,2, Theodora Lo1, Ka Ming Nip1,2, René L Warren1, Inanc Birol1,3.
Abstract
BACKGROUND: Compared with second-generation sequencing technologies, third-generation single-molecule RNA sequencing has unprecedented advantages; the long reads it generates facilitate isoform-level transcript characterization. In particular, the Oxford Nanopore Technology sequencing platforms have become more popular in recent years owing to their relatively high affordability and portability compared with other third-generation sequencing technologies. To aid the development of analytical tools that leverage the power of this technology, simulated data provide a cost-effective solution with ground truth. However, a nanopore sequence simulator targeting transcriptomic data is not available yet.Entities:
Keywords: RNA-seq; nanopore sequencing; sequence simulation; transcriptome
Year: 2020 PMID: 32520350 PMCID: PMC7285873 DOI: 10.1093/gigascience/giaa061
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:Benchmarking Trans-NanoSim and DeepSimulator on the human direct RNA dataset. A. Comparison of length distributions of experimental reads and simulated reads generated by Trans-NanoSim and DeepSimulator. B. The length of consecutive match/error bases of empirical and simulated reads, as indicated. C. Transcript expression levels measured from simulated reads versus the same measured from experimental reads.
Figure 2:Homopolymer simulation performance on the human direct RNA dataset. The x-axis shows the reference homopolymer length (nt) and y-axis is the mean homopolymer length (nt) on corresponding reads. The distributions for A and T homopolymers are trimmed at 40 nt.
Figure 3:Schematic overview of the Trans-NanoSim pipeline. The first stage (Characterization) of the pipeline aligns input ONT transcriptome reads against the reference transcriptome and genome to statistically model the read length distribution and error modes. It also optionally detects intron retention events and quantifies transcript expression. These profiles alongside the homopolymer model are then used in the second stage (Simulation) to generate simulated reads, also reporting their associated error profiles.