| Literature DB >> 32041730 |
Yoshinori Fukasawa1, Luca Ermini2, Hai Wang2, Karen Carty2, Min-Sin Cheung1.
Abstract
We propose LongQC as an easy and automated quality control tool for genomic datasets generated by third generation sequencing (TGS) technologies such as Oxford Nanopore technologies (ONT) and SMRT sequencing from Pacific Bioscience (PacBio). Key statistics were optimized for long read data, and LongQC covers all major TGS platforms. LongQC processes and visualizes those statistics automatically and quickly.Entities:
Keywords: Long read; Oxford Nanopore; PacBio; Quality control; third generation sequencers
Mesh:
Year: 2020 PMID: 32041730 PMCID: PMC7144081 DOI: 10.1534/g3.119.400864
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Schematic diagram of non-sense reads and example plots for E.coli genome. (A) Blue rectangles represent normal read derived from large molecules such as genomic DNA and orange rectangle shows non-sense read. Non-sense reads have no coverage due to randomness or an even higher error rate. (B) whisker plots for standardized per-read coverage in two challenging and two normal datasets. Standardized per-read coverage is centered by mean of per-read coverage values and divided by standard deviation of per-read coverage values. Blue lines represent 3 standard deviations. (C) read length histograms for the same datasets.
Figure 2Effects of E. coli filter on ONT A. thaliana dataset. Top panels were generated from the original dataset and bottom panels show plots after E. coli read removal. (A, D) Distribution of per-read coverage. (B, E) GC content distributions. (C, F) Length distributions. Yellow boxes highlighted the spikes that disappeared after E. coli read removal.