| Literature DB >> 35065593 |
Abstract
BACKGROUND: Quality control checks are the first step in RNA-Sequencing analysis, which enable the identification of common issues that occur in the sequenced reads. Checks for sequence quality, contamination, and complexity are commonplace, and allow users to implement steps downstream which can account for these issues. Strand-specificity of reads is frequently overlooked and is often unavailable even in published data, yet when unknown or incorrectly specified can have detrimental effects on the reproducibility and accuracy of downstream analyses.Entities:
Keywords: Bioinformatics; Quality control; RNA-Sequencing
Mesh:
Year: 2022 PMID: 35065593 PMCID: PMC8783475 DOI: 10.1186/s12859-022-04572-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Strandedness proportions in simulated RNA-Seq data. Four biological samples for each species were used to generate three simulated replicates each using polyester [17] at varying read numbers with either strand-specific or non-specific reads. All samples show the correct strandedness, with the strandedness proportion was below 0.6 (unstranded) or above 0.9 (stranded; dashed lines). how_are_we_stranded_here was run using the full Ensembl cDNA annotation for each species
Fig. 2Strandedness proportions in RNA-Seq data. Strandedness proportions were evaluated for 20 studies for each h. sapiens, s. cerevisiae, and a. thaliana using how_are_we_stranded_here and varying the number of input reads sampled. Results are not included where zero reads were psuedoaligned, and triangles denote results where the proportion of reads psuedoaligned is less than 0.1. Studies for which the strandedness proportion was between 0.6 and 0.9 (dashed lines), and those which do not match the reported strandedness are highlighted. how_are_we_stranded_here was run using the full Ensembl cDNA annotation for each species