| Literature DB >> 34563123 |
Nicholas F Lahens1, Thomas G Brooks1, Dimitra Sarantopoulou1,2, Soumyashant Nayak3, Cris Lawrence1, Antonijo Mrčela1, Anand Srinivasan4, Jonathan Schug5, John B Hogenesch6, Yoseph Barash7, Gregory R Grant8,9.
Abstract
BACKGROUND: The accurate interpretation of RNA-Seq data presents a moving target as scientists continue to introduce new experimental techniques and analysis algorithms. Simulated datasets are an invaluable tool to accurately assess the performance of RNA-Seq analysis methods. However, existing RNA-Seq simulators focus on modeling the technical biases and artifacts of sequencing, rather than on simulating the original RNA samples. A first step in simulating RNA-Seq is to simulate RNA.Entities:
Keywords: Benchmarking; RNA-Seq; Simulation
Mesh:
Substances:
Year: 2021 PMID: 34563123 PMCID: PMC8467241 DOI: 10.1186/s12864-021-07934-2
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Flow chart of the CAMPAREE pipeline. Diagram of flow through the CAMPAREE pipeline from FASTQ file input (top) to molecule file or FASTA output (bottom). Names for each step listed on the left side. File types for intermediates between each of the steps listed on the right side
Fig. 2Idealized coverage plots from CAMPAREE output. Representative coverage plots of real, STAR-aligned input data (pink, top) and CAMPAREE (idealized) output primed from the input data (black, bottom). Transcript models for the gene Derl2 displayed below coverage plots in dark blue. Image captured from the UCSC genome browser. Input and CAMPAREE data were generated from a mouse liver sample (9576; GSM2599715)
Fig. 3CAMPAREE adds allele-specific expression support to BEERS, Polyester, and RSEM. Representative coverage plots of simulated RNA-Seq data created by Polyester (red), BEERS (blue), and RSEM (orange) from CAMPAREE output (black). Separate coverage plots for signal from each parental allele are displayed on the left and right. Transcript models for the gene Polr2j displayed below coverage plots in dark blue. Note, Polyester, BEERS, and RSEM depth of coverage appears lower than CAMPAREE because they are displaying coverage from short reads, rather than full length transcripts. Image captured from the UCSC genome browser
Fig. 4Variants introduced by CAMPAREE are maintained in BEERS and Polyester output. Coverage plots and alignments for reads simulated by BEERS and Polyester from the two terminal exons of Polr2j. Black rectangles highlight variants specific to each parental allele. Red lines on left indicate a ‘T’ substitution present in all alignments from parent 2 allele. Orange lines on the right indicate a ‘G’ substitution present in all alignments from parent 2 allele. Similar results for RSEM are displayed in Additional file 7