| Literature DB >> 26236465 |
Michael T Wolfinger1, Jörg Fallmann2, Florian Eggenhofer2, Fabian Amman3.
Abstract
Recent achievements in next-generation sequencing (NGS) technologies lead to a high demand for reuseable software components to easily compile customized analysis workflows for big genomics data. We present ViennaNGS, an integrated collection of Perl modules focused on building efficient pipelines for NGS data processing. It comes with functionality for extracting and converting features from common NGS file formats, computation and evaluation of read mapping statistics, as well as normalization of RNA abundance. Moreover, ViennaNGS provides software components for identification and characterization of splice junctions from RNA-seq data, parsing and condensing sequence motif data, automated construction of Assembly and Track Hubs for the UCSC genome browser, as well as wrapper routines for a set of commonly used NGS command line tools.Entities:
Keywords: Perl; RNA-seq; next generation sequencing; pipelines; read mapping
Year: 2015 PMID: 26236465 PMCID: PMC4513691 DOI: 10.12688/f1000research.6157.2
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Time and memory requirements of exemplary implementations of the ViennaNGS core modules, as implemented in the ViennaNGS tutorials.
Data were collected on a single core of a desktop workstation (Intel ® Core™ i7-4771 CPU @ 3.50GHz; 32GB RAM).
| Script | Input | Run time | RAM |
|---|---|---|---|
| Tutorial #0 | 4GB BAM file | 50m 30s | 5.1 GB |
| Tutorial #1 | 28GB Fasta, 16KB BED, 292KB XML | 0m 38s | 219 MB |
| Tutorial #2 | 4GB BAM, 28GB Fasta, 16KB BED | 7m 49s | 663 MB |
| Tutorial #3 | 5MB BigBed, 4MB BigWig, 4MB BigBed, 3MB BigWig | 0m 1s | 213MB |
Overview of the complementary utilities shipped with ViennaNGS.
While some of these scripts are re-implementations of functionality available elsewhere, they have been developed primarily as reference implementation of the library routines to help prospective ViennaNGS users getting started quickly with the development of custom pipelines.
| Utility | Description |
|---|---|
|
| Construct Assembly Hubs for UCSC genome browser visualization |
|
| Compute mapping/quality statistics along with publication-ready figures |
|
| Split BAM files by strand |
|
| Produce BigWig coverage profiles from BAM files for visualization |
|
| Filter uniquely and multi mapped reads from BAM files |
|
| Convert BED to (strand specific) BedGraph format |
|
| Extend genomic intervals in BED format at the 5′, 3′, or both ends |
|
| Convert bacterial RefSeq GFF3 annotation to BED12 format |
|
| Count k-mers of predefined length in FastQ and Fasta files |
|
| Compute basic statistics from MEME XML output |
|
| Create a new genome database in a local UCSC genome browser
|
|
| Compute normalized expression data in RPKM and TPM from read
|
|
| Convert splice junctions in segemehl BED6 splice junction format to
|
|
| Identify and characterize splice junctions from RNA-seq data |
|
| Construct Track Hubs for UCSC genome browser visualization |
|
| Trim sequence and quality fields in FastQ format |
Figure 1. Schematic overview of ViennaNGS components.
Core modules can be combined within a data analysis script in a flexible manner to meet individual analysis objectives and experimental setup.
Figure 2. Class diagram illustrating the relations among generic Moose classes which are used as abstract representations of genomic intervals (only attributes are shown).
Figure 3. Schematic representation of genomic interval classes in terms of their corresponding feature annotation.
Simple intervals (“features”) are characterized by ViennaNGS::Feature objects (bottom box). At the next level, ViennaNGS::FeatureChain bundles these, thereby maintaining individual annotation chains for e.g. UTRs, exons, introns, splice junctions, etc. (middle box). The topmost level is given by ViennaNGS::FeatureLine objects, representing individual transcripts.