| Literature DB >> 27127540 |
Franco Milicchio1, Rebecca Rose2, Jiang Bian3, Jae Min4, Mattia Prosperi4.
Abstract
BACKGROUND: High-throughput or next-generation sequencing (NGS) technologies have become an established and affordable experimental framework in biological and medical sciences for all basic and translational research. Processing and analyzing NGS data is challenging. NGS data are big, heterogeneous, sparse, and error prone. Although a plethora of tools for NGS data analysis has emerged in the past decade, (i) software development is still lagging behind data generation capabilities, and (ii) there is a 'cultural' gap between the end user and the developer. TEXT: Generic software template libraries specifically developed for NGS can help in dealing with the former problem, whilst coupling template libraries with visual programming may help with the latter. Here we scrutinize the state-of-the-art low-level software libraries implemented specifically for NGS and graphical tools for NGS analytics. An ideal developing environment for NGS should be modular (with a native library interface), scalable in computational methods (i.e. serial, multithread, distributed), transparent (platform-independent), interoperable (with external software interface), and usable (via an intuitive graphical user interface). These characteristics should facilitate both the run of standardized NGS pipelines and the development of new workflows based on technological advancements or users' needs. We discuss in detail the potential of a computational framework blending generic template programming and visual programming that addresses all of the current limitations.Entities:
Keywords: Big data; Generic programming; Graphical user interface; High-throughput sequencing; Next-generation sequencing; Software suite; Template library; Visual programming
Year: 2016 PMID: 27127540 PMCID: PMC4848821 DOI: 10.1186/s13040-016-0095-3
Source DB: PubMed Journal: BioData Min ISSN: 1756-0381 Impact factor: 2.522
Summary of programming libraries/toolkits for analysis of (next-generation) sequencing data
| Library Name | Release | Programming | License | Website | Features |
|---|---|---|---|---|---|
| EMBOSS [ | 2000 | C | GNU GPL |
| Sequence alignment; rapid database search; protein motif identification; nucleotide sequence pattern analysis; codon usage analysis for small genomes; rapid identification of sequence patterns in large scale sequence sets; presentation tools for publication. |
| BTL [ | 2001 | C++ | GNU GPL |
| Data structures (e.g. graphs); nucleotide string methods (e.g. Fourier transform, Needleman-Wunsch alignment). |
| Bioperl [ | 2002 | Perl | Artistic License |
| Access sequence data from local/remote data bases; manage data base formats; data base search; manipulating sequences/sequence alignments; gene annotations. |
| Bioconductor [ | 2003 | R | Artistic |
| Repository of multiple libraries for analysis and comprehension of genomic and –omics data, including NGS. |
| BioPHP | 2003 | PHP | GNU GPL |
| DNA and protein sequence analysis, sequence alignment. |
| GenomeTools [ | 2003 | C | Open BSD |
| Parsing, compression, k-mer, suffix trees, annotation, error correction and other sequence analytics (FASTA, FASTQ) |
| Pizza&Chili [ | 2005 | C/C++ | GNU Lesser GPL |
| Compressed indices, text collections |
| Bio++[ | 2006 | C++ | CeCILL GPL |
| Sequence analysis, phylogenetics, molecular evolution; population genetics. |
| Biojava [ | 2008 | Java | GNU Lesser GPL |
| Manipulate biological sequences; file parse; DAS client/server support; access to BioSQL/Ensembl data bases; tools for making sequence analysis GUIs; statistical routines; dynamic programming toolkit. |
| SeqAn [ | 2008 | C++ | BSD 3-clause |
| Extensive set of algorithms and data structures for the analysis of nucleotide sequences, with emphasis on NGS data; includes index, compression, data base search, support for NGS-specific file formats (fastq, SAM/BAM, VCF, BED). |
| Biopython [ | 2009 | Python, C | Biopython |
| Sequence input/output; alignment input/output; population genetics; structural bioinformatics; SQL interface. |
| htslib | 2009 | C | MIT Expat |
| Read, write, edit, index, view SAM/BAM/CRAM formats; read, write BCF2/VCF/gVCF files; call, filter, summarize SNP/short indels. |
| BioRuby [ | 2010 | Ruby | GNU GPL |
| DNA and protein sequence analysis, sequence alignment, biological database parsing, ontology, structural biology. |
| BAMTools [ | 2011 | C++ | MIT |
| Read, write, manipulate BAM formats |
| libStatGen [ | 2011 | C++ | GNU GPL |
| Handle SAM/BAM, fastq, GLF, VCF, ASP. |
| NGS++ [ | 2013 | C++ | GNU Lesser GPL |
| Read, write, manipulate multiple genomic file formats and data associated with BED type files (epigenomics). |
| Bioclojure [ | 2014 | Clojure | GNU Lesser GPL |
| Parse of Genbank, Uniprot XML, fasta, fastq formats; wrappers for BLAST, signalP, TMHMM; index files for random access, lazy processing of sequences from very large files. |
Summary of all-purpose software suites for analysis of next-generation sequencing data offered with a graphical user interface option
| Software Name | License | Free | Platform | Installation | Workflow Builder | Website |
|---|---|---|---|---|---|---|
| BaseSpace | Proprietary | No | Web-browser | Cloud | No |
|
| CLCBio | Proprietary | Trial | Web-browser | Server | Yes |
|
| DNASTAR | Proprietary | Trial | MS Windows | Localhost | No |
|
| Galaxy | GNU GPL | Yes | Web-browser | Localhost | Yes |
|
| Geneious | Proprietary | Trial | MS Windows | Localhost | No |
|
| Globus Genomics | Apache | Yes/No | Web-browser | Cloud | Yes |
|
| Golden Helix | Proprietary | Trial | MS Windows | Localhost | No |
|
| Partek | Proprietary | Trial | MS Windows | Localhost | No |
|
| PATRIC | GNU GPL | Yes | Web-browser | Cloud | No |
|
| Sequencher | Proprietary | Trial | MS Windows | Localhost | No |
|
| SevenBridges | GNU GPL (Rabix) | Trial | Web-browser | Cloud | Yes |
|
| SoftGenetics | Proprietary | Trial | MS Windows | Localhost | No |
|
| UGENE | GNU GPL | Yes | MS Windows | Localhost | Yes |
|
| Vector NTI | Proprietary | Trial | MS Windows | Localhost | No |
|
Fig. 1Example of a pipeline for single nucleotide variant calling from fastq files, using Galaxy’s (top) and UGENE’s (bottom) workflow builders
Fig. 2Physiognomy of visual programming for development of tools for next-generation sequencing data analytics
Fig. 3The conceptual visual programming (VP) framework for developing next-generation sequencing data analytics tools