| Literature DB >> 24929426 |
Jost Waldmann, Jan Gerken, Wolfgang Hankeln, Timmy Schweer, Frank Oliver Glöckner1.
Abstract
BACKGROUND: Advances in sequencing technologies challenge the efficient importing and validation of FASTA formatted sequence data which is still a prerequisite for most bioinformatic tools and pipelines. Comparative analysis of commonly used Bio*-frameworks (BioPerl, BioJava and Biopython) shows that their scalability and accuracy is hampered.Entities:
Mesh:
Year: 2014 PMID: 24929426 PMCID: PMC4094456 DOI: 10.1186/1756-0500-7-365
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Figure 1Comparison of validation performance of three bioinformatic frameworks and FastaValidator. Validation performance of three bioinformatic frameworks in comparison to FastaValidator with six different data sets. (A) all protein sequences of Escherichia coli K-12, (B) the complete genome of Escherichia coli K-12, (C) the protein sequences of the SWISSPROT database, (D) the metagenomic sequence set from the sampling site 1103283000001 of the Global Ocean Sampling Expedition, (E) the unaligned complete rRNA genes of the SILVA database and (F) the aligned sequences of the SILVA SSU reference database. Missing bars indicate that the corresponding test failed.