| Literature DB >> 30631441 |
Tyler Weirick1,2, Raphael Müller2,3, Shizuka Uchida1,2,4.
Abstract
Despite advances in bioinformatics, custom scripts remain a source of difficulty, slowing workflow development and hampering reproducibility. Here, we introduce Vectools, a command-line tool-suite to reduce reliance on custom scripts and improve reproducibility by offering a wide range of common easy-to-use functions for table and vector manipulation. Vectools also offers a number of vector related functions to speed up workflow development, such as simple machine learning and common statistics functions.Entities:
Keywords: bioinformatics; matrix; reproducibility; spreadsheet; vector; workflow
Mesh:
Year: 2018 PMID: 30631441 PMCID: PMC6281013 DOI: 10.12688/f1000research.16301.2
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Figure 1. Comparison of Vectools and Coreutils.
( A) Joining more than two files requires a single command using Vectools. The same operation using Coreutils requires a custom script. The information regarding file sizes is omitted as whole files are shown. ( B) Aggregating Gene Ontology terms by gene accession numbers with Vectools can be done with a simple command. The same operation using Coreutils requires a complex regular expression. Further, the regular expression does not work properly on MacOS. The information regarding file sizes is omitted as whole files are shown. ( C) Vectools also includes many operations unavailable in Coreutils, such as machine learning. Here, in five commands, we use supervised-learning for homology-independent prediction of enzyme function. Using Vectools we generated a support-vector machine model capable of predicting carbonic anhydrases with an estimated 99% accuracy and predict 15,018 of 1,223,287 uncharacterized proteins as potential carbonic anhydrases. The size and dimensions of files used in the machine learning examples are shown in the image as comments. Additionally, methods, input, and output data can be found in the archived data and analysis pipelines [7].