| Literature DB >> 29499649 |
Yözen Hernández1,2, Rocky Bernstein1, Pedro Pagan1, Levy Vargas1, William McCaig1, Girish Ramrattan1, Saymon Akther3, Amanda Larracuente1, Lia Di1, Filipe G Vieira4, Wei-Gang Qiu5,6,7.
Abstract
BACKGROUND: Automated bioinformatics workflows are more robust, easier to maintain, and results more reproducible when built with command-line utilities than with custom-coded scripts. Command-line utilities further benefit by relieving bioinformatics developers to learn the use of, or to interact directly with, biological software libraries. There is however a lack of command-line utilities that leverage popular Open Source biological software toolkits such as BioPerl ( http://bioperl.org ) to make many of the well-designed, robust, and routinely used biological classes available for a wider base of end users.Entities:
Keywords: BioPerl; FASTA sequences; NEWICK tree; Sequence alignments; UNIX utilities
Mesh:
Year: 2018 PMID: 29499649 PMCID: PMC5833151 DOI: 10.1186/s12859-018-2074-9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Four command-line utilities in BpWrapper. a bioseq reads sequences (in FASTA format as the default) as inputs, renders them into Bio::Seq objects in BioPerl (blue), and generates sequence statistics (green) or a modified FASTA file (purple). b bioaln reads a sequence alignment (in CLUSTALW format as the default) as input, renders them into a Bio::SimpleAlign object in BioPerl, and generates alignment statistics or a modified alignment. c biopop reads allelic sequences (in FASTA format as the default) as inputs, renders them into Bio::PopGen objects, and generates SNP (single-nucleotide polymorphism) statistics. d biotree reads a phylogenetic tree (in NEWICK format as the default) as inputs, renders it into a Bio::Tree::Tree object, and generates tree statistics or a modified tree. Note that since the Bio::PopGen class in BioPerl inherits the Bio::SimpleAlign class, which in turn inherits the Bio::Seq class, options in bioseq are applicable to alignments as well as to allelic sequences and options in bioaln are applicable to allelic sequence alignments. Documentation of these utilities are self-contained through the Perl POD mechanism and viewable on the command line through the “perldoc” command or the “--help”, “-h”, or “--man” options. A reference card of all options and their usage is provided in the Additional file 1
A selection of options and their usage
| Utility | Option | Usage | Example |
|---|---|---|---|
| bioseq | --length, -l | Print lengths of sequences | bioseq –l foo.fasta |
| --num-seq, -n | Print number of sequences | bioseq –n foo.fasta | |
| --composition, -c | Print base/residue composition | bioseq –c foo.fasta | |
| --revcom, -r | Reverse & complement | bioseq –r foo.fasta | |
| --pick, -p | Pick sequences by identifiers | bioseq –p ‘id:B31,N40’ foo.fasta | |
| Pick sequences by order | bioseq –p ‘order:1-3’ foo.fasta | ||
| Pick sequences by pattern | bioseq –p ‘re:B31’ foo.fasta | ||
| --delete, -d | Delete sequences by identifiers | bioseq –d ‘id:B31,N40’ foo.fasta | |
| Delete sequences by order | bioseq –d ‘order:1-3’ foo.fasta | ||
| Delete sequences by pattern | bioseq –d ‘re:B31’ foo.fasta | ||
| --subseq, -s | Get a sub-sequence | bioseq –s ‘10,20’ foo.fasta | |
| --translate, -t | Translate in the 1st reading frame | bioseq –t1 foo.fasta | |
| Translate in three reading frames | bioseq –t3 foo.fasta | ||
| Translate in all six reading frames | bioseq –t6 foo.fasta | ||
| --input, -i | Read a GenBank file | bioseq –i ‘genbank’ foo.gb | |
| --restrict | Print fragments by a restriction digest | bioseq –-restrict ‘EcoRI’ foo.fasta | |
| bioaln | --length, -l | Print alignment length | bioaln –l foo.aln |
| --num-seq, -n | Print number of sequences | bioaln –n foo.aln | |
| --avg-pid, -a | Print average percent identify | bioaln –a foo.aln | |
| --pick, -p | Pick sequences by identifiers | bioaln –p ‘id1, id2’ foo.aln | |
| --delete, -d | Delete sequences by identifiers | bioaln –d ‘id1, id2’ foo.aln | |
| --slice, -s | Slice an alignment | bioaln –s ‘10,20’ foo.aln | |
| Slice to the end | bioaln –s ‘20,-’ foo.aln | ||
| Slice from the start | bioaln –s ‘-,20’ foo.aln | ||
| --input, -i | Read a FASTA alignment | bioaln –I ‘fasta’ foo.fasta | |
| --output, -o | Write a PHYLIP alignment | bioaln –o ‘phylip’ foo.aln | |
| --concat, -A | Concatenate alignments | bioaln –A *.aln > concat.aln | |
| --pep2dna, -P | Generate a codon-based alignment | bioaln –P ‘cds.fas’ pep.aln > codon.aln | |
| biopop | --segsites, -s | Print number of segregating sites | biopop –s pop.fasta |
| --pi, -p | Print average nucleotide differences | biopop –p pop.fasta | |
| --mis-match, -m | Obtain pair-wise mismatch distribution | biopop –m pop.fasta | |
| --snp-coding, -c | Print coding SNP statistics | biopop –c pop.fasta | |
| --stats, -t | Print population statistics | biopop –t ‘pi,theta’ pop.fasta | |
| biotree | --length, -l | Print total tree length | biotree –l foo.newick |
| --mid-point, -m | Re-root at mid-point | biotree –m foo.newick | |
| --del-otus, -d | Delete OTUs by identifies | biotree –d ‘id1,id2’ foo.newick | |
| --subset, -s | Obtain a sub-tree of specified OTUs | biotree –s ‘id1,id2,id3,id4’ foo.newick | |
| Obtain a sub-tree from an internal node | biotree –s ‘node1’ foo.newick | ||
| --reroot, -r | Re-root with a outgroup | biotree –r ‘otu1’ foo.newick | |
| --del-low-boot, -D | Delete low-support branches | biotree –D ‘75’ foo.newick | |
| --dist-all | Print pair-wise OTU distances | biotree –-dist-all foo.newick | |
| --as-text, -t | Preview tree in ASCII | biotree –t foo.newick |