| Literature DB >> 26835189 |
Abstract
The amount of data used in phylogenetics has grown explosively in the recent years and many phylogenies are inferred with hundreds or even thousands of loci and many taxa. These modern phylogenomic studies often entail separate analyses of each of the loci in addition to multiple analyses of subsets of genes or concatenated sequences. Computationally efficient tools for handling and computing properties of thousands of single-locus or large concatenated alignments are needed. Here I present AMAS (Alignment Manipulation And Summary), a tool that can be used either as a stand-alone command-line utility or as a Python package. AMAS works on amino acid and nucleotide alignments and combines capabilities of sequence manipulation with a function that calculates basic statistics. The manipulation functions include conversions among popular formats, concatenation, extracting sites and splitting according to a pre-defined partitioning scheme, creation of replicate data sets, and removal of taxa. The statistics calculated include the number of taxa, alignment length, total count of matrix cells, overall number of undetermined characters, percent of missing data, AT and GC contents (for DNA alignments), count and proportion of variable sites, count and proportion of parsimony informative sites, and counts of all characters relevant for a nucleotide or amino acid alphabet. AMAS is particularly suitable for very large alignments with hundreds of taxa and thousands of loci. It is computationally efficient, utilizes parallel processing, and performs better at concatenation than other popular tools. AMAS is a Python 3 program that relies solely on Python's core modules and needs no additional dependencies. AMAS source code and manual can be downloaded from http://github.com/marekborowiec/AMAS/ under GNU General Public License.Entities:
Keywords: Alignment properties; Bioinformatics; Concatenation; Phylogenetics; Phylogenomics
Year: 2016 PMID: 26835189 PMCID: PMC4734057 DOI: 10.7717/peerj.1660
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Overview of functions available in AMAS, FASconCAT-G, and Phyutility.
| Function | AMAS | FASconCAT-G | Phyutility |
|---|---|---|---|
| Input formats | fasta phylip nexus | clustal fasta phylip nexus | fasta nexus |
| Concatenation | yes | yes | yes |
| Splitting or site extraction | yes | yes | yes (gaps only) |
| Summary statistics | yes | yes | no |
| Replicate alignments | yes | no | no |
| Taxon removal | yes | no | no |
| Translation | no | yes | no |
| RY coding | no | yes | no |
| Consensus sequences | no | yes | no |
| NCBI interactions | no | no | yes |
| Tree manipulations | no | no | yes |
Figure 1AMAS functionality.
(A) Concatenation of two FASTA files. (B) File format conversion from FASTA to PHYLIP. (C) Splitting of a concatenated alignment according to pre-defined partitions. (D) Removal of sequences by name. (E) Creation of randomized replicates from input alignments. All input files are preserved, here indicated in blue. Orange files represent the output. Command line examples are given along with each action.
Select statistics of benchmark alignments.
| Alignment name | Data type | Length | Total cells | Missing percent | Prop. variable | Prop. parsimony inf. |
|---|---|---|---|---|---|---|
|
| Amino acid | 3,001,657 | 57,031,483 | 46.42 | 0.53 | 0.28 |
|
| Amino acid | 1,313,129 | 189,090,576 | 64.43 | 0.76 | 0.59 |
| Nucleotide | 9,251,694 | 453,333,006 | 19.46 | 0.64 | 0.44 | |
| Nucleotide | 13,557,123 | 704,970,396 | 16.63 | 0.51 | 0.35 |
Figure 2Performance.
(A) Computing times for concatenation of the Johnson et al. (2013) data set composed of 5,214 separate alignments. FASconCAT-G was run in two modes: with and without simultaneous computation of alignment summaries (FASconCAT-G v1.02.pl -s and FASconCAT-G v1.02.pl -s -i , respectively). Phyutility was ran with java -jar phyutility.jar -concat -in ∗fas -out phyut j2013-test. (B) Computing times for AMAS writing alignment summaries on the four benchmark data sets (AMAS.py summary command).