| Literature DB >> 23564938 |
Abstract
Since its first release in 2001 as mainly a software package for phylogenetic analysis, data analysis for molecular biology and evolution (DAMBE) has gained many new functions that may be classified into six categories: 1) sequence retrieval, editing, manipulation, and conversion among more than 20 standard sequence formats including MEGA, NEXUS, PHYLIP, GenBank, and the new NeXML format for interoperability, 2) motif characterization and discovery functions such as position weight matrix and Gibbs sampler, 3) descriptive genomic analysis tools with improved versions of codon adaptation index, effective number of codons, protein isoelectric point profiling, RNA and protein secondary structure prediction and calculation of minimum folding energy, and genomic skew plots with optimized window size, 4) molecular phylogenetics including sequence alignment, testing substitution saturation, distance-based, maximum parsimony, and maximum-likelihood methods for tree reconstructions, testing the molecular clock hypothesis with either a phylogeny or with relative-rate tests, dating gene duplication and speciation events, choosing the best-fit substitution models, and estimating rate heterogeneity over sites, 5) phylogeny-based comparative methods for continuous and discrete variables, and 6) graphic functions including secondary structure display, optimized skew plot, hydrophobicity plot, and many other plots of amino acid properties along a protein sequence, tree display and drawing by dragging nodes to each other, and visual searching of the maximum parsimony tree. DAMBE features a graphic, user-friendly, and intuitive interface and is freely available from http://dambe.bio.uottawa.ca (last accessed April 16, 2013).Entities:
Keywords: Gibbs sampler; bioinformatics; codon usage; dating; genomic analysis; hidden Markov model; motif discovery; phylogenetics; secondary structure
Mesh:
Substances:
Year: 2013 PMID: 23564938 PMCID: PMC3684854 DOI: 10.1093/molbev/mst064
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Categorization of DAMBE Functions.
| Sequence retrieval and format conversion | Read/write sequence file in standard formats (PHYLIP, NEXUS, PAML, FASTA, GenBank, Clustal, GCG, NeXML, Trace, etc.) |
| Sequence manipulation | Parsing GenBank files into CDS, exons, introns, exon/intron junctions, rRNA, tRNA, upstream, and downstream of sequence features; concatenate sequences from different files; editing sequences; extract first, second, or third codon sites |
| Motif characterization and discovery | Position weight matrix for characterizing and predicting sequence motifs; perceptron for two-group classification of sequence motifs; Gibbs sampler for characterizing and predicting novel/hidden sequence motifs, etc. |
| Secondary structure | RNA secondary structure prediction, and computation of MFE based on Vienna RNA library; hidden Markov model for predicting protein secondary structure based on training sequences |
| Sequence feature characterization | Codon adaptation index; effective number of codons; RSCU; protein isoelectric point; energetic cost of proteins; skew plots with optimized window size; Z curve |
| Sequence alignment | Global alignment nucleotide and AA sequences by ClustalW, mapping codon sequences to aligned AA sequences for all known genetic codes; local alignment by FASTA algorithms |
| Distances | Robust simultaneous distance estimation by likelihood and LS methods for F84, TN93, and GTR models; codon-based distances for all known genetic codes; patristic distance from input trees; R-F partition distance |
| Distance-based phylogenetic methods | NJ and FastME; statistical test of alternative topologies; use one matrix to evaluate multiple alternative trees, and multiple set of aligned sequences or distance matrices to compute a consensus tree |
| Parsimony method | Searching for MP tree and test for alternative topologies; dragging nodes to each other to find the MP tree |
| Likelihood method | fastDNAML but with estimated s/v ratio; ML analysis based on PAML codes; statistical tests of alternative topologies |
| Substitution models | Finding the best-fitting substitution model by using the likelihood ratio test and information theoretic indices (AIC, BIC); estimate rate heterogeneity over sites and proportion of invariant sites |
| Testing molecular clock | Likelihood-based relative rate test for nucleotide and codon sequences (for all known genetic codes); phylogeny-based test using ML and LS methods |
| Dating speciation or gene duplication events | Regular dating with internal node calibration, with single or multiple soft or hard calibration points, and tip dating frequently used for viruses sampled at different years. |
| Substitution saturation | Test for the presence of phylogenetic information, graphic substitution saturation plot |
| Comparative methods | Independent contrasts for continuous variables and character association for discrete/binary characters |
| Detecting recombination | Simplot; Boot-Scan; compatibility method |
| Other phylogenetic functions | A versatile tree-displaying panel for exporting high-quality trees for publication; tree-drawing by dragging nodes to each other; mapping nucleotide, AA and codon substitutions to tree branches |
| Other graphic functions | In silico 2D gel; plotting AA properties long sequences (e.g., hydrophobicity plot) |
FGibbs sampler in action. The yeast (Saccharomyces cerevisiae) intron sequences in the top panel represent the input to the Gibbs sampler. The bottom panel represents part of the output showing the identified motif (i.e., TAATAAC, bolded) shared among the sequences. Output from DAMBE5 also includes the PWM, the significance tests associated with PWM, and the PWM scores for individual motifs as a measure of motif strength, which is correlated with slicing efficiency. The input intron sequence file (YeastAllIntron.fas) is in DAMBE installation directory in FASTA format. Modified from Xia (2012b).
FSkew plots of the Bacillus subtilis genome at three different window sizes, with the skew curve colored in red having the optimal window size. The horizontal line is the global GC skew computed from the entire genome.
FHydrophobicity plot for human (NP_000530.1) and avian (Emberiza bruniceps: AFK10338) rhodopsin with seven transmembrane domains (peaks). The weak 7th peak is due to a relatively short α-helix. Output from DAMBE. A sliding window of 12 AAs is used.
FRegular dating with internal node calibration (top panel) and tip dating with sampling times (specified in OTU name following “@”) for calibration, based on the LS method in DAMBE.