Literature DB >> 29669107

DAMBE7: New and Improved Tools for Data Analysis in Molecular Biology and Evolution.

Xuhua Xia1,2.   

Abstract

DAMBE is a comprehensive software package for genomic and phylogenetic data analysis on Windows, Linux, and Macintosh computers. New functions include imputing missing distances and phylogeny simultaneously (paving the way to build large phage and transposon trees), new bootstrapping/jackknifing methods for PhyPA (phylogenetics from pairwise alignments), and an improved function for fast and accurate estimation of the shape parameter of the gamma distribution for fitting rate heterogeneity over sites. Previous method corrects multiple hits for each site independently. DAMBE's new method uses all sites simultaneously for correction. DAMBE, featuring a user-friendly graphic interface, is freely available from http://dambe.bio.uottawa.ca (last accessed, April 17, 2018).

Entities:  

Mesh:

Year:  2018        PMID: 29669107      PMCID: PMC5967572          DOI: 10.1093/molbev/msy073

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


DAMBE is for descriptive and comparative sequence analysis (Xia 2013, 2017) featuring a graphic, user-friendly, and intuitive interface, and available free for Windows, Linux, and Macintosh computers at dambe.bio.uottawa.ca. DAMBE7 represents a major upgrade with many new functions including new sets of significance tests for position weight matrix and Gibbs sampler for de novo characterization of sequence motifs. I outline three functions most relevant to molecular evolution and phylogenetics. A supplemental file (Using_New_Functions.docx) is included in Supplementary Material online.

Imputing Missing Distance and Phylogeny Simultaneously

This function is implemented for building large trees of phages which often 1) are too diverged to build a multiple sequence alignment (MSA), and 2) do no share homologous genes/sites (e.g., S3 and S4 in fig. 1). This is also true for many transposons from which one cannot get a meaningful MSA, and researchers are limited to align the sequences against the consensus (Gallus et al. 2015). One can do pairwise alignment among most of the sequences and compute their distances, but some sequence pairs do not share homologous sites and need to have their distances imputed from those computable distances. This allows one to build trees and likely will revolutionize phage taxonomy which is not based on phylogeny.
. 1.

Illustration of distance imputation and estimation of the shape parameter in gamma distribution. (a) A sequence data set with S3 and S4 sharing no homologous sites to estimate distance. (b) Distance matrix with two shaded distance missing. (c) Tree reconstructed from the distance matrix in (b). (d) A case with nonunique solution for a missing distance between bonobo and chimpanzee. (e) Tree reconstructed from a multiple alignment with one site mapped to the leaves, together with one of several possible reconstruction of internal nodes. (f) Counting changes between neighboring nodes and correction for multiple hits. (g) Transitions and transversions at three sites illustrating independently estimated distance (DIE) and simultaneously estimated distance (DSE).

Illustration of distance imputation and estimation of the shape parameter in gamma distribution. (a) A sequence data set with S3 and S4 sharing no homologous sites to estimate distance. (b) Distance matrix with two shaded distance missing. (c) Tree reconstructed from the distance matrix in (b). (d) A case with nonunique solution for a missing distance between bonobo and chimpanzee. (e) Tree reconstructed from a multiple alignment with one site mapped to the leaves, together with one of several possible reconstruction of internal nodes. (f) Counting changes between neighboring nodes and correction for multiple hits. (g) Transitions and transversions at three sites illustrating independently estimated distance (DIE) and simultaneously estimated distance (DSE). This distance-imputation function is currently missing. MEGA (Kumar et al. 2016) does not impute missing distances, neither does PHYLIP’s DNADIST (Felsenstein 2014). Fitch and Kitsch programs can estimate missing distances if a user tree is provided. For a distance matrix with N missing distances (parameters), DAMBE searches the tree space and parameter space to find a tree with the N parameters that minimizes where D and E(D) are the observed and patristic distance, and m is typically 0, 1, or 2. Figure 1 is the phylogenetic tree reconstructed from the distance matrix in figure 1 with two shaded distances missing. For sequences such as that in figure 1, DAMBE will compute all computable distances and impute the missing distances. When bootstrapping/jackknifing is used, distance imputation and phylogeny inference are done for each resampled data. One may also have unaligned sequence data and use PhyPA (Xia 2016) to build phylogenetic trees and obtain bootstrap/jackknife support. There are cases where a unique solution cannot be obtained. For example, when a missing distance is for two sister taxa (e.g., bonobo and chimpanzee in fig. 1), we can find minimum RSS but the solution for missing Dij is not unique, with different values for missing Dij resulting in the same minimum RSS. The patristic distances Dp.bonobo.i and Dp.chimpanzee.i, where i stands for other species, do not change when x1 changes to x2 (fig. 1), so Dp.bonobo.i and Dp.chimpanzee.i will remain the same, and so does RSS in equation (1). DAMBE use the midpoint distance in such cases.

Bootstrap/Jackknife Support for PhyPA

For each pair of sequences, we can obtain a vector S of 10 N values (number of pairs with nucleotide i in one sequence and j in another). With 10 sequences and 45 pairwise comparisons, there are 45 S vectors from which we can compute the 45 pairwise distances. For bootstrapping/jackknifing, we simply resample each pair to generate an S vector and use the 45 S vectors to produce a new set of 45 pairwise distances from which a tree can be reconstructed. This function complements the function of phylogenetics with imputed missing distances.

An Improved Method for Estimating the Shape Parameter of Gamma Distribution

Substitution rate varies over sites and is particularly pronounced in protein-coding genes (Xia 1998). The method by Gu and Zhang (1997) uses the following probability density function (Johnson and Kotz 1969) to estimate α: where k, instead of being integers, is replaced by the estimated number of substitutions per site, and is mean k. The method’s accuracy depends on the accuracy of the estimated k which comes from a multiple alignment in two steps (fig. 1): 1) construct a phylogenetic tree from the aligned sequences and reconstruct ancestral sequences at internal nodes (fig. 1, showing one of several possible reconstructions for one site with nucleotides mapped to the leaves), and 2) perform pairwise comparisons between two nodes on each side of a branch to obtain observed number of substitutions per site, and apply correction for multiple hits to get k (fig. 1). DAMBE improves this estimation in two ways. First, it uses simultaneous estimation (SE). Take the K80 model for example. At each site, where D is K80 distance and κ is the transition/transversion ratio, not to confuse with k in equation (2) which is the estimated number of substitution for a site. Applying equations (3) and (4) to data from the three sites (fig. 1) independently will generate one inapplicable case for site 2 (under DIE in fig. 1, with IE for independent estimation). We can estimate all D and κ simultaneously by maximizing the following log-likelihood: where N is the number of sites, N and N and N are recorded number of transitional, transversional difference and no difference from pairwise comparisons along the tree between nodes on each side of each branch at site i. SE generates no inapplicable cases (DSE in fig. 1) and leads to the second improvement in using more realistic models such as F84 or TN93 instead of the K80 correction in GZ-gamma (Gu and Zhang 1997). SE distance is used in MEGA (Tamura et al. 2004) and DAMBE (Xia 2009) which includes MLCompositeF84 and MLCompositeTN93 for F84 and TN93 models, respectively, but has never been used in estimating the shape parameter.

Supplementary Material

Supplementary data are available at Molecular Biology and Evolution online. Click here for additional data file.
  9 in total

1.  Prospects for inferring very large phylogenies by using the neighbor-joining method.

Authors:  Koichiro Tamura; Masatoshi Nei; Sudhir Kumar
Journal:  Proc Natl Acad Sci U S A       Date:  2004-07-16       Impact factor: 11.205

2.  Information-theoretic indices and an approximate significance test for testing the molecular clock hypothesis with genetic distances.

Authors:  Xuhua Xia
Journal:  Mol Phylogenet Evol       Date:  2009-05-03       Impact factor: 4.286

3.  MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets.

Authors:  Sudhir Kumar; Glen Stecher; Koichiro Tamura
Journal:  Mol Biol Evol       Date:  2016-03-22       Impact factor: 16.240

4.  PhyPA: Phylogenetic method with pairwise sequence alignment outperforms likelihood methods in phylogenetics involving highly diverged sequences.

Authors:  Xuhua Xia
Journal:  Mol Phylogenet Evol       Date:  2016-07-01       Impact factor: 4.286

5.  A simple method for estimating the parameter of substitution rate variation among sites.

Authors:  X Gu; J Zhang
Journal:  Mol Biol Evol       Date:  1997-11       Impact factor: 16.240

6.  The rate heterogeneity of nonsynonymous substitutions in mammalian mitochondrial genes.

Authors:  X Xia
Journal:  Mol Biol Evol       Date:  1998-03       Impact factor: 16.240

7.  Evolutionary histories of transposable elements in the genome of the largest living marsupial carnivore, the Tasmanian devil.

Authors:  Susanne Gallus; Björn M Hallström; Vikas Kumar; William G Dodt; Axel Janke; Gerald G Schumann; Maria A Nilsson
Journal:  Mol Biol Evol       Date:  2015-01-28       Impact factor: 16.240

8.  DAMBE6: New Tools for Microbial Genomics, Phylogenetics, and Molecular Evolution.

Authors:  Xuhua Xia
Journal:  J Hered       Date:  2017-06-01       Impact factor: 2.645

9.  DAMBE5: a comprehensive software package for data analysis in molecular biology and evolution.

Authors:  Xuhua Xia
Journal:  Mol Biol Evol       Date:  2013-04-05       Impact factor: 16.240

  9 in total
  125 in total

1.  Repeated horizontal gene transfer of GALactose metabolism genes violates Dollo's law of irreversible loss.

Authors:  Max A B Haase; Jacek Kominek; Dana A Opulente; Xing-Xing Shen; Abigail L LaBella; Xiaofan Zhou; Jeremy DeVirgilio; Amanda Beth Hulfachor; Cletus P Kurtzman; Antonis Rokas; Chris Todd Hittinger
Journal:  Genetics       Date:  2021-02-09       Impact factor: 4.562

2.  Target-capture phylogenomics provide insights on gene and species tree discordances in Old World treefrogs (Anura: Rhacophoridae).

Authors:  Kin Onn Chan; Carl R Hutter; Perry L Wood; L Lee Grismer; Rafe M Brown
Journal:  Proc Biol Sci       Date:  2020-12-09       Impact factor: 5.349

3.  Positive Selection in the Chloroplastic ATP-Synthase β-Subunit and Its Relation to Virulence Factors.

Authors:  Joaquina Farias; Facundo M Giorello
Journal:  J Mol Evol       Date:  2020-10-08       Impact factor: 2.395

4.  The Effects of Ecological Traits on the Rate of Molecular Evolution in Ray-Finned Fishes: A Multivariable Approach.

Authors:  Jacqueline A May; Zeny Feng; Matthew G Orton; Sarah J Adamowicz
Journal:  J Mol Evol       Date:  2020-10-03       Impact factor: 2.395

5.  Eimeria spp. (Apicomplexa: Eimeriidae) from great horned owls, Bubo virginianus (Gmelin) (Aves: Strigiformes) from Arkansas and Oklahoma, USA, with novel molecular information on Eimeria bubonis Cawthorn & Stockdale, 1981.

Authors:  Chris T McAllister; John A Hnida; Ethan T Woodyard; Thomas G Rosser
Journal:  Syst Parasitol       Date:  2019-09-03       Impact factor: 1.431

6.  Molecular Evolution of DNA Topoisomerase III Beta (TOP3B) in Metazoa.

Authors:  Filipa Moreira; Miguel Arenas; Arnaldo Videira; Filipe Pereira
Journal:  J Mol Evol       Date:  2021-05-17       Impact factor: 2.395

7.  DNA barcoding and species delimitation of the Old World tooth-carps, family Aphaniidae Hoedeman, 1949 (Teleostei: Cyprinodontiformes).

Authors:  Hamid Reza Esmaeili; Azad Teimori; Fatah Zarei; Golnaz Sayyadzadeh
Journal:  PLoS One       Date:  2020-04-16       Impact factor: 3.240

8.  Dracunculiasis in a domestic dog in Brazil.

Authors:  Fernando Paiva; Príscila de Souza Piazzalunga; Felipe Bisaggio Pereira; Tarcilla Corrente Borghesan; Priscilla Soares; Luiz Eduardo Roland Tavares
Journal:  Parasitol Res       Date:  2021-02-24       Impact factor: 2.289

9.  APPLES: Scalable Distance-Based Phylogenetic Placement with or without Alignments.

Authors:  Metin Balaban; Shahab Sarmashghi; Siavash Mirarab
Journal:  Syst Biol       Date:  2020-05-01       Impact factor: 15.683

10.  Parasitic copepods Caligus lacustris (Copepoda: Caligidae) on the rainbow trout Oncorhynchus mykiss in cage aquaculture: morphology, population demography, and first insights into phylogenetic relationships.

Authors:  Aleksey Parshukov; Pavel Vlasenko; Evgeniy Simonov; Evgeny Ieshko; Tatyana Burdukovskaya; Larisa Anikieva; Elena Kashinskaya; Karl B Andree; Mikhail Solovyev
Journal:  Parasitol Res       Date:  2021-06-17       Impact factor: 2.289

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.