| Literature DB >> 28460062 |
Haim Ashkenazy1, Eli Levy Karin1,2, Zach Mertens3, Reed A Cartwright3,4, Tal Pupko1.
Abstract
Many analyses for the detection of biological phenomena rely on a multiple sequence alignment as input. The results of such analyses are often further studied through parametric bootstrap procedures, using sequence simulators. One of the problems with conducting such simulation studies is that users currently have no means to decide which insertion and deletion (indel) parameters to choose, so that the resulting sequences mimic biological data. Here, we present SpartaABC, a web server that aims to solve this issue. SpartaABC implements an approximate-Bayesian-computation rejection algorithm to infer indel parameters from sequence data. It does so by extracting summary statistics from the input. It then performs numerous sequence simulations under randomly sampled indel parameters. By computing a distance between the summary statistics extracted from the input and each simulation, SpartaABC retains only parameters behind simulations close to the real data. As output, SpartaABC provides point estimates and approximate posterior distributions of the indel parameters. In addition, SpartaABC allows simulating sequences with the inferred indel parameters. To this end, the sequence simulators, Dawg 2.0 and INDELible were integrated. Using SpartaABC we demonstrate the differences in indel dynamics among three protein-coding genes across mammalian orthologs. SpartaABC is freely available for use at http://spartaabc.tau.ac.il/webserver.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28460062 PMCID: PMC5570005 DOI: 10.1093/nar/gkx322
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.An illustration of the computational stages performed by the SpartaABC web server.
Figure 2.SpartaABC analyses of three genes involved in human diseases across mammalian orthologs. The point estimates of each of the indel parameters are presented above the approximated posterior distribution plots. IR: indel to substitution rate ratio; A: the shape parameter controlling the power–law distribution describing indel lengths; RL: root length.