| Literature DB >> 22160766 |
Daniel A Dalquen1, Maria Anisimova, Gaston H Gonnet, Christophe Dessimoz.
Abstract
In computational evolutionary biology, verification and benchmarking is a challenging task because the evolutionary history of studied biological entities is usually not known. Computer programs for simulating sequence evolution in silico have shown to be viable test beds for the verification of newly developed methods and to compare different algorithms. However, current simulation packages tend to focus either on gene-level aspects of genome evolution such as character substitutions and insertions and deletions (indels) or on genome-level aspects such as genome rearrangement and speciation events. Here, we introduce Artificial Life Framework (ALF), which aims at simulating the entire range of evolutionary forces that act on genomes: nucleotide, codon, or amino acid substitution (under simple or mixture models), indels, GC-content amelioration, gene duplication, gene loss, gene fusion, gene fission, genome rearrangement, lateral gene transfer (LGT), or speciation. The other distinctive feature of ALF is its user-friendly yet powerful web interface. We illustrate the utility of ALF with two possible applications: 1) we reanalyze data from a study of selection after globin gene duplication and test the statistical significance of the original conclusions and 2) we demonstrate that LGT can dramatically decrease the accuracy of two well-established orthology inference methods. ALF is available as a stand-alone application or via a web interface at http://www.cbrg.ethz.ch/alf.Entities:
Mesh:
Year: 2011 PMID: 22160766 PMCID: PMC3341827 DOI: 10.1093/molbev/msr268
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
FOverview of the ALF simulation process. A root genome is evolved along a species tree. Events at the site, sequence and genome level are simulated iteratively.
ML Estimates of Model Parameters for the Globin Data Set and the “Globin-Like” Simulated Data
| Data Set | Model | Parameters Estimates | |
| Globin data set | M0 | – 2477.82 | |
| (real data) | |||
| M3 | – 2442.4 | ||
| MD | – 2435.65 | ||
| ALF simulation | M0 | ||
| (100 replicates) | M3 | ||
| MD | |||
Note.—ω: selective pressure (dN/dS ratio), p: proportions of ω classes. For real data, log likelihoods are shown. MD is the preferred model according to the likelihood ratio test, Akaike information criterion (AIC) and Bayesian information criterion (BIC).
FThe distribution of ML estimates for ω2 from simulation with ALF (a) for one run with sequence length matching the real data (144 codons; other data shown in supplementary fig. 1, Supplementary Material online), and (b) for sequences of 10,000 codons. Data simulated under MD with ω2 = 1, all other parameters are as in table 1.
FPrecision/recall of orthology predictions with different proportions of genes with a history of duplications and/or LGT. Each data point corresponds to the mean of five independent runs using the same parameters (with 95% confidence interval in both dimensions).