| Literature DB >> 26597459 |
Linda Dib1,2,3, Xavier Meyer4,5,6, Panu Artimo7, Vassilios Ioannidis8, Heinz Stockinger9, Nicolas Salamin10,11.
Abstract
BACKGROUND: Available methods to simulate nucleotide or amino acid data typically use Markov models to simulate each position independently. These approaches are not appropriate to assess the performance of combinatorial and probabilistic methods that look for coevolving positions in nucleotide or amino acid sequences.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26597459 PMCID: PMC4657261 DOI: 10.1186/s12859-015-0785-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Simulation interface of the Coev-web platform. Part a shows the job submission page while part b gives an example of the results obtained for 15 pairs
Fig. 2Correlation between the number of lineages with double substitutions and the likelihood difference between Coev model and independent model for amino acid and nucleotide sequences. The X axis reflects the number of lineages with double substitutions. In both plots the same tree is used and it is composed of 100 leaves. The likelihood difference increases as X increases. The likelihood difference represented by ΔAIC shows that the Coev model is preferred to the independent model for amino acid and nucleotide sequences especially when X is big. (1.) The combinations used for nucleotide experiment are Adenine-Adenine (AA) and Thymine-Thymine (TT). (2.) The combinations used for the amino acid experiment are Alanine-Alanine (AA) and Threonine-Threonine (TT)
Fig. 3Simulation. We present the simulation steps of two nucleotides pairs along a phylogenetic tree of 4 leafs. In red we highlight the nucleotide changes. (1.) We randomly pick a state at the root. (2.) We assign internal node states using the transition probability matrix P(t)=e where Q is the Coev instantaneous rate matrix and t is the branch length [7]. (3.) The simulated pairs are the pairs assigned to the leafs of the phylogenetic tree
Fig. 4Simulation plots. We plot the proportion of combinations simulated, that belong to the profile {AA, CC}, against d/s ratio along different branch lengths (a for 0.1; b for 0.5; c for 1; d for 5). Each box plot is obtained by varying the r 1 and r 2 rates within the range [1,100] and by randomly picking an ancestral state from the frequency vector issued from the matrix Q