| Literature DB >> 16188037 |
Andy Pang1, Andrew D Smith, Paulo A S Nuin, Elisabeth R M Tillier.
Abstract
BACKGROUND: General protein evolution models help determine the baseline expectations for the evolution of sequences, and they have been extensively useful in sequence analysis and for the computer simulation of artificial sequence data sets.Entities:
Mesh:
Year: 2005 PMID: 16188037 PMCID: PMC1261159 DOI: 10.1186/1471-2105-6-236
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The GCG distribution of indel length is determined by the evolutionary distance for a given evolutionary scale factor c. The expected frequency of indels of given lengths are plotted. In a. the distribution is shown for different evolutionary distances (as labelled next to the corresponding lines). In b. the evolutionary distance is fixed and the GCG length distribution is plotted for different evolutionary scale factor values (as labelled next to the corresponding lines).
Figure 2Comparison of the GQG distribution with the data obtained from the study [7] for protein sequences with less than 100 PAM sequence divergence. The parameters of the GQG disribution are set to the default c = 3 and t = PAM 50. These values were chosen simply because they seemed reasonable, not to maximize the fit of the curve to the data. The striking fit indicates that our scaling of the QG distribution is appropriate to model indels at lower levels of sequence divergence.
Figure 3The indel probability of the GCG distribution is determined by the indel rate z (x-axis) and the evolutionary scale factor c (labelled next to the corresponding line). This probability can be set by the user to influence the number of indels present in the final alignment.