| Literature DB >> 14706121 |
Andrew Butterfield1, Vivek Vedagiri, Edward Lang, Cath Lawrence, Matthew J Wakefield, Alexander Isaev, Gavin A Huttley.
Abstract
BACKGROUND: Examining the distribution of variation has proven an extremely profitable technique in the effort to identify sequences of biological significance. Most approaches in the field, however, evaluate only the conserved portions of sequences - ignoring the biological significance of sequence differences. A suite of sophisticated likelihood based statistical models from the field of molecular evolution provides the basis for extracting the information from the full distribution of sequence variation. The number of different problems to which phylogeny-based maximum likelihood calculations can be applied is extensive. Available software packages that can perform likelihood calculations suffer from a lack of flexibility and scalability, or employ error-prone approaches to model parameterisation.Entities:
Mesh:
Year: 2004 PMID: 14706121 PMCID: PMC317364 DOI: 10.1186/1471-2105-5-1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Relationship among the PyEvolve components The boxes represent the objects (classes) that make up PyEvolve. The filled diamonds, as specified by the Unified Modelling language specification, indicate the object at which the diamonds end has the connected objects as essential components.
Figure 2The 3-level parallelisation stack for a model Model – the null or alternative hypothesis parameterisations; SA – the simulated annealing parallelisation level; LF – the likelihood function parallelisation level; CPU – actual hardware cpu's; level ID – an identifier relative to the virtual cpu's at the specified level; subgroup ID – an identifier relative to the processors within a specified virtual processor at the specified level; actual ID – the standard MPI identifier for each real cpu.
Figure 3Python code required to define a novel model of substitution (a) A function to specify a new substitution model rule. This function identifies dinucleotides that differ from each other by mutation of a methylated C. (b) The step to define a dinucleotide substitution model with terms for both transition and methylation induced changes.
PyEvolve benchmarking. Time taken was estimated as time for optimisation. Number of runs per condition ranged from 1 to 5. 1Model – See text for details of the codon and dinuc substitution models; 2Levels – indicates whether Simulated Annealing (SA), Likelihood Function (LF) or BOTH parallelisation levels were used; 3Parallel degree refers to the number of virtual cpu's at the 4SA or LF levels (for the LF level, this is defined per SA virtual cpu); 5the number of likelihood function evaluations made during the optimisation for 610 or 20 sequences, expressed in thousands. See text for details of the data and hardware used.
| Model 1 | Levels 2 | Total cpus | Parallel degree 3 | lfe (1000's) 5 | Total Time (minutes) | Time (seconds) per 1000 lfe | ||||
| SA 4 | LF 4 | 10 6 | 20 6 | 10 | 20 | 10 | 20 | |||
| codon | Serial | 1 | 1 | 1 | 56 | 121 | 124 | 269 | 133.06 | 133.65 |
| LF | 2 | 1 | 2 | 56 | 121 | 81 | 182 | 86.49 | 90.16 | |
| 4 | 1 | 4 | 56 | 122 | 55 | 130 | 58.69 | 63.78 | ||
| 8 | 1 | 8 | 57 | 122 | 41 | 100 | 43.99 | 49.02 | ||
| 16 | 1 | 16 | 57 | 120 | 35 | 82 | 36.52 | 41.02 | ||
| SA | 2 | 2 | 1 | 57 | 122 | 85 | 178 | 89.40 | 87.22 | |
| 4 | 4 | 1 | 57 | 121 | 58 | 121 | 60.19 | 60.15 | ||
| 8 | 8 | 1 | 57 | 122 | 44 | 99 | 46.10 | 48.83 | ||
| 16 | 16 | 1 | 58 | 122 | 38 | 88 | 39.48 | 43.16 | ||
| BOTH | 4 | 2 | 2 | 57 | 125 | 56 | 125 | 59.39 | 60.03 | |
| 8 | 2 | 4 | 58 | 122 | 40 | 89 | 41.57 | 43.74 | ||
| 8 | 4 | 2 | 57 | 121 | 39 | 85 | 40.92 | 42.37 | ||
| 16 | 2 | 8 | 56 | 121 | 30 | 69 | 32.28 | 34.47 | ||
| 16 | 4 | 4 | 58 | 121 | 28 | 63 | 29.31 | 31.27 | ||
| 16 | 8 | 2 | 57 | 121 | 31 | 71 | 32.70 | 35.30 | ||
| dinuc | Serial | 1 | 1 | 1 | 54 | 119 | 17 | 37 | 19.22 | 18.47 |
| LF | 2 | 1 | 2 | 54 | 119 | 11 | 24 | 12.59 | 12.29 | |
| 4 | 1 | 4 | 54 | 119 | 7 | 16 | 7.80 | 7.82 | ||
| 8 | 1 | 8 | 54 | 117 | 5 | 11 | 5.30 | 5.55 | ||
| 16 | 1 | 16 | 55 | 119 | 4 | 9 | 4.04 | 4.41 | ||
| SA | 2 | 2 | 1 | 53 | 118 | 11 | 23 | 12.19 | 11.51 | |
| 4 | 4 | 1 | 54 | 118 | 7 | 15 | 8.32 | 7.77 | ||
| 8 | 8 | 1 | 54 | 118 | 5 | 12 | 5.91 | 5.89 | ||
| 16 | 16 | 1 | 53 | 118 | 4 | 10 | 4.73 | 4.86 | ||
| BOTH | 4 | 2 | 2 | 54 | 118 | 7 | 15 | 8.07 | 7.69 | |
| 8 | 2 | 4 | 54 | 118 | 5 | 10 | 5.14 | 5.04 | ||
| 8 | 4 | 2 | 54 | 117 | 5 | 10 | 5.61 | 5.28 | ||
| 16 | 2 | 8 | 54 | 118 | 3 | 7 | 3.76 | 3.74 | ||
| 16 | 4 | 4 | 54 | 119 | 3 | 7 | 3.76 | 3.57 | ||
| 16 | 8 | 2 | 54 | 119 | 4 | 8 | 4.13 | 4.10 | ||