| Literature DB >> 18558620 |
Abstract
MOTIVATION: A backtrace through a dynamic programming algorithm's intermediate results in search of an optimal path, or to sample paths according to an implied probability distribution, or as the second stage of a forward-backward algorithm, is a task of fundamental importance in computational biology. When there is insufficient space to store all intermediate results in high-speed memory (e.g. cache) existing approaches store selected stages of the computation, and recompute missing values from these checkpoints on an as-needed basis.Entities:
Mesh:
Year: 2008 PMID: 18558620 PMCID: PMC2668612 DOI: 10.1093/bioinformatics/btn308
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
The number of stages and run time for both the algorithm of Wheeler and Hughey (2000) and the optimal checkpointing algorithm
| Alg. L | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | |
| 1/1 | 1/1 | 1/1 | 1/1 | 1/1 | 1/1 | 1/1 | 1/1 | 1/1 | 1/1 | 1/1 | 1/1 | 1/1 | 1/1 | |
| 2 | 2/2 | 4/3 | 7/4 | 11/5 | 16/6 | 22/7 | 29/8 | 3/2 | 6/4 | 12/6 | 20/8 | 30/10 | 42/12 | 56/14 |
| 3 | 3/3 | 9/6 | 21/10 | 41/15 | 71/21 | 113/28 | 169/36 | 3/3 | 13/8 | 34/15 | 70/24 | 125/35 | 203/48 | 308/63 |
| 4 | 4/4 | 16/10 | 46/20 | 106/35 | 211/56 | 379/84 | 631/120 | 4/4 | 22/13 | 70/29 | 170/54 | 350/90 | 644/139 | 1092/203 |
| 5 | 5/5 | 25/15 | 85/35 | 225/70 | 505/126 | 1009/210 | 1849/330 | 5/5 | 33/19 | 123/49 | 343/104 | 798/195 | 1638/335 | 3066/539 |
| 6 | 6/6 | 36/21 | 141/56 | 421/126 | 1051/252 | 2311/462 | 4621/792 | 6/6 | 46/26 | 196/76 | 616/181 | 1596/377 | 3612/713 | 7392/1253 |
| 7 | 7/7 | 49/28 | 217/84 | 721/210 | 1981/462 | 4753/924 | 10 297/1716 | 7/7 | 61/34 | 292/111 | 1020/293 | 2910/671 | 7194/1385 | 15 972/2639 |
For the L-level backtracking algorithm of Wheeler and Hughey (2000) with memory suitable for storage of M stages, the left side of this table shows both NWH(M,L), the number of stages that can be produced in reverse order, and TWH(M,L), the number of stage computations required for that backtrace. [See Equations(1) and(5).] For instance, to perform a backtrace on N=36 stages with M=3 memory locations requires the (L=7)-level algorithm and requires T=169 stage computations. It is not straightforward to predict the number of stage computations for other values of N (Fig. 1).
For the optimal checkpointing algorithm presented here, the right side of this table shows both Nopt(M,L), a number of stages that can be produced in reverse order, and T(M,Nopt(M,L)), the number of stage computations required for that backtrace. [See Equations(7) and(11).] When the number of stages is between Nopt(M,L) and Nopt(M,L+1), the optimal number of stage computations Topt(M,N) is computed via linear interpolation. For instance, to do the backtrace on N=36 stages with M=3 memory locations, we observe that N falls between Nopt(M=3,L=5)=35 and Nopt(M=3,L=6)=48. Thus, the algorithm requires T(M=3,N=36)=T(M=3,N=35)+6(36−35)=131 stage computations. Thus, in this case, the number of stage computations for the L-level algorithm of Wheeler and Hughey (2000) is 29% higher.
Fig. 1.Low memory comparison of algorithms. This figure exhibits the effect on the run time at low memory levels. Clockwise from the top, the curves come in 12 pairs, one each for M=2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 15 and 20 memory locations. Within each pair, the curve for the L-level algorithm of (Wheeler and Hughey, 2000) is first; as M increases these curves become increasingly ‘fractal’, with jumps in the run time at several scales. The curve for the optimal checkpointing algorithm is second in each pair; these curves are piecewise linear.
Fig. 2.The optimal checkpointing algorithm in pseudo-C++, for a backtrace through N stages using memory sufficient for M stages. Using Equation (7), find the level L=max{L : Nopt(M,L)≤N}. For the convention that the memory locations are labeled 0, …, M−1 and the stages are labeled 0, …, N−1, invoke backtrace(−1, M, −1, N, L, Nopt(M,L), advance, available, p); where advance is a pointer to a callback function that computes stage N, to be stored in memory location Mto, from the immediately preceding stage, which is stored in memory location Mfrom unless Nto is the first stage; where available is a pointer to a callback function invoked during backtrace so that the user can make use of stage N, stored in memory location M; and where p is a user-supplied pointer to applicable stage-independent information. BIGINT should be an integer type able to handle integers a little larger than N M2. Note that, although the backtrace routine directs the callback routines on the use of the memory locations, the actual allocation and access of the memory is not handled by the backtrace routine. Further, note that if the generality is not required, the pointer parameters, advance, available and p, can be eliminated, and their use in the body of the function can be replaced by ‘hard-wired’ calls to appropriate functions. See the Supplementary Materials for C++source code.