| Literature DB >> 23331707 |
Abstract
BACKGROUND: Maximum Likelihood (ML)-based phylogenetic inference using Felsenstein's pruning algorithm is a standard method for estimating the evolutionary relationships amongst a set of species based on DNA sequence data, and is used in popular applications such as RAxML, PHYLIP, GARLI, BEAST, and MrBayes. The Phylogenetic Likelihood Function (PLF) and its associated scaling and normalization steps comprise the computational kernel for these tools. These computations are data intensive but contain fine grain parallelism that can be exploited by coprocessor architectures such as FPGAs and GPUs. A general purpose API called BEAGLE has recently been developed that includes optimized implementations of Felsenstein's pruning algorithm for various data parallel architectures. In this paper, we extend the BEAGLE API to a multiple Field Programmable Gate Array (FPGA)-based platform called the Convey HC-1.Entities:
Mesh:
Year: 2013 PMID: 23331707 PMCID: PMC3599256 DOI: 10.1186/1471-2105-14-25
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The HC-1 coprocessor board. Four application engines connect to eight memory controllers through a full crossbar.
Figure 2An arithmetic operation as a DFG.
Figure 3Pipeline circuit generated from DFG in Figure 2.
Figure 4Full data flow graph of PLF and tree likelihood calculation.
Xilinx IEEE-754 single-precision floating-point operator’s latency and slice register usage
| fadd | 3 | 139 | 12 | 547 |
| fmul | 3 | 87 | 8 | 361 |
| fdiv | 11 | 499 | 28 | 1377 |
| fcomp | 1 | 2 | 2 | 8 |
Figure 5Hardware architecture of PLF accelerator.
Figure 6Memory access FSM.
Figure 7Mapping between BEAGLE API and coprocessor communication.
Descriptions of BEAGLE API implementation
| 4x4 transition probability matrices for each node (initialize arrays tipL and tipR in Pseudocode 1) | |
| Copy an array of partials into an instance buffer (initialize arrays clL and clR in Pseudocode 1) | |
| Copy a state frequency array into an instance buffer (initialize array bs in Pseudocode 1) | |
| Set the vector of pattern weights for an instance (initialize array numSites in Pseudocode 1) | |
| Reset a cumulative scale buffer | |
| Calculate partials for all internal nodes (compute array clP in Pseudocode 1) | |
| Calculate log-likelihood of root node. (compute arrays lnScaler and scP in Pseudocode 1, calculate lnL in Pseudocode 1) |
Figure 8Memory allocation.
Performance results of our design
| 128 | 27 | 93 | 96 | 2 | 0.28 | 0.97 |
| 256 | 41 | 93 | 101 | 4 | 0.41 | 0.92 |
| 512 | 69 | 94 | 103 | 7 | 0.67 | 0.91 |
| 1024 | 133 | 99 | 106 | 13 | 1.25 | 0.93 |
| 2048 | 225 | 107 | 107 | 20 | 2.10 | 1.00 |
| 4096 | 462 | 130 | 115 | 30 | 4.02 | 1.13 |
| 8192 | 944 | 167 | 125 | 28 | 7.55 | 1.34 |
| 16384 | 1894 | 240 | 148 | 28 | 12.80 | 1.62 |
| 32768 | 3873 | 385 | 207 | 28 | 18.71 | 1.86 |
| 65536 | 7922 | 672 | 304 | 29 | 26.06 | 2.21 |
| 131072 | 15898 | 1247 | 415 | 38 | 38.31 | 3.00 |
| 262144 | 31774 | n/a | 764 | 40 | 41.59 | n/a |
| 524288 | 63696 | n/a | 1240 | 44 | 51.37 | n/a |
| 1048576 | 127957 | n/a | 2280 | 46 | 56.12 | n/a |
| 8192000 | 1028649 | n/a | 15750 | 49 | 65.31 | n/a |
Area results of our design
| LUT | 193,518 | 207,360 | 93% |
| FF | 200,833 | 207,360 | 97% |
| Slice | 51,629 | 51,840 | 99% |
| BRAM | 235 | 288 | 82% |
| DSP48E | 152 | 192 | 79% |