| Literature DB >> 34037688 |
Tony Robinson1, Jim Harkin1, Priyank Shukla2.
Abstract
The significant decline in the cost of genome sequencing has dramatically changed the typical bioinformatics pipeline for analysing sequencing data. Where traditionally, the computational challenge of sequencing is now secondary to genomic data analysis. Short read alignment (SRA) is a ubiquitous process within every modern bioinformatics pipeline in the field of genomics and is often regarded as the principal computational bottleneck. Many hardware and software approaches have been provided to solve the challenge of acceleration. However, previous attempts to increase throughput using many-core processing strategies have enjoyed limited success, mainly due to a dependence on global memory for each computational block. The limited scalability and high energy costs of many-core SRA implementations pose a significant constraint in maintaining acceleration. The Networks-On-Chip (NoC) hardware interconnect mechanism has advanced the scalability of many-core computing systems and, more recently, has demonstrated potential in SRA implementations by integrating multiple computational blocks such as pre-alignment filtering and sequence alignment efficiently, while minimising memory latency and global memory access. This paper provides a state of the art review on current hardware acceleration strategies for genomic data analysis, and it establishes the challenges and opportunities of utilising NoCs as a critical building block in next-generation sequencing (NGS) technologies for advancing the speed of analysis.Entities:
Year: 2021 PMID: 34037688 PMCID: PMC8317111 DOI: 10.1093/bioinformatics/btab017
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Typical variant calling bioinformatics pipeline composed of steps following NGS sequencing leading to the visualization of data is presented in the middle panel. The variant calling bioinformatics pipeline is contained within the data pre-processing and data analysis stages of a much larger bioinformatics-based research study as illustrated in the upper panel. Data file formats at each step are presented in the lower panel (Al Kawam ; Lightbody ). *Platform-specific raw sequence output either .BAM or .FASTQ or .HDF5 (NCBI, 2019).
Fig. 2.(A) Global and (B) local alignment of two sequences and , with scoring schema . Optimal alignment scores are highlighted in bold font and paths for tracing back the optimal alignments are highlighted in bold arrows. Note the gaps in (A) appearing outside the matched regions, leading to global alignments. No gaps appear in (B) outside the matched region leading to a local alignment.
CPU based alignment algorithms and their critical performance metrics.
| Algorithm | Performance features | Basic features | References | ||||
|---|---|---|---|---|---|---|---|
| Speed (reads/sec) | Reads aligned (%) | Memory footprint (GB) | Min read length (bp) | Max read length (bp) | Compression method | ||
|
| 185 | 95.0 | 3.8 | 11 | 5000000 | – |
|
|
| 5556 | 79.9 | 5.0 | 4 | 1024 | FM-index |
|
|
| 2083 | 99.2 | 5.1 | 4 | 5000000 | FM-index |
|
|
| 1282 | 92.8 | 7.6 | 4 | 200 | BWT |
|
|
| 51 | 97.4 | 1.0 | 28 | 63 | Hash table |
|
|
| 37000 | 94.0 | 1.2 | – | – | Hash table |
|
|
| 4167 | 79.9 | 5.3 | 27 | 1000 | BWT |
|
|
| 2083 | 94.0 | 2.3 | – | >1000 | Suffix arrays |
|
Note: Performance features, where unless stated otherwise, are based upon the alignment of 1 million, 100 bp, single-end reads with the human genome (Homo sapiens, assembly GRCh37) on a single-core CPU, with 32 GB of RAM. Speed (reads/sec) is the number of reads aligned to the reference genome per second. Reads aligned (%) is the percentage of reads aligned to the reference genome. Memory footprint is the quoted operational peak memory usage (GB) per processing core. Min and Max read length (bp) are the reported read lengths that can be aligned. The compression method is the algorithm used by the aligner for reference genome compression. The information which is not obtainable is denoted as (–). Please refer to the respective article(s) mentioned in the table for further details.
*MAQ performance features are based upon mapping of 100 million, 35 bp, paired-end reads. Computing hardware specifications are unavailable.
†SNAP performance features are based upon mapping of 100 million, 125 bp, single-end reads. SNAP benchmarking, as reported by Zaharia is based on a 256 GB RAM computing system.
‡STAR performance features are based upon mapping 10 million, 76 bp, paired-end reads. STAR benchmarking, as reported by Dobin is based on a 148 GB RAM computing system.
Fig. 3.NoC mesh and ring topology adapted from Das and Ghosal (2018). PE, processing element; NI, network interface; R, switching router.
Comparison summary of four different hardware accelerators for sequence alignment.
| Features |
|
|
|
|
|---|---|---|---|---|
| Speed (reads/sec) | 483k | – | 23k | ∼10k |
| Max read length (bp) | 1024 | – | 10k | 128 |
| Data structure | FM-index | – | – | – |
| Hardware accelerator processor | ReRam (specialist) | Xilinx Virtex-7 XC7VX485T FPGA | Xilinx Kintex-7 FPGA | Xilinx Virtex-7 XC7VX690T FPGA |
| Operating frequency (MHz) | 100 | 200 | 250 | 250 |
| Processing elements (PE) per array | – | 512 | 64 | 256 |
| GCUPS | – | 105.9 | – | 609.6 |
| Data bus | – | – | NoC interconnect | Crossbar |
| External memory (DRAM) | No external memory dependence | 3 x 8GB DDR3-1600 | 4 x 32GB LPDDR4 | – |
| Host CPU | – | Intel i5 | Intel Xeon E5-26200 | IBM power8 |
| Host memory (GB) (DDR3 RAM) | – | 8 | 64 | – |
| Host interface | – | SFP+ Optical interface | ×16 PCIe 2.0 | CAPI interface |
| Search space reduction | – | – | D-SOFT | – |
| Edit distance function | Hamming | Levenshtein | – | Levenshtein |
| Gap penalty model | – | Affine | Affine | Constant |
| Edit distance implementation | Process-In-Memory (PIM) | Sequential logic | Sequential logic | Sequential logic |
| Power consumption (W) | 1.9 | 44 | 15 | 6.9 |
Note: Speed is quoted in reads per second for simulated reads. Maximum read length (bp) is the reported maximum read length that can be aligned. Data structure corresponds to the compression mode utilized. Hardware accelerator processer is the main accelerator device used. Operating frequency (MHz) is the clock frequency of the accelerator hardware. Processing elements (PE) is the number of computational cells per dynamic programming (DP) matrix/array. GCUPS (Giga Cell Updates Per Second) is a performance measure of the number of processing element cell updates per second for a single array cell. Data bus is the interconnection strategy used. External memory (GB) corresponds to the available DDR3 RAM required to support accelerator operation. Host CPU is the CPU of interface computer to the accelerator. Host memory (GB) is the memory capacity of the host computer which the accelerator can draw upon. Host interface is the communication interconnect between host and accelerator. Search space reduction corresponds to the search space reduction strategy used in the pre-alignment filtering stage. Edit distance function corresponds to the specific edit distance calculation method used. Gap penalty model corresponds to the specific gap (insertion or deletion) penalty method used for each implementation. Edit distance implementation is the mode in which each accelerator computes the edit distance function to determine optimum alignment. Power consumption (W) is the power consumed by the accelerator during alignment. The information which is not obtainable is denoted as (–). Please refer to the respective article(s) mentioned in the table for further details.
* AligneR computing speed is based upon 10 million, 100 bp simulated short reads from human genome reference hg19.
† Darwin computing speed is based upon 3 million, 1000 bp simulated short reads from human genome reference GRCh38.
‡Details on the actual device used in the case of Darwin are unavailable other than the Kintex-7 series by Xilinx.
§ ASAP computing speed is based upon 100 million, 128 bp simulated short reads from human genome reference hg38.