| Literature DB >> 30154821 |
Huimin Kang1,2,3, Chao Ning1,2,3, Lei Zhou1,2,3, Shengli Zhang1,2,3, Ning Yang1,2,3, Jian-Feng Liu1,2,3.
Abstract
Today, the rapid increase in phenotypic and genotypic information is leading to larger mixed model equations (MMEs) and rendering genetic evaluation more time-consuming. It has been demonstrated that a preconditioned conjugate gradient (PCG) algorithm via an iteration on data (IOD) technique is the most efficient method of solving MME at a low computing cost. Commonly used software applications implementing PCG by IOD merely employ functions from the Intel Math Kernel Library (MKL) to accelerate numerical computations and have not taken full advantage of the multicores or multiprocessors of computer systems to reduce the execution time. Making the most of multicore/multiprocessor systems, we propose PIBLUP, a parallel, shared memory implementation of PCG by IOD to minimize the execution time of genetic evaluation. In addition to functions in MKL, PIBLUP uses Message Passing Interface (MPI) shared memory programming to parallelize code in the entire workflow where possible. Results from the analysis of the two datasets show that the execution time was reduced by more than 80% when solving MME using PIBLUP with 16 processes in parallel, compared to a serial program using a single process. PIBLUP is a high-performance tool for users to efficiently perform genetic evaluation. PIBLUP with its user manual is available at https://github.com/huiminkang/PIBLUP.Entities:
Keywords: Intel Math Kernel Library; Message Passing Interface; genetic evaluation; iteration on data; large-scale; multicore/multiprocessor; preconditioned conjugate gradient
Year: 2018 PMID: 30154821 PMCID: PMC6102405 DOI: 10.3389/fgene.2018.00226
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Workflow diagram for PIBLUP. The mixed model equation to be solved is Cx = b. The preconditioner matrix is M, a block diagonal matrix formed from the coefficient matrix C. Implementation of steps marked by “*” employs math functions in Intel Math Kernel Library. Code in parts of M−1 and b constructions and PCG are parallelized using Message Passing Interface shared memory programming. Steps in cyan color are executed by all the processes to reduce runtime, and those in white are executed by the master process. All the processes execute the two conditional statements in diamonds.
Runtime (min) of different parts (model construction, M−1 and b constructions, and PCG) in PIBLUP in analyses of datasets 1 and 2.
| Serial | 21.97 | 0.09 | 0.61 | 36.56 | 14.45 | 87.21 |
| Parallel (1) | 21.97 | 1.56 | 0.62 | 36.56 | 17.36 | 98.76 |
| Parallel (4) | 8.63 | 0.41 | 0.19 | 22.57 | 5.47 | 42.61 |
Serial version (Serial), parallel version of PIBLUP with a single process (Parallel (1)) and four processes (Parallel (4)) were tested.
There were 22,671 equations in mixed model equations for analysis of dataset 1.
There were 185,048,637 equations in mixed model equations for analysis of dataset 2.
PCG was iterated 50 rounds for easy comparison.
Runtime (min) of PIBLUP, BLUPF90 and DMU in the analysis of dataset 1.
| 1 | 51.49 | 84.97 | 58.18 |
| 4 | 17.61 | 49.87 | 38.79 |
Parallel version of PIBLUP.
The blupf90 program from BLUPF90 package was employed.
The DMU4 program from DMU package was employed.
Figure 2Relative runtime of each part (model construction, M−1 and b constructions, and PCG) in parallel version of PIBLUP against runtime of their serial counterparts by the number of processes in analyses of dataset 1. In M−1 and b constructions, relative time was calculated against the parallel version using a single process. Model, model construction; M−1 and b: M−1 and b constructions.
Figure 3Relative runtime of each part (model construction, M−1 and b constructions, and PCG) in parallel version of PIBLUP against runtime of their serial counterparts by the number of processes in analyses of dataset 2. Model, model construction; M−1 and b: M−1 and b constructions.