| Literature DB >> 30390656 |
Jérémie Vandenplas1, Herwin Eding2, Mario P L Calus3, Cornelis Vuik4.
Abstract
BACKGROUND: The single-step single nucleotide polymorphism best linear unbiased prediction (ssSNPBLUP) method, such as single-step genomic BLUP (ssGBLUP), simultaneously analyses phenotypic, pedigree, and genomic information of genotyped and non-genotyped animals. In contrast to ssGBLUP, SNP effects are fitted explicitly as random effects in the ssSNPBLUP model. Similarly, principal components associated with the genomic information can be fitted explicitly as random effects in a single-step principal component BLUP (ssPCBLUP) model to remove noise in genomic information. Single-step genomic BLUP is solved efficiently by using the preconditioned conjugate gradient (PCG) method. Unfortunately, convergence issues have been reported when solving ssSNPBLUP by using PCG. Poor convergence may be linked with poor spectral condition numbers of the preconditioned coefficient matrices of ssSNPBLUP. These condition numbers, and thus convergence, could be improved through the deflated PCG (DPCG) method, which is a two-level PCG method for ill-conditioned linear systems. Therefore, the first aim of this study was to compare the properties of the preconditioned coefficient matrices of ssGBLUP and ssSNPBLUP, and to document convergence patterns that are obtained with the PCG method. The second aim was to implement and test the efficiency of a DPCG method for solving ssSNPBLUP and ssPCBLUP.Entities:
Mesh:
Year: 2018 PMID: 30390656 PMCID: PMC6215606 DOI: 10.1186/s12711-018-0429-3
Source DB: PubMed Journal: Genet Sel Evol ISSN: 0999-193X Impact factor: 4.297
Algorithm for preconditioned conjugate gradient (PCG) and deflated PCG (DPCG) methods for solving in the linear system using a preconditioner M
| 1 | Select an initial guess |
| 2 | for |
| 3 |
|
| 4 |
|
| 5 |
|
| 6 |
|
| 7 |
|
| 8 |
|
| 9 |
|
| 10 |
|
| 11 |
|
| 12 | end |
| 13 |
|
For PCG: , ; for DPCG: , , and is a deflation-subspace matrix
For the PCG implementation: the equation in line 11 was replaced by the equation at each 50 iterations [16]
Comparison of estimates obtained with different models against estimates obtained with ssGBLUP using the PCG method
| Dataset | Modela | Pearson correlation | Interceptb | Regression coefficientb |
|---|---|---|---|---|
| Reduced dataset | ssSNPBLUP | > 0.999 | 0.045 | 0.998 |
| Field dataset | ssSNPBLUP | 0.999 | − 0.001 | 0.997 |
| ssPCBLUP | 0.999 | − 0.001 | 0.998 |
assSNPBLUP was solved with the DPCG method with five SNP effects per subdomain; ssPCBLUP was solved with one PC effect per subdomain
bResults from the regression of estimates of ssGBLUP on estimates of ssSNPBLUP or of ssPCBLUP
Fig. 1Eigenvalues of different preconditioned (deflated) coefficient matrices for the reduced dataset. Eigenvalues of the preconditioned coefficient matrices of ssGBLUP and of ssSNPBLUP, and of the preconditioned deflated coefficient matrix of ssSNPBLUP with one SNP effect per subdomain are depicted on a logarithm scale. All eigenvalues less than 10−11 were set to 10−11. Eigenvalues are sorted in ascending order
Characteristics of preconditioned (deflated) coefficient matrices, and of PCG and DPCG methods for the reduced dataset
| Model | Methoda | Smallest eigenvalue | Largest eigenvalue | Effective condition number | Number of iterations | Total timeb | Time/iterationc |
|---|---|---|---|---|---|---|---|
| ssGBLUP | PCG | 1.1 × 10−4 | 11.9 | 1.1 × 105 | 270 | 11.3 | 0.05 |
| ssSNPBLUP | PCG | 1.1 × 10−4 | 181.0 | 1.7 × 106 | 1475 | 688.2 | 0.46 |
| DPCG (200) | 1.1 × 10−4 | 99.4 | 9.3 × 105 | 1221 | 570.5 | 0.47 | |
| DPCG (50) | 1.1 × 10−4 | 40.5 | 3.8 × 105 | 890 | 437.7 | 0.49 | |
| DPCG (5) | 1.1 × 10−4 | 6.4 | 6.0 × 104 | 331 | 170.1 | 0.49 | |
| DPCG (1) | 1.1 × 10−4 | 6.0 | 5.9 × 104 | 270 | 189.6 | 0.66 |
aNumber of SNP effects per subdomain is within brackets
bWall clock time (s) for the iterative process
cAverage wall clock time (s) per iteration. Iterations computing the residual from the coefficient matrix for the PCG method were removed before averaging
Fig. 2Termination criteria for the reduced dataset for ssGBLUP and ssSNPBLUP using the PCG method and for ssSNPBLUP using the DPCG method. Number of SNP effects per subdomain is within brackets
Characteristics of preconditioned (deflated) coefficient matrices, and of PCG and DPCG methods for the field dataset
| Model | Methoda | Smallest eigenvalue | Largest eigenvalue | Effective condition number | Number of iterationsb | Total timec | Time/iterationd |
|---|---|---|---|---|---|---|---|
| ssGBLUP | PCG | 2.3 × 10−5 | 5.1 | 2.2 × 105 | 729 | 3993 | 5.3 (0.4) |
| ssSNPBLUP | PCG | 3.7 × 10−5 | 1751.9 | 4.7 × 107 | 10,000 | 52,683 | 4.4 (0.4) |
| DPCG (200) | 1.2 × 10−5 | 193.1 | 1.6 × 107 | 10,000 | 92,171 | 9.2 (1.4) | |
| DPCG (50) | 8.7 × 10−6 | 29.9 | 3.4 × 106 | 6074 | 52,503 | 8.6 (2.4) | |
| DPCG (5) | 2.9 × 10−5 | 4.8 | 1.7 × 105 | 748 | 7735 | 8.7 (0.3) | |
| ssPCBLUPe | PCG | 1.2 × 10−5 | 220.0 | 1.8 × 107 | 10,000 | 30,198 | 2.9 (0.2) |
| DPCG (200) | 8.3 × 10−6 | 113.3 | 1.4 × 107 | 10,000 | 58,280 | 5.8 (0.7) | |
| DPCG (50) | 7.7 × 10−6 | 46.0 | 6.0 × 106 | 8541 | 55,388 | 6.5 (0.5) | |
| DPCG (5) | 8.0 × 10−6 | 5.1 | 6.4 × 105 | 2686 | 15,063 | 5.6 (0.2) | |
| DPCG (1) | 9.6 × 10−4 | 4.8 | 4.9 × 104 | 375 | 2402 | 6.3 (0.2) |
aNumber of SNP effects per subdomain is within brackets
bA number of iterations equal to 10,000 means that the method failed to converge within 10,000 iterations
cWall clock time (s) for the iterative process
dAverage wall clock time (s) (SD within brackets) per iteration. Iterations computing the residual from the coefficient matrix for the PCG method were removed before averaging
eThe number of principal components retained was equal to 13,803
Fig. 3Termination criteria for the field dataset for ssGBLUP and ssSNPBLUP using the PCG method and for ssSNPBLUP using the DPCG method. Number of SNP effects per subdomain is within brackets
Fig. 4Termination criteria for the field dataset for ssGBLUP and ssPCBLUP using the PCG method and for ssPCBLUP using the DPCG method. The number of principal components retained was equal to 13,803. Number of PC effects per subdomain is within brackets
Computational costs for different matrices and for the software used for the field dataset
| Model | Methoda | Galerkin matrix (E−1) | Dense matrixd |
| Software peak memorye | ||
|---|---|---|---|---|---|---|---|
| Sizeb | GB | Timec (s) | GB | GB | GB | ||
| ssGBLUP | PCG | – | – | – | – | 63.1 | 70.2 |
| ssSNPBLUP | PCG | – | – | – | 26.4 | – | 34.0 |
| DPCG (200) | 764 | 0.004 | 2199 | 26.4 | – | 43.8 | |
| DPCG (50) | 3044 | 0.071 | 2959 | 26.4 | – | 43.9 | |
| DPCG (5) | 30,400 | 7.1 | 9131 | 26.4 | – | 51.0 | |
| ssPCBLUP | PCG | – | – | – | 9.6 | – | 16.6 |
| DPCG (200) | 284 | < 0.001 | 430 | 9.6 | – | 16.8 | |
| DPCG (50) | 1112 | 0.009 | 663 | 9.6 | – | 16.8 | |
| DPCG(5) | 11,048 | 0.9 | 1965 | 9.6 | – | 17.7 | |
| DPCG (1) | 55,216 | 23.3 | 9630 | 9.6 | – | 40.6 | |
aNumber of SNP (PC) effects per subdomain is within brackets
bThe size of the Galerkin matrix is equal to the rank of the deflation-subspace matrix
cWall clock time required for the computation of the Galerkin matrix following a naive implementation, and computation of its inverse
dThe dense matrix is the centered genotype matrix for ssSNPBLUP and the matrix with principal components for ssPCBLUP
eThe software peak memory is defined as the peak resident set size (VmHWM) obtained from the Linux/proc virtual file system