| Literature DB >> 35073307 |
Fabien Laporte1, Alain Charcosset1, Tristan Mary-Huard1,2.
Abstract
Since their introduction in the 50's, variance component mixed models have been widely used in many application fields. In this context, ReML estimation is by far the most popular procedure to infer the variance components of the model. Although many implementations of the ReML procedure are readily available, there is still need for computational improvements due to the ever-increasing size of the datasets to be handled, and to the complexity of the models to be adjusted. In this paper, we present a Min-Max (MM) algorithm for ReML inference and combine it with several speed-up procedures. The ReML MM algorithm we present is compared to 5 state-of-the-art publicly available algorithms used in statistical genetics. The computational performance of the different algorithms are evaluated on several datasets representing different plant breeding experimental designs. The MM algorithm ranks among the top 2 methods in almost all settings and is more versatile than many of its competitors. The MM algorithm is a promising alternative to the classical AI-ReML algorithm in the context of variance component mixed models. It is available in the MM4LMM R-package.Entities:
Mesh:
Year: 2022 PMID: 35073307 PMCID: PMC8824334 DOI: 10.1371/journal.pcbi.1009659
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Computational time (in sec.) associated to the different algorithms for the complete GWAS analysis of the Flint dataset, trait by trait.
| gaston | MM4LMM | FaST-LMM | GEMMA | BOLT-LMM | GridLMM | lme4 | |
|---|---|---|---|---|---|---|---|
| DM_Y | 3 | 6 | 28 | 15 | 12 | 9 | 12886 |
| Tass | 5 | 17 | 28 | 15 | 325 | 5 | 34852 |
−log10(p-value) of markers detected by all exact algorithms for DMY_Flo.
| Marker | −log10( |
|---|---|
| SYN10537 | 5.61 |
| SYN10528 | 5.61 |
| PZE-101030022 | 5.07 |
| PZE-101123079 | 4.87 |
| SYN13856 | 5.19 |
List of significant markers at a nominal level of 5% (Gao correction for multiple testing).
Fig 1Log-transformed p-values concordance between gaston and GridLMM, and gaston and MM4LMM.
Fig 2Computational time for variance component analysis with simulated data.
Computational time of the algorithms with respect to the number of observations (left) and the number of trials (right).
Computational time (in sec.) associated to the analysis of different subsamples of trials of the NAM dataset, using Model (2).
Bold numbers correspond to the best performance.
| Nb Trials | Avg Nb Obs | gaston | MM4LMM | GEMMA | GridLMM | Ratio | ||||
|---|---|---|---|---|---|---|---|---|---|---|
| mean | sd | mean | sd | mean | sd | mean | sd | |||
| 2 | 1,931.25 |
| 4.43 | 42.15 | 7.05 | 58.35 | 16.48 | 23.02 | 4.45 | 3.6 |
| 4 | 3,862.50 | 170.44 | 24.15 | 288.08 | 44.62 | 386.03 | 30.02 |
| 14.28 | 3.6 |
| 6 | 5,793.75 | 659.35 | 72.58 |
| 25.87 | 1244.19 | 152.15 | 326.58 | 37.15 | 4.0 |
| 8 | 7,725 | 1786.87 |
| 3100.94 | 792.59 | 9.3 | ||||
Computational time (in sec.) associated to the analysis of the NAM dataset using Model (3).
| gaston | MM4LMM | GEMMA |
|---|---|---|
| 5207 | 15739 | >30000 |