| Literature DB >> 28158525 |
Yang-Jun Wen1, Hanwen Zhang2, Yuan-Li Ni1, Bo Huang1, Jin Zhang1, Jian-Ying Feng1, Shi-Bo Wang3, Jim M Dunwell4, Yuan-Ming Zhang1,3, Rongling Wu5,6.
Abstract
The mixed linear model has been widely used in genome-wide association studies (GWAS), but its application to multi-locus GWAS analysis has not been explored and assessed. Here, we implemented a fast multi-locus random-SNP-effect EMMA (FASTmrEMMA) model for GWAS. The model is built on random single nucleotide polymorphism (SNP) effects and a new algorithm. This algorithm whitens the covariance matrix of the polygenic matrix K and environmental noise, and specifies the number of nonzero eigenvalues as one. The model first chooses all putative quantitative trait nucleotides (QTNs) with ≤ 0.005 P-values and then includes them in a multi-locus model for true QTN detection. Owing to the multi-locus feature, the Bonferroni correction is replaced by a less stringent selection criterion. Results from analyses of both simulated and real data showed that FASTmrEMMA is more powerful in QTN detection and model fit, has less bias in QTN effect estimation and requires a less running time than existing single- and multi-locus methods, such as empirical Bayes, settlement of mixed linear model under progressively exclusive relationship (SUPER), efficient mixed model association (EMMA), compressed MLM (CMLM) and enriched CMLM (ECMLM). FASTmrEMMA provides an alternative for multi-locus GWAS.Entities:
Mesh:
Substances:
Year: 2018 PMID: 28158525 PMCID: PMC6054291 DOI: 10.1093/bib/bbw145
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
Comparison of six methods and their softwares for GWAS
| Case | FASTmrEMMA | E-BAYES | EMMA | CMLM | ECMLM | SUPER |
|---|---|---|---|---|---|---|
| Model | Multi-locus model | Multi-locus model | Single-locus model | Single-locus model | Single-locus model | Single-locus model |
| QTN effect | Random | Random | Fixed | Fixed | Fixed | Fixed |
| Polygenic background control | Yes | No | Yes | Yes | Yes | Yes |
| Population structure control | Yes | No | Yes | Yes | Yes | Yes |
| Number of variance components | Three | No. of effects | Two | Two | Two | Two |
| Polygenic-to-residual variance ratio | Fixed | NA | NA | Fixed | Fixed | NA |
| Significant critical value | LOD (logarithm of odds)=3 | |||||
| Transformation matrix and performances | Shrinkage is selective. Large effects subject to virtually no shrinkage while small effects are shrunken to zero. | Kinship among individuals is replaced by the kinship among groups.Fit the groups as the random effect, and estimates population parameters only once and then fixes them to test genetic markers. | Kinship among individuals is replaced by the kinship among groups.Chooses the best combination between kinship algorithms and grouping algorithms. | Dramatically reduces the number of markers used to define individual relationships, and uses them in FaST-LMM. | ||
| Running time | Fast | Depend on the number of effects. | Slow | Fast | Fast | Moderate |
| Software Web site |
Figure 1Comparison of the QTN-variance estimates between fast multi-locus random-SNP-effect EMMA (FASTmrEMMA) and one exact algorithm implemented by PROC MIXED in SAS. LD: days to flowering under long days; SDV: days to flowering under short days with vernalization; 8W GH LN: leaf number at flowering with 8 weeks vernalization, greenhouse; and 8W GH FT: days to flowering, 8 weeks vernalization, greenhouse.
Figure 2Comparison of FASTmrEMMA with the single- and multi-locus approaches under various genetic backgrounds. The single-locus model approaches include SUPER, EMMA, ECMLM and CMLM, and the multi-locus approach has E-BAYES. The powers are presented in A–C, MSEs are showed in D–F and MADs are listed in G–I. Six QTNs (A, D and G), six QTNs plus polygenes (B, E and H) and six QTNs plus three epistasis (C, F and I) were simulated, respectively, in the first to third simulation experiments.
Figure 3Statistical powers for six simulated QTNs in the first simulation experiment plotted against type I error (in a log10 scale) for the six GWAS methods (FASTmrEMMA, E-BAYES, SUPER, EMMA, ECMLM and CMLM).
Bayesian information criterion values for four flowering time traits in Arabidopsis using six genome-wide association study approaches
| Trait | FASTmr EMMA | E-BAYES | SUPER | EMMA | ECMLM | CMLM |
|---|---|---|---|---|---|---|
| LD | 39.54 | 287.00 | 396.65 | 299.97 | 382.07 | 382.07 |
| SDV | −88.09 | 43.20 | 179.54 | 100.69 | 169.87 | 169.87 |
| 8W GH LN | −103.47 | 77.76 | 117.50 | 117.50 | 117.50 | 117.50 |
| 8W GH FT | −321.72 | −155.55 | −82.41 | −101.83 | −82.41 | −82.41 |
LD: days to flowering under long days; SDV: days to flowering under short days with vernalization; 8W GH LN: leaf number at flowering with 8 weeks vernalization, greenhouse; 8W GH FT: days to flowering, 8 weeks vernalization, greenhouse.
GWAS for four flowering time traits in Arabidopsis using six GWAS methods
| Trait | Gene | Chr | SNP (bp) | FASTmrEMMA | E-BAYES | SUPER | EMMA | References | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LOD | Effect | MAF | r2 (%) | LOD | Effect | MAF | r2 (%) | Effect | MAF | r2 (%) | Effect | MAF | r2 (%) | |||||||
| LD | 1 | 8045438 | 4.872 | −0.112 | 0.395 | 0.549 | [ | |||||||||||||
| 1 | 8128350 | 9.006 | −0.197 | 0.461 | 1.767 | [ | ||||||||||||||
| 2 | 9588685 | 10.338 | −0.330 | 0.281 | 4.034 | 10.753 | −0.611 | 0.281 | 13.817 | 2.78E-09 | −0.815 | 0.281 | 24.607 | [ | ||||||
| 2 | 9588685 | 10.338 | −0.330 | 0.281 | 4.034 | 10.753 | −0.611 | 0.281 | 13.817 | 2.78E-09 | −0.815 | 0.281 | 24.607 | [ | ||||||
| 3 | 22949227 | 5.919 | 0.149 | 0.413 | 0.986 | [ | ||||||||||||||
| 5 | 3188328 | 12.759 | −0.272 | 0.263 | 2.630 | [ | ||||||||||||||
| 4 | 153459 | 8.39E-08 | −0.363 | 0.168 | 3.374 | [ | ||||||||||||||
| 4 | 167142 | 6.75E-08 | −0.538 | 0.138 | 6.307 | [ | ||||||||||||||
| 4 | 196614 | 2.88E-08 | −0.227 | 0.389 | 2.243 | [ | ||||||||||||||
| 4 | 516758 | 8.15E-08 | −0.504 | 0.108 | 4.483 | [ | ||||||||||||||
| SDV | 1 | 1595585 | 4.298 | 0.117 | 0.214 | 1.346 | [ | |||||||||||||
| 1 | 1595585 | 4.298 | 0.117 | 0.214 | 1.346 | [ | ||||||||||||||
| 1 | 28965510 | 10.817 | −0.177 | 0.484 | 4.576 | 4.020 | −0.170 | 0.484 | 4.221 | [ | ||||||||||
| 2 | 17488070 | 4.339 | 0.099 | 0.302 | 1.208 | [ | ||||||||||||||
| 3 | 7084425 | 3.309 | 0.068 | 0.302 | 0.570 | [ | ||||||||||||||
| 3 | 18385143 | 4.529 | 0.118 | 0.321 | 1.774 | [ | ||||||||||||||
| 4 | 2748735 | 4.286 | −0.091 | 0.459 | 1.203 | [ | ||||||||||||||
| 5 | 1164843 | 4.479 | −0.137 | 0.220 | 1.884 | [ | ||||||||||||||
| 5 | 3055565 | 4.763 | −0.105 | 0.233 | 1.151 | [ | ||||||||||||||
| 5 | 23249199 | 5.419 | −0.141 | 0.321 | 2.533 | [ | ||||||||||||||
| 5 | 23249199 | 5.419 | −0.141 | 0.321 | 2.533 | [ | ||||||||||||||
| 5 | 19044037 | 3.55E-08 | 0.408 | 0.107 | 9.296 | [ | ||||||||||||||
| 5 | 26794176 | 1.79E-07 | −0.292 | 0.321 | 10.864 | [ | ||||||||||||||
| 5 | 26794176 | 1.79E-07 | −0.292 | 0.321 | 10.864 | [ | ||||||||||||||
| 8W GH LN | 1 | 28965510 | 3.857 | −0.109 | 0.497 | 2.610 | [ | |||||||||||||
| 2 | 11703876 | 9.631 | −0.153 | 0.325 | 4.514 | [ | ||||||||||||||
| 4 | 15918498 | 4.651 | −0.147 | 0.147 | 2.384 | [ | ||||||||||||||
| 5 | 5196549 | 5.923 | −0.106 | 0.319 | 2.145 | [ | ||||||||||||||
| 5 | 18600041 | 4.608 | −0.107 | 0.423 | 2.456 | [ | ||||||||||||||
| 8W GH FT | 1 | 863771 | 5.055 | 0.040 | 0.460 | 1.199 | [ | |||||||||||||
| 2 | 11703876 | 4.744 | −0.043 | 0.323 | 1.122 | [ | ||||||||||||||
| 2 | 19396129 | 4.208 | −0.038 | 0.298 | 0.911 | [ | ||||||||||||||
| 3 | 21079518 | 3.081 | −0.032 | 0.311 | 0.661 | [ | ||||||||||||||
| 3 | 21079518 | 3.081 | −0.032 | 0.311 | 0.661 | [ | ||||||||||||||
| 5 | 2002341 | 3.169 | −0.070 | 0.186 | 2.241 | [ | ||||||||||||||
| 5 | 2002341 | 3.169 | −0.070 | 0.186 | 2.241 | [ | ||||||||||||||
| 5 | 26781546 | 4.302 | 0.076 | 0.317 | 3.772 | [ | ||||||||||||||
MAF: minor allele frequency. The individuals with missing phenotypes and the SNPs with MAF were excluded. The critical value for significance was for FASTmrEMMA and E-BAYES, and approximately 2.8E-07 P-value for SUPER, EMMA, CMLM and ECMLM. The results from CMLM and ECMLM were not listed in this table because no genes were detected. The data set was derived from Atwell et al. (2010).