| Literature DB >> 33286103 |
Hongping Guo1,2, Zuguo Yu1,3, Jiyuan An4, Guosheng Han1, Yuanlin Ma1, Runbin Tang1.
Abstract
Genome-wide association study (GWAS) has turned out to be an essential technology for exploring the genetic mechanism of complex traits. To reduce the complexity of computation, it is well accepted to remove unrelated single nucleotide polymorphisms (SNPs) before GWAS, e.g., by using iterative sure independence screening expectation-maximization Bayesian Lasso (ISIS EM-BLASSO) method. In this work, a modified version of ISIS EM-BLASSO is proposed, which reduces the number of SNPs by a screening methodology based on Pearson correlation and mutual information, then estimates the effects via EM-Bayesian Lasso (EM-BLASSO), and finally detects the true quantitative trait nucleotides (QTNs) through likelihood ratio test. We call our method a two-stage mutual information based Bayesian Lasso (MBLASSO). Under three simulation scenarios, MBLASSO improves the statistical power and retains the higher effect estimation accuracy when comparing with three other algorithms. Moreover, MBLASSO performs best on model fitting, the accuracy of detected associations is the highest, and 21 genes can only be detected by MBLASSO in Arabidopsis thaliana datasets.Entities:
Keywords: Bayesian Lasso; GWAS; Pearson correlation; feature screening; mutual information
Year: 2020 PMID: 33286103 PMCID: PMC7516787 DOI: 10.3390/e22030329
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1A flow chart of MBLASSO method.
Screening results based on Pearson correlation and mutual information in MBLASSO under three simulation scenarios (each cell includes the overlap ratio and average number of SNPs after screening in the parentheses).
| Simulations | Pearson Correlation Screening | Mutual Information Screening | ||||
|---|---|---|---|---|---|---|
| Type I | Type II | Total | Type I | Type II | Total | |
| 1 | 0.470 (15.8) | 0.086 (50.4) | 0.184 (66.2) | 0.417 (18.2) | 0.298 (15.5) | 0.356 (33.7) |
| 2 | 0.452 (16.6) | 0.091 (50.3) | 0.181 (66.9) | 0.398 (19.0) | 0.285 (17.5) | 0.334 (36.5) |
| 3 | 0.457 (14.6) | 0.090 (50.8) | 0.173 (65.4) | 0.383 (18.4) | 0.278 (17.4) | 0.323 (35.8) |
Figure 2Statistical powers for the six simulated QTNs in three simulation scenarios. (a) only six QTNs’ additive effects; (b) six QTNs’ additive effects and polygenic background effect; and (c) six QTNs’ additive effects and three other pairs of QTNs’ epistatic effects.
Figure 3Average mean squared errors (MSEs) for the six simulated QTNs in three simulation scenarios. The description of (a–c) is the same as that in Figure 2.
Figure 4Type 1 error ratios () in three simulation scenarios. The descriptions of Simulations 1–3 corresponding to (a–c) in Figure 2.
Degree of model fitting (AIC, BIC) for SNPs identified in four flowering-time related traits for Arabidopsis thaliana.
| Traits | MBLASSO | ISIS EM-BLASSO | GEMMA | EM-BLASSO | ||||
|---|---|---|---|---|---|---|---|---|
| AIC | BIC | AIC | BIC | AIC | BIC | AIC | BIC | |
| LDV | −360.543 | −307.436 | −318.966 | −275.230 | 1312.693 | 1322.065 | −113.638 | −104.266 |
| SDV | −169.269 | −114.028 | −140.485 | −85.245 | 1356.907 | 1372.251 | 149.095 | 149.095 |
| 2W | −103.363 | −51.957 | −65.172 | −7.718 | 584.000 | 587.024 | 148.247 | 160.342 |
| 4W | −124.109 | −74.084 | −98.993 | −54.527 | 1253.281 | 1258.839 | 22.893 | 39.568 |
The accuracy of detected associations in four flowering-time related traits for Arabidopsis thaliana (the number behind slash in each cell is the count of detected SNPs, and the number in front of slash is the count of known genes in GO annotation),
| Traits | MBLASSO | ISIS EM-BLASSO | GEMMA | EM-BLASSO |
|---|---|---|---|---|
| LDV | 5/17 | 3/14 | 0/3 | 0/3 |
| SDV | 4/18 | 2/18 | 1/5 | 0/0 |
| 2W | 2/17 | 1/19 | 0/1 | 0/4 |
| 4W | 3/18 | 2/16 | 1/2 | 0/6 |