| Literature DB >> 32093702 |
Haohan Wang1, Michael M Vanyukov2, Eric P Xing1,3, Wei Wu4.
Abstract
BACKGROUND: The current understanding of the genetic basis of complex human diseases is that they are caused and affected by many common and rare genetic variants. A considerable number of the disease-associated variants have been identified by Genome Wide Association Studies, however, they can explain only a small proportion of heritability. One of the possible reasons for the missing heritability is that many undiscovered disease-causing variants are weakly associated with the disease. This can pose serious challenges to many statistical methods, which seems to be only capable of identifying disease-associated variants with relatively stronger coefficients.Entities:
Keywords: GWAS; Linear mixed model; Weak association
Year: 2020 PMID: 32093702 PMCID: PMC7038505 DOI: 10.1186/s12920-020-0667-4
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Fig. 1An illustration of the generation process of SNP array data. This figure shows the data is generated with three populations as an example
Fig. 2Simulation results of CS-LMM compared to other models in terms of the precision-recall curve. The x-axis is recall and y-axis is precision. This figure is split into four components based on the heritability. a: heritability is 0.1; b heritability is 0.3; c heritability is 0.5; d heritability is 0.7;
The top SNPs that CS-LMM identifies in an alcoholism study with four known associations
| Rank | SNP | Chr | Chr Position | Est. Coe. | MAF | Gene | Disease [Literature] |
|---|---|---|---|---|---|---|---|
| 1 | rs1789891 | 4 | 99329262 | 4.2E3 | 0.15 | ALC [ | |
| 2 | rs7590720 | 2 | 216033935 | 1.7E3 | 0.29 | ALC [ | |
| 3 | rs2835872 | 21 | 37654970 | 1.5E3 | 0.25 | ALC [ | |
| 4 | rs4478858 | 1 | 31411078 | 1.4E3 | 0.44 | ALC [ | |
| 5 | rs1789924 | 4 | 99353129 | -2.2E-4 | 0.33 | ALC [ | |
| 6 | rs698 | 4 | 99339632 | -2.2E-4 | 0.33 | ALC [ | |
| 7 | rs2851300 | 4 | 99358667 | -2.2E-4 | 0.33 | ||
| 8 | rs10483038 | 21 | 37652469 | -1.6E-4 | 0.25 | ALC [ | |
| 9 | rs1344694 | 2 | 216028914 | -1.3E-4 | 0.32 | ALC [ | |
| 10 | rs4147536 | 4 | 99317955 | -7.6E-5 | 0.30 | ALC [ | |
| 11 | rs12482570 | 21 | 37705475 | -5.9E-5 | 0.28 | ALC [ | |
| 12 | rs857975 | 21 | 37629311 | -5.8E-5 | 0.28 | ALC [ | |
| 13 | rs4147544 | 4 | 99213357 | -5.7E-5 | 0.45 | ALC [ | |
| 14 | rs702860 | 21 | 37636327 | -5.6E-5 | 0.26 | ALC [ | |
| 15 | rs2835853 | 21 | 37642590 | -5.6E-5 | 0.26 | ALC [ | |
| 16 | rs717859 | 21 | 37640500 | -5.6E-5 | 0.26 | ALC [ | |
| 17 | rs11499823 | 4 | 99353592 | -5.6E-5 | 0.12 | ALC [ | |
| 18 | rs2835910 | 21 | 37713604 | -5.5E-5 | 0.30 | ALC [ | |
| 19 | rs4355398 | 4 | 99237168 | -3.8E-5 | 0.25 | ||
| 20 | rs2187483 | 4 | 99212946 | -9.7E-7 | 0.38 | ALC [ | |
| 21 | rs2835831 | 21 | 37614931 | -6.9E-7 | 0.30 |
The SNPs are ranked by the absolute values of their estimated coefficients. The first four SNPs with the largest coefficients in the upper panel are known SNPs that our model CS-LMM takes as prior knowledge. The rest SNPs in the lower panel are ones predicted by the model. The MAFs reported in the table are calculated using the case-control alcoholism GWAS data. The information of whether a SNP is located within a region of a gene is taken from the Database for Single Nucleotide Polymorphisms (dbSNP) [35], and listed in the ’Gene’ column. Abbreviations: ALC, Alcoholism; AD, Alzheimer’s Disease; DS, Down Syndrome; Est. Coe.: Estimated Coefficients. Note that the literature support may refer to how the genes that the corresponding SNPs reside in are related to the phenotype, instead of the SNPs themselves. See discussions in Section Alcoholism Study for details
The top SNPs that CS-LMM identifies in an AD study with two known associations
| Rank | SNP | Chr | Chr Position | Est. Coe. | MAF | Gene | Disease [Literature] |
|---|---|---|---|---|---|---|---|
| 1 | rs2075650 | 19 | 44892362 | 0.21 | 0.18 | AD [ | |
| 2 | rs157580 | 19 | 44892009 | 0.02 | 0.27 | AD [ | |
| 3 | rs10027926 | 4 | 3412927 | -8.3E-11 | 0.14 | SCZ [ | |
| 4 | rs12641989 | 4 | 3418113 | -7.8E-11 | 0.14 | SCZ [ | |
| 5 | rs3088231 | 4 | 3420484 | -7.5E-11 | 0.13 | SCZ [ | |
| 6 | rs10512523 | 17 | 69044919 | 5.2E-11 | 0.28 | AD [ | |
| 7 | rs4076949 | 1 | 234066399 | 4.2E-11 | 0.18 | ||
| 8 | rs874418 | 4 | 3440342 | -3.9E-11 | 0.19 | ||
| 9 | rs6842419 | 4 | 3475572 | -3.2E-11 | 0.16 | ||
| 10 | rs16844383 | 4 | 3445516 | -2.9E-11 | 0.21 | ||
| 11 | rs12131508 | 1 | 234017193 | 1.7E-11 | 0.17 | ||
| 12 | rs12506821 | 4 | 3282833 | -1.6E-11 | 0.16 | ||
| 13 | rs11485175 | 1 | 222437868 | 1.4E-11 | 0.23 | ||
| 14 | rs584507 | 10 | 6489788 | 1.2E-11 | 0.24 | ||
| 15 | rs12563692 | 1 | 216818264 | -1.2E-11 | 0.30 | ALC [ | |
| 16 | rs6446731 | 4 | 3283024 | -1.1E-11 | 0.26 | ||
| 17 | rs7984051 | 13 | 70233817 | -8.1E-12 | 0.25 | ||
| 18 | rs2327771 | 20 | 13295734 | 3.0E-12 | 0.29 | ||
| 19 | rs7548651 | 1 | 234012812 | 2.4E-12 | 0.20 | ||
| 20 | rs4330674 | 8 | 133209259 | -1.2E-12 | 0.24 | ||
| 21 | rs16885750 | 5 | 56578982 | -8.1E-13 | 0.12 | ||
| 22 | rs938412 | 3 | 188571269 | 3.6E-13 | 0.31 |
The SNPs are ranked by the absolute values of their estimated coefficients. The first two SNPs with largest coefficients are known SNPs the model takes as a prior knowledge. The rest are SNPs predicted by the model. The MAFs reported in the table are calculated using the AD GWAS data. The information of whether a SNP is located within a region of a gene is taken from the dbSNP. Abbreviations: ALC, Alchoholism; AD, Alzheimer’s Disease; SCZ, Schizophrenia; Est. Coe.: Estimated Coefficients. Note that the literature support may refer to how the genes that the corresponding SNPs reside in are related to the phenotype, instead of the SNPs themselves. See discussions in Section Alzheimer’s Disease Study for details.