| Literature DB >> 27042214 |
Minjun Huang1, Britney E Graham1, Ge Zhang2, Reed Harder1, Nuri Kodaman1, Jason H Moore3, Louis Muglia4, Scott M Williams5.
Abstract
Genetic studies of human diseases have identified many variants associated with pathogenesis and severity. However, most studies have used only statistical association to assess putative relationships to disease, and ignored other factors for evaluation. For example, evolution is a factor that has shaped disease risk, changing allele frequencies as human populations migrated into and inhabited new environments. Since many common variants differ among populations in frequency, as does disease prevalence, we hypothesized that patterns of disease and population structure, taken together, will inform association studies. Thus, the population distributions of allelic risk variants should reflect the distributions of their associated diseases. Evolutionary Triangulation (ET) exploits this evolutionary differentiation by comparing population structure among three populations with variable patterns of disease prevalence. By selecting populations based on patterns where two have similar rates of disease that differ substantially from a third, we performed a proof of principle analysis for this method. We examined three disease phenotypes, lactase persistence, melanoma, and Type 2 diabetes mellitus. We show that for lactase persistence, a phenotype with a simple genetic architecture, ET identifies the key gene, lactase. For melanoma, ET identifies several genes associated with this disease and/or phenotypes related to it, such as skin color genes. ET was less obviously successful for Type 2 diabetes mellitus, perhaps because of the small effect sizes in known risk loci and recent environmental changes that have altered disease risk. Alternatively, ET may have revealed new genes involved in conferring disease risk for diabetes that did not meet nominal GWAS significance thresholds. We also compared ET to another method used to filter for phenotype associated genes, population branch statistic (PBS), and show that ET performs better in identifying genes known to associate with diseases appropriately distributed among populations. Our results indicate that ET can filter association results to improve our ability to discover disease loci.Entities:
Keywords: Genetic association; Health disparity; Population differentiation; Selection
Year: 2016 PMID: 27042214 PMCID: PMC4818851 DOI: 10.1186/s13040-016-0091-7
Source DB: PubMed Journal: BioData Min ISSN: 1756-0381 Impact factor: 2.522
Fig. 1Limiting the number of SNPs by switching from pairwise to three-way comparison. The number of SNPs identified in pairwise versus three-way population comparisons are illustrated by showing each two-analysis and the three-way overlap. The number in each circle indicates the number of SNPs that are identified under a particular F ST threshold for a pairwise comparison; the numbers are shown in the lower left legend. For example, a indicates there are 65119 SNPs under 95th percentile F ST threshold in the comparison between CEU and YRI. b shows that there is a significant decrease to 22 SNPs (center triangle) when using a three way comparison under the thresholds of the same stringency
Genetic associations of the 85th/15th threshold ET genes with index diseases/traits (CEU-GIH-YRI)
| Index disease | ET genes | 5tha | 10tha | Odds ratiob |
|---|---|---|---|---|
| Lactase Persistence |
| Y | Y | Mendelian |
| Melanoma/Skin Neoplasms |
| 3.16 [ | ||
|
| Y | Y | 2.78 [ | |
|
| 1.15 [ | |||
|
| 0.60 [ | |||
| Diabetes Mellitus, Type 2/Glucose Intolerance/Insulin Resistancec |
| 1.12 [ | ||
|
| 1.04 [ | |||
|
| 1.17 [ | |||
|
| 1.28 [ | |||
|
| Y | 1.22 [ | ||
|
| Y | 1.29 [ | ||
|
| Y | 1.16 [ | ||
|
| Y | 1.17 [ |
a5th/10th: Y indicates ET genes identified under the 95th/5th or 90th/10th threshold, respectively
bLargest reported Odds ratio
cCase diagnosis is as defined in each reference
Fig. 2F ST distributions among ET comparisons of CEU, GIH and YRI populations. Shows the F ST distributions of CEU-GIH, CEU-YRI and GIH-YRI. Similar in the three distributions, most F ST’s are less than 0.1, which means that most SNPs are not differentiated greatly among population. The red dotted lines indicates the percentile thresholds we use to generate ET SNPs. For example, for CEU-GIH and CEU-YRI we took SNPs having an F ST greater than or equal to 95th percentile, which are those to right of the red dotted lines. And for GIH-YRI, we took SNPs having an F ST less than or equal to than 5th percentile, which are those to left of the red dotted line. By overlapping these three sets of SNPs, we could generate the ET SNPs
Fig. 3ET SNPs in vicinity of the LCT gene. Seven out of eight ET SNPs generated under the 95th/5th percentile threshold are within 100 Kb of LCT loci. The eighth is to the right of the DARS gene, upstream of the LCT-MCM6 locus. (Coordinate only shows relative distance not indicating exact build 37 coordinates)
Fig. 4Percentage of ET SNPs within ± 100 Kb of the LCT-MCM genic region. In all two-way comparisons, only a small proportion (<0.2 %) of the SNPs are within ± 100 Kb of the LCT-MCM region. For example, the 95th percentile F ST threshold of two way comparison between CEU and TSI (red line) generates 68,106 SNPs, only 91 (0.133 %) of which locate within ± 100 Kb of the LCT-MCM region. In contrast, signals from the three way comparison of F ST (ET) more than 60 % of the ET SNPs are within the same region (blue line). The black line represents the pairwise comparisons between CEU and CHB, while the green line represents between CHB and TSI
Significant Association of ET Genes (CEU, GIH, YRI) with diseases/traits
| Percentile threshold | Number of ET SNPs | Number of ET genes | Number of diseases appropriately distributed among ET populations | Number of ET genes associated with appropriately distributed diseases | Permutation |
|---|---|---|---|---|---|
| 95th/5th | 22 | 33 | 6 | 4 | 0.0237 |
| 90th/10th | 168 | 230 | 27 | 15 | 0.0111 |
| 85th/15th | 733 | 971 | 42 | 49 | 0.0023 |