| Literature DB >> 27142425 |
Robert C Williams1, Robert C Elston2, Pankaj Kumar3, William C Knowler3, Hanna E Abboud4, Sharon Adler5, Donald W Bowden6, Jasmin Divers6, Barry I Freedman6, Robert P Igo2, Eli Ipp5, Sudha K Iyengar2, Paul L Kimmel7, Michael J Klag8, Orly Kohn9, Carl D Langefeld6, David J Leehey10, Robert G Nelson3, Susanne B Nicholas11, Madeleine V Pahl12, Rulan S Parekh13, Jerome I Rotter14, Jeffrey R Schelling15, John R Sedor15, Vallabh O Shah16, Michael W Smith17, Kent D Taylor14, Farook Thameem4,18, Denyse Thornley-Brown19, Cheryl A Winkler20, Xiuqing Guo14, Phillip Zager16, Robert L Hanson3.
Abstract
BACKGROUND: The presence of population structure in a sample may confound the search for important genetic loci associated with disease. Our four samples in the Family Investigation of Nephropathy and Diabetes (FIND), European Americans, Mexican Americans, African Americans, and American Indians are part of a genome- wide association study in which population structure might be particularly important. We therefore decided to study in detail one component of this, individual genetic ancestry (IGA). From SNPs present on the Affymetrix 6.0 Human SNP array, we identified 3 sets of ancestry informative markers (AIMs), each maximized for the information in one the three contrasts among ancestral populations: Europeans (HAPMAP, CEU), Africans (HAPMAP, YRI and LWK), and Native Americans (full heritage Pima Indians). We estimate IGA and present an algorithm for their standard errors, compare IGA to principal components, emphasize the importance of balancing information in the ancestry informative markers (AIMs), and test the association of IGA with diabetic nephropathy in the combined sample.Entities:
Keywords: Diabetic nephropathy; Individual genetic ancestry; Population structure; SNP
Mesh:
Substances:
Year: 2016 PMID: 27142425 PMCID: PMC4855449 DOI: 10.1186/s12864-016-2654-x
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Information contrasts. For a 3-ancestral population model there are three information contrasts that are represented by the absolute value of the difference of the respective allele frequencies for allele 1 of the SNP: |P1-P2|, |P1-P3|, and |P2-P3|, a value that is usually given the symbol δ. The variable In is the information-for-assignment statistic. Accurate individual ancestry estimates depend upon balancing the information between these 3 contrasts
Descriptive statistics for 1300 ancestry informative SNP Loci
| Maximized contrasts, δ ≥ 0.5 | |||||
|---|---|---|---|---|---|
| Chromosome | #SNPs | Mean distance (Bp) | |PEU-PAI| | |PEU-PAF| | |PAI-PAF| |
| 1 | 113 | 4,747,709 | 44 | 34 | 35 |
| 2 | 101 | 4,296,466 | 35 | 36 | 30 |
| 3 | 89 | 4,427,268 | 29 | 30 | 30 |
| 4 | 89 | 4,480,508 | 26 | 32 | 31 |
| 5 | 87 | 4,286,121 | 29 | 33 | 25 |
| 6 | 76 | 4,328,735 | 33 | 27 | 16 |
| 7 | 84 | 3,812,870 | 25 | 34 | 25 |
| 8 | 85 | 3,480,996 | 27 | 33 | 25 |
| 9 | 66 | 4,917,071 | 23 | 25 | 18 |
| 10 | 59 | 4,358,358 | 17 | 24 | 18 |
| 11 | 66 | 4,356,288 | 32 | 14 | 20 |
| 12 | 51 | 4,784,074 | 17 | 20 | 14 |
| 13 | 54 | 3,707,021 | 15 | 19 | 20 |
| 14 | 36 | 5,120,754 | 11 | 12 | 13 |
| 15 | 50 | 3,516,390 | 14 | 19 | 17 |
| 16 | 49 | 3,846,889 | 22 | 12 | 15 |
| 17 | 29 | 4,218,684 | 8 | 11 | 10 |
| 18 | 34 | 3,410,019 | 13 | 11 | 10 |
| 19 | 19 | 6,022,778 | 6 | 8 | 5 |
| 20 | 31 | 4,143,198 | 10 | 8 | 13 |
| 21 | 15 | 3,683,800 | 6 | 5 | 4 |
| 22 | 17 | 4,038,881 | 8 | 3 | 6 |
Measures for balancing information (standard deviation) in the three information contrasts
| Information contrast | |||
|---|---|---|---|
| Information | |PEU-PAI| | |PEU-PAF| | |PAI-PAF| |
| Number of SNPs |
|
|
|
| Information-for-Assignment, | 37.3 | 37.3 | 36.7 |
| Mean δ | 0.529 (0.022) | 0.528 (0.022) | 0.542 (0.027) |
| All SNPs | |||
| Information-for-Assignment, | 56.5 | 56.9 | 58.3 |
| Mean δ | 0.351 (0.132) | 0.364 (0.121) | 0.350 (0.130) |
Mean F (standard deviation) in individual and combined contrasts
| Information contrast | |||
|---|---|---|---|
| |PEU-PAI| | |PEU-PAF| | |PAI-PAF| | |
|
| 0.516 (0.023) | 0.381 (0.015) | 0.437 (0.035) |
|
| 0.502 (0.036) | 0.367 (0.018) | 0.421 (0.028) |
Mean (standard deviation) of source samples for AIMs typed with the 3 sets of informative markers
| SNPs in estimates | Source samples for AIMs | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| HapMap CEU, | HapMap LWK, | HapMap YRI, | Pima, | |||||||||
| EU | AI | AF | EU | AI | AF | EU | AI | AF | EU | AI | AF | |
| |PEU-PAI| | .988 (.022) | .004 (.010) | .008 (.021) | .163 (.182) | .144 (.180) | .693 (.359) | .133 (.175) | .152 (.178) | .715 (.349) | .004 (.029) | .985 (.065) | .011 (.053) |
| |PEU-PAF| | .986 (.023) | .010 (.022) | .004 (.010) | .010 (.015) | .017 (.030) | .972 (.028) | .001 (.003) | .004 (.012) | .995 (.012) | .143 (.169) | .715 (.325) | .142 (.164) |
| |PAI-PAF| | .656 (.403) | .185 (.221) | .159 (.186) | .012 (.022) | .008 (.015) | .980 (.023) | .004 (.016) | .002 (.006) | .994 (.017) | .009 (.049) | .988 (.055) | .003 (.017) |
| All SNPs | .990 (.014) | .005 (.010) | .005 (.010) | .015 (.015) | .007 (.011 | .978 (.016) | .001 (.004) | .002 (.007) | .996 (.008) | .007 (.039) | .989 (.048) | .003 (.021) |
Each set is maximized for information in one contrast, and with all combined SNPs
EU European ancestry, AI American Indian ancestry, AF African ancestry
Fig. 2Mean ancestry when estimated with three sets of SNPs, each set maximized for information in one contrast. Each of the ancestral populations was modeled by samples from HapMap or from the Pima Indian GWAS. Three sets of SNPs were each maximized for information in one of the three contrasts and then used to estimate the respective mean ancestry (CEU, European (EU); LWK and YRI, African (AF); Pima, American Indian (AI)) in each sample, with the expectation of a mean of 1.0. When the ancestry of the sample was not represented in the maximized contrast set, then the estimates of individual ancestry become unstable with large error
Mean (standard deviation) and range for individual heritage and standard error estimates for FIND populations
| EU | AI | AF | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Heritage | Standard error | Heritage | Standard error | Heritage | Standard error | ||||||||
| FIND population | N | Mean | Range | Mean | Range | Mean | Range | Mean | Range | Mean | Range | Mean | Range |
| European American | 826 | .961 (.090) | .059–1.0 | .008 (.002) | .002–.014 | 0.014 (.023) | 0–.237 | .011 (.002) | .003–.015 | 0.025 (.084) | 0–.923 | .008 (.002) | .001–.014 |
| American Indian | 869 | .045 (.090) | 0–.712 | .009 (.003) | .002–.017 | 0.945 (.111) | 0–1.0 | .007 (.003) | .001–.016 | 0.010 (.049) | 0–.866 | .007 (.002) | .001–.015 |
| Mexican American | 1451 | .476 (.134) | 0–.974 | .012 (.001) | .006–.015 | 0.447 (.140) | 0–1.0 | .011 (.001) | .004–.015 | 0.077 (.053) | 0–.845 | .011 (.001) | .005–.014 |
| African American | 1385 | .149 (.104) | 0–.638 | .009 (.002) | .002–.016 | 0.021 (.030) | 0–.539 | .010 (.002) | .002–.016 | 0.830 (.111) | .058–1.0 | .008 (.002) | .001–.014 |
IGA estimates were computed with 1300 Ancestry Informative Markers and a 3 Ancestral components Model
EU European heritage, AI American Indian heritage, AF African heritage
Fig. 3Mean heritage for persons who self-identify in the FIND study. Legend: Mean estimates are presented for the three components of individual ancestry in the FIND samples. For European Americans, American Indians, and African Americans the expected largest component is >0.8, while for Mexican Americans the European and American Indian components are similar. EU: European Ancestry; AI: American Indian Ancestry; AF: African Ancestry
Fig. 4Mean standard error of individual heritage estimates in four FIND samples by number of SNP Loci. The mean standard error of the individual ancestry estimates was calculated across the 4 FIND samples at 1300 points, adding each successive SNP to the calculation in chromosome and position order (EU, dotted line; AI, dashed line; AF, solid line). After the addition of about 200 informative SNPs, the standard error falls below 0.02 and decreases further at a slower rate with each additional locus. It takes approximately 700 SNPs in the estimates to have a mean standard error <0.01
Fig. 5Estimates of individual heritage for the FIND Mexican American sample with and without the Pima genotypes. Panel a has the estimates from STRUCTURE while using the 1300 genotypes from the Pima, CEU, LWK, and YRI samples. These are very similar to the estimates obtained from the maximum likelihood method that is presented in Panel c. When the Pima genotypes were removed from the STRUCTURE analysis, the amount of American Indian ancestry was overestimated in the Mexican sample in Panel b. It is recommend that, in the latter situation, maximum likelihood returns the better estimates of individual heritage
Tests for the association of heritage with diabetic nephropathy in the combined FIND populations, N = 4126
| EU heritage | p | AI heritage | p | AF heritage | p | |
|---|---|---|---|---|---|---|
| Model 1 | 0.311 (.232, .418) | <.0001 | 1.031 (.547, 1.944) | 0.924 | Reference | |
| Model 2 | 0.269 (.143, .507) | <.0001 | Reference | 0.748 (.381, 1.468) | 0.398 | |
| Model 3 | Reference | 3.762 (1.958, 7.228) | <.0001 | 2.956 (2.212, 3.947) | <.0001 |
Logistic models have two heritage variables in addition to explanatory variables Enrolled-Age, Sex, and Enrolment Center. Results are presented as Odds Ratios (95 % C.I.). (For covariate results see Additional file 4: Tables S3 and S4.)