| Literature DB >> 29037167 |
Jinyoung Byun1, Younghun Han1, Ivan P Gorlov1, Jonathan A Busam1, Michael F Seldin2, Christopher I Amos3.
Abstract
BACKGROUND: Accurate inference of genetic ancestry is of fundamental interest to many biomedical, forensic, and anthropological research areas. Genetic ancestry memberships may relate to genetic disease risks. In a genome association study, failing to account for differences in genetic ancestry between cases and controls may also lead to false-positive results. Although a number of strategies for inferring and taking into account the confounding effects of genetic ancestry are available, applying them to large studies (tens thousands samples) is challenging. The goal of this study is to develop an approach for inferring genetic ancestry of samples with unknown ancestry among closely related populations and to provide accurate estimates of ancestry for application to large-scale studies.Entities:
Keywords: Ancestry inference; Inverse distance weighted interpolation; Principal component analysis; Spatial analysis
Mesh:
Year: 2017 PMID: 29037167 PMCID: PMC5644186 DOI: 10.1186/s12864-017-4166-8
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1a Selection of Admixtures. In a model with 3 admixtures, L2 is the shortest distance between sample A and a centroid of known population (Pop2). Then compare two other closest populations; Pop1 and Pop3 with the distances, S1 and S2, between the closest Pop2 and two other closer ones; Pop1 and Pop3. If S1 and S2 are longer than L1 and L3, respectively, then keep Pop1 and Pop3 in the 3 admixture model. Pop4 has longer distance than other three populations then the Pop4 is not included. b After selecting the closest population (Pop1) to sample B, compare two other closest populations (Pop2 and Pop3). In this case, S1 and S2 are shorter than L2 and L3. Then Pop2 and Pop3 would not be included in the 3 admixture model
Fig. 2a Population structure within Europe using 22 diverse sets of European descendants. The scores were calculated by AIPS. The colored points in grey and pink indicate all 4376 Europeans and 3424 individuals with unknown ancestry memberships in subpopulations, respectively. 952 known ancestry individuals in 22 subpopulations were overplotted on all 4376 Europeans. b European substructure analysis using scores from Principal Component Analysis. Among 952 ancestry known individuals, 7 subgroups within Europe were defined; Northern European group, Southern European group, Great Britain, Russian, Basque, Ashkenazi Jewish American, and Arab group. For Northern European group, Dutch American, Eastern European American, German American, Hungarian American, Scandinavian American, and Swedish were assigned. Southern European group consisted of Adygei, Greek American, Italian American, Sardinian, Spanish, and Tuscan. For Great Britain, CEPH Euro American, Irish, Orcadian, and United Kingdom American were assigned. Bedouin, Druze and Palestinian were defined as Arab group
Comparison among 7 subpopulations within Europe using Hotelling’s T2 test
| Population1 | Population2 | Statistic |
|
|
|---|---|---|---|---|
| N. European | S. European | 334.97 | < 1 × 10−16 | < 1 × 10−4 |
| N. European | Great Britain | 331.63 | < 1 × 10−16 | < 1 × 10−4 |
| N. European | Russian | 148.56 | < 1 × 10−16 | < 1 × 10−4 |
| N. European | Arab | 81.87 | 1.12 × 10−14 | < 1 × 10−4 |
| N. European | Basque | 181.06 | < 1 × 10−16 | < 1 × 10−4 |
| N. European | Jews | 362.28 | < 1 × 10−16 | < 1 × 10−4 |
| S. European | Great Britain | 680.60 | < 1 × 10−16 | < 1 × 10−4 |
| S. European | Russian | 713.40 | < 1 × 10−16 | < 1 × 10−4 |
| S. European | Arab | 334.90 | < 1 × 10−16 | < 1 × 10−4 |
| S. European | Basque | 710.36 | < 1 × 10−16 | < 1 × 10−4 |
| S. European | Jews | 1108.18 | < 1 × 10−16 | < 1 × 10−4 |
| Great Britain | Russian | 865.25 | < 1 × 10−16 | < 1 × 10−4 |
| Great Britain | Arab | 646.45 | < 1 × 10−16 | < 1 × 10−4 |
| Great Britain | Basque | 1165.79 | < 1 × 10−16 | < 1 × 10−4 |
| Great Britain | Jews | 73.14 | 7.77 × 10−15 | < 1 × 10−4 |
| Russian | Arab | 17.64 | 1.04 × 10−8 | 1 × 10−4 |
| Russian | Basque | 4.96 | 0.0014 | 0.0014 |
| Russian | Jews | 1436.50 | < 1 × 10−16 | < 1 × 10−4 |
| Arab | Basque | 16.82 | 2.34 × 10−8 | < 1 × 10−4 |
| Arab | Jews | 1038.41 | < 1 × 10−16 | < 1 × 10−4 |
| Basque | Jews | 1366.32 | < 1 × 10−16 | < 1 × 10−4 |
P-value* is computed using permutation test which estimates the non-parametric P-value for the hypothesis test in Hotelling’s T2 test
Fig. 3a AIPS assuming 3 admixtures using IDW; b AIPS assuming 3 admixtures using IDW with Eigenvalue Weight; c AIPS assuming 4 admixtures using IDW; d AIPS assuming 4 admixtures using IDW with Eigenvalue Weight; e Structure not given POPID; f Structure given POPID; g fastSTRUCTURE using option “simple”; h fastSTRUCTURE using option “logistic prior”; i ADMIXTURE without reference population information
The Average percent of correctly inferred proportions from AIPS, STRUCTURE, and ADMIXTURE
| Given Pop | Inferred Clusters | Number of Individuals(np) | ||||||
|---|---|---|---|---|---|---|---|---|
| AIPS[3] | NEa | SEb | GBc | Russiad | Arabe | Basquef | Jewg | |
| NE |
| 0.00 | 0.11 | 0.11 | 0.00 | 0.00 | 0.00 | 601 |
| SE | 0.00 |
| 0.04 | 0.00 | 0.08 | 0.12 | 0.08 | 100 |
| GB | 0.11 | 0.00 |
| 0.05 | 0.00 | 0.07 | 0.00 | 124 |
| Russia | 0.05 | 0.00 | 0.08 |
| 0.00 | 0.00 | 0.00 | 13 |
| Arab | 0.00 | 0.08 | 0.00 | 0.00 |
| 0.00 | 0.09 | 62 |
| Basque | 0.00 | 0.08 | 0.05 | 0.00 | 0.00 |
| 0.00 | 12 |
| Jew | 0.00 | 0.06 | 0.00 | 0.00 | 0.04 | 0.01 |
| 40 |
| AIPS[4] | NEa | SEb | GBc | Russiad | Arabe | Basquef | Jewg | np |
| NE |
| 0.00 | 0.10 | 0.11 | 0.00 | 0.05 | 0.00 | 601 |
| SE | 0.01 |
| 0.06 | 0.00 | 0.09 | 0.11 | 0.07 | 100 |
| GB | 0.11 | 0.00 |
| 0.05 | 0.00 | 0.09 | 0.00 | 124 |
| Russia | 0.05 | 0.00 | 0.08 |
| 0.00 | 0.04 | 0.00 | 13 |
| Arab | 0.00 | 0.07 | 0.00 | 0.00 |
| 0.05 | 0.08 | 62 |
| Basque | 0.04 | 0.08 | 0.05 | 0.00 | 0.00 |
| 0.00 | 12 |
| Jew | 0.00 | 0.05 | 0.00 | 0.00 | 0.04 | 0.04 |
| 40 |
| STRUCTURE1 | POP1 | POP2 | POP3 | POP4 | POP5 | POP6 | POP7 | np |
| NE | 0.07 | 0.13 |
| 0.10 | 0.14 | 0.09 | 0.26 | 601 |
| SE | 0.21 | 0.09 | 0.05 | 0.10 | 0.14 | 0.33 | 0.07 | 100 |
| GB | 0.07 |
| 0.11 | 0.09 | 0.13 | 0.11 | 0.22 | 124 |
| Russia | 0.10 | 0.04 | 0.06 | 0.10 |
| 0.04 |
| 13 |
| Arab |
| 0.04 | 0.03 | 0.09 | 0.07 | 0.11 | 0.01 | 62 |
| Basque | 0.08 | 0.22 | 0.03 | 0.10 | 0.05 |
| 0.16 | 12 |
| Jew | 0.25 | 0.04 | 0.03 |
| 0.05 | 0.07 | 0.02 | 40 |
| STRUCTURE2 | POP1 | POP2 | POP3 | POP4 | POP5 | POP6 | POP7 | np |
| NE |
| 0.03 | 0.03 | 0.00 | 0.00 | 0.00 | 0.00 | 601 |
| SE | 0.17 | 0.28 | 0.06 | 0.00 |
| 0.00 | 0.03 | 100 |
| GB |
| 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 124 |
| Russia |
| 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 13 |
| Arab | 0.00 |
| 0.00 | 0.00 | 0.11 | 0.00 | 0.00 | 62 |
| Basque |
| 0.27 | 0.09 | 0.00 | 0.00 | 0.00 | 0.00 | 12 |
| Jew | 0.00 | 0.00 | 0.00 | 0.00 |
| 0.00 | 0.00 | 40 |
| ADMIXTURE1 | POP1 | POP2 | POP3 | POP4 | POP5 | POP6 | POP7 | np |
| NE | 0.07 | 0.18 |
| 0.06 | 0.11 | 0.11 | 0.05 | 601 |
| SE | 0.11 | 0.07 | 0.05 |
| 0.12 | 0.12 | 0.15 | 100 |
| GB | 0.06 | 0.07 | 0.22 | 0.06 | 0.15 |
| 0.04 | 124 |
| Russia | 0.04 |
| 0.05 | 0.03 | 0.10 | 0.12 | 0.03 | 13 |
| Arab | 0.14 | 0.03 | 0.04 | 0.12 | 0.06 | 0.05 |
| 62 |
| Basque | 0.00 | 0.02 | 0.01 | 0.03 |
| 0.01 | 0.00 | 12 |
| Jew |
| 0.03 | 0.03 | 0.04 | 0.09 | 0.04 | 0.04 | 40 |
| AIPS[3] | NEa | SEb | GBc | Russiad | Arabe | Basquef | Jewg | np |
| NE |
| 0.00 | 0.09 | 0.08 | 0.00 | 0.01 | 0.00 | 601 |
| SE | 0.00 |
| 0.04 | 0.00 | 0.08 | 0.13 | 0.07 | 100 |
| GB | 0.12 | 0.00 |
| 0.02 | 0.00 | 0.07 | 0.00 | 124 |
| Russia | 0.07 | 0.00 | 0.05 |
| 0.00 | 0.00 | 0.00 | 13 |
| Arab | 0.00 | 0.07 | 0.00 | 0.00 |
| 0.00 | 0.07 | 62 |
| Basque | 0.02 | 0.05 | 0.06 | 0.00 | 0.00 |
| 0.00 | 12 |
| Jew | 0.00 | 0.05 | 0.00 | 0.00 | 0.04 | 0.00 |
| 40 |
| Unknown |
|
|
|
|
|
|
| 3424 |
| ADMIXTURE1 | POP1 | POP2 | POP3 | POP4 | POP5 | POP6 | POP7 | np |
| NE | 0.05 | 0.09 |
| 0.08 | 0.06 | 0.14 | 0.16 | 601 |
| SE | 0.05 | 0.08 | 0.06 |
| 0.21 | 0.11 | 0.08 | 100 |
| GB | 0.06 | 0.09 | 0.16 | 0.07 | 0.07 |
| 0.08 | 124 |
| Russia | 0.05 | 0.07 | 0.16 | 0.06 | 0.08 | 0.04 |
| 13 |
| Arab | 0.05 | 0.38 | 0.03 | 0.10 |
| 0.02 | 0.02 | 62 |
| Basque | 0.05 | 0.05 | 0.07 |
| 0.03 | 0.28 | 0.09 | 12 |
| Jew |
| 0.06 | 0.07 | 0.05 | 0.10 | 0.05 | 0.06 | 40 |
| Unknown |
|
|
|
|
|
|
| 3424 |
| ADMIXTURE2 | POP1 | POP2 | POP3 | POP4 | POP5 | POP6 | POP7 | np |
| NE | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 0.00 | 601 |
| SE |
| 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 100 |
| GB | 0.00 | 0.00 | 0.00 | 0.00 |
| 0.00 | 0.00 | 124 |
| Russia | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 13 |
| Arab | 0.00 | 0.00 | 0.00 |
| 0.00 | 0.00 | 0.00 | 62 |
| Basque | 0.00 | 0.00 |
| 0.00 | 0.00 | 0.00 | 0.00 | 12 |
| Jew | 0.00 |
| 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 40 |
| Unknown |
|
|
|
|
|
|
| 3424 |
Note that superscripts a-g indicate the proportions inferred from each population centroid. Superscript1 and superscript2 are computed without and with population identities, respectively. The number in bracket presents the number of admixtures in AIPS. The italicized number presents the highest correct classification rates for each population. *The ancestry inference with asterisk was obtained by supervised learning mode in ADMIXTURE, assigning 100% ancestry membership without further computation