| Literature DB >> 24549058 |
Laura M Huckins1, Vesna Boraska2, Christopher S Franklin1, James A B Floyd1, Lorraine Southam1, Patrick F Sullivan3, Cynthia M Bulik3, David A Collier4, Chris Tyler-Smith1, Eleftheria Zeggini1, Ioanna Tachmazidou1.
Abstract
The Wellcome Trust Case Control Consortium 3 anorexia nervosa genome-wide association scan includes 2907 cases from 15 different populations of European origin genotyped on the Illumina 670K chip. We compared methods for identifying population stratification, and suggest list of markers that may help to counter this problem. It is usual to identify population structure in such studies using only common variants with minor allele frequency (MAF) >5%; we find that this may result in highly informative SNPs being discarded, and suggest that instead all SNPs with MAF >1% may be used. We established informative axes of variation identified via principal component analysis and highlight important features of the genetic structure of diverse European-descent populations, some studied for the first time at this scale. Finally, we investigated the substructure within each of these 15 populations and identified SNPs that help capture hidden stratification. This work can provide information regarding the designing and interpretation of association results in the International Consortia.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24549058 PMCID: PMC4169539 DOI: 10.1038/ejhg.2014.1
Source DB: PubMed Journal: Eur J Hum Genet ISSN: 1018-4813 Impact factor: 4.246
Figure 1Geographical distribution of samples across Europe.
Sample sizes per population
| Canada | CA | 54 |
| Czech Republic | CZ | 72 |
| Finland | FI | 131 |
| France | FR | 293 |
| Germany | DE | 475 |
| Greece | GR | 70 |
| North Italy | NIT | 203 |
| Netherlands | NL | 348 |
| Norway | NO | 82 |
| Poland | PL | 175 |
| South Italy | SIT | 75 |
| Spain | ES | 186 |
| Sweden | SE | 39 |
| UK | UK | 213 |
| USA | USA | 491 |
Figure 2Fine structure between the 15 European populations studied. (a) Fine structure across all populations: PC1 versus PC2. (b) The distribution of samples is shown for each population. Outlying samples (deviating in location by more than 3 SDs from the mean) were excluded. A three-point moving average filter was used to smooth outlines. (c) Fine structure across all populations: PC2 versus PC3. (d) The distribution of samples is shown for each population, calculated as in (b). CA, Canada; CZ, Czech Republic; DE, Germany; ES, Spain; FI, Finland; FR, France; GR, Greece; NIT, North Italy; NL, Netherlands; NO, Norway; PL, Poland; SE, Sweden; SIT, South Italy; UK, United Kingdom; USA, United States of America
Significance of principal componentsa
| % | P- | ||
|---|---|---|---|
| 1 | 0.14 | 1333.1 | <1E−300 |
| 2 | 0.09 | 603.3 | <1E−300 |
| 3 | 0.07 | 294.9 | <1E−300 |
| 4 | 0.06 | 121.2 | <1E−300 |
| 5 | 0.05 | 100.9 | 1.40E−295 |
| 6 | 0.05 | 43.7 | 9.79E−86 |
| 7 | 0.05 | 10.9 | 3.20E−12 |
| 8 | 0.05 | 10.0 | 5.08E−11 |
| 9 | 0.05 | 10.2 | 3.24E−11 |
| 10 | 0.05 | 6.9 | 5.30E−07 |
The Tracy–Widom statistic is calculated using the smartpca software package.[15]
Proportion of variance explained by the top 10 principal components.
Pair-wise Fst calculated between all populations
| CZ | ||||||||||||
| DE | 0 | |||||||||||
| ES | 0.003 | 0.002 | ||||||||||
| FI | 0.006 | 0.007 | 0.011 | |||||||||
| FR | 0.001 | 0.001 | 0.001 | 0.008 | ||||||||
| GR | 0.004 | 0.004 | 0.003 | 0.013 | 0.003 | |||||||
| NIT | 0.005 | 0.004 | 0.002 | 0.014 | 0.003 | 0.001 | ||||||
| NL | 0.001 | 0.001 | 0.003 | 0.007 | 0.001 | 0.005 | 0.006 | |||||
| NO | 0.002 | 0.001 | 0.004 | 0.006 | 0.002 | 0.007 | 0.007 | 0.001 | ||||
| PL | 0 | 0.001 | 0.005 | 0.006 | 0.003 | 0.006 | 0.007 | 0.002 | 0.003 | |||
| SIT | 0.003 | 0.002 | 0.001 | 0.011 | 0.001 | 0.001 | 0.001 | 0.003 | 0.004 | 0.004 | ||
| UK | 0.001 | 0 | 0.002 | 0.007 | 0 | 0.005 | 0.005 | 0 | 0.001 | 0.002 | 0.002 | |
| USA | 0.001 | 0 | 0.002 | 0.007 | 0 | 0.004 | 0.004 | 0 | 0.001 | 0.002 | 0.002 | 0 |
Swedish and Canadian samples are not included here owing to small sample sizes. Population pairs falling below the Fst=0.001 threshold are in pink.
Figure 3Genetic distance correlates with geographical distance. We computed pair-wise Fst between all populations, and compared this to the geographic distance in kilometres between the midpoints of each population. R2=0.465.
Correlation between AIMs when calculated using all 13 populations, and when leaving one population out
| CZ | 0.9773 |
| DE | 0.9810 |
| ES | 0.9635 |
| FI | 0.9070 |
| FR | 0.9795 |
| GR | 0.9605 |
| NIT | 0.9564 |
| NL | 0.9760 |
| NO | 0.9713 |
| PL | 0.9682 |
| SIT | 0.9745 |
| UK | 0.9774 |
| USA | 0.9771 |
We calculated AIMs for 13 sets of 12 populations, and computed the Spearman's rank correlation coefficient (ρ) in each instance.
Figure 4AIMs and PCAIMs are able to predict sample ancestry with high accuracy for most populations, even at small numbers of markers. (a) Percent of samples correctly assigned using 25 markers, across all populations. AIMs are shown in green, PCAIMs in blue. (b) Assignment of Finnish samples, for varying numbers of markers. AIMs are shown as a solid line and PCAIMs as a dashed line. (c) Assignment of German samples, with increasing numbers of markers. (d) Assignment of Swedish samples, using 25 markers; AIMs are shown in green and PCAIMs in blue.