| Literature DB >> 19424496 |
Mari Nelis1, Tõnu Esko, Reedik Mägi, Fritz Zimprich, Alexander Zimprich, Draga Toncheva, Sena Karachanak, Tereza Piskácková, Ivan Balascák, Leena Peltonen, Eveliina Jakkula, Karola Rehnström, Mark Lathrop, Simon Heath, Pilar Galan, Stefan Schreiber, Thomas Meitinger, Arne Pfeufer, H-Erich Wichmann, Béla Melegh, Noémi Polgár, Daniela Toniolo, Paolo Gasparini, Pio D'Adamo, Janis Klovins, Liene Nikitina-Zake, Vaidutis Kucinskas, Jūrate Kasnauskiene, Jan Lubinski, Tadeusz Debniak, Svetlana Limborska, Andrey Khrunin, Xavier Estivill, Raquel Rabionet, Sara Marsal, Antonio Julià, Stylianos E Antonarakis, Samuel Deutsch, Christelle Borel, Homa Attar, Maryline Gagnebin, Milan Macek, Michael Krawczak, Maido Remm, Andres Metspalu.
Abstract
Using principal component (PC) analysis, we studied the genetic constitution of 3,112 individuals from Europe as portrayed by more than 270,000 single nucleotide polymorphisms (SNPs) genotyped with the Illumina Infinium platform. In cohorts where the sample size was >100, one hundred randomly chosen samples were used for analysis to minimize the sample size effect, resulting in a total of 1,564 samples. This analysis revealed that the genetic structure of the European population correlates closely with geography. The first two PCs highlight the genetic diversity corresponding to the northwest to southeast gradient and position the populations according to their approximate geographic origin. The resulting genetic map forms a triangular structure with a) Finland, b) the Baltic region, Poland and Western Russia, and c) Italy as its vertexes, and with d) Central- and Western Europe in its centre. Inter- and intra- population genetic differences were quantified by the inflation factor lambda (lambda) (ranging from 1.00 to 4.21), fixation index (F(st)) (ranging from 0.000 to 0.023), and by the number of markers exhibiting significant allele frequency differences in pair-wise population comparisons. The estimated lambda was used to assess the real diminishing impact to association statistics when two distinct populations are merged directly in an analysis. When the PC analysis was confined to the 1,019 Estonian individuals (0.1% of the Estonian population), a fine structure emerged that correlated with the geography of individual counties. With at least two cohorts available from several countries, genetic substructures were investigated in Czech, Finnish, German, Estonian and Italian populations. Together with previously published data, our results allow the creation of a comprehensive European genetic map that will greatly facilitate inter-population genetic studies including genome wide association studies (GWAS).Entities:
Mesh:
Substances:
Year: 2009 PMID: 19424496 PMCID: PMC2675054 DOI: 10.1371/journal.pone.0005472
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Studied samples.
| Country | Code | # of individuals | # of individuals after QC | # randomly selected 100 individuals | Illumina genotyping assay |
| Austria (Vienna) | AT | 88 | 87 | 87 | CNV370 |
| Bulgaria | BG | 48 | 47 | 47 | CNV370 |
| Czech Republic (Prague and Moravia) | CZ | 94 | 89 | 89 | CNV370 |
| Estonia | EE | 1090 | 966 | 100 | CNV370 |
| Finland (Helsinki) | FI (HEL) | 100 | 100 | 100 | CNV370 |
| Finland (Kuusamo) | FI (KUU) | 84 | 79 | 79 | CNV370 |
| France (Paris) | FR | 100 | 100 | 100 | HumHap300 |
| Northern Germany (Schleswig-Holstein) | DE (N) | 210 | 206 | 100 | HumHap300 |
| Southern Germany (Augsburg region) | DE (S) | 473 | 468 | 100 | CNV370 |
| Hungary | HU | 50 | 49 | 49 | CNV370 |
| Northern Italy (Borbera Valley) | IT (N) | 96 | 53 | 53 | CNV370 |
| Southern Italy (Region of Apulia) | IT (S) | 95 | 57 | 57 | CNV370 |
| Latvia (Riga) | LV | 95 | 87 | 87 | CNV370 |
| Lithuania | LT | 95 | 90 | 90 | CNV370 |
| Poland ((West-Pomerania) | PL | 48 | 45 | 45 | CNV370 |
| Russia (Andeapol district of Tver region) | RU | 96 | 94 | 94 | CNV370 |
| Spain | ES | 200 | 194 | 100 | HumHap300 |
| Sweden (Stockholm) | SE | 100 | 87 | 87 | HumHap300 |
| Switzerland (Geneva) | CH | 216 | 214 | 100 | HumHap550 |
| Total | 3378 | 3112 | 1564 |
Raw data provided.
Genotyped at Estonian Biocentre.
Figure 1Genome-wide LD pattern (based on 273,464 SNPs), measured by average r2, at 5 kb to 100 kb inter-marker distance.
Averages were obtained within distance categories according of size 5 kb, i.e. 0–5 kb, 5–10 kb, etc.
Figure 2The European genetic structure (based on 273,464 SNPs).
Three levels of structure as revealed by PC analysis are shown: A) inter-continental; B) intra-continental; and C) inside a single country (Estonia), where median values of the PC1&2 are shown. D) European map illustrating the origin of sample and population size. CEU - Utah residents with ancestry from Northern and Western Europe, CHB – Han Chinese from Beijing, JPT - Japanese from Tokyo, and YRI - Yoruba from Ibadan, Nigeria.
Number of significant (p<0.05) SNPs (based on the 273,464 markers) between populations (≤100 samples from every population), using Bonferroni corrected trend test and the inflation factor λ from the genomic control.
| # of significant SNPs/Inflation factor λ | Austria | Bulgaria | Czech Republic | Estonia | Finland (Helsinki) | Finland (Kuusamo) | France | Northern Germany | Southern Germany | Hungary | Northern Italy | Southern Italy | Latvia | Lithuania | Poland | Russia | Spain | Sweden | Switzerland | CEU |
| Austria | - | 0 | 0 | 2 | 67 | 468 | 0 | 0 | 1 | 0 | 2 | 25 | 8 | 8 | 0 | 2 | 0 | 1 | 0 | 1 |
| Bulgaria | 1.14 | - | 0 | 9 | 68 | 293 | 0 | 8 | 0 | 0 | 0 | 0 | 11 | 13 | 0 | 6 | 2 | 24 | 0 | 14 |
| Czech Republic | 1.08 | 1.21 | - | 1 | 47 | 498 | 0 | 0 | 0 | 0 | 2 | 32 | 2 | 2 | 0 | 0 | 3 | 4 | 0 | 1 |
| Estonia | 1.58 | 1.70 | 1.42 | - | 8 | 229 | 30 | 4 | 3 | 1 | 84 | 288 | 0 | 0 | 0 | 1 | 155 | 6 | 45 | 20 |
| Finland (Helsinki) | 2.24 | 2.19 | 2.20 | 1.71 | - | 6 | 190 | 48 | 73 | 20 | 253 | 630 | 85 | 114 | 4 | 41 | 515 | 10 | 230 | 21 |
| Finland (Kuusamo) | 3.30 | 2.91 | 3.26 | 2.80 | 1.86 | - | 978 | 492 | 593 | 170 | 758 | 1470 | 598 | 567 | 109 | 410 | 1620 | 252 | 988 | 215 |
| France | 1.16 | 1.22 | 1.35 | 2.08 | 2.69 | 3.72 | - | 1 | 0 | 0 | 2 | 23 | 85 | 37 | 3 | 16 | 0 | 1 | 0 | 0 |
| Northern Germany | 1.10 | 1.32 | 1.15 | 1.53 | 2.17 | 3.27 | 1.25 | - | 0 | 0 | 20 | 79 | 12 | 5 | 0 | 12 | 12 | 0 | 4 | 0 |
| Southern Germany | 1.04 | 1.19 | 1.16 | 1.70 | 2.35 | 3.46 | 1.12 | 1.08 | - | 0 | 3 | 34 | 27 | 17 | 4 | 4 | 2 | 2 | 0 | 0 |
| Hungary | 1.04 | 1.10 | 1.06 | 1.41 | 1.87 | 2.68 | 1.16 | 1.11 | 1.08 | - | 0 | 4 | 7 | 4 | 0 | 1 | 2 | 18 | 0 | 9 |
| Northern Italy | 1.49 | 1.32 | 1.69 | 2.42 | 2.82 | 3.64 | 1.38 | 1.72 | 1.53 | 1.42 | - | 0 | 118 | 93 | 1 | 42 | 2 | 33 | 0 | 25 |
| Southern Italy | 1.79 | 1.38 | 2.04 | 2.93 | 3.37 | 4.18 | 1.68 | 2.14 | 1.85 | 1.63 | 1.54 | - | 337 | 277 | 22 | 133 | 3 | 117 | 3 | 51 |
| Latvia | 1.85 | 1.86 | 1.62 | 1.24 | 2.31 | 3.33 | 2.40 | 1.84 | 1.20 | 1.58 | 2.64 | 3.14 | - | 0 | 0 | 0 | 247 | 33 | 122 | 22 |
| Lithuania | 1.70 | 1.73 | 1.48 | 1.28 | 2.33 | 3.37 | 2.20 | 1.66 | 1.84 | 1.46 | 2.48 | 2.96 | 1.20 | - | 0 | 0 | 198 | 28 | 67 | 15 |
| Poland | 1.19 | 1.29 | 1.09 | 1.17 | 1.75 | 2.49 | 1.44 | 1.18 | 1.23 | 1.14 | 1.75 | 1.99 | 1.26 | 1.20 | - | 0 | 6 | 5 | 1 | 3 |
| Russia | 1.47 | 1.53 | 1.27 | 1.21 | 2.10 | 3.16 | 1.94 | 1.49 | 1.58 | 1.28 | 2.24 | 2.68 | 1.32 | 1.26 | 1.18 | - | 79 | 27 | 24 | 23 |
| Spain | 1.41 | 1.30 | 1.63 | 2.54 | 3.14 | 4.21 | 1.13 | 1.62 | 1.40 | 1.32 | 1.42 | 1.67 | 2.82 | 2.62 | 1.66 | 2.32 | - | 38 | 0 | 16 |
| Sweden | 1.21 | 1.47 | 1.26 | 1.49 | 1.89 | 2.87 | 1.38 | 1.12 | 1.21 | 1.22 | 1.86 | 2.28 | 1.89 | 1.74 | 1.30 | 1.59 | 1.73 | - | 23 | 0 |
| Switzerland | 1.19 | 1.13 | 1.37 | 2.16 | 2.77 | 3.83 | 1.10 | 1.36 | 1.17 | 1.16 | 1.36 | 1.54 | 2.52 | 2.29 | 1.46 | 1.20 | 1.16 | 1.50 | - | 14 |
| CEU | 1.12 | 1.29 | 1.21 | 1.59 | 1.99 | 2.89 | 1.13 | 1.06 | 1.07 | 1.13 | 1.56 | 1.84 | 1.87 | 1.74 | 1.28 | 1.56 | 1.34 | 1.09 | 1.21 | - |
CEU - Utah residents with ancestry from Northern and Western Europe.
Figure 3Impact of inflation factor λ upon the required significance of disease-gene association.
The graph shows the highest p-value that would stay below 0.05 after correction using a given λ in the Genomic Control approach for two scenarios: 1) the decrease of chi-square statistics in a test with 1 degree of freedom (e.g. Allelic, Additive, Dominant, Receive), and 2) in a test with two degrees of freedom (e.g. Genotypic).