| Literature DB >> 21829632 |
Keith Humphreys1, Alexander Grankvist, Monica Leu, Per Hall, Jianjun Liu, Samuli Ripatti, Karola Rehnström, Leif Groop, Lars Klareskog, Bo Ding, Henrik Grönberg, Jianfeng Xu, Nancy L Pedersen, Paul Lichtenstein, Morten Mattingsdal, Ole A Andreassen, Colm O'Dushlaine, Shaun M Purcell, Pamela Sklar, Patrick F Sullivan, Christina M Hultman, Juni Palmgren, Patrik K E Magnusson.
Abstract
Patterns of genetic diversity have previously been shown to mirror geography on a global scale and within continents and individual countries. Using genome-wide SNP data on 5174 Swedes with extensive geographical coverage, we analyzed the genetic structure of the Swedish population. We observed strong differences between the far northern counties and the remaining counties. The population of Dalarna county, in north middle Sweden, which borders southern Norway, also appears to differ markedly from other counties, possibly due to this county having more individuals with remote Finnish or Norwegian ancestry than other counties. An analysis of genetic differentiation (based on pairwise F(st)) indicated that the population of Sweden's southernmost counties are genetically closer to the HapMap CEU samples of Northern European ancestry than to the populations of Sweden's northernmost counties. In a comparison of extended homozygous segments, we detected a clear divide between southern and northern Sweden with small differences between the southern counties and considerably more segments in northern Sweden. Both the increased degree of homozygosity in the north and the large genetic differences between the south and the north may have arisen due to a small population in the north and the vast geographical distances between towns and villages in the north, in contrast to the more densely settled southern parts of Sweden. Our findings have implications for future genome-wide association studies (GWAS) with respect to the matching of cases and controls and the need for within-county matching. We have shown that genetic differences within a single country may be substantial, even when viewed on a European scale. Thus, population stratification needs to be accounted for, even within a country like Sweden, which is often perceived to be relatively homogenous and a favourable resource for genetic mapping, otherwise inferences based on genetic data may lead to false conclusions.Entities:
Mesh:
Year: 2011 PMID: 21829632 PMCID: PMC3150368 DOI: 10.1371/journal.pone.0022547
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Description of included control samples.
| Study | N | Primary disease for which cases were selected | Genotyping platform |
| CAHRES | 732 | Breast cancer | Illumina HumanHap 550 |
| CAPS | 850 | Prostate cancer | Affymetrix 550K and 5.0 |
| DGI | 398 | Diabetes type 2 | Affymetrix 550K |
| SCZ-SW | 2290 | Schizophrenia | Affymetrix 5.0 and 6.0 |
| TWINGENE-SW | 290 | Twin cohort study | Illumina HumanHap 300 |
| EIRA | 614 | Rheumatoid arthritis | Illumina HumanHap300 |
|
| 5174 |
*Numbers refer to the post-QC number of samples from each study.
Summary of SNP QC.
| QC threshold | Removed SNPs | SNPs removed solely for that reason | |
|
|
| ||
| Hardy-Weinberg equilibrium, p<10−6 | 1,061 | 45 | |
| Genotyping rate<.95 | CAHRES | 196,644 | 668 |
| CAPS | 439,509 | 25,839 | |
| DGI | 401,945 | 6,014 | |
| SCZ-SW | 396,702 | 9,941 | |
| TWINGENE-SW | 273,685 | 1,975 | |
| EIRA | 349,059 | 27,371 | |
| Minor allele frequency<.01 | 8,311 | 70 | |
| Between-study comparison (1-vs-rest comparison), any p<10−6 | 142,899 | 785 | |
|
|
| ||
*SNPs that were not removed due to any other criterion than the one listed.
**The number of SNPs in the largest study (SCZ-SW). All other numbers refer to this number and not the number of SNPs available in each study.
Figure 1Plot of principal component 1 vs. principal component 2 with corresponding histograms.
The histograms show the high density in the main cluster of the plot. The principal components are based on the analysis with 374 samples removed, due to suspected Finnish ancestry and 76 samples removed due to other criteria (see methods and results).
Figure 2Map plot of principal components 1 and 2.
The smallest (large negative) value is represented by dark-brown and the largest positive by yellow. Each sample's location was determined by the best estimate, with birthplace being the first pick (see methods). A small random component was added to each sample's location.
Figure 3Counties and national areas of Sweden.
A national area is comprised of one or more counties. Names of counties are displayed directly on the map while the national areas have been given different colors.
Fst values and λs for a fully stratified study of 500 cases and 500 controls between national areas.
| HapMap CEU | Southern Sweden | Småland with the islands | Western Sweden | Stockholm | East Middle Sweden | North Middle Sweden | Middle Norrland | Upper Norrland | Finns | |
| HapMap CEU | 0.000545 | 0.000673 | 0.000627 | 0.000585 | 0.000672 |
|
|
|
| |
| Southern Sweden | 1.55 | 0.000158 | 0.000225 | 0.000180 | 0.000237 | 0.000574 |
|
|
| |
| Småland with the islands | 1.67 | 1.16 | 0.000179 | 0.000112 | 0.000164 | 0.000494 |
|
|
| |
| Western Sweden | 1.63 | 1.23 | 1.18 | 0.000160 | 0.000215 | 0.000500 |
|
|
| |
| Stockholm | 1.59 | 1.18 | 1.11 | 1.16 | 0.000022 | 0.000227 | 0.000496 |
|
| |
| East Middle Sweden | 1.67 | 1.24 | 1.16 | 1.22 | 1.02 | 0.000195 | 0.000512 |
|
| |
| North Middle Sweden |
| 1.57 | 1.49 | 1.50 | 1.23 | 1.20 | 0.000448 |
|
| |
| Middle Norrland |
|
|
|
| 1.50 | 1.51 | 1.45 |
|
| |
| Upper Norrland |
|
|
|
|
|
|
|
|
| |
| Finns |
|
|
|
|
|
|
|
|
|
National areas are sorted south-north with the southern ones at the top of the table. Fst values above the diagonal, λs below. Comparisons with Fst>0.0008 or E(λ)>1.8 have been made bold.
Figure 4Heat map with hierarchical clustering of counties based on Fst.
Figure 5Poisson model of the mean number of homozygous segments in each county.
Bar chart of estimates of regression coefficients, with standard error bars. Model uses Stockholm county as baseline and adjusts for source study. Dark gray bars p<10−5, light gray bars p<0.05, white bars p> = 0.05 (with p values adjusted for the multiple testing of 20 counties). Counties are sorted south to north.