| Literature DB >> 28983069 |
Sini Kerminen1, Aki S Havulinna1,2, Garrett Hellenthal3, Alicia R Martin4,5,6, Antti-Pekka Sarin1,2, Markus Perola1,2,7, Aarno Palotie1,4,6,8,9, Veikko Salomaa2, Mark J Daly1,4,5,6, Samuli Ripatti1,10, Matti Pirinen11,12.
Abstract
Coupling dense genotype data with new computational methods offers unprecedented opportunities for individual-level ancestry estimation once geographically precisely defined reference data sets become available. We study such a reference data set for Finland containing 2376 such individuals from the FINRISK Study survey of 1997 both of whose parents were born close to each other. This sampling strategy focuses on the population structure present in Finland before the 1950s. By using the recent haplotype-based methods ChromoPainter (CP) and FineSTRUCTURE (FS) we reveal a highly geographically clustered genetic structure in Finland and report its connections to the settlement history as well as to the current dialectal regions of the Finnish language. The main genetic division within Finland shows striking concordance with the 1323 borderline of the treaty of Nöteborg. In general, we detect genetic substructure throughout the country, which reflects stronger regional genetic differences in Finland compared to, for example, the UK, which in a similar analysis was dominated by a single unstructured population. We expect that similar population genetic reference data sets will become available for many more populations in the near future with important applications, for example, in forensic genetics and in genetic association studies. With this in mind, we report those extensions of the CP + FS approach that we found most useful in our analyses of the Finnish data.Entities:
Keywords: haplotype sharing; population genetics; population structure
Mesh:
Year: 2017 PMID: 28983069 PMCID: PMC5633394 DOI: 10.1534/g3.117.300217
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Locations of 1042 samples and the 12 Finnish provinces (1996 definition). Each sample is at the mean of parents’ coordinates. LAP: Lapland, NOS: Northern Ostrobothnia, OST: Ostrobothnia, CNF: Central Finland, NSA: Northern Savonia, SSA: Southern Savonia, NKA: Northern Karelia, SKA: Southern Karelia, TAV: Tavastia, SWF: Southwestern Finland, SOF: Southern Finland. Kainuu is a subregion of NOS. The dashed line divides Finland into an early-settlement area (south and west of the line) and a late-settlement area (north and east of the line) (Jutikkala 1933). Cities of Helsinki, Turku, and Oulu are marked with black diamonds.
Sample sizes
| Province | Full Data Set | Main Data Set |
|---|---|---|
| Lapland (LAP) | 38 | 38 |
| Northern Ostrobothnia (NOS) | 522 | 263 |
| Kainuu | 140 | 57 |
| Northern Savonia (NSA) | 592 | 139 |
| Northern Karelia (NKA) | 587 | 139 |
| Central Finland (CNF) | 45 | 45 |
| Southern Savonia (SSA) | 90 | 69 |
| Southern Karelia (SKA) | 49 | 47 |
| Ostrobothnia (OST) | 85 | 84 |
| Tavastia (TAV) | 75 | 71 |
| Southwestern Finland (SWF) | 226 | 109 |
| Southern Finland (SOF) | 67 | 38 |
| Åland (ÅLA) | 0 | 0 |
| Total | 2376 | 1042 |
Kainuu samples are included in NOS samples.
Includes samples outside the southeastern border.
Figure 3(A) FineSTRUCTURE results with two populations that we labeled west (W) and east (E). (B) Results from A refined by marking with yellow circles the individuals whose assignment is uncertain (<80% assignment probability to both populations). Also shown are the approximate 1323 borderline of the treaty of Nöteborg, the early vs. late-settlement border from Figure 1, and the regions of E and W dialects of the Finnish language, including partly Swedish-speaking coastal regions.
Figure 2(A and B) The first and second principal components of genetic structure given by ChromoPainter (A) and SmartPCA (B) with individuals colored according to provinces of Figure 1. (C) For each province, the violin plots show how dispersed, as measured by the sample variance, the individuals from that province are in A and B compared to a random set of similar size (Materials and Methods).
Figure 4(A) Fine-scale population structure with 17 populations, (B) their relationships according to TVD-tree, and (C) their overlap with the seven main dialectal regions of the Finnish language with eight Savonian subdialects marked with different shades of blue. Numbers in parentheses in B show into how many subpopulations these 17 populations split in the complete tree of 52 populations (Figure S3 in File S1).
Figure 5FS results with varying sample size and sample density. (A) Data set of 580 individuals at FS-tree level 9, (B) data set of 1042 individuals at FS-tree level 9, and (C) data set of 2376 individuals at FS-tree level 15.