| Literature DB >> 32712624 |
Luca Pagani1,2, Andres Metspalu3, Mait Metspalu1, Vasili Pankratov4, Francesco Montinaro1, Alena Kushniarevich1, Georgi Hudjashov1,5, Flora Jay6, Lauri Saag1, Rodrigo Flores1, Davide Marnetto1, Marten Seppel7, Mart Kals3, Urmo Võsa3, Cristian Taccioli2, Märt Möls8, Lili Milani3, Anto Aasa9, Daniel John Lawson10, Tõnu Esko3, Reedik Mägi3.
Abstract
Several recent studies detected fine-scale genetic structure in human populations. Hence, groups conventionally treated as single populations harbour significant variation in terms of allele frequencies and patterns of haplotype sharing. It has been shown that these findings should be considered when performing studies of genetic associations and natural selection, especially when dealing with polygenic phenotypes. However, there is little understanding of the practical effects of such genetic structure on demography reconstructions and selection scans when focusing on recent population history. Here we tested the impact of population structure on such inferences using high-coverage (~30×) genome sequences of 2305 Estonians. We show that different regions of Estonia differ in both effective population size dynamics and signatures of natural selection. By analyzing identity-by-descent segments we also reveal that some Estonian regions exhibit evidence of a bottleneck 10-15 generations ago reflecting sequential episodes of wars, plague and famine, although this signal is virtually undetected when treating Estonia as a single population. Besides that, we provide a framework for relating effective population size estimated from genetic data to actual census size and validate it on the Estonian population. This approach may be widely used both to cross-check estimates based on historical sources as well as to get insight into times and/or regions with no other information available. Our results suggest that the history of human populations within the last few millennia can be highly region specific and cannot be properly studied without taking local genetic structure into account.Entities:
Year: 2020 PMID: 32712624 PMCID: PMC7575549 DOI: 10.1038/s41431-020-0699-4
Source DB: PubMed Journal: Eur J Hum Genet ISSN: 1018-4813 Impact factor: 4.246
Fig. 1Principal components analysis of 2305 Estonian samples.
a Principle component analysis of the Estonian dataset. The first two PCs are shown. Individual dots are coloured according to the donor’s place of birth. Estonian counties were divided into four groups (SE South-East; SW South-West; NW North-West; NE North-East) as shown in the map. This map was created in R (https://www.R-project.org/) [16] using an shp object of the administrative and settlement units provided by the Estonian Land Board, 2018.11.01 (https://geoportaal.maaamet.ee/eng/Spatial-Data/Administrative-and-Settlement-Division-p312.html). See “Methods” for more details. The individuals with no information available regarding their place of birth are shown in grey. b Projecting Estonian samples onto PC space defined by European samples (“Methods”, Supplementary text section 1). Red crosses correspond to medians of European populations while empty circles represent individual samples. Populations are labelled as follows: Ita Italians; Spa Spaniards; Fre French; Ger Germans; Hun Hungarians; Eng British; Swe Swedes; Ukr Ukrainians; Bel Belarusians; RuCS Russians from Central and Southern Russia; Pol Poles; Lit Lithuanians; Lat Latvians; Mor Mordvins; RuN Russians from Northern Russia, Estonian samples are shown in colour reflecting their position along PC1 in (a). In both panels percentages in the axis labels show the proportion of the total variance explained by the corresponding PC.
Fig. 2Genetic clustering of R50+ samples based on pairwise sharing of IBD segments.
a Hierarchical relationships (tree) and the average total length of IBD segments shared between cluster members (heatmap) as inferred by fineSTRUCTURE. The length of the tree branches does not reflect any relationship between the clusters. Clusters are named to reflect their geographic distribution (E East; NW North-West; NE North-East; SW South-West; SE South-East). Numbers in grey next to cluster names refer to the sample size of each cluster. b Geographic distribution of inferred genetic clusters. Each symbol on the Estonian map corresponds to one individual from the R50+ subset. See Section 2.3 of the Supplementary text for details. This map was created in R (https://www.R-project.org/) [38] using an shp object of the administrative and settlement units provided by the Estonian Land Board, 2018.11.01 (https://geoportaal.maaamet.ee/eng/Spatial-Data/Administrative-and-Settlement-Division-p312.html). See “Methods” for more details.
Fig. 3Relative proportions of “Baltic”, “Slavic”, Finnish and Swedish ancestry in the R50+ subset.
Modelled relative ancestral proportions of «Balts» (Latvians and Lithuanians), «Slavs» (Belarusians, Poles, Russians, Ukrainians), Finns, and Swedes attributed by applying non-negative least-squares approach (NNLS) to CHROMOPAINTER/fineSTRUCTURE (CP/FS) results are shown. See Supplementary text section 3.1 for details. The colour of each parish reflects mean values of samples coming from this parish. Parishes with no samples in the R50+ dataset are filled with grey. Names in rectangles show directions to neighbouring countries. These maps were created in R (https://www.R-project.org/) [38] using an shp object of the administrative and settlement units provided by the Estonian Land Board, 2018.11.01 (https://geoportaal.maaamet.ee/eng/Spatial-Data/Administrative-and-Settlement-Division-p312.html). See “Methods” for more details.
Fig. 4Genetic clusters of the entire Estonian dataset (2305 samples) and their recent Ne dynamics.
a Clustering of the entire dataset obtained the same way as in Fig. 2. The heatmap shows the average total length of IBD segments shared between clusters. The length of the tree branches does not reflect any relationship between the clusters. Numbers in grey next to cluster names show the number of samples in each cluster. b Geography of inferred clusters. Each dot within the contour of Estonia corresponds to 1 individual, while waffle plots show samples for 15 major Estonian towns with each dot corresponding to 5 individuals. This map was created in R (https://www.R-project.org/) [38] using an shp object of the administrative and settlement units provided by the Estonian Land Board, 2018.11.01 (https://geoportaal.maaamet.ee/eng/Spatial-Data/Administrative-and-Settlement-Division-p312.html). See “Methods” for more details. c Effective population size estimates obtained by applying IBDNe [15] to the entire dataset and to four clusters from (a) eNW_1, eNE, eSW_2 and eSE_5. d Comparison of historical and genetic estimates of Estonian population size. Historical estimates combine census data and reconstructions based on written or archaeological sources (Fig. S4.6). Genetic estimates are derived from IBDNe results, for which Est1527 subset was used (Fig. S4.9) and refer to the broader population that contributed over time to the genomes of contemporary Estonians. When converting time points of the IBDNe curve into actual years we used the same logic as in the original publication [15] and set generation 0 to correspond to the year when individuals in our sample had a mean age of 25 (1988). Generation time of 29 years was assumed. For year 1200 the minimum and maximum estimates are provided. In (c) shaded areas show 95% confidence intervals. In (d) shaded area corresponds to the range between the minimum and maximum genetic estimates of Nc (Methods), while the light blue line shows the geometric mean between the two. In both panels on the y axis, “k” stands for “thousands” and “M” for “millions”.
Fig. 5Singleton density score selection scan results.
Genome-wide plots of p values corresponding to standardized SDS scores for the entire dataset (a) as well as SE (b) and nonSE (c) subsets. Conditional suggestive (blue) and genome-wide (red) significance lines are drawn. Gene names are highlighted for intragenic variants with –log10 (p) > 5. Datasets are described in the text and Supplementary information SI1:5.1.