Literature DB >> 23249956

Genetic characterization of northeastern Italian population isolates in the context of broader European genetic diversity.

Tõnu Esko¹, Massimo Mezzavilla, Mari Nelis, Christelle Borel, Tadeusz Debniak, Eveliina Jakkula, Antonio Julia, Sena Karachanak, Andrey Khrunin, Peter Kisfali, Veronika Krulisova, Zita Aušrelé Kučinskiené, Karola Rehnström, Michela Traglia, Liene Nikitina-Zake, Fritz Zimprich, Stylianos E Antonarakis, Xavier Estivill, Damjan Glavač, Ivo Gut, Janis Klovins, Michael Krawczak, Vaidutis Kučinskas, Mark Lathrop, Milan Macek, Sara Marsal, Thomas Meitinger, Béla Melegh, Svetlana Limborska, Jan Lubinski, Aarno Paolotie, Stefan Schreiber, Draga Toncheva, Daniela Toniolo, H-Erich Wichmann, Alexander Zimprich, Mait Metspalu, Paolo Gasparini, Andres Metspalu, Pio D'Adamo.

Abstract

Population genetic studies on European populations have highlighted Italy as one of genetically most diverse regions. This is possibly due to the country's complex demographic history and large variability in terrain throughout the territory. This is the reason why Italy is enriched for population isolates, Sardinia being the best-known example. As the population isolates have a great potential in disease-causing genetic variants identification, we aimed to genetically characterize a region from northeastern Italy, which is known for isolated communities. Total of 1310 samples, collected from six geographically isolated villages, were genotyped at >145000 single-nucleotide polymorphism positions. Newly genotyped data were analyzed jointly with the available genome-wide data sets of individuals of European descent, including several population isolates. Despite the linguistic differences and geographical isolation the village populations still show the greatest genetic similarity to other Italian samples. The genetic isolation and small effective population size of the village populations is manifested by higher levels of genomic homozygosity and elevated linkage disequilibrium. These estimates become even more striking when the detected substructure is taken into account. The observed level of genetic isolation in Friuli-Venezia Giulia region is more extreme according to several measures of isolation compared with Sardinians, French Basques and northern Finns, thus proving the status of an isolate.

Entities: Chemical Disease Gene Mutation Species

Mesh：

Year: 2012 PMID： 23249956 PMCID： PMC3658181 DOI： 10.1038/ejhg.2012.229

Source DB: PubMed Journal: Eur J Hum Genet ISSN： 1018-4813 Impact factor: 4.246

Introduction

Human complex traits arise from the new mutations as well as from the interplay between existing genetic variants and exposure to environmental conditions. This means that it is desirable to study populations with decreased genetic variability, such as isolated populations[1, 2] and large homogeneous populations[2, 3] as more power is gained for genetic association mapping studies.[4] Population isolates are by definition characterized by small effective population size (Ne), which results in stronger effects of random genetic drift leading to decreased genetic variability.[5] These processes can be temporally continuous and/or woven into a chain of discrete bottleneck events in the demographic history of an isolate. Several population isolates, like Sardinians,[6] northern Finns,[4] Amish[7] and Icelanders[8] have successfully been used in both linkage and genome-wide association studies for pinpointing DNA sequence variants for disease predispositions (OMIM database: www.omim.org and NHGRI GWAS Catalog[9]). Another favorable aspect of population isolates is the largely shared environment. The isolation usually arises from geographical barriers and therefore everyone is exposed to the same factors, which enables to effectively design powerful gene–environment interaction studies.[5] In recent years, several studies have started to shed light on the structure of the genetic variation on global,[10, 11] continental,[12, 13] regional,[3, 14] ethnic group (such as Jews,[15] Indians,[16] Brazilia[17]) and country[2, 3, 18, 19] level, but very rarely are the studied sample representative for a whole country.[2, 3] The analyses on the sub-population level[20, 21] focus mostly on population isolates and demonstrate a decreased genetic variability within, but also elevated diversity between neighboring regions and source populations.[2, 15, 21, 22] It is important to characterize any putative population isolate for events of recent admixture and presence of sub-structure as these could disrupt the genetic homogeneity and lead to possible spurious associations in gene mapping studies.[23] Therefore, a correct sampling strategy is fundamental. Italians, on the example of Sardinians, are one of the most studied European populations next to Saami people and Basques where the population structure has been analyzed in depth by using both the haploid loci (mtDNA and Y chromosome)[24, 25] and the autosomal markers.[3, 20] It is hypothesized that Italy may be enriched for population isolates because of complex demographic history and topographic variability (Italian Network of Genetic Isolates: http://www.netgene.it/ita/ingi.asp). This study aims to test this hypothesis by genetically characterizing a hilly part of Friuli-Venezia Giulia (FVG) county located in northeastern Italy. The region is particularly interesting, – while covering a total area of only 7858 km2 several distinct dialects are spoken and several villages sport traditions and/or surnames linking them to ethnic groups further away rather than in FVG or in Italy for that matter. All in all, it seems the area is characterized by complex demographic history. For example, people in the village of Resia speak an archaic proto-Slavic language, known as Resian, but their surnames are Italian or Italianized. In the village of Illegio, not far from Resia, people speak another local language – called friuliano – of the Rhaeto-Romance language sub-family, which, during the Middle Ages, was widespread – from modern Switzerland to Slovenia. Illego is further characterized by a limited number of surnames, what could be interpreted as evidence for marginal immigration. Another layer of specific cultural heritage is added by characteristic local symbols found engraved on local houses. The inhabitants of village Sauris speak an archaic dialect of German origin and according to legends the locals have their ancestral roots near Tyrol. Until 50 years ago, before a flood devastated the valley of origin, the main spoken language in the village of Erto was a Latin dialect called ertano, while current population is an admixture of the former inhabitants of Erto and migrants from the nearby regions. The village of San Martino del Carso (SMC) is the only Italian-speaking village in the Slovenian-speaking Carso region. Finally, the village of Clauzetto is located in a remote valley where people speak friulano. The rich history and linguistic diversity of these six populations allows to predict elevated levels of intragroup genetic homogeneity, higher intergroup differentiation between the villages, and suggests at least some degree of ancestry with the neighboring regions of Slovenia, Germany and Italy. In this study, we analyze the genetic variation within the FVG region in order to understand (a) the relationship between the villages inhabitants and the other Italians, and Europeans as a whole, which may offer unique insight into the complex demographic history of the region, and (b) to evaluate if any of the village populations represent a genuine population isolate, as these would have a great potential in genetic epidemiological studies. In order to answer these questions, high-density genotype data of >1400 newly analyzed samples from FVG region and Slovenia were combined with several publicly available data collections.[3, 11, 15, 16, 26] This enables us to directly contrast the hypothetical cultural origins of FVG region villages against the data-driven genetic similarity patterns. Furthermore, we compare the genetic diversity and genomic homogeneity found in the six FVG villages with other well-known geographical and cultural population isolates, such as Sardinians, French Basques and northern Finns.

Materials and methods

Samples

We genotyped 1310 samples from six geographically isolated villages in the Italian FVG region. We combined this data with the 96 newly genotyped Slovenian samples and with five published population-based collections[3, 11, 15, 16, 26] (see Supplementary Table 1), with an emphasis on populations with European ancestry. After data quality control and exclusion of close relatives, 3091 samples from 72 populations were used for the analyses (Supplementary Table 1). This set included 733 samples of Italian ancestry from 11 populations: (1) Borbera Valley (North-West Italy) and Apulia region (South-East Italy; Carlantino)[3] and (2) Sardinians, Tuscans and northern Italians from Human Genetic Diversity Panel[11] and (3) six villages from the FVG region. Figures 1a and b shows the valleys and isolated villages from where the Italian samples were collected. Hereafter, the nation name is used for general populations and a village or region name for more isolated populations.

Figure 1

Map showing approximate location of analyzed Italian samples and revealed population structure in analyzed populations. (a) A geographical map of Italy and shown are the approximate sampling regions. (b) A detailed geographical map of FVG and shown are the approximate location of the six villages. (c) Ancestry proportions of the studied 1008 individuals from 39 European and Near-Eastern populations (including the six FVG village populations) as revealed by the ADMIXTURE program[29] with K=2 to K=10. A stacked column of the K proportions represents each individual, with fractions indicated on the y axis. From all non-FVG populations a subset of 24 randomly drawn individuals (if applicable) was used.

A written informed consent for participation was obtained from all newly genotyped subjects.

Single-nucleotide polymorphism (SNP) genotypes and quality control

The six Italian cohorts were genotyped using the Human370CNV and the Slovenian sample with the HumanOmniExpress beadchips according to the manufacturer's protocol (Illumina Inc., San Diego, CA, USA). The published data sets used in the analysis had been genotyped with different versions of Illumina beadchips. The genotype data from different platforms were first merged and then filtered according to the standard genotype quality control metrics using PLINK v1.07 software.[27] Only the SNPs of the 22 autosomal chromosomes with minor allele frequency of >1%, Hardy–Weinberg equilibrium P-value >10-6, and genotyping success rate >95% were included, leaving approximately 145 000 SNP markers. Only the individuals with genotyping success rate >95% were used. Cryptic relatedness was tested with the same software and from the detected relative pairs (up to the second cousins) only one was randomly chosen for the subsequent analyses. Depending on the analyses (eg, computational optimization and bias in sample sizes) very distant ethnic groups were excluded and the population sample sizes were trimmed to 50 or 24 samples. Background linkage disequilibrium (LD) can induce bias in the principal component[28] and structure-like analyses; therefore the set of 145 000 SNPs was thinned by excluding markers in strong LD (pairwise genotype correlation r2>0.4) in a window of 200 SNPs (sliding window overlap 25 SNPs at a time) leaving approximately 101 000 SNPs for the subsequent analyses.

Statistical analyses

An unsupervised, maximum likelihood-based clustering algorithm assembled in ADMIXTURE[29] software was applied to the European and Near-Eastern ancestry (Jews, Palestinians, Adyghes and Druzes) population samples (n=1975) to identify the putative ancestral clusters within the samples as well as to assess the extent of admixture. Clustering was performed 100 times at K=2 to K=15 and the best-fitting K was selected according to the lowest cross-validation (CV) index (Supplementary Figure 1A). In addition, we opted to use a threshold level of variation in log likelihood scores (LL<1) within a fraction (10%) of runs with the highest LLs[15] (Supplementary Figure 1B) as a pointer to assume that the global likelihood maximum was reached, thus rendering the given K model a useful representation of the genetic structure of the sample. The lowest CV indexes were observed at K=9 while K=10 showed only marginally worse values. It is likely that global likelihood maximum was indeed reached at K=2 to K=15. For several analyses, the FVG populations were split into sub-populations according to ancestry estimations at K=10 (if applicable) as follows: (1) general set (GS), when village-specific ancestry loading was smaller than 30% and (2) more isolated set (IS), when loading exceeded 30%. We choose K=10 to discern between the sets because beyond this K (and up to K=15, see above) no additional village-specific components arose, thus making this K the most appropriate one to choose for this particular task. This choice was supported by observing that for K=10, (1) the global likelihood maximum was indeed probably reached and (2) the CV index values were close to the lowest ones at K=9. Subsequently, a sub-set of 24 individuals was chosen at random from each European and Near-Eastern ancestry population for the following analyses in order to minimize sample size effects.[30] In all instances, the sampling was repeated five times and the obtained pairwise Fst distances were close to unity, indicating that the random sets were representative of the entire sample. Principal component analysis (PCA) using 101 000 SNPs was performed using EIGENSOFT[31] package. The same software was used to calculate the pairwise Fst values that were further analyzed with hierarchical agglomerative clustering. The analyzes were performed on two sets of samples: (1) all available samples (n=3091), and (2) only the trimmed European and Near-Eastern ancestry populations (n=1008). Spatial ancestry analysis (SPA) was applied on the latter set, to explicitly model the spatial distribution of each marker in order to describe the degree of population stratification in each of the putative population isolates into the European genetic background.[32] Principal component ancestry informative markers (PCAIMs)[33] panel was constructed of 250 SNPs (out of 101 000 SNPs) retained from the top five PCs (50 SNPs from each)[17] ranked by the absolute loading scores. PCAIMs panel was the input to a non-model-based multivariate approach, a discriminant analysis of principal components (DAPCs), which is implemented into the R package adegenet ver1.3-0.[36] All principal components were included to the k-means clustering algorithm from KDAPC=1 to KDAPC=14, and the best-fitting KDAPC was selected using the Bayesian information criterion. DAPC was used to assign individuals into the predicted clusters and real populations were used as identifiers. Over fitting was avoided by estimating the difference between the proportion of successful reassignments and values obtained using random grouping. Pairwise LD extent was calculated as the genotype correlation (r2) between marker pairs <100 kb apart using the PLINK v1.07[27] software. A custom Perl script was applied to categorize the r2 values according to inter-marker distances (0−5 kb, 5−10 kb, and so on) and mean r2 was calculated for each category. Calculations were restricted only to the established population isolates, the FVG region samples, and a set of geographical reference populations (Estonians, Slovenians and Swiss). The genomic runs of homozygosity (gROH) and the inbreeding coefficient (Fin) were estimated using PLINK v1.07 with established parameters.[34] ROH was defined as a sequence of at least 25 consecutive homozygous SNPs spanning at least 1500 kb, with a maximum gap of 100 kb between the adjacent SNPs and a density of SNPs within the run of no >20 kb/SNP[34] (for each individual the gROH was defined as the sum of above defined genomic regions). The Fin was estimated for each sample based on the ratio between observed and expected number of the homozygous genotypes.

Results

FVG region in the context of broader European genetic diversity

As the first step in understanding the place of the FVG region populations on the genetic canvas of Europe, we applied a model-based structure-like algorithm assembled in ADMIXTURE[29] to compute quantitative estimates for individual ancestry in K hypothetical ancestral populations. The best predictive accuracy was observed for a model with K=9 (Supplementary Figure 1). In general, the ancestry proportions distributions among FVG village samples were very similar to other populations in the geographical region, although on higher K values almost all of the FVG populations became dominated by a single component largely specific to any one particular village. The results revealed a substantial level of intra-population structure in most of the FVG region populations manifested in elevated variability in membership to the village-specific ancestry component (Figure 1c). It is likely that this structure is representing the intra-population differences in level of inbreeding rather than specific ancestry. This interpretation gains further support in subsequent analyses where the presence of the village-specific ancestry component was found to be highly correlated with both extended LD range and elevated levels of gROH (Supplementary Figure 7; see below). For further analyses, the village populations were split into sub-populations (referred as GS and IS) based on the prevalence of the village-specific ancestry component (described in Materials and Methods section) and considered as independent populations to more accurately represent the genetic diversity within FVG villages. Next, we conducted a PCA on the European and Near-Eastern ancestry populations (on the global PCA plot[11] all FVG samples clustered onto Europeans; Supplementary Figure 2A). The two first principal components accounted for 1.5% and 0.7% of the genetic variation, whereas clearly separating the population isolates, like Sardinians,[22] French Basques[22] and northern Finns,[2] but also the more homogenous FVG sub-populations (Figure 1; Supplementary Figure 2B). A more detailed picture emerged from the SPA[32] analyzes as follows: (a) SMC has the widest spread, clustering onto the German-speaking populations and partially overlapping with the Slovenians, (b) the GS sub-populations of Clauzetto, Sauris and Illegio cluster between SMC and northern Italians, from which Borbera and Tuscans positioned farther ‘south' and (c) GS sub-populations of Erto and Resia are shifted toward population isolates but cluster less distantly (Figure 2). The clustering of the FVG region samples in both PCA and SPA projections were roughly representative to their geographical location.

Figure 2

Model-based mapping convergence with SPA. Label position indicates the (a) specific PC1 and PC2 coordinate values for each individual and (b) the mean PC1 and PC2 coordinate values for each population. For (a, b), the colors have a following meaning: (1) dark blue color: a homogeneous fraction of the FVG population; a blue color: more general fraction of the FVG population; a red color: other Italian samples; a violet color: Basques; an orange color: Slovenians; and green color: all other populations. For (a, b), the following population abbreviation labels are used: AT, Austrians; BA, French Basques; BG, Bulgarians; BO, Borbera; CA, Carlantino; CL, Clauzetto; CH, Swiss; CZ, Czechs; GR, Germans; ER, Erto; ES, Spaniards; FR, French; HU, Hungarians; IL, Illegio; IT, Italians; JW_A, Ashkenazy Jews; JW_S, Sephardic Jews; OR, Orcadians; RE, Resia; RO, Romanians; SA, Sardinians; SA_, Sauris; SMC, San Martino del Carso; SI, Slovenians; TU, Tuscans. The extra abbreviations: N, northern; S, southern; I, a more homogeneous sub-population; G, a more general sub-population.

To further minimize the bias from within- and maximize the between-group variance, we applied the DAPCs[35] to more precisely identify the genetically closest populations to the FVG groups. For the combined European and Near-Eastern populations, the SNP PCAIM panel had the best fit for KDAPC=7 (Supplementary Figure 3) and the DAPC clearly out-clusters the IS sub-populations from Illegio (C1), Sauris (C3) and Resia (C6), including some of the respective GS sub-population samples as well (Figure 3). The rest of the FVG populations clustered mostly with C2 (dominated by central Europeans) and C5 (dominated manly by Italians), although a minor fraction was also assigned to C4 (dominated by northern Europeans), like Sauris and SMC, and C7 (dominated by Near-Eastern ancestry populations), like Clauzetto, Illegio and Sauris but also other southern Europeans (Figure 3).

Figure 3

Predicted cluster membership for each individual from DAPCs. Model optimum was found on K=7 and individuals are assigned to best-fitting cluster. The size of the rectangle scales with the number of assigned samples. In population names: N, northern; S, southern; I, a more homogeneous sub-population; G, a more general sub-population.

Hierarchical agglomerative clustering analysis based on the between populations pair-wise Fst distances revealed that the IS sub-populations were very distant from all other populations (Supplementary Figure 4). The GS sub-populations showed very small pair-wise Fst distances from the geographically close populations (Supplementary Figure 5). Two main observations were made: (1) the SMC was equally distant from Sauris and Clauzetto as well as from its geographical neighbors, such as Slovenians, and (2) Sauris resembled Clauzetto the most but also the Swiss. All the other FVG sets resembled a larger list of populations (Supplementary Figure 5).

FVG village populations as genetic isolates

Some of the FVG villages reside in very remote valleys and this can lead to genetic isolation, so we next investigated measures of genetic diversity that might reflect a history of relative isolation, elevated levels of inbreeding and/or recent historical bottlenecks. To compare the depth of the isolation in the FVG village sub-populations, we compared the observed values with the established population isolates (like French Basques, Sardinians and northern Finns) but also with the other reference populations. A high level of inner-structure in the FVG populations was revealed by ADMIXTURE (Figure 1c), which was further confirmed by the PCA, SPA and DAPC (Supplementary Figure 2B; Figures 2 and 3). For example, in both the PCA and SPA analyses, the IS FVG sub-populations (except Clauzetto) positioned very distantly from their geographical neighbors (Supplementary Figure 2B). Genetic isolation of FVG populations was further highlighted by DAPC, which assigned three of the IS sub-populations (Illegio, Sauris and Resia) exclusively to respective single village-specific clusters, whereas all other FVG populations were distributed between several clusters (Figure 3). As may be expected from the observed genetic structure, the between populations pair-wise Fst values of the IS sub-populations were extremely high compared with all the other populations (overall mean Fst: Resia 0.023 and Sauris 0.021 vs SMC 0.006) and elevated compared with the previously characterized population isolates (overall mean Fst: Sardinia 0.014; French Basques 0.013 and northern Finns 0.18) (Supplementary Figure 6). The results from the LD analysis revealed a striking difference between the sub-populations drawn from a single village compared with the established population isolates, whereas even the (GS) sub-population of Resia and Illegio showed a strong deviation in LD from the expected southern decline[3] (Figure 4). The LD block length was the shortest for the GS sub-populations of Clauzetto and Sauris, where the extent was comparable with values observed for Slovenians and Swiss.

Figure 4

Genome-wide LD length based on 145 000 SNPs. Each line represents the LD decay averaged across populations and sub-populations and LD (r2) between SNPs shown in 5 kb bins. Sub-populations drawn from same population have the same color coding. A dashed line represents the more general sub-populations and the solid line with rectangles represents the more homogeneous sub-populations respectively. A solid thin line represents the European reference populations (Estonians, Slovenians and Swiss). The LD extent in reference population isolates (Sardinians, French Basques and northern Finns) is also shown with a solid line with rectangles. In population names: I, a more homogeneous sub-population; G, a more general sub-population.

Finally, similar large deviation from the expected values was observed for population mean gROH in IS sub-populations (Figure 5). For example, the mean gROH for the latter set was 47 Mb, compared with 11 Mb in the population isolates cluster, 5.5 Mb in the GS sub-populations and roughly 2 Mb in the European reference populations (Figure 5). The latter is similar to previously published estimates.[36, 37] The same tendency was observed from clumped inbreeding coefficient estimates respectively in above defined clusters: 0.04, 0.028, 0.009 and 0.011 (Supplementary Table 2). These two estimates were highly correlated with the village-specific ancestral components (Supplementary Figure 7) and thus it is probably the decreased diversity because of small Ne in the subsets of villages that is behind the striking structure.

Figure 5

The genomic runs of homozygosity (gROH) based on 145 000 SNPs. (a) Distribution of population mean homozygous segments and mean count of gROH>1.5 Mb per sample. (b) Zoom in view for the region indicated with a dashed ellipse on (a). FVG sub-populations are indicated in bold letters. In population names: I, a more homogeneous sub-population; G, a more general sub-population.

Discussion

In this study, we were interested to see how the FVG village populations fit into the general paradigm of spatial structure of the genetic variation of European populations using 145 000 autosomal SNPs. To this end, we compared the genetic variation of the FVG village populations against a reference panel composed of other Italian, a wider range of European, and a set of Near-Eastern populations. We observed a striking level of genetic structure among the FVG populations (Figure 1c), which likely arises from increased genetic similarity within specific subsets of samples from respective villages and not difference in genetic origin. The FVG-specific components were present in the background profile of all Europeans, which is consistent with the FVG-specific components representing a fraction of the overall genetic variability in Europe. Based on the composition of ancestral components the FVG populations were split into two sets for the subsequent analyses: (1) more general sub-population (GS), which resembled other Italians and populations in close proximity, and (2) more diverse and very homogeneous set (IS). The GS sets were used to further understand the genetic position of the FVG populations on the European genetic canvas and to look for the genetic roots of cultural differences. The two first components in both the PCA and the SPA analyses positioned the FVG samples close within the variation of Italian populations as reported by Li et al.[11] In addition, FVG region showed the smallest Fst distances with the populations in close geographical proximity. For example the SMC, while being an only Italian-speaking village in a Slovenian-speaking region, had equally distant Fst values from Slovenians and other FVG populations, but at the same time had a very broad clustering in the SPA and was consistent with varied level of admixture. This suggests that the original Italian founders of the village have genetically mixed with the neighboring Slovenians and thus on PC plot position in between Italians and Slovenians but have maintained their Italian cultural heritage. From the discriminant analyses of principal components, which effectively minimized the large within-group variability,[33] we found some hints for the origin of Sauris, the historical founders of which are believed to be German speakers from the region adjacent to Tyrol (Supplementary Note). In the DAPC, the Sauris samples positioned to several clusters, mainly Italian-specific C5 and southern European-specific C2 but also in the northern European cluster C4 (also slightly present only for SMC), together with the Germans and Swiss. The smallest Fst distance with the Swiss next to Clauzetto further strengthened the Sauris link with the German-speaking populations. In addition to being intriguing subjects in studies on population history, linguistics and ethnogenesis, population isolates are useful tools for genetic epidemiology studies in quest for finding disease susceptibility alleles. Using isolates is advantageous because of very similar exposure to environmental factors and effects of small Ne – enrichment of some harmful sequence variants because of more pronounced random genetic drift, a higher level of consanguinity and overall genetic homogeneity.[5, 6] In this study, we detected a strong signal for genetic isolation for subsets of the village populations that could only be discovered by analyzing a representative fraction of the village populations. The more homogeneous FVG sub-populations showed more extreme values for all considered measures of isolation compared with Sardinia,[11, 36] French Basques[11, 36] and northern Finns.[2] Most of the extensive genetic homogeneity is explained by the very small Ne within the FVG villages (eg, 105 inhabitants in the village of Erto), which has led to high but varying levels of consanguinity suggested also by the trice as large inbreeding coefficients in GS compared with IS sub-population clusters. Not surprisingly also the more general sub-populations of the FVG, except Clauzetto, demonstrated values in the direction that is indicative of isolation relative to the European reference set for Fst distances,[3] in the extent of LD[3] and levels of genomic background homozygosity.[34, 36, 37] This study has yielded two main results. We have deeply characterized and confirmed the status of genetic isolate for four further populations, which provide a useful tool for gene mapping studies. Second, the genetic structure analyses clearly highlights the need to analyze a large and representative sample to precisely estimate the intragroup variability in a population. Our results show the extent of the genetic diversity and variation within and between populations sampled from northeastern part of Italy while highlighting an extreme level of isolation compared with other genetic isolates.

37 in total

1. Population structure and genome-wide patterns of variation in Ireland and Britain.

Authors: Colm T O'Dushlaine; Derek Morris; Valentina Moskvina; George Kirov; Michael Gill; Aiden Corvin; James F Wilson; Gianpiero L Cavalleri
Journal: Eur J Hum Genet Date: 2010-06-23 Impact factor: 4.246

2. A genome-wide analysis of population structure in the Finnish Saami with implications for genetic association studies.

Authors: Jeroen R Huyghe; Erik Fransen; Samuli Hannula; Lut Van Laer; Els Van Eyken; Elina Mäki-Torkko; Pekka Aikio; Martti Sorri; Matthew J Huentelman; Guy Van Camp
Journal: Eur J Hum Genet Date: 2010-12-08 Impact factor: 4.246

3. Fast model-based estimation of ancestry in unrelated individuals.

Authors: David H Alexander; John Novembre; Kenneth Lange
Journal: Genome Res Date: 2009-07-31 Impact factor: 9.043

4. A rare variant in MYH6 is associated with high risk of sick sinus syndrome.

Authors: Hilma Holm; Daniel F Gudbjartsson; Patrick Sulem; Gisli Masson; Hafdis Th Helgadottir; Carlo Zanon; Olafur Th Magnusson; Agnar Helgason; Jona Saemundsdottir; Arnaldur Gylfason; Hrafnhildur Stefansdottir; Solveig Gretarsdottir; Stefan E Matthiasson; Gu Mundur Thorgeirsson; Aslaug Jonasdottir; Asgeir Sigurdsson; Hreinn Stefansson; Thomas Werge; Thorunn Rafnar; Lambertus A Kiemeney; Babar Parvez; Raafia Muhammad; Dan M Roden; Dawood Darbar; Gudmar Thorleifsson; G Bragi Walters; Augustine Kong; Unnur Thorsteinsdottir; David O Arnar; Kari Stefansson
Journal: Nat Genet Date: 2011-03-06 Impact factor: 38.330

5. Genomic and geographic distribution of SNP-defined runs of homozygosity in Europeans.

Authors: Michael Nothnagel; Timothy Tehua Lu; Manfred Kayser; Michael Krawczak
Journal: Hum Mol Genet Date: 2010-05-12 Impact factor: 6.150

6. Inferring geographic coordinates of origin for Europeans using small panels of ancestry informative markers.

Authors: Petros Drineas; Jamey Lewis; Peristera Paschou
Journal: PLoS One Date: 2010-08-18 Impact factor: 3.240

7. Discriminant analysis of principal components: a new method for the analysis of genetically structured populations.

Authors: Thibaut Jombart; Sébastien Devillard; François Balloux
Journal: BMC Genet Date: 2010-10-15 Impact factor: 2.797

8. Genomic runs of homozygosity record population history and consanguinity.

Authors: Mirna Kirin; Ruth McQuillan; Christopher S Franklin; Harry Campbell; Paul M McKeigue; James F Wilson
Journal: PLoS One Date: 2010-11-15 Impact factor: 3.240

9. The genetic structure of the Swedish population.

Authors: Keith Humphreys; Alexander Grankvist; Monica Leu; Per Hall; Jianjun Liu; Samuli Ripatti; Karola Rehnström; Leif Groop; Lars Klareskog; Bo Ding; Henrik Grönberg; Jianfeng Xu; Nancy L Pedersen; Paul Lichtenstein; Morten Mattingsdal; Ole A Andreassen; Colm O'Dushlaine; Shaun M Purcell; Pamela Sklar; Patrick F Sullivan; Christina M Hultman; Juni Palmgren; Patrik K E Magnusson
Journal: PLoS One Date: 2011-08-04 Impact factor: 3.240

10. A genealogical interpretation of principal components analysis.

Authors: Gil McVean
Journal: PLoS Genet Date: 2009-10-16 Impact factor: 5.917

31 in total

1. Extensive genome-wide autozygosity in the population isolates of Daghestan.

Authors: Tatiana M Karafet; Kazima B Bulayeva; Oleg A Bulayev; Farida Gurgenova; Jamilia Omarova; Levon Yepiskoposyan; Olga V Savina; Krishna R Veeramah; Michael F Hammer
Journal: Eur J Hum Genet Date: 2015-01-21 Impact factor: 4.246

2. The type 2 diabetes associated rs7903146 T allele within TCF7L2 is significantly under-represented in Hereditary Multiple Exostoses: insights into pathogenesis.

Authors: Federica Sgariglia; Elena Pedrini; Jonathan P Bradfield; Tricia R Bhatti; Pio D'Adamo; John P Dormans; Aruni T Gunawardena; Hakon Hakonarson; Jacqueline T Hecht; Luca Sangiorgi; Maurizio Pacifici; Motomi Enomoto-Iwamoto; Struan F A Grant
Journal: Bone Date: 2014-12-09 Impact factor: 4.398

3. Imputation-based assessment of next generation rare exome variant arrays.

Authors: Alicia R Martin; Gerard Tse; Carlos D Bustamante; Eimear E Kenny
Journal: Pac Symp Biocomput Date: 2014

4. Polymorphisms in sweet taste genes (TAS1R2 and GLUT2), sweet liking, and dental caries prevalence in an adult Italian population.

Authors: Antonietta Robino; Lorenzo Bevilacqua; Nicola Pirastu; Roberta Situlin; Roberto Di Lenarda; Paolo Gasparini; Chiara Ottavia Navarra
Journal: Genes Nutr Date: 2015-08-13 Impact factor: 5.523

Review 5. Runs of homozygosity: windows into population history and trait architecture.

Authors: Francisco C Ceballos; Peter K Joshi; David W Clark; Michèle Ramsay; James F Wilson
Journal: Nat Rev Genet Date: 2018-01-15 Impact factor: 53.242

6. Fine-scale human genetic structure in Western France.

Authors: Matilde Karakachoff; Nicolas Duforet-Frebourg; Floriane Simonet; Solena Le Scouarnec; Nadine Pellen; Simon Lecointe; Eric Charpentier; Françoise Gros; Stéphane Cauchi; Philippe Froguel; Nane Copin; Thierry Le Tourneau; Vincent Probst; Hervé Le Marec; Sabrina Molinaro; Beverley Balkau; Richard Redon; Jean-Jacques Schott; Michael Gb Blum; Christian Dina
Journal: Eur J Hum Genet Date: 2014-09-03 Impact factor: 4.246

7. Unravelling the genetic history of Negritos and indigenous populations of Southeast Asia.

Authors: Farhang Aghakhanian; Yushima Yunus; Rakesh Naidu; Timothy Jinam; Andrea Manica; Boon Peng Hoh; Maude E Phipps
Journal: Genome Biol Evol Date: 2015-04-14 Impact factor: 3.416

8. Population genetic differentiation of height and body mass index across Europe.

Authors: Matthew R Robinson; Gibran Hemani; Carolina Medina-Gomez; Massimo Mezzavilla; Tonu Esko; Konstantin Shakhbazov; Joseph E Powell; Anna Vinkhuyzen; Sonja I Berndt; Stefan Gustafsson; Anne E Justice; Bratati Kahali; Adam E Locke; Tune H Pers; Sailaja Vedantam; Andrew R Wood; Wouter van Rheenen; Ole A Andreassen; Paolo Gasparini; Andres Metspalu; Leonard H van den Berg; Jan H Veldink; Fernando Rivadeneira; Thomas M Werge; Goncalo R Abecasis; Dorret I Boomsma; Daniel I Chasman; Eco J C de Geus; Timothy M Frayling; Joel N Hirschhorn; Jouke Jan Hottenga; Erik Ingelsson; Ruth J F Loos; Patrik K E Magnusson; Nicholas G Martin; Grant W Montgomery; Kari E North; Nancy L Pedersen; Timothy D Spector; Elizabeth K Speliotes; Michael E Goddard; Jian Yang; Peter M Visscher
Journal: Nat Genet Date: 2015-09-14 Impact factor: 38.330

9. Multicohort analysis of the maternal age effect on recombination.

Authors: Hilary C Martin; Ryan Christ; Julie G Hussin; Jared O'Connell; Scott Gordon; Hamdi Mbarek; Jouke-Jan Hottenga; Kerrie McAloney; Gonnecke Willemsen; Paolo Gasparini; Nicola Pirastu; Grant W Montgomery; Pau Navarro; Nicole Soranzo; Daniela Toniolo; Veronique Vitart; James F Wilson; Jonathan Marchini; Dorret I Boomsma; Nicholas G Martin; Peter Donnelly
Journal: Nat Commun Date: 2015-08-05 Impact factor: 14.919

10. Xenobiotic sensor- and metabolism-related gene variants in environmental sensitivity-related illnesses: a survey on the Italian population.

Authors: Daniela Caccamo; Eleonora Cesareo; Serena Mariani; Desanka Raskovic; Riccardo Ientile; Monica Currò; Liudmila Korkina; Chiara De Luca
Journal: Oxid Med Cell Longev Date: 2013-07-07 Impact factor: 6.543