| Literature DB >> 32595703 |
Guan K Tay1,2,3,4, Andreas Henschel1,5, Gihan Daw Elbait1, Habiba S Al Safar1,2,6.
Abstract
With high consanguinity rates on the Arabian Peninsula, it would not have been unexpected if the population of the United Arab Emirates (UAE) was shown to be relatively homogenous. However, this study of 1000 UAE nationals provided a contrasting perspective, one of a relatively heterogeneous population. Located at the apex of Europe, Asia, and Africa, the observed diversity could be explained by a plethora of migration patterns since the first Out-of-Africa movement. A strategy to explore the extent of genetic variation of the population of the UAE is presented. The first step involved a comprehensive population stratification study that was instructive for subsequent whole genome sequencing (WGS) of suitable representatives (which is described elsewhere). When these UAE data were compared to previous smaller studies from the region, the findings were consistent with a population that is a diverse and admixed group of people. However, rather than sharp and distinctive clusters, cluster analysis reveals low levels of stratification throughout the population. UAE emirates exhibit high within-Emirate-distance/among-Emirate distance ratios. Supervised admixture analysis showed a continuous gradient of ancestral populations, suggesting that admixture on the south eastern tip of the Arabian Peninsula occurred gradually. When visualized using a unique technique that combined admixture ratios and principal component analysis (PCA), unappreciated diversity was revealed while mitigating projection bias of conventional PCA. We observe low population stratification in the UAE in terms of homozygosity versus separation cluster coefficients. This holds for the UAE in a global context as well as for isolated cluster analysis of the Emirati birthplaces. However, the subtle clustering observed in the Emirates reflects geographic proximity and historic migration events. The analytical strategy used here highlights the complementary nature of data from genotype array and WGS for anthropological studies. Specifically, genotype array data were instructive to select representative subjects for WGS. Furthermore, from the 2.3 million allele frequencies obtained from genotype arrays, we identified 46,481 loci with allele frequencies that were significantly different with respect to other world populations. This comparison of allele frequencies facilitates variant prioritization in common diseases. In addition, these loci bear great potential as biomarkers in anthropological and forensic studies.Entities:
Keywords: genetic anthropology; next generation sequencing; population admixture; population genetic variation; population-specific allele frequencies
Year: 2020 PMID: 32595703 PMCID: PMC7304494 DOI: 10.3389/fgene.2020.00608
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1Admixture informed principal component analysis (PCA) plot. Each outlined sample denoting an individual studied here is represented as a pie chart of ancestral composition as determined with the Admixture software tool in supervised mode. In addition, HGDP samples from model populations are shown as circles in their respective colors (no outline).
FIGURE 2Locations of the HGDP model populations and the accumulated ancestral population proportions for the UAE population (shown as a pie chart). The basic(grey) world map is created with mapchart.net.
FIGURE 3Component histograms for African, Middle Eastern, and Central/South Asian contributions in 1000 Emiratis. All levels of African and Middle Eastern contributions are observed.
FIGURE 4Dendrogram ordered admixture barplots. Ancestries and population structuring of UAE individuals revealed by a supervised admixture analysis against eight ancestral world populations.
FIGURE 5Phylogenetic tree based on neighbor joining and identity by state distances contextualizing the local samples of this study against world populations.
Global AMOVA results as a weighted average over loci.
| Populations | Source of variation | d.f* | Sum of squares | Variance components | Percentage variation | Fixation index (Fst) |
| UAE’s subpopulations | Among population | 6 | 324675.507 | 86.21332 | 0.244 | 0.00244 |
| Among individuals within population | 763 | 28700974.7 | 980.57196 | 2.77519 | ||
| Within Individuals | 800 | 27387119 | 34266.7071 | 96.98081 |
Cluster characteristics of the UAE in comparison to (down-sampled) HGDP populations.
| Compactness mean | Compact std | Heterogenity mean | Separation mean | |||
| UAE | 1.063692236 | 0.00339496 | 0.2459575 | 0.261619731 | 0 | 1 |
| Central South Asia | 1.073312712 | 0.00168414 | 0.24111312 | 0.258789099 | −25.25837531 | 7.64E-64 |
| North Africa | 1.083155129 | 0.0003187 | 0.24629228 | 0.266772749 | −56.79171903 | 1.68E-124 |
| Europe | 1.11681804 | 0.00101625 | 0.23375831 | 0.261065394 | −149.1605526 | 2.29E-205 |
| Sub-Saharan Africa | 1.19385538 | 0.0027968 | 0.25134665 | 0.300070385 | −294.4347462 | 1.46E-263 |
| East Asia | 1.208236732 | 0.0025765 | 0.21478897 | 0.259514893 | −337.4518228 | 2.90E-275 |
| America | 1.384473213 | 0.00336748 | 0.19251678 | 0.266533055 | −667.4734845 | 0 |
| Oceania | 1.390620317 | 0.00047624 | 0.19300375 | 0.268394939 | −948.8630385 | 0 |
FIGURE 6The color-coded distance matrix, arranged by the clustering dendrogram between the UAE population and against eight ancestral world populations.
FIGURE 7Color coded Mean Weir-Cockerham FST values. UAE FST values are closest to Central/South-Asia, Middle East, Europe, and North Africa. All Emirates exhibit very low FST values amongst each other in comparison to global FST values.
Significantly different allele frequencies and break down by SnpEff effect category.
| Z-score | Total | Non-synonymous coding | Start gained | Stop gained | Exon | Intron | Splice-site region | Others |
| Z < -4 | 1261 | 6 | 0 | 0 | 6 | 523 | 0 | 726 |
| Z > 4 | 45,220 | 874 | 6 | 5 | 305 | 18,649 | 15 | 25,366 |
| | Z| > 4 | 46,481 | 880 | 6 | 5 | 311 | 19,172 | 15 | 26,092 |
FIGURE 8Depiction of the chromosomal locations of the 46,481 variants with significantly different UAE specific allele frequencies (|Z| > 4, p-value = 6.3 × 10– 5) relative to a comparison to frequencies of the gnomAD populations.