BACKGROUND: The main goal of this study was to conduct a comparative population genetic study of Turkish speaking Iranian Azeries as being the biggest ethno-linguistic community, based on the polymorph markers on Y chromosome. METHODS: One hundred Turkish-speaking Azeri males from north-west Iran (Tabriz, 2008-2009) were selected based on living 3 generations paternally in the same region and not having any relationship with each other. Samples were collected by mouth swabs, DNA extracted and multiplex PCR done, then 12 Single Nucleotide Polymorphisms (SNPs) and 6 Microsatellites (MS) were sequenced. Obtained data were statistically analyzed by Arlequin software. RESULTS: SNPs and Microsatellites typing were compared with neighboring Turkish-speaking populations (from Turkey and Azerbaijan) and Turkmens representing a possible source group who imposed the Turkish language during 11-15(th) centuries AD. Azeris demonstrated high level of gene diversity compatible with patterns registered in the neighboring Turkish-speaking populations, whereas the Turkmens displayed significantly lower level of genetic variation. This rate of genetic affiliation depends primarily on the geographic proximity. CONCLUSION: The imposition of Turkish language to this region was realized predominantly by the process of elite dominance, i.e. by the limited number of invaders who left only weak patrilineal genetic trace in modern populations of the region.
BACKGROUND: The main goal of this study was to conduct a comparative population genetic study of Turkish speaking Iranian Azeries as being the biggest ethno-linguistic community, based on the polymorph markers on Y chromosome. METHODS: One hundred Turkish-speaking Azeri males from north-west Iran (Tabriz, 2008-2009) were selected based on living 3 generations paternally in the same region and not having any relationship with each other. Samples were collected by mouth swabs, DNA extracted and multiplex PCR done, then 12 Single Nucleotide Polymorphisms (SNPs) and 6 Microsatellites (MS) were sequenced. Obtained data were statistically analyzed by Arlequin software. RESULTS: SNPs and Microsatellites typing were compared with neighboring Turkish-speaking populations (from Turkey and Azerbaijan) and Turkmens representing a possible source group who imposed the Turkish language during 11-15(th) centuries AD. Azeris demonstrated high level of gene diversity compatible with patterns registered in the neighboring Turkish-speaking populations, whereas the Turkmens displayed significantly lower level of genetic variation. This rate of genetic affiliation depends primarily on the geographic proximity. CONCLUSION: The imposition of Turkish language to this region was realized predominantly by the process of elite dominance, i.e. by the limited number of invaders who left only weak patrilineal genetic trace in modern populations of the region.
Entities:
Keywords:
Iran; Iranian Azersi; Microsatellites; SNPs; Y chromosome diversity
Due to its geo-strategic location in the Middle East, the Iranian plateau has served as a key crossroad for human disseminations and played a critical role in the migratory waves between the populations of the Middle East and beyond (1–5). The most important long-term factor in this process was human adaptation to the Iranian plateau and its geographical, topographical, and climatic conditions with the subsequent development of agriculture, pastoralism, and pastoral nomadism. The spread of these technological innovations, along with a series of major demographic and historical events, has resulted in a large diversity and dispersal of ethnic groups and languages (1, 6, 7).Between the third and second millennia BC, the Iranian plateau became exposed to incursions of pastoral nomads from the Central Asian steppes (1), which were a difficult environment for agriculture but ideally suited to animal husbandry and pastoral nomadism.Presumably, via an elite-dominance process, existing Dravidian language across the regeon was substituted by Indo-Iranian language, which is a branch of Indo-European language (8–10). Also their genetic impacts were as significant as the imposition of their language, which is clearly observed in Iran (11), Pakistan (12) and northern India (13).In the period of the eleventh to thirteenth centuries AD the Arab-Muslim, Seljuk and subsequent Turkic-Mongol invasions signaled the arrival of a new people with flocks and culture. Specifically, in a series of rapid Arab-Muslim conquests in the seventh century, the Arab armies swept through most of the Middle East, completely engulfing the Persian lands (7).The dominance of the Arabs came to a sudden end in the mid-eleventh century with the arrival of Seljuk Turks, a clan of the Oguz Turks (8). The expanding waves of these Altaic-speaking nomads from Central Asia involved regions farther to the west, such as Iran, Iraq, Anatolia, and the Caucasus, where they imposed Altaic (Turkish) languages. In these western regions, however, the genetic contribution is low or undetectable (14), even though the power of these invaders was sometimes strong enough to impose a language replacement, as in Turkey and Azerbaijan (1).Later, the Mongol armies also moved westward and, by the early thirteenth century, established their rule over a vast region, including Iran and advancing as far west as the Caucasus and Turkey (1, 7). These waves of various invasions and subsequent migrations resulted in major demographic expansions in the region, which added new languages and culture to the mix of peoples that had pre-existed in Iran.In general, a considerable genetic diversity is observed in Iranian populations, which resembles to that of Middle East patterns as a whole and strengthens the idea of Persia, being the main crossroad for human dissemination (3, 11, 15–18). This area is remarkable for its high level of ethnic and linguistic diversity, comprising the major language families (Indo-European, Altaic, and Afro-Asiatic) currently spoken by more than seventy ethnically different populations (http://www.ethnologue.org/). This demonstrates the role of Iran, which played in population dispersal across the latitudinal belt spanning from western Anatolia to the Indus Valley. However, there have been gaps in high-resolution genetic analyses for this region to uncover population history at a fine scale, for example, for particular ethnic and linguistic groups.In this project, we intended to get relatively comprehensive information about the Y chromosome diversity in Azeris living in Iran. Subsequently, the principal aim of this paper was to identify the place of Azeris in the frame of Turkic-speaking populations of the Middle East and to test the extent of gene flow from Central Asia. We used both SNP and STR genetic markers on the nonrecombining portion of Y chromosome, which provide high level of genetic resolution and are consistent with other sets of markers applied to the studies on patrilineal genetic history of various populations (19, 20).
Material and Methods
Buccal swab specimens were collected from 100 ethnical Azerimen currently living in Tabriz, Iran in 2008. One sample was later discarded, as the Y-chromosome typing was unsuccessful.All donors were selected only if their paternal grandfathers were from the same region and they were unrelated to other donors at the grandfather level. The samples were stored in a DNA preservative solution consisting of 0.5% sodium dodecyl sulphate and 0.05M ethylenediaminetetraacetic acid for transport purposes. Samples were collected anonymously and informed consent was obtained from all individuals before samples were taken.The comparative data sets have been taken from previously published papers (19, 21, 22); the results for the Turkmen population are available only for microsatellite loci, therefore the comparison with this data set was conducted only at STR markers.Standard phenol-chloroform DNA extractions were performed. The strategy adopted for typing samples was designed to ensure informative comparison with existing published data. NRY were characterized by 12 binary Y chromosome polymorphisms: 92R7, M9, M13, M20, sY81, SRY+465, SRY4064, SRY10831, Tat, M17, Alu insert-YAP, and p12f2, (19, 23) and screened for six microsatellite (MS) markers: DYS19, DYS388, DYS390, DYS391, DYS392, and DYS393, as described by Thomas et al. (24). Haplogroups (hg) were defined by single nucleotide polymorphism (SNP) markers according to the Y Chromosome Consortium nomenclature (25). Microsatellite repeat numbers were assigned according to Kayser et al. (26).Either the microsatellite PCR products and UEP digestion products were run on an ABI-310 capillary-based genetic analyzer or a gel based system such as an ABI-377 automated sequencer. For the ABI-310 genetic analyzer, 1.2 ul aliquots of the microsatellite PCR products or UEP digestion products were mixed with 0.5 ul size standard labeled with the fluorescent dye TAMRA (PE-Applied Biosystems) and 12 ul of de-ionizedformamide. Samples were denaturated at 96° C for 3 min and chilled on ice for 5 min before being run, using POP-4 polymer and a 36 cm POP-4 capillary. For the ABI-377 automated sequencer, 1.0 ul aliquots of the microsatellite PCR products or UEP digestion products were mixed with 2.0 ul of a loading buffer (formamide: dextran blue: TAMRA-labeled size standard, in the ratio of 23: 4: 2).Unbiased genetic diversity index, h, and its standard error were calculated using the formulae of Nei (27). Nei’s Genetic Identity, I, was calculated in accordance with Nei (27). Pairwise genetic distances (FST) were estimated from analysis of molecular variance (AMOVA) ΦST values with the aid of Arlequin software (28). Tests for significant population differentiation were carried out using the exact test for population differentiation (27, 29). Testing for differences in h between populations was performed by bootstrapping method (30).Principal Coordinates Analysis was conducted on similarity matrices calculated as one minus Genetic distance (FST, RST) or based on Nei’s Genetic Identity values. Figures along the main diagonal, representing the similarity of each population sample to itself, were calculated from the estimated genetic distance between two copies of the same sample.Signature haplotype analysis (high frequency modal haplotypes and modal clusters) (19, 31–33) was performed by hand.
Results
The number of haplogroups detected in Azeris is the highest, nine, whereas in Eastern Turks, it is seven, and in Azerbaijanis, it is only five. The latter might be explained by the limited number of specimens in this data set. Nevertheless, we tried to make some inferences about the genetic structure of these groups based on haplogroup frequencies.The most common haplogroup in all data sets is haplogroup J that is present at almost equal rate in the three groups: 39.39% in Azeris, 40.00% in Azerbaijanis and 39.02% in Eastern Turks. The next frequently encountered haplogroup is hg BR* (xDE,JR) which is registered at 23.23%, 40.00%, and 9.76%, respectively. The haplogroups P*(xR1a), E*(xE3a) and R1a1 are also rather frequent in Eastern Turks (19.51%, 14.63% and 10.98%, respectively). It is necessary to add that haplogourp N3 defined by Tat mutation which presumably originated in Central Asia (34) is detected only in Azeris and Azerbaijanis.The overall comparison of population structures between the groups shows that Azerbaijanis differ significantly from Eastern Turks according to exact test of population differentiation (P< 0.0001); in the same time, there is no significant disparity between the Iranian Azeris and the two other (Azerbaijanis and Eastern Turks) comparative data sets (P> 0.05). This pattern of genetic relatedness is supported also by Fst values, which indicate closer genetic affinity between the Azeris and Eastern Turks, as well as Azeris and Azerbaijanis; the biggest genetic distance was detected between Azerbaijanis and Eastern Turks.At STR level, we had possibility to add Turkmens in comparative data sets. Microsatellite markers identified 137 haplotypes in total while considering the four groups. The most diverse pattern of haplotypes is observed in Azerbaijanis and Azeris (32 haplotypes in 40 samples, and 76 in 99, respectively). In contrast, Eastern Turks and Turkmens display much lower level of variability (49 in 82 and 18 in 51, respectively). It is worth mentioning that Azeris bear 52 unique haplotypes, i.e. they are not encountered in other three data sets; only two haplotypes were shared by all groups considered.The above-mentioned pattern of genetic variability is reflected in the actual values of gene diversity, h (Fig. 1). The Azeris show the highest level of genetic diversity (h= 0.9934, bootstrapped value −0.9834). The dramatically lower rate of this parameter was registered for Turkmens (actual value −0.8267, bootstrapped value −0.8068). Azerbaijanis and Eastern Turks also have rather high level of gene diversity, although they are still lower than in Azeris. While comparing differences in h values we found two not significant differences-between Azerbaijanis and Azeris, as well as between Azerbaijanis and Eastern Turks-using bootstrap method. In case of Bayesian approach, all possible comparisons show significant level of differences (P< 0.0001, Table 1).
Fig. 1:
Genetic diversity, h, with bootstrap 95% confidence intervals
Table 1:
Pairwise differences in h values based on SNP+MS haplotype (lower left table, based on bootstrap approach; upper right table, based on Bayesian method)
Azeri (n=99)
Azerb (n=40)
ET (n=82)
Turkmen (n=51)
h
Azeri
-
0.0000
0.0000
0.0000
0.9934
Azerb
0.5721
-
0.0000
0.0000
0.9885
ET
0.0122
0.9004
-
0.0000
0.9744
Turkmen
0.0000
0.0001
0.0003
-
0.8267
Significant values in underlined
The AMOVA analysis revealed that the bulk of observed genetic diversity is explained by within-population differentiation (94.16%), and only about 5.84% reflects inter-population variability. Once again, this result supports the general rule that within-group variability is the main source of human genetic diversity.The Azeri modal haplotypes, ht 14-15-23-10-11-12, is detected at 5.05% rate and is modal in Azerbaijani data set (7.50%). The comparable level of this haplotype is found in Eastern Turks (6.10%), while in Turkmens it is at 1.96%. The modal haplotypes in Eastern Turks data set, each at 8.54%, ht 13-12-24-10-11-13 and ht 14-12-24-11-13-12, are detected also in Azerbaijanis (2.50% each) and in Azeris (3.03% and 2.02%, respectively), being absent in Turkmens. Two modal haplotypes of Turkmens, ht 15-12-24-10-11-13 (35.29%) and 16-12-24-10-11-13 (21.57%) display very low frequency only in Azeris (2.02% and 1.01%, respectively).Actually, these two haplotypes are one-step neighbors and therefore might be considered as one modal cluster. The modal cluster in Turkmens, the most pronounced one, accounting for 60.78%; was found at low frequency only in Azeris (3.03%) and is totally absent in Azerbaijanis and Eastern Turks (Table 2). The modal cluster of Azeris and Azerbaijanis, being the same, accounts, respectively, for 20.00% and 14.14%; it is at comparable rate in Eastern Turkey (9.76%) and at much lower level in Turkmens (1.96%). Eastern Turks’ modal clusters, accounting for 10.98% and 9.76%, are present only in neighboring populations (Azerbaijanis and Azeris) and are absent in Turkmens.
Table 2:
Frequently Encountered Clusters
MS
Azerb (n=40)
Azeri (n=99)
ET (n=82)
Turkmen (n=51)
14 15 23 10 11 12
0.2000
0.1414
0.0976
0.0196
13 12 24 10 11 13
0.0250
0.0606
0.1098
0.0000
14 12 24 11 13 12
0.1000
0.0303
0.0976
0.0000
15 12 24 10 11 13
0.0000
0.0303
0.0000
0.6078
Significant values in underlined
It displays some important patterns of relationships between the groups. First, it is rather evident that Turkmens are almost equally distant from the rest of Turkic-speaking populations studied. In the same time, Azeris, Azerbaijanis, and Eastern Turks form some sort of dense cluster, possibly reflecting the close genetic contacts between these groups compared to Central Asian Turkic-speaking peoples. While constructing PCO plots based on SNP+MS haplotypes, the relationship between Middle Eastern populations becomes more refined and it is in a full accordance with the results of genetic distance comparison.Once again, the plot proves that Azeris occupy intermediate position between their close neighbors that might witness to some extent a common origin and/or intense genetic contacts since ancient times. The mentioned relationships between the populations are fully supported by the exact test of population differentiation (Table 3 based on MS data only).
Table 3:
P values for the exact test of population differentiation based on MS data
Azerb (n=40)
Azeri (n=99)
ET (n=82)
Turkmen (n=51)
Azerb
-
Azeri
0.363
-
ET
0.009
0.060
-
Turkmen
0.000
0.000
0.000
-
Genetic distance data (Rst based on MS only) were used to visualize the spatial relationships of the groups (Fig. 2).
Fig. 2:
Principal coordinates plot based on RST values
Discussion
The results obtained show that the rate of genetic relatedness between the populations considered depends in the first instance on the spatial proximity than on the belonging to the same linguistic group. In this context, these outcomes were highly expected taking into consideration the actual geographic location of the three populations. Azeris, being situated in between Azerbaijanis and Eastern Turks, had more possibilities of genetic contacts with both groups as the closest neighbors, while gene flow between Azerbaijan and Eastern Turkey could have been rather limited.In general, we can make rather strong inferences about the genetic relatedness between the populations under consideration. The principal one is that Iranian Azeris have much weaker genetic affinity with Turkmens than with their immediate neighbors. The same statement could be attributed to the Azerbaijanis and Eastern Turks. It seems that Turkmens had no marked input in the gene pool of Azeris, Azerbaijanis, and Eastern Turks, despite very close linguistic affinity of these groups belonging to Turkic-speaking populations. We have all grounds to suggest that language replacement took place through elite dominance phenomenon rather than demic diffusion model (35).A weak genetic affinity between Middle Eastern Turkic-speaking populations and Turkmens is possibly explained by the fact that Central Asian populations had not any essential gene flow to the origin of Turkic speaking peoples of South Caucasus and Asia Minor, which is supported also by the results of Cinnioğ lu et al. (21). Therefore, the imposition of Turkic language to this region was realized predominantly by limited number of invaders who left only weak genetic signal in modern populations of the region. The same pattern of geographic vs. genetic relatedness was revealed while comparing Indo-European speaking Bakhtiari, and Semitic-speaking Arabs (36). Both mtDNA and the Y chromosome, showed a close relationship of these groups with each other and with neighboring geographic groups, irrespective of the language spoken. Moreover, Semitic-speaking North African groups are more distant genetically from Semitic-speaking groups from the Near East and Iran. Similar results were recently obtained in the region of north-west Iran: the Uromian people (Iranian Muslim group) display a particularly close genetic relationship to the Armenians living in the same area (4). Thus, geographical proximity better explains genetic relatedness between populations than linguistic relatedness in this part of the world.As it was shown in our recently published results (37), based on multivariate classification Iranian Azeris and their close neighbors, Persians and Armenians, form a rather distinct cluster of Middle East origin. This pattern was obtained both while using Principal Coordinate Analysis and Neighbor-Joining method for phylogenetic inferences.As a whole, the results obtained indicate that the used set of genetic markers is an appropriate tool for population genetics study of such an ethnically and linguistically complex area as the Middle East. The methods applied allowed distinguishing fine specific features of each population and making inferences about their origin and possible ancient genetic contacts.We also realize that the Y chromosome represents only one locus in the human genome and describes only one, patrilineal, facet of the genetic history of human populations. The more comprehensive results on the origin, ancient relationships, and migrations of the populations of the Middle East can be achieved while using other complementary genetic systems, i.e. mtDNA and autosomal markers. Nevertheless, the Y chromosome markers provide rather strong information on the genetic history of the populations studied in the frame of this project. In addition, the main outcome of the project is that all the three populations could be considered as indigenous representatives of the area of their inhabitance with very limited genetic influence from East.
Ethical Considerations
Ethical issues including plagiarism, informed consent, misconduct, data fabrication and/or falsification, double publication and/or submission, redundancy, etc. have been completely observed by the authors.
Authors: M G Thomas; T Parfitt; D A Weiss; K Skorecki; J F Wilson; M le Roux; N Bradman; D B Goldstein Journal: Am J Hum Genet Date: 2000-02 Impact factor: 11.025
Authors: Cengiz Cinnioğlu; Roy King; Toomas Kivisild; Ersi Kalfoğlu; Sevil Atasoy; Gianpiero L Cavalleri; Anita S Lillie; Charles C Roseman; Alice A Lin; Kristina Prince; Peter J Oefner; Peidong Shen; Ornella Semino; L Luca Cavalli-Sforza; Peter A Underhill Journal: Hum Genet Date: 2003-10-29 Impact factor: 4.132
Authors: M Kayser; A Caglià; D Corach; N Fretwell; C Gehrig; G Graziosi; F Heidorn; S Herrmann; B Herzog; M Hidding; K Honda; M Jobling; M Krawczak; K Leim; S Meuser; E Meyer; W Oesterreich; A Pandya; W Parson; G Penacino; A Perez-Lezaun; A Piccinini; M Prinz; C Schmitt; L Roewer Journal: Int J Legal Med Date: 1997 Impact factor: 2.686
Authors: T Zerjal; B Dashnyam; A Pandya; M Kayser; L Roewer; F R Santos; W Schiefenhövel; N Fretwell; M A Jobling; S Harihara; K Shimizu; D Semjidmaa; A Sajantila; P Salo; M H Crawford; E K Ginter; O V Evgrafov; C Tyler-Smith Journal: Am J Hum Genet Date: 1997-05 Impact factor: 11.025
Authors: J F Wilson; D A Weiss; M Richards; M G Thomas; N Bradman; D B Goldstein Journal: Proc Natl Acad Sci U S A Date: 2001-04-03 Impact factor: 11.205
Authors: J R Luis; D J Rowold; M Regueiro; B Caeiro; C Cinnioğlu; C Roseman; P A Underhill; L L Cavalli-Sforza; R J Herrera Journal: Am J Hum Genet Date: 2004-02-17 Impact factor: 11.025