| Literature DB >> 27837350 |
Wibhu Kutanan1,2, Jatupol Kampuansai3, Metawee Srikummool4, Daoroong Kangwanpong3, Silvia Ghirotto5, Andrea Brunelli5, Mark Stoneking6.
Abstract
The Tai-Kadai (TK) language family is thought to have originated in southern China and spread to Thailand and Laos, but it is not clear if TK languages spread by demic diffusion (i.e., a migration of people from southern China) or by cultural diffusion, with native Austroasiatic (AA) speakers switching to TK languages. To address this and other questions, we obtained 1234 complete mtDNA genome sequences from 51 TK and AA groups from Thailand and Laos. We find high genetic heterogeneity across the region, with 212 different haplogroups, and significant genetic differentiation among different samples from the same ethnolinguistic group. TK groups are more genetically homogeneous than AA groups, with the latter exhibiting more ancient/basal mtDNA lineages, and showing more drift effects. Modeling of demic diffusion, cultural diffusion, and admixture scenarios consistently supports the spread of TK languages by demic diffusion.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27837350 PMCID: PMC5214972 DOI: 10.1007/s00439-016-1742-y
Source DB: PubMed Journal: Hum Genet ISSN: 0340-6717 Impact factor: 4.132
Fig. 1Map showing the geographic locations of the studied populations and their language family affiliation. Bar plots illustrate the relative frequency of major haplogroups by population. Dark and white shades show haplogroups B, F and M7, which are specific to Southeast Asian populations, whereas the remaining haplogroups (D, M12, M20, M24, M74, R9, R22 and other haplogroups) are represented by various colors
Fig. 2The MDS plot of dimension 1 vs. dimension 2 (a, c) and dimension 1 vs. dimension 3 (b, d) based on the Φ st genetic distance matrix among the entire set of 51 populations (a, b) and after removal of three outliers, namely TN1, TN2 and SK (c, d). Population abbreviations are provided in Fig. 1. Triangles and circles represent TK- and AA-speaking populations, respectively. Black, red, dark blue and pink colors indicate North, Northeastern, Central and West geographic regions of Thailand respectively; green indicates the two Lao populations
Analysis of molecular variance (AMOVA) results
| Grouping | Number of groups | Percent variation | ||
|---|---|---|---|---|
| Among groups | Among population (within group) | Within population | ||
| Geography | ||||
| Geography 1a | 5 | 0.07 | 7.63** | 92.3** |
| Geography 2b | 4 | 0.36 | 7.77** | 91.86** |
| Northern Thailand | 1 | – | 7.76** | 92.24 |
| Northeastern Thailand | 1 | – | 8.69** | 91.31 |
| Central Thailand | 1 | – | 6.83** | 93.17 |
| Western Thailand | 1 | – | −0.43 | 100.43 |
| Laos | 1 | – | 0.66** | 99.34 |
| Language | ||||
| Language 1c | 2 | 0.49* | 7.42** | 92.1** |
| Language 2d | 6 | 2.56** | 6.01** | 91.43** |
| Language 3e | 10 | 2.42** | 5.68** | 91.9** |
| Austroasiatic | 1 | – | 11.44** | 88.56 |
| Tai–Kadai | 1 | – | 4.74** | 95.26 |
| Ethnicity | ||||
| Mon | 1 | – | 7.1** | 92.9 |
| H’tin | 1 | – | 25.71** | 74.29 |
| Lawa | 1 | – | 7.78** | 92.22 |
| Khmer | 1 | – | 11.10** | 88.90 |
| Khon Mueang | 1 | – | 3.43** | 96.57 |
| Lao Isan | 1 | – | 2.31** | 97.69 |
| Phuan | 1 | – | 5.29** | 94.71 |
* Significant at 0.05 level; ** significant at 0.01 level
aGeography 1: Northern Thailand, Northeastern Thailand, Central Thailand, Western Thailand, Laos)
bGeography 2: (Northern Thailand, Northeastern Thailand, Central Thailand, Western Thailand)
cLanguage 1: (Austroasiatic, Tai–Kadai)
dLanguage 2: (Northern Tai, Southwestern Tai, Monic, Southern Monic, Eastern Mon-Khmer, Northern Mon–Khmer)
eLanguage 3: (Northern Tai, Chiang Saen, Lao–Phutai, Northwestern Tai, Monic, Southern Monic, Palaungic, Khmuic, Khmer, Katuic)
Fig. 3The MDS plot of dimension 1 vs. dimension 2 based on Φ st genetic distance matrix from mtDNA genomes among the presently studied populations and other populations from the literature. Population abbreviations are provided in Fig. 1 and Table S2
The Bayesian estimates (BE) of coalescence times with 95% credible intervals (CI) for each haplogroup
| Haplogroup | Sample size | BE | CI |
|---|---|---|---|
| A | 17 | 24,401 | 16,499–33,138 |
| A14 | 14 | 18,176 | 11,437–25,939 |
| A17 | 10 | 14,071 | 7976–20,878 |
| B4 | 74 | 34,814 | 30,445–46,173 |
| B4a1c4 | 14 | 10,240 | 6182–14,487 |
| B4b1a2 | 19 | 13,455 | 8215–19,131 |
| B4b1a2a | 17 | 9067 | 4283–11,449 |
| B4c2 | 11 | 10,623 | 7107–17,631 |
| B4 g | 10 | 19,684 | 12,839–26,407 |
| B4e | 7 | 15,661 | 11,310–24,892 |
| B5 | 162 | 36,397 | 24,836–46,990 |
| B5a | 160 | 20,252 | 13,196–27,886 |
| B5a1 | 158 | 16,857 | 11,693–22,532 |
| B5a1a | 65 | 9465 | 7267–11,972 |
| B5a1b1 | 26 | 8507 | 6438–10,686 |
| B5a1d | 52 | 8705 | 6641–11,077 |
| B6a | 26 | 34,428 | 24,086–47,839 |
| CZ | 38 | 37,711 | 26,934–48,685 |
| C7 | 32 | 18,599 | 12,417–26,106 |
| D | 58 | 34,847 | 26,392–44,310 |
| D4 | 52 | 25,375 | 20,235–31,447 |
| D5 | 6 | 23,206 | 16,365–30,866 |
| F1a | 184 | 17,825 | 12,565–23,276 |
| F1a1a | 134 | 10,075 | 7755–11,701 |
| F1a1a1 | 69 | 8817 | 7092–10,643 |
| F1a1d | 17 | 6676 | 3163–9231 |
| F1a3 | 15 | 7305 | 3495–10,057 |
| F1f | 65 | 12,517 | 7000–15,389 |
| F3a1 | 15 | 21,808 | 13,903–31,295 |
| G | 6 | 28,215 | 18,885–39,320 |
| H14 | 4 | 1685 | 162–4576 |
| M4 | 2 | 752 | 0–3414 |
| M5 | 8 | 36,248 | 26,787–46,432 |
| M7 | 134 | 50,282 | 39,494–62,123 |
| M7b | 106 | 38,342 | 27,442–51,252 |
| M7b1a1 | 104 | 16,723 | 12,570–21,211 |
| M7b1a1a3 | 27 | 12,659 | 8873–18,282 |
| M7b1a1b | 17 | 12,098 | 5973–19,336 |
| M7b1a1 (16192T) | 15 | 11,180 | 6323–17,000 |
| M7b1a1e1 | 13 | 5936 | 2224–11,313 |
| M7c | 28 | 30,547 | 21,905–41,116 |
| M7c1 | 21 | 21,657 | 14,519–29,420 |
| M7c1a | 12 | 3656 | 997–7882 |
| M7c2 | 7 | 8092 | 4066–14,357 |
| M8a2a1 | 5 | 12,325 | 5976–19,514 |
| M9 | 11 | 26,510 | 18,450–35,947 |
| M10a1b | 3 | 1478 | 48–4574 |
| M12-G | 35 | 53,006 | 42,129–65,779 |
| M12 | 29 | 37,225 | 29,530–46,002 |
| M12a1 | 20 | 31,096 | 24,221–38,387 |
| M12a1a | 15 | 23,184 | 16,770–30,030 |
| M12a1b | 5 | 24,369 | 17,342–31,650 |
| M12b | 14 | 27,475 | 19,665–35,577 |
| M17 | 7 | 40,440 | 29,244–52,628 |
| M20 | 29 | 12,229 | 7521–18,355 |
| M21b | 8 | 29,030 | 20,712–38,392 |
| M24 | 21 | 19,305 | 12,300–28,703 |
| M24a | 12 | 7550 | 2961–14,017 |
| M24b | 9 | 10,000 | 5175–15,821 |
| M45 | 3 | 21,338 | 11,949–32,348 |
| M49 | 4 | 23,544 | 14,606–33,592 |
| M51 | 11 | 30,097 | 21,140–40,588 |
| M57a | 2 | 764 | 0–3524 |
| M59 | 3 | 13,391 | 6372–22,559 |
| M61 | 8 | 2987 | 595–6794 |
| M68a | 2 | 16,056 | 8227–25,864 |
| M71 | 17 | 28,170 | 21,736–36,130 |
| M71 (151T) | 12 | 27,643 | 19,633–35,905 |
| M72a | 9 | 9073 | 4409–15,129 |
| M73 | 5 | 3143 | 630–6295 |
| M74 | 32 | 34,866 | 26,622–44,683 |
| M76 | 7 | 33,689 | 22,405–47,078 |
| M79 | 2 | 804 | 0–3499 |
| M91 | 5 | 34,931 | 23,358–48,322 |
| M* | 8 | 49,923 | 38,466–63,413 |
| N8 | 4 | 3116 | 683–7162 |
| N9a | 31 | 25,754 | 18,075–33,982 |
| N9a6 | 7 | 12,056 | 6415–18,767 |
| N9a10 | 16 | 17,059 | 11,630–22,635 |
| N9a10 (16311C) | 14 | 13,741 | 8569–19,217 |
| N10 | 8 | 52,013 | 37,525–68,350 |
| N10a | 7 | 11,312 | 6144–17,061 |
| N21 | 11 | 10,248 | 5291–16,123 |
| R5a1a | 3 | 1568 | 59–4465 |
| R6a2 | 3 | 12,622 | 5938–20,550 |
| R9b | 35 | 38,677 | 29,454–48,807 |
| R9b1a3 | 15 | 9849 | 5758–14,818 |
| R9b2 | 13 | 11,822 | 6899–18,096 |
| R22 | 23 | 39,214 | 29,555–50,055 |
| U2 | 3 | 43,295 | 30,742–55,978 |
| W3a1b | 7 | 13,418 | 6809–22,357 |
| Z | 6 | 21,428 | 14,175–29,084 |
Fig. 4Schematic Bayesian MCMC tree of the major haplogroups found in this study. Bayesian maximum clade credibility trees were constructed for each haplogroup with parameters as described in the “Methods” and then manually combined (dashed lines) based on PhyloTree mtDNA tree Build 17. The full Bayesian maximum clade credibility tree for each haplogroup is shown in Fig. S3
Fig. 5Four different trends of Bayesian skyline plots in fluctuation in maternal effective population size (y-axis) through time from the present in unit of years (x-axis) observed in the individual Bayesian skyline plots for the 51 populations (Fig. S6). The median estimate and the 95% highest posterior density limits are indicated by thick and thin lines, respectively. The plots were generated with 10,000,000 chains with the first 1,000,000 generations discarded as burn-in. Most populations (KM1–KM4, KM6–KM10, YU1–YU2, SH, IS3, PT, NY, KL, SK, BT1–BT2, PU1–PU5, MO1–MO5, KH2, BU, SO, SU, LW1, PL, BL1–BL2) show this trend in a; KM5, IS1–IS2 and LA2 show the trend in b; IS4 and LA1 show the trend in c; and KH1, BO, TN1–TN3, KA and LW2–LW3 show the trend in d
Fig. 6Proposed demographic models for three independent ABC tests concerning northern Thais, northeastern Thais combined with Laotian, and northeastern Thais. Each test consists of three scenarios according to three hypotheses, i.e., demic diffusion, admixture and cultural diffusion. The tables under each model are posterior probabilities computed by the acceptance–rejection procedure (AR) and by the weighted multinomial logistic regression (LR) approaches