| Literature DB >> 28974757 |
S Pischedda1,2,3,4, R Barral-Arca1,2,3,4, A Gómez-Carballa1,2,3,4, J Pardo-Seco1,2,3,4, M L Catelli5, V Álvarez-Iglesias1,2, J M Cárdenas1,2,6, N D Nguyen7, H H Ha7, A T Le7, F Martinón-Torres3,4, C Vullo5, A Salas8,9.
Abstract
The territory of present-day Vietnam was the cradle of one of the world's earliest civilizations, and one of the first world regions to develop agriculture. We analyzed the mitochondrial DNA (mtDNA) complete control region of six ethnic groups and the mitogenomes from Vietnamese in The 1000 Genomes Project (1000G). Genome-wide data from 1000G (~55k SNPs) were also investigated to explore different demographic scenarios. All Vietnamese carry South East Asian (SEA) haplotypes, which show a moderate geographic and ethnic stratification, with the Mong constituting the most distinctive group. Two new mtDNA clades (M7b1a1f1 and F1f1) point to historical gene flow between the Vietnamese and other neighboring countries. Bayesian-based inferences indicate a time-deep and continuous population growth of Vietnamese, although with some exceptions. The dramatic population decrease experienced by the Cham 700 years ago (ya) fits well with the Nam tiến ("southern expansion") southwards from their original heartland in the Red River Delta. Autosomal SNPs consistently point to important historical gene flow within mainland SEA, and add support to a main admixture event occurring between Chinese and a southern Asian ancestral composite (mainly represented by the Malay). This admixture event occurred ~800 ya, again coinciding with the Nam tiến.Entities:
Mesh:
Year: 2017 PMID: 28974757 PMCID: PMC5626762 DOI: 10.1038/s41598-017-12813-6
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Summary statistics of CR sequences in Vietnam carried out on six Vietnamese locations. All the computations were undertaken on the common sequence segment that ranges from 16024 to 574. Main regions represent the following populations: North = Lao Cai + Cao Bang + Ha Noi + Hai Pong, Center = Da Nang, and South = Ho Chi Minh.
| Population | n | k | k/n | S | nmut | H ± SE | Π ± SE | M |
|---|---|---|---|---|---|---|---|---|
| Provinces | ||||||||
| Cao Bang | 113 | 108 | 0.96 | 147 | 151 | 0.999 ± 0.001 | 0.0103 ± 0.0004 | 11.5 |
| Hai Pong | 133 | 119 | 0.89 | 150 | 156 | 0.997 ± 0.001 | 0.0105 ± 0.0003 | 11.7 |
| Ha Noi City | 38 | 38 | 1.00 | 89 | 91 | 1.000 ± 0.006 | 0.0110 ± 0.0005 | 12.3 |
| Lao Cai | 115 | 65 | 0.57 | 88 | 91 | 0.980 ± 0.005 | 0.0089 ± 0.0003 | 9.9 |
| Da Nang | 135 | 123 | 0.91 | 148 | 153 | 0.998 ± 0.001 | 0.0102 ± 0.0002 | 11.4 |
| Ho Chi Minh | 88 | 83 | 0.94 | 133 | 137 | 0.998 ± 0.002 | 0.0109 ± 0.0004 | 12.1 |
| Main regions | ||||||||
| North | 399 | 309 | 0.77 | 216 | 230 | 0.997 ± 0.001 | 0.0104 ± 0.0002 | 11.6 |
| Center | 135 | 123 | 0.91 | 148 | 153 | 0.998 ± 0.001 | 0.0102 ± 0.0002 | 11.4 |
| South | 88 | 83 | 0.94 | 133 | 137 | 0.998 ± 0.002 | 0.0109 ± 0.0004 | 12.1 |
| Ethnic group | ||||||||
| Hoa | 23 | 23 | 1 | 70 | 71 | 1.000 ± 0.013 | 0.0107 ± 0.0007 | 12.0 |
| Kho Me | 1 | — | — | — | — | — | — | — |
| Kinh | 399 | 334 | 0.84 | 218 | 230 | 0.998 ± 0.000 | 0.0105 ± 0.0002 | 11.7 |
| Mong | 115 | 65 | 0.57 | 88 | 91 | 0.980 ± 0.005 | 0.0089 ± 0.0003 | 9.9 |
| Nung | 21 | 21 | 1 | 54 | 54 | 1.000 ± 0.015 | 0.0094 ± 0.0006 | 10.5 |
| Tay | 62 | 54 | 0.87 | 114 | 118 | 0.998 ± 0.003 | 0.0110 ± 0.0005 | 12.2 |
| Thai | 1 | — | — | — | — | — | — | — |
|
| 622 | 478 | 0.77 | 252 | 271 | 0.998 ± 0.000 | 0.0105 ± 0.0001 | 11.7 |
n = Sample size; k = Number of different haplotypes; S = number of polymorphic (segregating sites); n mut = total number of mutations; H = haplotype diversity and standard error; π = nucleotide diversity and standard error; M = average number of nucleotide differences.
Figure 1(A) Frequencies of main haplogroup and sub-haplogroups by ethnic groups. (B) Map showing the location of the main Vietnamese regions analyzed in the present study. The pie charts display the frequency values for the main haplogroup categories. Maps were generated using R Project for Statistical Computing v. 3.3.1 (https://www.r-project.org/) and the package autoMap v. 1.0–14 (https://cran.r-project.org/web/packages/automap/index.html). Packages sp v. 1.2–5, rgdal v. 1.2–8, gstat v. 1.1–5, raster v. 2.5–8 and latticeExtra v. 0.6–28 were also used to improve visual appearance of the maps.
Figure 2(A) Interpolated geographic maps of haplotype diversity (crosses indicate sample points) and nucleotide diversity values. (B) and (C) Interpolated maps of main haplogroup frequencies across the territory of Vietnam. Maps were created as in Fig. 1.
Haplogroup frequencies in Vietnam by sample location and ethnic groups.
| Sample origin |
| M | C | D | M7 | M(×D,C) | N | A | B | R9’F | N(×A, B, R9’F) |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Sample locations | |||||||||||
| | 113 | 0.46 | 0.04 | 0.05 | 0.29 | 0.37 | 0.54 | 0.02 | 0.14 | 0.28 | 0.10 |
| | 135 | 0.44 | 0.00 | 0.06 | 0.25 | 0.38 | 0.56 | 0.01 | 0.32 | 0.21 | 0.02 |
| | 38 | 0.42 | 0.03 | 0.03 | 0.26 | 0.36 | 0.58 | 0.05 | 0.12 | 0.36 | 0.05 |
| | 133 | 0.43 | 0.04 | 0.03 | 0.23 | 0.36 | 0.57 | 0.01 | 0.14 | 0.32 | 0.10 |
| | 88 | 0.33 | 0.02 | 0.09 | 0.12 | 0.22 | 0.67 | 0.01 | 0.27 | 0.29 | 0.10 |
| | 115 | 0.29 | 0.13 | 0.07 | 0.04 | 0.09 | 0.71 | 0.05 | 0.41 | 0.16 | 0.09 |
| | 622 | 0.39 | 0.05 | 0.05 | 0.20 | 0.29 | 0.61 | 0.02 | 0.25 | 0.27 | 0.07 |
| Ethnic group | |||||||||||
| | 23 | 0.39 | 0 | 0.17 | 0.13 | 0.22 | 0.61 | 0.00 | 0.13 | 0.39 | 0.09 |
| | 1 | — | — | — | — | — | — | — | — | — | — |
| | 399 | 0.41 | 0.02 | 0.04 | 0.23 | 0.35 | 0.59 | 0.01 | 0.23 | 0.27 | 0.08 |
| | 115 | 0.29 | 0.13 | 0.07 | 0.04 | 0.09 | 0.71 | 0.05 | 0.41 | 0.16 | 0.09 |
| | 21 | 0.56 | 0.05 | 0.14 | 0.3 | 0.37 | 0.44 | 0.05 | 0.10 | 0.24 | 0.05 |
| | 62 | 0.45 | 0.07 | 0.05 | 0.3 | 0.33 | 0.55 | 0.02 | 0.16 | 0.30 | 0.07 |
| | 1 | — | — | — | — | — | — | — | — | — | — |
Figure 3Maximum parsimony trees based on mitogenomes representing haplogroup M7b1a1f (A) and F1f (B). The revised Cambridge reference sequence (rCRS) is shown as reference for nomenclature[52]. Genetic variants are indicated along the branches of phylogeny as follows: all of them are transitions unless a suffix A, C, G, or T indicates a transversion, and a prefix ‘@’ indicates a back mutation. As per common practice, the trees do not consider hotspot mutations at positions 16182, 16183, and 16519, nor variation around position 310 and length or point heteroplasmies. The ID numbers in the tips of the phylogeny identify mitogenomes as indicated in Table S2; this table also show details of the geographic or ethnic origin of all the samples. (C) EBSPs of haplogroup F1f and M7b1a1f obtained from complete mitogenomes. EBSPs with 95% HPD (highest posterior density) intervals are provided in Figure S3.
Analysis of molecular variance (AMOVA) accounting for main geographic regions (MGR), ethnic group (EG), and sampling location (SL) (P-value < 0.0000).
| Source of Variation | Percentage of variation | |||||
|---|---|---|---|---|---|---|
| Present study | Meta-analysis | |||||
| MGR | EG | SL | MGR | EG | SL | |
| Among groups | 0 | 0.57 | 1.61 | 0 | 1.28 | 1.31 |
| Among population within groups | 0.31 | 0.03 | 0.31 | 0.41 | 0.2 | 0.31 |
| Within populations | 99.69 | 99.41 | 98.08 | 99.59 | 98.52 | 98.37 |
Figure 4PCA of Vietnamese populations analyzed in the present article versus other Asian populations (A) and versus SEA/Southern China populations (B). Haplogroup frequencies from the reference populations were taken from Zhang et al.[16]. Note that there are two Vietnam_Kinh samples in the plot, one represents our sample from Kinh and another one that was taken from the literature.
Maternal gene flow between the main linguistic families within Vietnam, and between Vietnam neighboring countries.
| Model | lmL | LBF | Probability |
|---|---|---|---|
| Linguistic groups | |||
| Full | −4003.198 | −168.8531 | 0.000 |
| Panmictic | −4087.624 | 0.000000 | 1.000 |
| Vietnam vs Laos | |||
| Full | −2725.178 | −16.93701 | 0.000 |
| Panmictic | −2716.710 | 0.00000 | 1.000 |
| Laos to Vietnam | −2726.531 | −19.64204 | 0.000 |
| Vietnam to Laos | −2763.987 | −94.55403 | 0.000 |
| Vietnam vs Cambodia | |||
| Full | −2332.827 | −5.512706 | 0.004 |
| Panmictic | −2358.694 | −57.246388 | 0.000 |
| Cambodia to Vietnam | −2333.580 | −7.018030 | 0.001 |
| Vietnam to Cambodia | −2330.071 | 0.000000 | 0.995 |
Figure 5Analysis carried out on autosomal SNPs. (A) MDS of population samples from the Indochinese Peninsula and neighboring samples. Both plots were built using the same sample sets, but the one to the right aims at highlighting the center of each population sample points in order to easy interpretation (B) Admixture analysis including reference samples from Europe (CEU) and Africa (YRI). (C) Analysis of f3-statistics of Vietnamese (KHV) versus different neighboring population samples. (D) D-statistics of Vietnamese built as follows D(CHS, KHV; Y, OUTGROUP) and D(Y, KHV; CHS OUTGROUP). (E) Estimates of admixture between Chinese and Malay, using the samples CHS and Malay as subrogates of those that contributed to the present genomic architecture of present-day Vietnamese. Estimates were statistically significant according to the ad hoc z test from ALDER.