Wenhui Li1,2, Xianwen Wang3, Xiehong Wang3, Fenfen Wang4, Zhengming Du4, Fangshu Fu5, Wenlong Wu4, Shuya Wang6, Ziqing Mu7, Chunwei Chen8, Xiaomin Hu9, Jiuyang Ding10, Yunle Meng11, Pingming Qiu11,12, Haoliang Fan1,2,11,12. 1. School of Basic Medicine and Life Science, Hainan Medical University, Haikou, China. 2. Forensic Science Center of Hainan Medical University, Hainan Medical University, Haikou, China. 3. Criminal Technical Detachment, Haikou City Public Security Bureau, Haikou, China. 4. First Clinical Medical College, Hainan Medical University, Haikou, China. 5. School of Biomedical Information and Engineering, Hainan Medical University, Haikou, China. 6. School of Public Health, Hainan Medical University, Haikou, China. 7. School of Management, Hainan Medical University, Haikou, China. 8. Public Security and Judicial Appraisal Center of Sanya City, Sanya, China. 9. Hainan Zhujian Center for Molecular Cytogenetic Clinical Testing, Haikou, China. 10. School of Forensic Medicine, Guizhou Medical University, Guiyang, China. 11. School of Forensic Medicine, Southern Medical University, Guangzhou, China. 12. Multi-Omics Innovative Research Center of Forensic Identification, Department of Forensic Genetics, School of Forensic Medicine, Southern Medical University, Guangzhou, China.
Abstract
BACKGROUND: Hainan Island, located in the South China Sea and separated from the Leizhou Peninsula by Qiongzhou Strait, is the second largest island after Taiwan in China. With the expansion of Han Chinese and the gradual formation of "South Hlai and North Han", nowadays, Hainan Hlai is the second largest population after Han Chinese in Hainan Island. Ha Hlai, distributed in southwest and southern Hainan Island, is the dominant branch of Hlai and speaks Ha localism. METHODS: We utilized the Huaxia™ Platinum PCR Amplification System (including 23 autosomal STRs and 2 sex-linked markers) to obtain the first STR profiling batch of 657 Ha Hlai individuals (497 males and 160 females). In order to explore the genetic relationships between the studied Ha Hlai and other reference populations with different language families, population genetic analyses, including PCA, MDS, STRUCTURE, and phylogenetic analysis, were conducted based upon the raw data and allelic frequencies of the polymorphic autosomal STR markers. RESULTS: In total, 271 distinct alleles were observed at the 23 STR loci. The number of diverse alleles ranged from 7 at TPOX locus to 23 at FGA locus, and the allelic frequencies varied from 0.0008 to 0.5533. In addition, the CPE and CPD were 1-7.39 × 10-10 and 1-3.13 × 10-28 , respectively. The phylogenetic analyses indicated that Ha Hlai is a Tai-Kadai language-speaking and relatively isolated population which has a close genetic and geographical relationship with Hainan Hlai, and M95 is the dominant haplogroup in Ha Hlai (56.18%). CONCLUSION: The 23 autosomal STR genetic markers were highly polymorphic as well as potentially useful for forensic applications in Hainan Ha Hlai population. The phylogenetic analyses demonstrated that small geographic scale gene flows could not be ignored and the shaping of the unique gene pool for each population was the combination effects of geographic, language, and cultural isolations.
BACKGROUND: Hainan Island, located in the South China Sea and separated from the Leizhou Peninsula by Qiongzhou Strait, is the second largest island after Taiwan in China. With the expansion of Han Chinese and the gradual formation of "South Hlai and North Han", nowadays, Hainan Hlai is the second largest population after Han Chinese in Hainan Island. Ha Hlai, distributed in southwest and southern Hainan Island, is the dominant branch of Hlai and speaks Ha localism. METHODS: We utilized the Huaxia™ Platinum PCR Amplification System (including 23 autosomal STRs and 2 sex-linked markers) to obtain the first STR profiling batch of 657 Ha Hlai individuals (497 males and 160 females). In order to explore the genetic relationships between the studied Ha Hlai and other reference populations with different language families, population genetic analyses, including PCA, MDS, STRUCTURE, and phylogenetic analysis, were conducted based upon the raw data and allelic frequencies of the polymorphic autosomal STR markers. RESULTS: In total, 271 distinct alleles were observed at the 23 STR loci. The number of diverse alleles ranged from 7 at TPOX locus to 23 at FGA locus, and the allelic frequencies varied from 0.0008 to 0.5533. In addition, the CPE and CPD were 1-7.39 × 10-10 and 1-3.13 × 10-28 , respectively. The phylogenetic analyses indicated that Ha Hlai is a Tai-Kadai language-speaking and relatively isolated population which has a close genetic and geographical relationship with Hainan Hlai, and M95 is the dominant haplogroup in Ha Hlai (56.18%). CONCLUSION: The 23 autosomal STR genetic markers were highly polymorphic as well as potentially useful for forensic applications in Hainan Ha Hlai population. The phylogenetic analyses demonstrated that small geographic scale gene flows could not be ignored and the shaping of the unique gene pool for each population was the combination effects of geographic, language, and cultural isolations.
Hainan province, which means “shore of pearls”, is the smallest province of China in terms of its land areas (totally 35.4 thousand square kilometers) which is coextensive with Hainan Island and a handful of nearby offshore islands (Xisha, Zhongsha, and Nansha archipelagoes), yet the largest province in its maritime territories (plus 2 million square kilometers), which located in the South China Sea and separated from the Leizhou Peninsula of southern Guangdong province to the north by the shallow and narrow Qiongzhou Strait, while the west coast of Hainan Island is about 320 kilometers east of northern Vietnam, across the Beibu Gulf (Liang, 2013; Zhao, Wang, & Yuan, 2007). In Chinese history, the mass landings of Han Chinese and Lingao people dated from BC 214 in the Qin Dynasty and about BC 500 during the Spring and Autumn period (Yan, 2013; Zhang, 1974). While, with the establishment of Zhuya and Dan'er prefectures on Hainan Island, Hainan Island was officially and formally incorporated into the Chinese empire in Han Dynasty (BC 110) (Weimin Zhou, 2018; Zhong, 2010). Due to constant rebellions staged by the indigenous people, the island remained firmly in the hands of the indigenous Li ancestors and the effective government was not reintroduced until Tang Dynasty (618–907 CE), even though the island remained nominally under Chinese sovereignty during that period (Du & Yip, 1993; Weimin Zhou, 2018; Zhong, 2010). Until 12‐17th centuries, the massive immigrants of Han Chinese from Fujian and Guangdong provinces began settling in the northern uplands and plains, displacing and pushing the indigenous Li farther into the central and southern highlands of Hainan Island, which gradually formed the present situations of “South Hlai and North Han” (Weimin Zhou, 2018; Xing, 2004; Yan, 2013; Zhong, 2010).Li ethnic minority, the aborigines of Hainan Island, had five dominant subgroups, Ha, Gei, Zwn, Moifau, and Jiamao, which had distinct cultures and languages that cannot be understood by each other although they all belong to the most primordial branches of the Tai‐Kadai (also called Daic) linguistic phylum (Burusphat, 2007; Norquest, 2015; Thurgood, 1994; Wang XP & Xing, 2004; Weera, 2017). Nowadays, the Li people (Hlai) are the second largest population after Han Chinese in Hainan Island with a population of 1,514,780 according to the Population of Ethnicity by Region (2018) of Hainan Statistical Year Book 2019. Old Hlai did not have their own writing system and the Li language was inherited from mouth to mouth, until 1957, the writing system of Li minority that was made up of 33 initial consonants, 99 simple or compound vowels, 6 tones, and 5 tone symbols, which was not marked tone symbols for the first tone (Shu Tone with the tone value 53) and had three overlapping letter symbol tones, designed and created by linguists, ethnologists, and anthropologists at the Scientific Conference on the Language and Writing System of Hainan Li Minority conducted at Tongshi city (now Wuzhishan city in the center of Hainan Island) which reached a consensus that the writing system was based on the local localism of the Ha branch of Hlai (Ha Hlai) and the pronunciations of the language used by the Ha Hlai who lived in Baoding village, Baoyou Town, Ledong Li Autonomous County as the fundamental dialect and received pronunciation of Hainan Li, respectively (Wang XP & Xing, 2004).Ha Hlai, also called Xiao Li or Xiao Hlai, is the most populous and most extensive branch of Hainan Li and mainly distributed in the southwest and southern Hainan Island including the most regions of Ledong Li Autonomous County, Changjiang Li Autonomous County, Baisha Li Autonomous County, and Lingshui Li Autonomous County, and the partial areas of Dongfang and Sanya with about seven hundred thousand populations totally (Du & Yip, 1993; Li, Li, et al., 2008; Sun, Yang, Ou, Chen, et al., 2007; Sun, Yang, Ou, Zhou, et al., 2007; Xing, 2004; Zhong, 2010). Ha Hlai speaks Ha localism (Xiao localism) which belongs to Tai‐Kadai language, a crucial branch of the Sino‐Tibetan language family. Hlai, as the aborigines who lived at the south entrances (Hainan Island) to East Asia which might be the border between China and Indo‐China Peninsular countries, had been separated since about 20 thousand years ago, after two dominant haplogroups, O1 and O2, entered East Asia (31‐36 thousand years ago). However, the origin of Hlai in Hainan Island is still unsolved, one hypothesis was the origin from Luoyue tribes in southeast China (now in Guangxi Zhuang Autonomous Region of mainland China) because gene flow from mainland China played a nonnegligible role in shaping the Hlai maternal pool (Song et al., 2019), and another was for the immigrations from the indigenous peoples of Southeast Asia (Bao Maohong & Wenze, 2014; Jin & Su, 2000; Mou, 2007; Osborne, 2012; Yingming, 2010). Even though the genome‐wide results of ancient DNA from eighteen Southeast Asian individuals spanning from the Neolithic period through the Iron Age (4100‐1700 years ago) supported archaic hominids in Southeast Asia were likely to be descended from southern Chinese immigrants, however, the genetic evidence were insufficient. Our previous studies reported the allele frequencies of 19 autosomal short tandem repeats (STRs) and 27 Y‐STRs of the whole Hainan Hlai, mainly exploring the forensic parameters of these STRs and Y‐STRs to potential forensic applications (Fan, Wang, Chen, Zhang, et al., 2018; Fan, Wang, Ren, et al., 2019). The tentative explorations and general impressions for Hainan Li based on forensic genetic markers were conducted, whereas, the structures of the segmented populations are still ambiguous due to the limitations to the sample sizes for the branches of Hainan Li. Hence, as the important branch of the Hlai who inhabited at the most significant “transfer” location (the south entrances to East Asia), we collected the first batch of autosomal STR data from the branch, namely Ha Hlai, and conducted a series of population genetic analyses with other populations with diverse language families to depict the STR landscape of Ha Hlai and clear up the population genetic structures.
MATERIALS AND METHODS
Ethnics standards and sample preparation
This study and the procedures were approved by the Institutional Review Boards of Hainan Medical University and the Medical Ethics Committee of Hainan Medical University (No. HYLL‐2020‐012). The humane and ethical research principles recommended by Hainan Medical University were followed in this study.Blood samples were collected using the FTA cards with the written informed consent from 657 unrelated healthy Ha Hlai individuals (497 males and 160 females) from the districts of Ha localism. All self‐declared Ha Hlai indigenous subjects were needed to be no intermarriage with people of other populations and other subgroups of Hlai, used the Ha localism as their mother tongue and resided in the regions of Ha dialect at least three generations.
DNA extractions and quantification
Genomic DNA was extracted using the QIAamp DNA Blood Mini Kit (Qiagen, Germany) according to the manufacturer's protocol. The quantity of the DNA template was determined using Qubit 3.0 Fluorometer (Invitrogen™, Life Technologies™, USA) according to the manufacturer's instructions. Based on quantitative results, samples were normalized to 1.0 ng/μl and stored at −20°C until amplification.
Multiplex amplification and STR genotyping
The Huaxia™ Platinum PCR Amplification System (Applied Biosystems®, Life Technologies™, USA) which consisted of 23 autosomal STRs, two sex‐linked inherited Y‐InDel (rs2032678), and Amelogenin for STR genotyping and co‐amplified in one multiplex PCR on the Veriti® 96 Well Thermal Cycler System (Applied Biosystems®, Life Technologies™, USA) in accordance with the manufacturer's instructions.The amplified products were separated by capillary electrophoresis on an Applied Biosystems 3500 Genetic Analyzer (Applied Biosystems®, Life Technologies™, USA). Allele designation was conducted using GeneMapper® ID‐X software (Applied Biosystems®, Life Technologies™, USA) by comparing the fragment size with the corresponding allelic ladder provided by the Huaxia™ Platinum PCR Amplification System.
Quality control
We strictly followed the recommendations of the Chinese National Standards and Scientific Working Group on DNA Analysis Methods (SWGDAM) (“Scientific Working Group on DNA Analysis (SWGDAM),” 2010) and employed the typical control DNA of 9947A as positive, and the ddH2O as negative in each batch of PCR amplification and electrophoresis. In addition, the laboratory has been accredited in accordance with ISO/IEC 17025:2005 and the China National Accreditation Service for Conformity Assessment (CNAS) (Registration No. CNAS L10088).
Statistical and phylogenetic analysis
Allelic frequencies and forensic parameters of 23 autosomal STR loci were calculated by SAS® 9.4 software and the online tool of forensic statistics analysis toolbox (FORSTAT) (Ristow & D'Amato, 2017). The forensic genetic parameters contained genotype count (N), heterozygotes (He), random match probability (PM), polymorphism information content (PIC), power of discrimination (PD), power of exclusion (PE), genetic diversity (GD), and typical paternity index (TPI). Subsequently, the probability values of Hardy–Weinberg equilibrium (HWE) were assessed using the Arlequin v3.5 software (Excoffier & Lischer, 2010).The principal component analysis (PCA) and the Multi‐Dimensional Scaling (MDS) were performed with SPSS 22.0 (Hansen, 2000) based on the allelic frequencies of autosomal STR loci, respectively, which were used to explore the extent of correlated genetic relationships. To further investigate the genetic structure of populations, a Bayesian model‐based clustering approach was adopted with STRUCTURE 2.3.4 (Pritchard, Stephens, & Donnelly, 2000). Population structures were inferred by setting the value of the clusters (K) from 2 to 8 with 10 runs and 100,000 iterations. STRUCTURE HARVESTER (Earl & vonHoldt, 2012) and DISTRUCT (Rosenberg, 2003) were performed to identify the optimum K value and show the results of population stratification, respectively. The pairwise fixation index F (Fst) (Nei, 1977) and corresponding P values between the studied population and the other 72 reference populations were estimated using the Arlequin v3.5 software (Excoffier & Lischer, 2010). Additionally, phylogenetic relationships among different populations were depicted in the Molecular Evolutionary Genetics Analysis 7.0 (MEGA 7.0) software (Kumar, Stecher, & Tamura, 2016) using the neighbor‐joining (N‐J) phylogenetic tree (Saitou & Nei, 1987) based upon the Fst genetic distance matrix, further, the phylogenetic relationships were visualized by the Interactive Tree of Life v4 (iTOL) (Letunic & Bork, 2019). Except the result of STRUCTURE, other graph results in this study were visualized using the R programming language.
RESULTS AND DISCUSSION
In this study, the STR profiling of 657 Ha Hlai individuals (497 males and 160 females) was genotyping utilizing the Huaxia™ Platinum PCR Amplification System, all samples were obtained the integrated genotypes with no triallelic or null alleles. Furthermore, we evaluated the forensic characteristics and efficiencies of genetic markers included in the Huaxia™ Platinum PCR Amplification System for forensic applications in Ha Hlai population, to make genetic differences between Ha Hlai and other populations with diverse language‐speaking populations, the bioinformatics analyses were conducted to clarify the genetic structures from different scales.
Forensic characteristics of the 23 autosomal STRs
The first STR profiling batch of the Ha Hlai was reported in our present study, a total of 657 Ha Hlai (497 men and 160 women) were successfully genotyped for 23 autosomal STRs and two sex‐linked inherited markers (rs2032678 and Amelogenin). As shown in Tables S1 and S2, totally, 271 distinct alleles were observed at the 23 STR loci. The number of diverse alleles ranged from 7 at TPOX locus to 23 at FGA locus, and the allelic frequencies varied from 0.0008 to 0.5533. The forensic related parameters of 23 autosomal STRs for Ha Hlai are presented in Table S2, the He, PM, PIC, PD, PE, GD, and TPI, spanned from 0.1218 (Penta E) to 0.4170 (TPOX), 0.0200 (Penta E) to 0.2098 (TPOX), 0.5406 (TPOX) to 0.8882 (Penta E), 0.7902 (TPOX) to 0.9800 (Penta E), 0.2709 (TPOX) to 0.7512 (Penta E), 0.6005 (TPOX) to 0.8970 (Penta E), and 1.1989 (TPOX) to 4.1063 (Penta E), respectively. In addition, the CPE and CPD were 1‐7.39 × 10−10 and 1‐3.13 × 10−28, respectively.Furthermore, to evaluate the performances of forensic‐associated autosomal STRs between Ha Hlai and Hainan Hlai, we made comparisons for both which are presented in Table S3. Except for D1S1656, D2S441, D10S1248, and D22S1045, the remaining 19 autosomal STRs, which were also genotyped in our previous study on Hainan Hlai (n = 653) (Fan, Wang, Ren, et al., 2019), had 227 diverse alleles for Ha Hlai, while for the same loci, Hainan Hlai had 270 different alleles. At D18S51 and FGA loci, Ha Hlai had 17 and 23 alleles, more than 16 and 21 in Hainan Hlai, respectively, while the numbers of other loci in Ha Hlai were less than or equal to Hainan Hlai, especially, at CSF1PO, the allele diversity had onefold in Hainan Hlai (18 vs. 9 in Ha Hlai), Hainan Hlai had obvious polymorphisms at D2S1338 and Penta E loci (21 and 32) when compared with the same loci in Ha Hlai (13 and 22, respectively). The comparisons of forensic statistical parameters between Ha Hlai and Hainan Hlai are visualized in Figure 1. Even though for the numbers of most loci, Hainan Hlai had higher diversity, there was no apparent difference for forensic‐associated parameters and overall system effectiveness compared with Ha Hlai.
Figure 1
Forensic statistical parameters and performances of 19 autosomal STRs between Ha Hlai and Hainan Hlai
Forensic statistical parameters and performances of 19 autosomal STRs between Ha Hlai and Hainan HlaiAs a result, the analyses of all these allelic frequencies and forensic statistical parameters demonstrated that the 23 autosomal STR genetic markers included in Huaxia™ Platinum PCR Amplification System (Applied Biosystems®, Life Technologies™, USA) were highly polymorphic as well as potentially useful for forensic applications in Hainan Ha Hlai population.
PCA
Ha Hlai, a branch of Hainan Hlai, speaks Ha Hlai localism which is a branch of Tai‐Kadai language generally belongs to the Sino‐Tibetan language family. To illustrate the genetic differences of the Sino‐Tibetan language‐speaking populations, especially for the Ha Hlai, we established a database which included 49 populations with Sino‐Tibetan language (30 Chinese/Sinitic, 8 Tai‐Kadai, 6 Tibeto‐Burman, and 5 Hmong‐Mien language‐speaking population, totally 60,347 individuals) to perform PCA based on allelic frequencies of the same 19 autosomal STRs (Bai et al., 2017; Chan, Chiu, Tsui, Wong, & Fung, 2005; Chen Chunbao, Tian Xin, & Hanhua, 2017; Chen et al., 2017; Chen, Zhu, Shiming, You, & Ma, 2010; Deng et al., 2007; Fan, Wang, Chen, et al., 2019; Fan, Wang, Ren, et al., 2019; Ferdous et al., 2010; Guo, 2017; Guo, Li, Wei, Ye, & Chen, 2017; He et al., 2017; Hongdan et al., 2017; Huihui Lian, Lin, & Li, 2015; Jingzhou Wang, Zhai, Wang, Zhang, & Wang, 2015; Kraaijenbrink, van Driem, Tshering of Gaselô, & de Knijff, 2007; Le Wang, Zhang, Bai, & Ye, 2013; Li, Zheng, & Jun, 2017; Li et al., 2015, 2018; Liao et al., 2019; Lili Zhang et al., 2017; Liu, Liu, & Wang, 2006a, 2006b; Liu, Chen, Huang, et al., 2017; Liu, Chen, Mei, et al., 2017; Meng Pan et al., 2012; Qiuling Liu, Huiling, & Chen, 2003; Shen, Kang, Dong, Guo, & Wang, 2015; Sun et al., 2017; Sun, Zhang, Wu, Shen, & Wu, 2015; Wenming Han, 2016; Xiao, Zhang, Wei, Pan, & Huang, 2016; Xie et al., 2014; Xiuzi & Changchun, 2017; Xu, Feng, & Yao, 2017; Xu, Xu, Wang, & Yao, 2017; Yang et al., 2017, 2018; Yao et al., 2016; Zhang, 2015; Zhang, Zhao, Guo, Liu, & Wang, 2015; Zhang et al., 2013, 2018; Zhang, Du, et al., 2017; Zhang, Hu, Du, Nie, et al., 2017; Zhang, Hu, Du, Zheng, et al., 2017; Zou et al., 2017). PCA, an effective dimensionality reduction method, could visualize the essential patterns of genetic relationships, and recognize and depict the dominant patterns within a multivariate database, it could indicate that the populations with closer geographical distances have more intimate relationships in forensic genetics. As shown in Figure 2, the proportions of first, second, and third components (PC1, PC2, and PC3) were 10.33%, 7.26%, and 6.85% of the total variance observed within these Sino‐Tibetan language‐speaking populations, in total, the cumulative variance contribution of PC1, PC2, and PC3 was 24.44%. Intuitively speaking, Ha Hlai clustered with Hainan Hlai located at the bottom left corner in Figure 2a, while Ha Hlai and Hainan Hlai got together in the top left corner in Figure 2b. From the PCA plots, the Tai‐Kadai and Hmong‐Mien (Hmong) language‐speaking populations clustered together and had distances with other populations, and the Tibeto‐Burman language‐speaking populations relatively clustered with Chinese/Sinitic language‐speaking populations; however, the distributions of the Chinese/Sinitic language‐speaking populations were relatively dispersive and scattered. Even though, Hlai language is a branch of Tai‐Kadai language from the perspective of glossologies and linguistics (Ostapirat, 2005; Thurgood, 1994; Wang XP & Xing, 2004), the cluster of Ha Hlai and Hainan Hlai did not get together with Tai‐Kadai‐Hmong cluster, which indicated that Hainan Hlai is an isolated population because of the geographic and culture isolations.
Figure 2
PCA based on allelic frequencies of 49 Sino‐Tibetan language‐speaking populations. (a. PC1 and PC2; b. PC2 and PC3)
PCA based on allelic frequencies of 49 Sino‐Tibetan language‐speaking populations. (a. PC1 and PC2; b. PC2 and PC3)
MDS
While in order to make further confirmation about the relationships among the Sino‐Tibetan language‐speaking populations, MDS, which could accelerate the speed of algorithm execution and reduce the complexity of complicated and multidimensional data simultaneously, were performed to depict the forensic genetic landscape of the Sino‐Tibetan language‐speaking populations on the basis of Euclidean and Manhattan distances, respectively. No matter the Euclidean MDS (Figure 3a) or the Manhattan MDS (Figure 3b), the genetic relationships between Ha Hlai and other Sino‐Tibetan populations were in accordance with the results of PCA, Ha Hlai was clustered with Hainan Hlai and gathered in the corners of MDS diagrams which separated with other populations relatively.
Figure 3
MDS based on allelic frequencies of 49 Sino‐Tibetan language‐speaking populations. (a. MDS based on Euclidean distance; b. MDS based on Manhattan distance)
MDS based on allelic frequencies of 49 Sino‐Tibetan language‐speaking populations. (a. MDS based on Euclidean distance; b. MDS based on Manhattan distance)As a whole, the Sino‐Tibetan language‐speaking populations had relatively close distances, the Chinese/Sinitic and Tibeto‐Burman language‐speaking populations gathered together, while the Tai‐Kadai and the Hmong language‐speaking populations clustered, respectively. In China, the Tai‐Kadai and Hmong language‐speaking populations lived in southern China with the Chinese/Sinitic and Tibeto‐Burman language‐speaking populations at least for three thousand years, the genetic makeup of natives in southern China which included the populations who spoke Tai‐Kadai, Hmong, and Austroasiatic had influenced by three massive southward waves of Han Chinese (Western Jin Dynasty, AD 265‐316; Tang Dynasty, AD 618‐907; Southern Song Dynasty, AD1127‐1279), the expansions of Han Chinese contributed to the gene flows with southern natives (Fei, 1999; Ge, Wu, & Chao, 1997; Wen et al., 2004). Therefore, the Tai‐Kadai‐Hmong populations clustered together and had relatively close genetic distances with Han Chinese (Chinese/Sinitic language‐speaking populations). However, as the geographical isolations by the Qiongzhou Strait, limited gene flows with other Sino‐Tibetan populations depicted the characteristic genetic landscape of Hainan Hlai, especially for Ha Hlai with relatively far genetic and geographic distances.
STRUCTURE
STRUCTURE analysis is commonly recognized to be capable of inferring population structure and assigning individuals to populations using multi‐locus genotypic data (Pritchard et al., 2000). STRUCTURE clustering analysis was performed to reflect the memberships of biogeographic ancestry components for Ha Hlai and other Sino‐Tibetan (Tai‐Kadai, Tibeto‐Burman, and Chinese), Indo‐European, Altaic, and Semito‐Hamitic language‐speaking populations (Fan, Wang, Chen, et al., 2019; Fan, Wang, Ren, et al., 2019; He, Wang, Liu, Hou, & Wang, 2018; Martinez‐Cortes et al., 2019; Munoz et al., 2012; Tillmar, Backstrom, & Montelius, 2009; Wang et al., 2016, 2018) with the number of hypothetic populations (K) defined at 2‐8 in the present study. As shown in Figure 4, population names and corresponding language families were marked on the bottom and the top of the graph and the width of each bar represented the sample size for each population. With the STRUCTURE Harvester, we observed a platform of the estimated posterior probability at K = 4. Thus, K = 4 was the most appropriate value for STRUCTURE analysis based on the guidelines in the STRUCTURE manual.
Figure 4
STRUCTURE Clustering analysis conducted at population level based upon genotyping data of the 19 autosomal STRs for 16 populations with four different language families
STRUCTURE Clustering analysis conducted at population level based upon genotyping data of the 19 autosomal STRs for 16 populations with four different language familiesWhen K at 4, nine Sino‐Tibetan language‐speaking populations, three Indo‐European populations (two Mexico populations and one Argentina population), two Altaic language‐speaking Uyghur (Xinjiang and Xinjiang Kumul), and Semito‐Hamitic language‐speaking Somalia could be distinguished. However, the Chinese Indo‐European language‐speaking Hui and Altaic language‐speaking Uyghurs had the same components compared with other Sino‐Tibetan language‐speaking populations. In the inner of Sino‐Tibetan language‐speaking populations, Tai‐Kadai, Tibeto‐Burman, and Chinese could distinguish from each other on the basis of different components of red and green. In addition, Ha Hlai had the same compositions with Hainan Li and Lingao which all belong to the Tai‐Kadai language family (Yl, 2015; Min, 1997; Wang XP & Xing, 2004; Weera, 1998). In conclusion, the relatively small geographic scale gene flows, especially for the multi‐ethnic mixed areas, could not be ignored, and the shaping of the unique gene pool for each population the co‐effects of geographic, language and cultural isolations, which seemed that as if the effect of geographic isolation was a little stronger than that of language and cultural isolations.
F and phylogenetic analysis
In order to estimate the genetic affiliations, we constructed a database including 73 populations (71,388 individuals in total) (Ali et al., 2012; Alves et al., 2014; Andreassen, Pereira, Dupuy, & Mevaag, 2010; Bernal et al., 2006; Bindu, Trivedi, & Kashyap, 2007; Calzada et al., 2005; Cardoso, Sevillano, Illescas, & de Pancorbo, 2016; Coudray, Calderon, et al., 2007; Coudray, Guitard, el‐Chennawi, Larrouy, & Dugoujon, 2007; Coudray, Guitard, Keyser‐Tracqui, et al., 2007; Decorte et al., 2004; Gehrig et al., 2014; Gurkan, Demirdov, Yamaci, & Sevay, 2015; Havas, Jeran, Efremovska, Dordevic, & Rudan, 2007; Hedjazi, Nikbakht, Hosseini, Hoseinzadeh, & Hosseini, 2013; Jemeljanova et al., 2015; Jin et al., 2017; Jingzhou Wang et al., 2015; Khodjet‐el‐Khil et al., 2012; Kraaijenbrink, van Driem, Opgenort, Tuladhar, & de Knijff, 2007; Kraaijenbrink, van Driem, et al., 2007; Le Wang et al., 2013; Li et al., 2015, 2017, 2018, 2020; Liao et al., 2019; Lili Zhang et al., 2017; Liu, Liu, & Wang, 2006a, 2006b; Liu, Chen, Huang, et al., 2017; Liu, Chen, Mei, et al., 2017; Lopes et al., 2009; Maruyama, Minaguchi, Takezaki, & Nambiar, 2008; Melo et al., 2010; Meng Pan et al., 2012; Munoz, Pinto de Erazo, Baeza, Arroyo‐Pardo, & Lopez‐Parra, 2015; Muro et al., 2008; Okolie et al., 2018; Petric, Draskovic, Zgonjanin‐Bosic, Budakov, & Veselinovic, 2012; Qiuling Liu et al., 2003; Rodriguez, Salvador, Calacal, Laude, & De Ungria, 2015; Sadam et al., 2015; Serga et al., 2017; Shan et al., 2016; Shen et al., 2015; Shotivaranon, Chirachariyavej, Leetrakool, & Rerkamnuaychoke, 2009; Soltyszewski et al., 2014; Steinlechner, Schmidt, Kraft, Utermann, & Parson, 2002; Sun et al., 2015, 2017; Tillmar et al., 2009; Tokdemir, Tuncez, & Vicdanli, 2016; Toscanini et al., 2015; Tran et al., 2019; Ventura Spagnolo et al., 2017; Wenming Han, 2016; Xiao et al., 2016; Xie et al., 2014; Xiuzi & Changchun, 2017; Xu, Feng, et al., 2017; Xu, Xu, et al., 2017; Yang et al., 2017, 2018; Yao et al., 2016; Yoo et al., 2011; Zhang, 2015; Zhang et al., 2013, 2018; Zhang, Du, et al., 2017; Zhang, Hu, Du, Zheng, et al., 2017; Zhang, Liao, et al., 2015; Zhu et al., 2007; Zou et al., 2017; Zuniga et al., 2006) with nine language families consisting of Sino‐Tibetan (38), Indo‐European (19), Niger‐Congo (5), Semito‐Hamitic (3), Altaic (3), Uralic (2), Austronesian (3), Dravidian (1), and Native American (1). The detailed pairwise F and corresponding P values between Ha Hlai and the other 72 populations are presented in Table S4. As shown in Figure 5, two main branches could be clearly identified in the phylogenetic tree. The upper branch was composed of three clusters: most Indo‐European language‐speaking populations, except for Nepalese, clustered in the upper clade with three Altaic, two Uralic, two Semito‐Hamitic and one Native American language‐speaking populations, while the most Sino‐Tibetan (except for Yunnan Yi) and Niger‐Congo language‐speaking populations got together in the bottom clade clustering with three Austronesian and one Dravidian populations, and the Semito‐Hamitic language‐speaking Moroccan clustered with the Niger‐Congo cluster. In addition, in the Sino‐Tibetan cluster, Ha Hlai clustered with Hainan Hlai (F = 0.0011) and got together other Sino‐Tibetan language‐speaking populations, which indicated that Ha Hlai is a Sino‐Tibetan population and has a relatively close relationship with Hainan Hlai. The clustering results of the dendrogram focusing on the Hainan Hlai and Ha Hlai populations were roughly congruent with the former analyses such as the results of PCA and MDS analyses.
Figure 5
A phylogenetic N‐J tree constructed by F values of Ha Hlai and other 72 populations with nine diverse language families
A phylogenetic N‐J tree constructed by F values of Ha Hlai and other 72 populations with nine diverse language families
Predictions of Y haplogroups
As we all know, no effective algorithm has a strong and accurate ability to distinguish the same or very similar haplotypes and assign them into diverse haplogroups and the convergence of Y chromosome STR haplotypes among different haplogroups has compromised the accuracy of haplogroup prediction. Hence, we applied our in‐house database which included 37,754 pieces of Y SNP/STR data and 109,142 Y‐STR totally mainly from East and Southeast Asia, to make more precise predictions (Wang et al., 2015).In the present study, the 27 Y‐STR profiles applied to make the Y‐haplogroup estimation were collected from 185 Ha Hlai male individuals which came from our previous study that included 422 Hainan Hlai males (185/422) in total (Fan, Wang, Chen, Zhang, et al., 2018). As a result, 178 out of the 185 genotyped Y‐STRs (96.22%) were observed the Y‐haplogroup and four haplogroups observed in our Ha Hlai samples, which belonged to major clades O1 and O2. The predominant haplogroups were O1b1a1a‐M95 (56.18%), O1a‐M119 (29.21%), O1b1a1a1a1a1‐P203 (10.67%), and O2‐M122 (3.93%), which were determined according to ISOGG 2019. Our results were in accordance with previous studies (Li, Li, et al., 2008; Sun, Yang, Ou, Chen, et al., 2007), O1b1a1a‐M95 was the predominant haplogroup in the population of Ha Hlai, which were in accordance with previous studies that the M95 proportions ranged from 46% to 91.94% in different branches of Hainan Hlai.To discern the detailed relationship between the Ha Hlai and other language‐speaking populations with different language families, we performed a PCA which included 193 populations (Cadenas, Zhivotovsky, Cavalli‐Sforza, Underhill, & Herrera, 2008; Cai et al., 2011; Chennakrishnaiah et al., 2013; Deng et al., 2013; Fan et al., 2020; Fan, Wang, Chen, Long, et al., 2018; Fan, Zhang, et al., 2018; Gan et al., 2008; Gayden et al., 2007; Khurana et al., 2014; Kwon, Lee, Lee, Yang, & Shin, 2015; Li, Wen, et al., 2008; Regueiro, Cadenas, Gayden, Underhill, & Herrera, 2006; Rowold et al., 2016, 2019; Wells et al., 2001; Wen et al., 2004) from all over the world with eight language families, 11,197 individuals in total (dataset 1). The PC1, PC2, and PC3 of the total variances accounted for 13.97%, 10.83%, and 8.91%, respectively. From the dendrogram of Figure 6a,b, the populations with different language families were separated relatively, and Hainan Hlai and Ha Hlai clustered together and had close relationships with Sino‐Tibetan and Austroasiatic cluster. In addition, we collected all Sino‐Tibetan and Austroasiatic populations (dataset 2) to make further genetic differences, as shown in Figure 6c,d, Ha Hlai clustered with Hainan Hlai and located in Tai‐Kadai‐Austroasiatic cluster. The M95 haplogroup was shown to be prevalent in Austroasiatic language‐speaking populations in Southeast Asia (74‐87%) (Arunkumar et al., 2015; Majumder, 2010; Tattersall, 2009; Zhang, Liao, et al., 2015). All Austroasiatic populations in our datasets belonged to the Mon‐Khmer group which were spread over Southeast Asia (Vietnam and Cambodia), Yunnan, and Guangxi provinces (Sciences, 2012; Yingming, 2010). However, Hlai localism is a sort of Tai‐Kadai language from the perspective of linguistics, and the Ha Hlai inhabited in Hainan Island has close distances with our in‐house Mon‐Khmer populations from the geographical scale. Therefore, Ha Hlai which has a close genetic and geographical relationship with Hainan Hlai is a Tai‐Kadai language‐speaking population, and the M95 which is ubiquitous in Mon‐Khmer populations is the dominant haplogroup in Ha Hlai (56.18%).
Figure 6
Principal component analyses based on the frequencies of Y haplogroups in dataset 1 (193 populations) and dataset 2 (118 populations). (Dataset 1: a. PC1 and PC2, b. PC2 and PC3; Dataset 2: c. PC1 and PC2, d. PC 2 and PC3)
Principal component analyses based on the frequencies of Y haplogroups in dataset 1 (193 populations) and dataset 2 (118 populations). (Dataset 1: a. PC1 and PC2, b. PC2 and PC3; Dataset 2: c. PC1 and PC2, d. PC 2 and PC3)
CONCLUSIONS
In the present study, the first batch of 23 autosomal STR profiles of Ha Hlai, one important branch of Hainan Li (Hlai), were obtained and reported by genotyping 657 Ha Hlai individuals (497 males and 160 females) utilizing the Huaxia™ Platinum PCR Amplification System. In total, two hundred and seventy‐one distinct alleles were observed at the 23 STR loci with the allelic frequencies from 0.0008 to 0.5533. For the numbers of most loci, Hainan Hlai had higher diversity, however, there was no apparent difference for forensic‐associated parameters and overall system effectiveness compared with Ha Hlai. The CPE and CPD were 1‐7.39 × 10−10 and 1‐3.13 × 10−28, respectively. The analyses and comparisons of all these allelic frequencies and forensic statistical parameters demonstrated that the 23 autosomal STR genetic markers included in Huaxia™ Platinum PCR Amplification System were highly polymorphic as well as potentially useful for forensic applications in Hainan Ha Hlai population. In addition, we applied the polymorphic genetic markers to explore the genetic relationships between the studied Ha Hlai and other reference populations with different language families. The phylogenetic analyses indicated that Ha Hlai is a Tai‐Kadai language‐speaking and relatively isolated population which has a relatively close relationship with Hainan Hlai, and the haplogroup M95 is the dominant haplogroup in Ha Hlai (56.18%), which demonstrated that small geographic scale gene flows, especially for the multi‐ethnic mixed areas, could not be ignored and the shaping of the unique gene pool for each population the co‐effects of geographic, language and cultural isolations.
CONFLICT OF INTEREST
The authors declare that they have no conflicts of interest.
AUTHOR CONTRIBUTIONS
Conceptualization, Haoliang Fan and Pingming Qiu; Resources, Wenhui Li, Xianwen Wang, and Xiehong Wang; Software, Haoliang Fan; Investigation, Wenhui Li, Xianwen Wang, Xiehong Wang, Fenfen Wang, Zhengming Du, Shuya Wang, and Ziqing Mu; Validation, Chunwei Chen and Xiaomin Hu; Formal Analysis, Haoliang Fan, Fangshu Fu, Wenlong Wu, Jiuyang Ding, and Yunle Meng; Data Curation, Wenhui Li and Xianwen Wang; Writing – Original Draft Preparation, Haoliang Fan; Writing – Review & Editing, Haoliang Fan; Visualization, Haoliang Fan; Supervision, Pingming Qiu; Project Administration, Wenhui Li and Xianwen Wang; Funding Acquisition, Haoliang Fan, Fenfen Wang, Zhengming Du, Shuya Wang, and Pingming Qiu. All authors reviewed the manuscript.Table S1‐S4Click here for additional data file.
Authors: P Calzada; I Suárez; S García; C Barrot; C Sánchez; M Ortega; J Mas; E Huguet; J Corbella; M Gené Journal: Int J Legal Med Date: 2004-11-24 Impact factor: 2.686
Authors: Muhammad Adnan Shan; Manzoor Hussain; Muhammad Shafique; Muhammad Shahzad; Rukhsana Perveen; Muhammad Idrees Journal: Int J Legal Med Date: 2016-03-17 Impact factor: 2.686