Literature DB >> 32438591

Characterizing Y-STRs in the Evaluation of Population Differentiation Using the Mean of Allele Frequency Difference between Populations.

Yuxiang Zhou1, Yining Yao1, Baonian Liu1, Qinrui Yang1, Zhihan Zhou1, Chengchen Shao1, Shilin Li2, Qiqun Tang3, Jianhui Xie1.   

Abstract

Y-chromosomal short tandem repeats (Y-STRs) are widely used in human research for the evaluation of population substructure or population differentiation. Previous studies show that several haplotype sets can be used for the evaluation of population differentiation. However, little is known about whether each Y-STR in these sets performs well during this procedure. In this study, a total of 20,927 haplotypes of a Yfiler Plus set were collected from 41 global populations. Different configurations were observed in multidimensional scaling (MDS) plots based on pairwise genetic distances evaluated using a Yfiler set and a Yfiler Plus set, respectively. Subsequently, 23 single-copy Y-STRs were characterized in the evaluation of population differentiation using the mean of allele frequency difference (mAFD) between populations. Our results indicated that DYS392 had the largest mAFD value (0.3802) and YGATAH4 had the smallest value (0.1845). On the whole, larger pairwise genetic distances could be obtained using the set with the top fifteen markers from these 23 single-copy Y-STRs, and clear clustering or separation of populations could be observed in the MDS plot in comparison with those using the set with the minimum fifteen markers. In conclusion, the mAFD value is reliable to characterize Y-STRs for efficiency in the evaluation of population differentiation.

Entities:  

Keywords:  Y-STR; allele frequency difference; multidimensional scaling; pairwise genetic distance; population differentiation

Mesh:

Year:  2020        PMID: 32438591      PMCID: PMC7290957          DOI: 10.3390/genes11050566

Source DB:  PubMed          Journal:  Genes (Basel)        ISSN: 2073-4425            Impact factor:   4.096


1. Introduction

Over the past decade, Y-chromosomal short tandem repeats (Y-STRs) have been widely used in human research and forensic practice to supplement information retrieved from autosomal DNA profiling. Y-STRs can be applied for acquiring male genotypes from stains of unbalanced male/female DNA mixture [1,2,3], for testing paternal kinship [3,4], for determining paternal biogeographic ancestry of missing persons [3] and so on. While the primary Y-STR set, namely the minimal haplotype [5], allows the genotyping of only 9 loci, the Yfiler Plus PCR amplification kit, released in 2014, can simultaneously detect 17 Y-STR loci adopted in the Yfiler kit and 10 new Y-STR loci (DYS481, DYS460, DYS533, DYS449, DYS576, DYS627, DYS518, DYS570 and DYF387S1). As for these new markers, there are seven rapidly mutating Y-STR loci and three highly polymorphic Y-STR loci [6]. Because of the male-specific inheritance pattern and haploid nature of the Y chromosome, it is more sensitive to genetic drift and founder effect for Y-linked markers than for autosomal ones. Y-STRs are widely used for the evaluation of population substructure or population differentiation by testing pairwise genetic distances. The clustering or separation of populations can further be visualized using multidimensional scaling (MDS) analysis based on the similarity of pairwise genetic distances between populations. Previous studies show that several haplotype sets, such as the minimal haplotype set, Yfiler set, PowerPlex Y23 set and Yfiler Plus set, can be used in the evaluation of population differentiation [7,8,9,10,11]. In fact, compared with the original population, allelic configuration of Y-STRs in newly established populations might vary from locus to locus. Therefore, the allelic configurations of Y-STRs in newly established populations might be similar to or dramatically different from that in the original population. Generally, the single-locus genetic diversity and haplotype diversity are used to evaluate discrimination capacity of Y-STR markers and marker sets in populations, respectively. Recently, Shannon’s equivocation was reported to quantify the allelic association between Y-STRs to obtain maximally discriminatory marker sets [12]. In contrast, investigations are rarely conducted to characterize Y-STRs for efficiency in the evaluation of population differentiation. In this study, we characterized 23 single-copy Y-STRs adopted in a Yfiler Plus set in the evaluation of population differentiation using the mean of allele frequency difference (mAFD) between populations. Haplotype profiles of the Yfiler Plus set were compiled from 41 global populations, and an evaluation of population differentiation using the Yfiler set and Yfiler Plus set was performed. Further, the mAFD values of 23 Y-STR loci between 41 populations were calculated, respectively, and the reliability of the mAFD to characterize Y-STRs in the evaluation of population differentiation was estimated based on pairwise genetic distances and MDS analysis from a series of fifteen-marker sets.

2. Materials and Methods

2.1. Data-Set Collection

A total of 41 populations with haplotypes of 27 Y-STR loci in a Yfiler Plus set were collected (Table S1). To facilitate the subsequent analysis, haplotypes with null alleles, intermediate alleles, duplicated or triplicatd alleles were removed, which resulted in a final collection of 20,927 haplotypes. Because DYS389I is a part of DYS389II, each DYS389I allele was subtracted from the DYS389II allele to obtain another part of DYS389II.

2.2. Calculation of the Mean of Allele Frequency Difference

The allele frequency difference (AFD) of each Y-STR locus between two populations was defined as follows according to previous reports [13]: where n is the total number of alleles at a locus, and f-terms denote the frequency of the ith allele in the two populations. As for every Y-STR, the mean of the AFD across all population pairs was obtained based on the AFD for every population pair. Here, the mAFD values of every single-copy Y-STRs were computed using an in-house Python script.

2.3. The Evaluation of Population Differentiation

The evaluation of population differentiation using a Yfiler set and a Yifler Plus set was carried out. Pairwise genetic distances of these two marker sets were gauged using Arlequin v3.5 [14], and a line chart was produced simultaneously. In the assessment of pairwise genetic distances, p values were calculated at a significant level of 0.05 using 10,000 permutations. In addition, MDS analysis estimating similarity quantitatively among populations was performed with the software IBM SPSS Statistics version 22 (IBM Corp., Armonk, NY, USA). The output of MDS is a plot that reveals the relational structures of objects, where similar objects cluster together, and dissimilar ones are far from each other. For validating the efficacy of the mAFD in the evaluation of population differentiation, nine marker sets containing 15 Y-STRs were constructed along with the stepwise reduction of mAFD value (step size: one marker) using the 23 Y-STRs evaluated. Pairwise genetic distances between populations for these Y-STR sets were quantified and compared. To visualize the differences in pairwise genetic distances between populations, MDS analysis of all marker sets was conducted.

3. Results and Discussion

3.1. Evaluation of Population Differentiation Using a Yfiler Set and a Yfiler Plus Set

To evaluate the population differentiation, the pairwise genetic distances between 41 populations were calculated based on a Yfiler set and Yfiler Plus set, respectively, and population differentiation was visualized in a MDS plot. On the whole, the distribution of populations was in accordance with the biogeographic pattern based on pairwise genetic distances tested using a Yfiler set or a Yfiler Plus set (Figure 1A,B). However, the difference could be observed between these two MDS plots. For example, the distribution of seven populations marked in red circles in Figure 1A,B, including Kazakh_Kazakhstan, UpperAustrian, Austrian_Salzburg, Belgian, Italian_Sardinia, NorthernItalian and SouthAustralian, was scattered in the MDS plot of the Yfiler set, while two clusters could be clearly observed in the MDS plot of the Yfiler Plus set (Figure 1A,B). Generally, pairwise genetic distances between populations decrease with an increasing number of Y-STRs in a haplotype set (Figure 2) [15,16]. However, approximately one-third of pairwise genetic distances evaluated based on the Yfiler Plus set showed an increase compared with those using the Yfiler set (Tables S2 and S3), which might contribute to the clustering or separation of populations in MDS analysis.
Figure 1

The multidimensional scaling (MDS) plots obtained based on pairwise genetic distances between populations with a Yfiler set (A) and a Yfiler Plus set (B), respectively.

Figure 2

Pairwise populations are ranked in an ascending order of pairwise genetic distances with a Yfiler set and are set as the horizontal axis. Pairwise genetic distances with a Yfiler set and a Yfiler Plus set are indicated, respectively.

In this study, the same population sets were employed to evaluate population differentiation using a Yfiler set and a Yfiler Plus set, which can reduce the bias from sampling using different population sets. The Yfiler Plus set seems to more clearly cluster or separate populations than the Yfiler set in MDS analysis (Figure 1A,B), which shows that the similarity of the population can be different using different marker sets for the evaluation of population differentiation even though the same datasets were used. Therefore, these results imply that each Y-STRs might present different genetic diversity across these populations, which leads to an increase or decrease of pairwise genetic distances when adding new Y-STR loci into a Yfiler set.

3.2. Assessment of the mAFD Values of Y-STRs

To characterize Y-STRs in the evaluation of population differentiation, the mAFD values of 23 Y-STRs were calculated as described in Material and Methods section. As shown in Table 1, the largest mAFD value was observed at DYS392 (mAFD = 0.3802), followed by DYS438 (mAFD = 0.3507) and DYS635 (mAFD = 0.3379). The smallest two ones were at YGATAH4 (mAFD = 0.1845) and DYS391 (mAFD = 0.1891). Within the top 15 loci, 9 loci were from Yfiler set and 6 loci were from 10 newly added loci in the Yfiler Plus set (Table 1). While the values of mAFD vary from 0 to 1 theoretically, according to the evaluation formula, none was above 0.4 in this study, suggesting that the genetic variance between human populations was not remarkable for a single locus. The mAFD should imply the degree of variation for one Y-STR in populations and might represent the power that contributes to the haplotype set in the evaluation of population differentiation. It should be noted that the value of the mAFD mostly relied on tested populations and could be varied because of the number and the distribution patterns of populations tested. In this study, a set of global populations was sampled to diminish the deviation in the evaluation of the mAFD of every Y-STR.
Table 1

Allele frequency difference (mAFD) values and mutation rates of 23 Y-chromosomal short tandem repeats (Y-STRs).

LocimAFDMutation Rates #Binomial 95%CI
1 DYS392 0.38024.96 × 10−42.14 × 10−4–9.77 × 10−4
2 DYS438 0.35073.51 × 10−40.96 × 10−4–8.99 × 10−4
3 DYS635 0.33794.21 × 10−32.97 × 10−3–5.80 × 10−3
4DYS4810.33594.88 × 10−32.52 × 10−3–8.51 × 10−3
5DYS4490.32719.44 × 10−35.78 × 10−3–1.45 × 10−2
6 DYS448 0.32571.39 × 10−36.92 × 10−4–2.48 × 10−3
7DYS5330.31043.68 × 10−31.68 × 10−3–6.97 × 10−3
8 DYS19 0.30532.20 × 10−31.55 × 10−3–3.04 × 10−3
9 DYS393 0.28781.07 × 10−36.11 × 10−4–1.74 × 10−3
10DYS5180.27861.46 × 10−29.86 × 10−3–2.08 × 10−2
11DYS6270.26661.32 × 10−28.95 × 10−3–1.88 × 10−2
12 DYS390 0.25042.08 × 10−31.44 × 10−3–2.91 × 10−3
13 DYS389I 0.24942.72 × 10−31.96 × 10−3–3.69 × 10−3
14DYS5700.23721.14 × 10−27.67 × 10−3–1.62 × 10−2
15 DYS458 0.23456.17 × 10−34.57 × 10−3–8.15 × 10−3
16DYS4600.22785.82 × 10−32.80 × 10−3–1.69 × 10−2
17 DYS437 0.22731.32 × 10−37.39 × 10−4–2.18 × 10−3
18 DYS439 0.22115.46 × 10−34.19 × 10−3–6.99 × 10−3
19DYS5760.21361.26 × 10−28.86 × 10−3–1.73 × 10−2
20 DYS389II 0.21314.33 × 10−33.34 × 10−3–5.51 × 10−3
21 DYS456 0.21064.41 × 10−33.07 × 10−3–6.12 × 10−3
22 DYS391 0.18912.53 × 10−31.82 × 10−3–3.43 × 10−3
23 YGATAH4 0.18453.01 × 10−31.98 × 10−3–4.38 × 10−3

# The data of mutation rates is from the Y Chromosome Haplotype Reference Database (YHRD) website (release 61, Jun 2019) [19]. Y-STRs in the Yfiler set are indicated in bold.

In this study, the mAFD is described to evaluate the variance of genetic markers in populations, which is somewhat similar to Rogers’ distance [17,18]. Rogers’ distance is used to measure genetic distance between two populations using the variance of allelic frequency of genetic markers. In contrast, the mAFD aims to evaluate the variance of genetic markers across populations rather than genetic distance between populations. The evaluation of Rogers’ distance can be impaired due to the increased number of alleles [18]. As for the mAFD, it represents an overall diversity across all tested populations. Therefore, the order of these markers ranked using mAFD values should not be drastically modified by the addition of several new populations. The difference of allelic frequency between populations might be affected by many factors, such as founder effect, genetic drift and mutation rate. The mutation events of each Y-STR occur independently in the population, although the evaluation of genetic distances is based on the Y-STR haplotypes. A low mutation rate for Y-STRs might make the difference of allelic frequency more sensitive to genetic drift and founder effect. In contrast, a high mutation rate for Y-STRs might help to rapidly establish a permanent allelic configuration in the population. In this study, the mutation rate did not show a close correlation with the mAFD (Figure 3). Similarly, the single-locus genetic diversity obtained from all tested populations did not show a close correlation with the mAFD (data not shown). Therefore, genetic drift and founder effect might be important factors for the mAFD.
Figure 3

The mAFD values and mutation rates of 23 Y-STRs. Blue line: mAFD; Orange line: Mutation rates.

3.3. Analysis of Y-STR Marker Sets in the Population Differentiation

To validate the efficacy of the mAFD in the evaluation of population differentiation, 23 Y-STR markers were ranked in a descending order based on the mAFD, and nine sets were established using a sliding-set method with a set of 15 markers and a step size of one marker. The pairwise genetic distances calculated using these sets could be varied (Tables S4–S12), and the tendency was observed that pairwise genetic distances increased with the application of the marker sets with larger mAFD values (Figure 4). It should be noted that a small number of pairwise genetic distances was decreased when a marker with a low mAFD value was replaced by one with a high mAFD value, which suggests that the replacement with the marker with the high mAFD value could result in a lower genetic diversity of haplotypes between two populations.
Figure 4

Pairwise populations are ranked in an ascending order of pairwise genetic distances with the 15-marker set of the minimum mAFD values, and are set as the horizontal axis. Pairwise genetic distances with the 15-marker set of the largest mAFD values and of the minimum mAFD values are indicated, respectively.

To visualize population differentiation, MDS analysis was performed based on pairwise genetic distances obtained using the sets with the top fifteen markers and the minimum fifteen markers, respectively. As shown in Figure 5A, the set with the top fifteen markers could clearly cluster or separate tested populations according to biogeographic patterns, which was similar to the performance of the Yfiler Plus set. In contrast, the set with the minimum fifteen markers did not effectively differentiate these tested populations (Figure 5B). It might be that genetic distances obtained using the set with the top fifteen markers have a relatively slight increase tendency between close populations and a relatively large increase tendency between distant populations, in comparison to those obtained using the set with the minimum fifteen markers (Figure 4). Altogether, these results demonstrate that the use of the mAFD to characterize Y-STRs in the evaluation of population differentiation is reliable.
Figure 5

The MDS plots obtained based on pairwise genetic distances between populations with the marker sets of the largest (A) and the minimum (B) mAFD values, respectively.

4. Conclusions

Genetic distance is defined as the extent of genetic differentiation between populations or species. Previous studies demonstrate the capacity of several haplotype sets in the evaluation of population differentiation through genetic distance, and little attention has been paid to the selection of genetic markers. Unlike autosomal markers, which can reach Hardy–Weinberg equilibrium, Y-STRs are paternally inherited. Currently, it remains unknown whether a Y-STR locus in a haplotype is suitable for the evaluation of population differentiation. Although the method based on the mAFD value for the evaluation of single Y-STR is relatively simple in this study, our results show that the mAFD is suitable for characterizing Y-STRs in the evaluation of population differentiation. In the future, the introduction of more global populations can improve the evaluation of the mAFD value for each Y-STR.
  18 in total

1.  Mutability of Y-chromosomal microsatellites: rates, characteristics, molecular bases, and forensic implications.

Authors:  Kaye N Ballantyne; Miriam Goedbloed; Rixun Fang; Onno Schaap; Oscar Lao; Andreas Wollstein; Ying Choi; Kate van Duijn; Mark Vermeulen; Silke Brauer; Ronny Decorte; Micaela Poetsch; Nicole von Wurmb-Schwark; Peter de Knijff; Damian Labuda; Hélène Vézina; Hans Knoblauch; Rüdiger Lessig; Lutz Roewer; Rafal Ploski; Tadeusz Dobosz; Lotte Henke; Jürgen Henke; Manohar R Furtado; Manfred Kayser
Journal:  Am J Hum Genet       Date:  2010-09-10       Impact factor: 11.025

2.  Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows.

Authors:  Laurent Excoffier; Heidi E L Lischer
Journal:  Mol Ecol Resour       Date:  2010-03-01       Impact factor: 7.090

3.  Evaluation of Y-chromosomal STRs: a multicenter study.

Authors:  M Kayser; A Caglià; D Corach; N Fretwell; C Gehrig; G Graziosi; F Heidorn; S Herrmann; B Herzog; M Hidding; K Honda; M Jobling; M Krawczak; K Leim; S Meuser; E Meyer; W Oesterreich; A Pandya; W Parson; G Penacino; A Perez-Lezaun; A Piccinini; M Prinz; C Schmitt; L Roewer
Journal:  Int J Legal Med       Date:  1997       Impact factor: 2.686

Review 4.  The Y chromosome in forensic analysis and paternity testing.

Authors:  M A Jobling; A Pandya; C Tyler-Smith
Journal:  Int J Legal Med       Date:  1997       Impact factor: 2.686

5.  Genetic polymorphism investigation of the Chinese Yi minority using PowerPlex® Y23 STR amplification system.

Authors:  GuangLin He; PengYu Chen; Xing Zou; Xu Chen; Feng Song; Jing Yan; YiPing Hou
Journal:  Int J Legal Med       Date:  2017-01-16       Impact factor: 2.686

6.  Genetic polymorphisms of 17 Y-chromosomal STRs in the Chengdu Han population of China.

Authors:  Hui Wang; Jiong Mao; Yu Xia; Xiaogang Bai; Wenqing Zhu; Duo Peng; Weibo Liang
Journal:  Int J Legal Med       Date:  2016-12-12       Impact factor: 2.686

7.  Accuracy of estimated phylogenetic trees from molecular data. II. Gene frequency data.

Authors:  M Nei; F Tajima; Y Tateno
Journal:  J Mol Evol       Date:  1983       Impact factor: 2.395

8.  Forensic characteristics and phylogenetic analysis of Hubei Han population in central China using 17 Y-STR loci.

Authors:  Zheng Wang; Weian Du; Guanglin He; Jing Liu; Yiping Hou
Journal:  Forensic Sci Int Genet       Date:  2017-04-20       Impact factor: 4.882

9.  A global analysis of Y-chromosomal haplotype diversity for 23 STR loci.

Authors:  Josephine Purps; Sabine Siegert; Sascha Willuweit; Marion Nagy; Cíntia Alves; Renato Salazar; Sheila M T Angustia; Lorna H Santos; Katja Anslinger; Birgit Bayer; Qasim Ayub; Wei Wei; Yali Xue; Chris Tyler-Smith; Miriam Baeta Bafalluy; Begoña Martínez-Jarreta; Balazs Egyed; Beate Balitzki; Sibylle Tschumi; David Ballard; Denise Syndercombe Court; Xinia Barrantes; Gerhard Bäßler; Tina Wiest; Burkhard Berger; Harald Niederstätter; Walther Parson; Carey Davis; Bruce Budowle; Helen Burri; Urs Borer; Christoph Koller; Elizeu F Carvalho; Patricia M Domingues; Wafaa Takash Chamoun; Michael D Coble; Carolyn R Hill; Daniel Corach; Mariela Caputo; Maria E D'Amato; Sean Davison; Ronny Decorte; Maarten H D Larmuseau; Claudio Ottoni; Olga Rickards; Di Lu; Chengtao Jiang; Tadeusz Dobosz; Anna Jonkisz; William E Frank; Ivana Furac; Christian Gehrig; Vincent Castella; Branka Grskovic; Cordula Haas; Jana Wobst; Gavrilo Hadzic; Katja Drobnic; Katsuya Honda; Yiping Hou; Di Zhou; Yan Li; Shengping Hu; Shenglan Chen; Uta-Dorothee Immel; Rüdiger Lessig; Zlatko Jakovski; Tanja Ilievska; Anja E Klann; Cristina Cano García; Peter de Knijff; Thirsa Kraaijenbrink; Aikaterini Kondili; Penelope Miniati; Maria Vouropoulou; Lejla Kovacevic; Damir Marjanovic; Iris Lindner; Issam Mansour; Mouayyad Al-Azem; Ansar El Andari; Miguel Marino; Sandra Furfuro; Laura Locarno; Pablo Martín; Gracia M Luque; Antonio Alonso; Luís Souto Miranda; Helena Moreira; Natsuko Mizuno; Yasuki Iwashima; Rodrigo S Moura Neto; Tatiana L S Nogueira; Rosane Silva; Marina Nastainczyk-Wulf; Jeanett Edelmann; Michael Kohl; Shengjie Nie; Xianping Wang; Baowen Cheng; Carolina Núñez; Marian Martínez de Pancorbo; Jill K Olofsson; Niels Morling; Valerio Onofri; Adriano Tagliabracci; Horolma Pamjav; Antonia Volgyi; Gusztav Barany; Ryszard Pawlowski; Agnieszka Maciejewska; Susi Pelotti; Witold Pepinski; Monica Abreu-Glowacka; Christopher Phillips; Jorge Cárdenas; Danel Rey-Gonzalez; Antonio Salas; Francesca Brisighelli; Cristian Capelli; Ulises Toscanini; Andrea Piccinini; Marilidia Piglionica; Stefania L Baldassarra; Rafal Ploski; Magdalena Konarzewska; Emila Jastrzebska; Carlo Robino; Antti Sajantila; Jukka U Palo; Evelyn Guevara; Jazelyn Salvador; Maria Corazon De Ungria; Jae Joseph Russell Rodriguez; Ulrike Schmidt; Nicola Schlauderer; Pekka Saukko; Peter M Schneider; Miriam Sirker; Kyoung-Jin Shin; Yu Na Oh; Iulia Skitsa; Alexandra Ampati; Tobi-Gail Smith; Lina Solis de Calvit; Vlastimil Stenzl; Thomas Capal; Andreas Tillmar; Helena Nilsson; Stefania Turrina; Domenico De Leo; Andrea Verzeletti; Venusia Cortellini; Jon H Wetton; Gareth M Gwynne; Mark A Jobling; Martin R Whittle; Denilce R Sumita; Paulina Wolańska-Nowak; Rita Y Y Yong; Michael Krawczak; Michael Nothnagel; Lutz Roewer
Journal:  Forensic Sci Int Genet       Date:  2014-04-28       Impact factor: 4.882

10.  Allele Frequency Difference AFD⁻An Intuitive Alternative to FST for Quantifying Genetic Population Differentiation.

Authors:  Daniel Berner
Journal:  Genes (Basel)       Date:  2019-04-18       Impact factor: 4.096

View more
  1 in total

1.  Special Issue "Forensic Genetics and Genomics".

Authors:  Emiliano Giardina; Michele Ragazzo
Journal:  Genes (Basel)       Date:  2021-01-25       Impact factor: 4.096

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.