| Literature DB >> 29679929 |
Tunde I Huszar1, Mark A Jobling2, Jon H Wetton3.
Abstract
Short tandem repeats on the male-specific region of the Y chromosome (Y-STRs) are permanently linked as haplotypes, and therefore Y-STR sequence diversity can be considered within the robust framework of a phylogeny of haplogroups defined by single nucleotide polymorphisms (SNPs). Here we use massively parallel sequencing (MPS) to analyse the 23 Y-STRs in Promega's prototype PowerSeq™ Auto/Mito/Y System kit (containing the markers of the PowerPlex® Y23 [PPY23] System) in a set of 100 diverse Y chromosomes whose phylogenetic relationships are known from previous megabase-scale resequencing. Including allele duplications and alleles resulting from likely somatic mutation, we characterised 2311 alleles, demonstrating 99.83% concordance with capillary electrophoresis (CE) data on the same sample set. The set contains 267 distinct sequence-based alleles (an increase of 58% compared to the 169 detectable by CE), including 60 novel Y-STR variants phased with their flanking sequences which have not been reported previously to our knowledge. Variation includes 46 distinct alleles containing non-reference variants of SNPs/indels in both repeat and flanking regions, and 145 distinct alleles containing repeat pattern variants (RPV). For DYS385a,b, DYS481 and DYS390 we observed repeat count variation in short flanking segments previously considered invariable, and suggest new MPS-based structural designations based on these. We considered the observed variation in the context of the Y phylogeny: several specific haplogroup associations were observed for SNPs and indels, reflecting the low mutation rates of such variant types; however, RPVs showed less phylogenetic coherence and more recurrence, reflecting their relatively high mutation rates. In conclusion, our study reveals considerable additional diversity at the Y-STRs of the PPY23 set via MPS analysis, demonstrates high concordance with CE data, facilitates nomenclature standardisation, and places Y-STR sequence variants in their phylogenetic context.Entities:
Keywords: Massively parallel sequencing; PPY23; PowerSeq system; Repeat pattern variation (RPV); Single nucleotide polymorphism (SNP); Y-STRs
Mesh:
Year: 2018 PMID: 29679929 PMCID: PMC6010625 DOI: 10.1016/j.fsigen.2018.03.012
Source DB: PubMed Journal: Forensic Sci Int Genet ISSN: 1872-4973 Impact factor: 4.882
Fig. 1Observed SNPs and indels in their phylogenetic context.
The phylogenetic tree to the left represents the relationships among 100 diverse Y chromosomes, based on 13,261 high-confidence Y-SNPs previously described [11]. Y-chromosome haplogroups are given in their shorthand formats (Table S1) to the right of the tree. Y-STR names are listed above. Variants are shaded in grey and represented by filled circles if internal to the repeat array, or unfilled diamonds if in the flanking region. Variants are described below, by rs# where available, or otherwise as ‘SNP’ or ‘indel’ (Table S3). Note that ‘multiple SNPs’ internal to DYS635 (which we regard as an RPV − see text) are found in 85/100 samples because the GRCh38 reference assembly carries the same derived state as superhaplogroup P, and hence all deeper-rooting clades bearing the ancestral state are considered as ‘alternative’ rather than ‘reference’ variants. Note that rs370750300 and rs375658920 are listed elsewhere as DYS481-associated SNPs, and thus included in the figure; however, we regard these as an RPV (see text).
Comparison of number of alleles for each Y-STR based on length only (as in CE) and on full sequence information (MPS).
| Y-STR | Count of length-based alleles | Count of sequence-based alleles | Increase in number of alleles (%) | Novel sequence variants in this study |
|---|---|---|---|---|
| DYS389II | 7 | 32 | 357.1 | 4 |
| DYS390 | 8 | 19 | 137.5 | 6 |
| DYS448 | 9 | 19 | 111.1 | 5 |
| DYS391 | 5 | 10 | 100.0 | 6 |
| DYS437 | 5 | 9 | 80.0 | 2 |
| DYS481 | 12 | 21 | 75.0 | 3 |
| DYS458 | 10 | 16 | 60.0 | 9 |
| DYS385a,b | 14 | 22 | 57.1 | 7 |
| DYS635 | 11 | 17 | 54.5 | 4 |
| DYS570 | 8 | 12 | 50.0 | 4 |
| DYS438 | 7 | 10 | 42.9 | 2 |
| DYS389I | 5 | 6 | 20.0 | 0 |
| DYS439 | 5 | 6 | 20.0 | 1 |
| DYS19 | 6 | 7 | 16.7 | 1 |
| DYS393 | 6 | 7 | 16.7 | 0 |
| Y-GATA-H4 | 6 | 7 | 16.7 | 1 |
| DYS533 | 8 | 9 | 12.5 | 2 |
| DYS643 | 9 | 10 | 11.1 | 2 |
| DYS392 | 8 | 8 | 0.0 | 0 |
| DYS456 | 5 | 5 | 0.0 | 0 |
| DYS549 | 6 | 6 | 0.0 | 0 |
| DYS576 | 9 | 9 | 0.0 | 1 |
| Total | 169 | 267 | 58.0 | 60 |
Abbreviations: Y-STR, Y-chromosomal short tandem repeat; CE, capillary electrophoresis; MPS, massively parallel sequencing.
STRs are listed in descending order of percentage increase in number of alleles, based on sequence-level information from MPS.
Sum of isometric allele groups of 23 Y-STRs analysed by MPS found in the sample set.
| # of MPS alleles found per single CE allele | ||||||||
|---|---|---|---|---|---|---|---|---|
| 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
| total # of isometric allele groups | 41 | 13 | 2 | 1 | – | 1 | 1 | 1 |
| # of Y-STRs with isometric allele groups | 19 | 10 | 2 | 1 | – | 1 | 1 | 1 |
Abbreviations: Y-STR, Y-chromosomal short tandem repeat; MPS, massively parallel sequencing; CE, capillary electrophoresis.
Isometric allele groups are alleles with the same fragment length, but showing different sequences.
List of novel Y-STR sequence variants defined by MPS.
| Y-STR | Y-STR definition; Novel sequence variants | Observed # | Aspects of novelty | |
|---|---|---|---|---|
| CE12_TCTA[13]a+b ccta[0] | 1 | SNP internal to repeat array, allele name is a + b − 1 for compatibility to CE | ||
| CE9_AAGG[5]GAAA[10] | 1 | new combination of repeat units; upstream flanking region previously considered non-variable, but shows high level of variation in number of repeats; therefore here considered part of the repeat array as AAGG[5–9] [23] also found AAGG[9]) | ||
| CE13_AAGG[5]GAAA[14] | 1 | |||
| CE15_AAGG[5]GAAA[16] | 1 | |||
| CE15_AAGG[8]GAAA[13] | 1 | |||
| CE16_AAGG[8]GAAA[14] | 1 | |||
| CE17_AAGG[5]GAAA[18] | 1 | |||
| CE18_AAGG[7]GAAA[17] | 2 | |||
| CE30_TAGA[11]CAGA[2]N[48]TAGA[13]CAGA[4] | 1 | shorter first CAGA array | ||
| CE30_TAGA[9]CAGA[3]N[48]TAGA[12]CAGA[6] | 2 | new combination of repeat units | ||
| CE31_TAGA[10]CAGA[3]N[48]TAGA[11]CAGA[1]TAGA[1]CAGA[5] | 1 | SNP internal to repeat array | ||
| CE34_TAGA[10]CAGA[3]N[48]TAGA[15]CAGA[6] | 1 | longer second TAGA array | ||
| CE22_TAGA[14]a+cCAGA[0]CAGA[8]TAGA[2] | 1 | SNP internal to repeat array | ||
| CE23_TAGA[5]CAGA[1]TAGA[9]CAGA[8]TAGA[2] | 1 | longer first TAGA array | ||
| CE24_TAGA[4]CAGA[1]TAGA[10]CAGA[10]TAGA[1] | 1 | longer second CAGA/shorter third TAGA array | ||
| CE24_TAGA[4]CAGA[1]TAGA[11]CAGA[7]TAGA[3] | 1 | longer third TAGA array | ||
| CE24_TAGA[4]CAGA[1]TAGA[11]CAGA[8]TAGA[1]GAGA[1] | 1 | SNP internal to repeat array | ||
| CE26_TAGA[4]CAGA[1]TAGA[12]CAGA[9]TAGA[2] | 2 | new combination of repeat units | ||
| CE8_TCTA[8]_+50C>A rs112815242 @11,982,182 M8738/CTS1866 | 2 | SNP in the flanking region | ||
| CE9_TCTA[9]_+50C>A rs112815242 @11,982,182 M8738/CTS1866 | 2 | SNP in the flanking region | ||
| CE10_TCTA[10]_+50C>A rs112815242 @11,982,182 M8738/CTS1866 | 2 | SNP in the flanking region | ||
| CE11_TCTA[11]_+50C>A rs112815242 @11,982,182 M8738/CTS1866 | 2 | SNP in the flanking region | ||
| CE11_TCTG[1]TCTA[10] | 1 | SNP internal to repeat array | ||
| CE12_TCTA[12]_+50C>A rs112815242 @11,982,182 M8738/CTS1866 | 1 | SNP in the flanking region | ||
| CE15_TCTG[1]TCTA[8]TCTG[2]TCTA[4] | 1 | SNP internal to repeat array | ||
| CE16_TCTA[6]TCTG[1]TCTA[3]TCTG[2]TCTA[4] | 1 | SNP internal to repeat array | ||
| CE8_TTTTC[8]_+21T>C rs761843885 @12,825,969 Z10613 | 1 | shorter array; SNP in the flanking region | ||
| CE11_TTTTC[11]_+7A>C rs760613324 @12,825,955 L255/PF4706 | 1 | SNP in the flanking region | ||
| CE11_GATA[11]_+3A>T SNP @12,403,567 | 1 | SNP in the flanking region | ||
| CE13_AGAGAT[5]N[42]AGAGAT[8] | 1 | shorter first AGAGAT array | ||
| CE19_AGAGAT[13]N[42]AGAGAT[6] | 1 | shorter second AGAGAT array | ||
| CE20.4_AGAGAT[3]AGAT[1]AGAGAT[9]N[42]AGAGAT[8] | 1 | indel in the repeat array | ||
| CE23_AGAGAT[14]N[42]AGAGAT[9] | 1 | new combination of repeat units | ||
| CE23_AGAGAT[15]N[42]AGAGAT[8] | 1 | longer first AGAGAT array | ||
| CE14_GAAA[13]GGAA[1] | 1 | SNP internal to repeat array | ||
| CE15_GAAA[14]GGAA[1] | 1 | SNP internal to repeat array | ||
| CE16_GAAA[15]GGAA[1] | 1 | SNP internal to repeat array | ||
| CE17_GAAA[17]_+32T>C rs549572931 @7,999,934 M11097 | 1 | SNP in the flanking region | ||
| CE17.2_GAAA[15]AA[1]GAAA[2] | 1 | indel in the repeat array | ||
| CE19_GAAA[19]_+32T>C rs549572931 @7,999,934 M11097 | 1 | SNP in the flanking region | ||
| CE19_GAAG[1]GAAA[18] | 1 | SNP internal to repeat array | ||
| CE19.2_GAAA[17]AA[1]GAAA[2] | 1 | indel in the repeat array | ||
| CE20_GAAA[19]GGAA[1] | 1 | SNP internal to repeat array | ||
| CE26_CTG[0]CTT[27] | 1 | new combination of repeat units | ||
| CE27_CTG[0]CTT[28] | 2 | new combination of repeat units | ||
| CE28_CTG[1]CTT[3]CCT[1]CTT[24] | 1 | SNP internal to repeat array | ||
| CE14.1_TATC[11]_−48.1->CTCTTCTAACTAT indel @16,281,301 | 1 | indel in the flanking region | ||
| CE15_TATC[15] | 1 | longer repeat unit in array | ||
| CE16_TTTC[16]_+4T>G rs763920632 @6,993,261 PH250 | 1 | SNP in the flanking region | ||
| CE17_TTCC[1]TTTC[16] | 1 | SNP internal to repeat array | ||
| CE17_TTTC[15]CTTC[1]TTTC[1] | 1 | SNP internal to repeat array | ||
| CE19_TTTC[5]TCTC[1]TTTC[13] | 1 | SNP internal to repeat array | ||
| CE17.1_AAAG[18]_+3AAA>− indel @7,185,388 | 1 | indel in the flanking region | ||
| CE18_TAGA[8]TACA[2]TAGA[2]TACA[2]TAGA[4] | 3 | new combination of repeat units | ||
| CE20_TAGA[8]CAGA[1]TAGA[1]TACA[2]TAGA[2]TACA[2]TAGA[4] | 1 | SNP internal to repeat array | ||
| CE21_TAGA[9]CAGA[1]TAGA[1]TACA[2]TAGA[2]TACA[2]TAGA[4] | 1 | SNP internal to repeat array | ||
| CE25_TAGA[14]TACA[3]TAGA[2]TACA[2]TAGA[4] | 1 | SNP internal to repeat array | ||
| CE11_CTTTT[11]_−7A>G SNP @15,314,125 | 1 | SNP in the flanking region | ||
| CE15_CTTTT[15] | 1 | longer repeat unit in array | ||
| CE13_TCTA[13]_+36A>G SNP @16,631,756 Y15322/Z34275 | 1 | SNP in the flanking region | ||
Abbreviations: Y-STR, Y-chromosomal short tandem repeat; MPS, massively parallel sequencing; SNP, single nucleotide polymorphism; CE, capillary electrophoresis.
For DYS19, DYS385a,b, DYS390 and DYS481, uncounted repeat units are denoted with lower-case letters within the Y-STR definition. GRCh38 chrY genomic positions are noted after the ‘@’ signs. rs# or names of SNPs/indels are provided where available.
These sequence variants, in phase with their flanking sequences, to the best of our knowledge, have not been described in the literature previously [16,17,[19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29]], strbase.nist.gov, accessed 02-Nov-2017. Comparison is detailed in Table S5.
Summary of MPS sequence variants showing sequences previously considered as non-variable flanking regions.
| Y-STR | Allele | Observed # | General structure of alleles including variable flanking sequences | CE allele name designation | Examples in this study | |
|---|---|---|---|---|---|---|
| DYS385a,b | canonical | 193 | n | CEU-NA12716 | CE11_ | |
| variant | 4 | n − 1 | kun-m82 | CE15_ | ||
| variant | 2 | n + 1 | TSI-NA20805 | CE10_ | ||
| variant | 2 | n + 2 | bkl-46 | CE15_ | ||
| variant | n + 3 | in [23] | ||||
| DYS481 | canonical | 87 | n | CEU-NA12716 | CE23_ | |
| variant | 9 | n + 1 | tur-1 | CE21_ | ||
| variant | 4 | n − 1 | bak-55 | CE27_ | ||
| DYS390 | canonical | 97 | TAGA[n]CAGA[o]TAGA[p]CAGA[q] | (n + o + p + q) | CEU-NA12716 | CE24_TAGA[4]CAGA[1]TAGA[11]CAGA[8] |
| variant | 2 | TAGA[n]CAGA[o]TAGA[p]CAGA[q] | (n + o + p + q) − 1 | bhu-1150 | CE24_TAGA[4]CAGA[1]TAGA[10]CAGA[10] | |
| variant | 1 | TAGA[n]CAGA[o]TAGA[p]CAGA[q] | (n + o + p + q) + 1 | bav-55 | CE24_TAGA[4]CAGA[1]TAGA[11]CAGA[7] | |
Abbreviations: MPS, massively parallel sequencing; Y-STR, Y-chromosomal short tandem repeat; CE, capillary electrophoresis.
The most frequent allele variants are denoted ‘canonical’; repeat units that show additional polymorphism are shown in bold.
Fig. 2Examples of observed RPVs in their phylogenetic contexts.
A phylogenetic tree is shown to the left, as in Fig. 1. a) Allele structures for DYS635 in all 100 samples. Repeat unit sequences are shown above, and boxes below contain the number of repeat units in each block, coloured by heat-map from blue (shortest) to red (longest). Invariant blocks are not coloured. SNPs and indels are highlighted by green and orange boxes respectively. Bars on the right mark features specifically mentioned in the text, and are coloured black for monophyletic, or grey for polyphyletic examples. Below is represented the reference sequence allele structure (‘ref.’) in GRCh38 chrY. To fully appreciate the colours of the heat-map, please, consult the online version of the figure. b) Allele structures for DYS389II; c) Allele structures for DYS481.