Literature DB >> 35784409

Sequencing of human identification markers in an Uyghur population using the MiSeq FGx^TM Forensic Genomics System.

Halimureti Simayijiang^1,2, Niels Morling¹, Claus Børsting¹.

Abstract

Massively parallel sequencing (MPS) offers a useful alternative to capillary electrophoresis (CE) based analysis of human identification markers in forensic genetics. By sequencing short tandem repeats (STRs) instead of determining the fragment lengths by CE, the sequence variation within the repeat region and the flanking regions may be identified. In this study, we typed 264 Uyghur individuals using the MiSeq FGx™ Forensic Genomics System and Primer Mix A of the ForenSeq™ DNA Signature Prep Kit that amplifies 27 autosomal STRs, 25 Y-STRs, seven X-STRs, and 94 HID-SNPs. STRinNGS v.1.0 and GATK 3.6 were used to analyse the STR regions and HID-SNPs, respectively. Increased allelic diversity was observed for 33 STRs with the PCR-MPS assay. The largest increases were found in DYS389II and D12S391, where the numbers of sequenced alleles were 3-4 times larger than those of alleles determined by repeat length alone. A relatively large number of flanking region variants (28 SNPs and three InDels) were observed in the Uyghur population. Seventeen of the flanking region SNPs were rare, and 12 of these SNPs had no accession number in dbSNP. The combined mean match probability and typical paternity index based on 26 sequenced autosomal STRs were 3.85E-36 and 1.49E + 16, respectively. This was 10 000 times lower and 1 000 times higher, respectively, than the same parameters calculated from STR repeat lengths.Key PointsSequencing data on STRs and SNPs used for human identification are presented for the Uyghur population.STRinNGS v.1.0 was used to analyse the flanking regions of STRs.The concordance between PCR-CE and PCR-MPS results was 99.86%.Detection of sequence variation in STRs and their flanking regions increased the allelic diversity.

Entities: Chemical

Keywords: ForenSeq™ DNA Signature Prep Kit; Forensic sciences; Uyghur; forensic genetics; massively parallel sequencing (MPS); short tandem repeat (STR); single nucleotide polymorphism (SNP)

Year: 2020 PMID： 35784409 PMCID： PMC9246034 DOI： 10.1080/20961790.2020.1779967

Source DB: PubMed Journal: Forensic Sci Res ISSN： 2471-1411

Introduction

DNA regions with tandemly arranged nucleotide repeat units (2–7 bp in length), known as short tandem repeats (STRs), are highly variable, which makes them very useful for human identification and relationship casework. Most forensic genetic DNA laboratories utilize capillary electrophoresis (CE) based analysis of PCR products to identify fragment length variation of STR markers [1]. However, CE-based STR typing has relatively low multiplexing capability and cannot identify sequence variation in STRs or STR flanking regions. The use of massively parallel sequencing (MPS) technologies overcomes these limitations [2]. With MPS detection, STR alleles of the same length but with different sequences, and haplotypes consisting of the STR and flanking region single nucleotide polymorphisms (SNPs) or insertion/deletions (InDels) may be identified. Furthermore, it is possible to investigate hundreds of targeted regions (with STRs, SNPs, or InDels) simultaneously in a relatively short time, and to construct PCR-MPS assays with short amplicons that may be amplified from highly degraded DNA or RNA molecules. Several commercial sequencing panels were developed for human identification, including the ForenSeq™ DNA Signature Prep Kit (Verogen®, San Diego, CA, USA), the Precision ID GlobalFiler NGS STR Panel (Thermo Fisher Scientific, Waltham, MA, USA), and the PowerSeq™ 46GY System (Promega, Madison, WI, USA). These assays were tested rigorously by different forensic genetic laboratories, and they produced sequencing results that were concordant with CE-based STR typing assays at a level of sensitivity that is comparable to that of the currently used PCR-CE methods [3-18]. Xinjiang means “New Frontier” in Chinese. In 1955, the Xinjiang Uyghur Autonomous Region was established. It borders eight countries to the north and west of China, and it is the largest region in the People’s Republic of China. Uyghurs are one of China’s 55 officially recognized ethnic minorities, and they primarily live in the “Tarim Basin” in the south-western part of Xinjiang. The Uyghurs constitute approximately half of the 24.9 million people in Xinjiang [19]. In this study, we typed 264 Uyghur individuals from the Xinjiang Uyghur Autonomous Region using the ForenSeqTM DNA Signature Prep Kit. The samples were amplified with Primer Mix A, which includes primer sets for 27 autosomal STRs, seven X-STRs, 25 Y-STRs, and 94 HID-SNPs. The aims of this study were to (1) assess the performance of the MiSeq FGxTM Forensic Genomics System (Verogen®) by assessing the read depth of the markers and allele balances, (2) compare the results with the previously published data generated by PCR-CE, (3) analyse STR sequence diversity, (4) analyse sequence variation in STR flanking regions, and (5) generate allele frequency data for the Uyghur population.

Materials and methods

Samples, DNA extraction, and DNA quantification

Blood samples were collected on FTA cards with written informed consent from 264 unrelated individuals of the Uyghur ethnic groups living in the Urumqi metropolitan area, Xinjiang Uyghur Autonomous Region. All samples were subsequently anonymized. Genomic DNA was extracted as previously described [20]. Extracted DNA was quantified using the Qubit™ dsDNA HS (High Sensitivity) Assay Kit and the Qubit® 3.0 Fluorometer (Invitrogen, Carlsbad, CA, USA) according to the manufacturer’s recommendations.

Sequencing

Samples were amplified with the ForenSeq™ DNA Signature Prep Kit according to the manufacturer’s recommendations using Primer Mix A and approximately 1 ng DNA input. The libraries were sequenced using the MiSeq FGx™ instrument following the protocol of the manufacturer. A total of 96, 95, and 79 libraries were pooled and sequenced in three separate MiSeq runs, respectively. One positive and one negative control were included in each run. The read depth ranged from 31 209 to 141 197 reads per sample with median read depths of 100 116, 95 233, and 52 783 reads in each of the three runs, respectively. The cluster densities were 1 141, 1 201, and 694 k/mm2, and the clusters passing filters were 93.5%, 92.7%, and 96.7% for each of the three runs, respectively. A total of 19 samples with rare flanking variants were typed twice for confirmation.

STR analysis

Fastq.gz files from the MiSeq FGx™ sequencing were analysed with STRinNGS v.1.0 [21]. Hg19 (GRCh37) was used for alignment. The configuration files [11] for the STRinNGS v.1.0 software were changed slightly to incorporate the most recent information of each STR [22-25] (Supplementary material File S1). New STR motifs were named according to the STR sequence guidelines [22]. The output data from STRinNGS were manually reanalyzed to correct for high stutter ratios. DYS461 was amplified on the same fragment as DYS460 and included in the analysis with STRinNGS. Locus balances and heterozygote balances (Hb) were calculated and plotted using ggplot2 [26] in R version 3.4.3 (https://www.r-project.org/). All sequencing runs were also analysed with the ForenSeq™ Universal Analysis Software (UAS) v1.3 using default settings.

GATK SNP analysis

HID-SNP genotypes were obtained using GATK 3.6 (https://software.broadinstitute.org/gatk/). The in-house developed pipeline included four main processes: (1) quality trimming (Q-score = 30) of the reads obtained from the fastq.gz files using AdapterRemoval v.2.1.3 [27], (2) mapping to the hg19 (GRCh37) using BWA-MEM (http://bio-bwa.sourceforge.net/), (3) extraction of the aligned reads using SAMtools [28], and (4) genotype calling using GATK 3.6. The generated genotypes were analysed using two different genotype acceptance criteria. First, we used the following genotype acceptance criteria: minimum locus coverage ≥ 100 reads, and 0.3 ≤ heterozygote balance ≤ 3. An average of 32.7 (34.8%) locus dropouts per sample was observed. Second, we reduced the minimum locus coverage to 45 reads. Furthermore, genotypes with locus coverage between 40 and 45 were accepted if the heterozygote balance was between 0.7 and 1.5, or if the number of noise reads was 0 or 1 for homozygous genotype calls. Three of the 94 HID-SNPs (rs826472, rs2269355, and rs1736442) were not typed for any of the samples. Dropouts were frequently observed in rs1015250 (242 dropouts), rs2920816 (234 dropouts), rs7041158 (233 dropouts), rs1357617 (209 dropouts), rs729172 (191 dropouts), rs1031825 (176 dropouts), rs1493232 (161 dropouts), rs2342747 (131 dropouts), rs2076848 (123 dropouts), rs13182883 (97 dropouts), rs13218440 (96 dropouts), and rs740598 (92 dropouts). The loci with frequent dropouts were excluded in the downstream analysis. An average of 2.1 (2.7%) locus dropouts per sample was observed for the remaining 79 HID-SNPs.

Population genetic analyses

Allele frequencies were calculated using the counting method or the Arlequin v3.5 software [29]. Tests for Hardy-Weinberg equilibrium (HWE) and linkage disequilibrium (LD) were performed for the 26 autosomal STRs in all 264 Uyghur individuals, the six X-STRs in 138 female Uyghur individuals, and the 79 of the 94 HID-SNPs in all Uyghur individuals using the Arlequin v3.5 software. The combined probabilities of exclusion (cPE), the combined matching probabilities (cMP), the combined trio paternity indices (cPItrio), and the combined duo paternity indices (cPIduo) were calculated using DNAVIEW v. 28.103 (http://dna-view.com/). Y-STR haplogroups of 126 male individuals were predicted based on the data of 21 Y-STR loci (DYS390, DYS19, DYS391, DYS385a-b, DYS439, DYS389I, DYS389II, DYS392, DYS437, DYS448, DYS460, Y-GATA-H4, DYS576, DYS570, DYS438, DYS481, DYS461, DYS505, DYS522, DYS533, and DYS643) using the Haplogroup Predictor and 27-haplogroup programme (http://www.hprg.com/hapest5).

Results and discussion

STR allelic diversity

Fifty-nine STR markers (27 autosomal STRs, seven X-STRs, and 25 Y-STRs) were sequenced in 264 Uyghur individuals (126 males and 138 females). The identified alleles and their frequencies are shown in Supplementary material Table S1. The median read depth for the STRs varied from 67 (DXS10103) to 4 782 (TH01) reads (Supplementary material Figure S1). The heterozygote balance (Hb) was calculated as the read count of the longest allele divided by that of the shortest allele among all heterozygous genotypes (Figure 1). For most loci, the median Hb was close to or slightly smaller than 1, indicating that the sequencing assay tended to produce more reads for shorter alleles. This phenomenon was previously observed [11]. D22S1045 was the only STR locus with a median Hb below 0.5. DXS10103 and D22S1045 data were excluded in the downstream analyses because of frequent locus and allele dropouts caused by low read depths and/or skewed Hb.

Figure 1.

Box-and-whisker plot of the heterozygote balance for each STR loci. Outliers are indicated by dots. X-STRs and Y-STRs are indicated in grey and dark grey colours, respectively.

Box-and-whisker plot of the heterozygote balance for each STR loci. Outliers are indicated by dots. X-STRs and Y-STRs are indicated in grey and dark grey colours, respectively. An unusual number of alleles were identified in a few Uyghur individuals. In DYS19, DYS460, and DYS549, six, two, and one individual, respectively, had two Y-STR alleles.

Concordance test

The STR sequence data of 264 individuals typed with the MiSeq FGx™ Forensic Genomics System were compared to PCR-CE data of 14 autosomal STR loci (D1S1656, D2S441, D2S1338, D3S1358, FGA, D8S1179, D10S1248, TH01, D12S391, vWA, D16S539, D18S51, D19S433, and D21S11) generated with the AmpFℓSTR® NGM SElect™ Kit [30]. Discordances were observed in five of the 3 696 comparisons (0.14%). Two of the five genotype mismatches were caused by a 2-bp deletion in the upstream flanking region of D19S433. The alleles were called as 13.2 by both PCR-CE analysis and by the ForenSeq™ UAS. However, the alleles had 14 repeats and should be named D19S433 [CCTT]12 ccta CCTT cttt CCTT del:30417136-7. The three other mismatches (in D2S441, FGA, and D12S391) were caused by allele dropouts in the PCR-CE assay.

Variations in STR and their flanking regions

Sequence variations in the STR region increased the allelic diversity of 19 autosomal STRs, two X-STRs, and 13 Y-STRs when compared to the allele diversity obtained with fragment length analyses (Table 1 and Supplementary material Table S1). Variations in the flanking regions were observed in 22 STRs (Tables 1 and 2). Six of these STRs had only length variation of the STR locus. No increase in the number of alleles was observed in three autosomal STRs, three X-STRs, and 12 Y-STRs. The largest increases in the numbers of alleles were observed in DYS389 (317%) and D12S391 (234%). DYS389 consists of two compound STR regions with the same repeat structure (TAGA[+] CAGA[+]) separated by 48 nucleotides. D12S391 is a compound STR with three variable sub-repeats. As shown in Table 1, complex and compound STRs generally had the largest increases in allelic diversity, whereas the increase was smaller in simple STRs. This has also been observed in other populations [3,7-9,12,15].

Table 1.

Number of observed alleles in the Uyghur population.

		Number of alleles
STR locus	Chromosome	Based on the number of repeats	Based on the STR sequence variation	Based on the sequence variation in the STR and flanks	Increase in number of alleles (%)
Autosomal STRs
D12S391	12	17	55	55	234
D21S11^a	21	15	41	42	180
D2S1338	2	14	34	34	143
D3S1358	3	7	17	17	143
D16S539^a	16	8	10	18	125
D7S820^a	7	9	12	20	122
vWA	12	8	17	17	113
D5S818^a	5	7	8	14	100
D13S317^a	13	8	8	16	100
D8S1179	8	11	21	21	91
D9S1122	9	8	15	15	88
D2S441^a	2	11	16	20	82
D20S482^a	20	9	9	13	44
D1S1656	1	15	21	21	40
D4S2408	4	6	8	8	33
D17S1301	17	10	13	13	30
D6S1043^a	6	18	21	22	22
CSF1PO	5	7	8	8	14
D18S51	18	17	19	19	12
FGA	4	19	21	21	11
Penta D^a	21	11	11	12	9
D19S433^a	19	14	14	15	7
Penta E	15	20	21	21	5
TPOX	2	7	7	7	0
D10S1248	10	9	9	9	0
TH01	11	7	7	7	0
X-STRs
DXS10135^a	X	23	61	67	191
DXS10074^a	X	13	16	19	46
DXS7132^a	X	8	8	11	38
DXS8378	X	7	7	7	0
DXS7423	X	5	5	5	0
HPRTB	X	9	9	9	0
Y-STRs
DYS389^b	Y	6	25	25	317
1st STR region^c	Y	3	3	3	0
2nd STR region	Y	5	12	12	140
DYF387S1	Y	10	33	33	230
DYS460/461^a^,d	Y	7	17	19	171
1st STR region^e	Y	6	6	6	0
2nd STR region^a^,f	Y	4	4	6	50
DYS448^a	Y	6	13	15	150
DYS481^a	Y	10	15	21	110
DYS390^a	Y	6	11	12	100
DYS612^a	Y	9	15	16	78
DYS437^a	Y	4	6	7	75
DYS635	Y	9	13	13	44
DYS438^a	Y	6	7	8	33
DYS19	Y	5	6	6	20
Y-GATA-H4^a	Y	5	5	6	20
DYS643	Y	6	7	7	17
DYS570	Y	8	8	8	0
DYS385a-b	Y	14	14	14	0
DYS391	Y	4	4	4	0
DYS392	Y	7	7	7	0
DYS439	Y	6	6	6	0
DYS505	Y	7	7	7	0
DYS522	Y	6	6	6	0
DYS533	Y	5	5	5	0
DYS549	Y	4	4	4	0
DYS576	Y	8	8	8	0

aLoci with variations in the flanking regions.

bThe DYS389 locus consists of two compound STR regions. The marker conventionally referred to as DYS389II includes both STRs.

cConventionally referred to as DYS389I.

dThe amplicons included two STRs: DYS460 and DYS461.

eReferred to as DYS461.

fReferred to as DYS460.

Table 2.

SNPs and InDels observed in the STR flanking regions.

Locus	STR locus	Upstream/ downstream	Chromosome	Position^a	Most frequent allele	Least frequent allele	Minor allele frequency
rs74640515	D2S441	Upstream	2	68,239,054	G	A	0.080
68239142G > A	D2S441	Downstream	2	68,239,142	G	A	0.002
rs73801920	D5S818	Upstream	5	123,111,246	C	A	0.165
92449928C > T	D6S1043	Upstream	6	92,449,928	C	T	0.004
rs7789995	D7S820	Upstream	7	83,789,520	T	A	0.068
del:83789519	D7S820	Upstream	7	83,789,519	T	del:83,789,519	0.002
rs16887642	D7S820	Downstream	7	83,789,602	G	A	0.157
rs75219269	vWA	Upstream	12	6,093,136	A	G	0.127
rs73250432	D13S317	Upstream	13	82,722,135	C	T	0.008
rs9546005	D13S317	Downstream	13	82,722,204	T	A	0.508
rs202043589	D13S317	Downstream	13	82,722,208	A	T	0.034
rs11642858	D16S539	Downstream	16	86,386,367	A	C	0.288
86386297A > G	D16S539	Upstream	16	86,386,297	A	G	0.002
rs563997442	D16S539	Upstream	16	86,386,298	C	G	0.009
rs745607776	D19S433	Upstream	19	30,417,136-7	CT	del:30,417,136-7	0.004
rs77560248	D20S482	Upstream	20	4,506,326	C	T	0.070
rs561985213	D20S482	Upstream	20	4,506,327	G	A	0.002
20554419C > T	D21S11	Downstream	21	20,554,419	C	T	0.002
rs186259515	Penta D	Downstream	21	45,056,154	A	G	0.004
64655583C > T	DXS7132	Downstream	X	64,655,583	C	T	0.020
rs56195635	DXS10074	Upstream	X	66,977,164	C	G	0.002
rs771349963	DXS10074	Upstream	X	66,977,180	G	A	0.012
del:9306454-6	DXS10135	Downstream	X	9,306,454-6	AGA	del:9,306,454-6	0.047
rs368663163	DYS481	Upstream	Y	8,426,362	G	A	0.079
14467152G > A	DYS437	Downstream	Y	14,467,152	G	A	0.008
14937880A > C	DYS438	Downstream	Y	14,937,880	A	C	0.016
rs758940870	DYS390	Downstream	Y	17,275,043	T	C	0.016
15752741T > C	DYS612	Downstream	Y	15,752,741	T	C	0.008
21050775T > C	DYS460	Upstream	Y	21,050,775	T	C	0.008
21050824T > A	DYS460	Upstream	Y	21,050,824	T	A	0.008
24365062A > G	DYS448	Upstream	Y	24,365,062	A	G	0.032
18743636A > G	Y-GATA-H4	Downstream	Y	18,743,636	A	G	0.008

aPositions in the GRCh37 genome build.

Number of observed alleles in the Uyghur population. aLoci with variations in the flanking regions. bThe DYS389 locus consists of two compound STR regions. The marker conventionally referred to as DYS389II includes both STRs. cConventionally referred to as DYS389I. dThe amplicons included two STRs: DYS460 and DYS461. eReferred to as DYS461. fReferred to as DYS460. SNPs and InDels observed in the STR flanking regions. aPositions in the GRCh37 genome build. We analysed DYS389 as one marker containing both STR regions, which corresponds to the marker known as DYS389II (Table 1). Similarly, we analysed DYS460 and DYS461 as one STR (Table 1). These two STRs are separated by 104 nucleotides but amplified on the same fragment with the ForenSeq™ DNA Signature Prep Kit. The allelic diversity of the complex STRs DYS389 and DYS460/DYS461 were much higher than those of the individual repeat regions (Table 1), which makes the complex STR loci better for human identification than the individual ones. Twenty-eight SNPs and three InDels were identified in the flanking regions of 22 STRs (Tables 1 and 2). Of these, 11 SNPs and one InDel were also found in our previous analysis of STR sequences in 363 Danes, whereas two SNPs with variation in Danes (rs577490589 and rs563636310 in D6S1043 and D10S1248, respectively) were not variable in Uyghurs [12]. In a study of 62 Yavapai Native Americans typed with the ForenSeq™ DNA Signature Prep Kit [8], only 11 SNPs were observed in the flanking regions. Seven of the most frequent variants, e.g. rs9546005 and rs11642858 in the downstream flanks of D13S317 and D16S539, respectively, were also found in our study of Uyghurs. Similarly, the most frequent variants identified with the Precision ID GlobalFiler NGS STR Panel v2 in 496 Spanish individuals [15] and with a custom 23-STR panel in 250 Koreans [31] were also found in Uyghurs. However, there were two important exceptions: rs4847015 and rs25768 in D1S1656 and D5S818, respectively. These polymorphic SNPs could not be identified with the ForenSeq™ DNA Signature Prep Kit because they were positioned in PCR primer binding sites used for the first PCR. The distribution of InDel-STR and/or SNP-STR alleles are shown in Supplementary Figure S2. In 15 of the 22 loci with flanking region variations, the variations were found in combination with sequence variation in the STR repeat region. The highest increase in the numbers of alleles due to variation in the flanking regions were observed in D7S820, D13S317, and D16S539 (eight alleles). Uyghurs is an admixed population with mainly European and East Asian backgrounds [32], and it was not surprising to find more flanking region variants in this population than in Danes [12]. However, 18 of the flanking region variants observed in Uyghurs were rare and only found in one to four individuals (Table 2). Furthermore, 12 of them had no accession number in dbSNP and may have been observed for the first time in this work. These variants were studied in more detail using IGV 2.4.14 [33]. The allele calls were based on a minimum of 100 reads, and all the flanking region variants were also identifiable in the stutter artefacts.

Considerations of variations in STR flanking regions and nomenclature

The STRAND working group under the International Society of Forensic Genetics is currently working on continued collection of STR sequence information and the development of a common STR nomenclature system [9,23,24]. The “Forensic STR Sequence Structure Guide”, accessible from the STRidER homepage (https://strider.online/nomenclature), contains highly useful information on the STRs and known flanking region variants. It also reveals many variants (SNPs and InDels) near the commonly used STRs, e.g. five SNPs and two deletions within 25 nucleotides on either side of the D13S317 locus. In this work, we found 12 additional flanking region variants by typing 57 STRs in 264 individuals from a population that was rarely studied. This indicates that many more variants exist and will be identified as more populations are sequenced. The current STR nomenclature guidelines [9] do not include SNP-STR or InDel-STR haplotype nomenclatures. In Supplementary material Table S1, we added the SNP allele information to the STR name [34]. This was done whether or not the SNP allele was identical to the reference genome to underline that the SNP allele was positioned on the same strand as the STR allele. However, it makes some of the SNP-STR names very long, which may be impractical for case work reporting. Therefore, a simpler nomenclature for reporting SNP-STR haplotypes should be considered. Flanking regions hold less information than the STRs. Nevertheless, flanking region analysis is important. It is especially important to identify InDels because they may affect backward compatibility with older PCR-CE results, as exemplified with the two inconsistencies in D19S433 discussed above. Furthermore, SNPs may provide important information in mixture analyses, and some SNP alleles affect the repeat number, e.g. rs186259515 A > G in Penta D and rs73801920 C > A in D5S818, and may as a consequence affect the read depths of stutter artefacts.

Population genetic analyses and forensic statistic parameters

No statistically significant deviation from HWE was observed (P < 0.0019 and P < 0.0083 for 26 autosomal STRs and six X-STRs, respectively), and no LD (P < 0.00015 and P < 0.0031 for 26 autosomal STRs and six X-STRs, respectively) between loci was found after Bonferroni correction (Supplementary material Tables S2 and S3). Every male individual had a unique Y-STR haplotype. A total of 121 of the 126 males were assigned to a haplogroup with a probability above 0.8 using Haplogroup Predictor. The haplogroups R1a and Q were the most common and were found in 29 (24.0%) and 19 (15.7%) individuals, respectively. Fourteen G2a (11.6%), 12 J1 (9.9%), 12 I2a (xl2a1) (9.9%), 10 L (8.3%), nine R1b (7.4%), seven E1b1b (5.8%), seven J2b (5.8%), one I1 (0.8%), and one H (0.8%) haplogroup were also observed in the Uyghur population. The haplogroup R1a is associated primarily with European and South/Central Asian ancestry [35], and Q is the predominant haplogroup in Native Americans, Central Asians, and Northern Siberians. The observed haplogroups were consistent with our expectations since the Uyghur population is an admixed population of mainly European and East Asian ancestry [32,36,37]. A study of Y-SNPs identified high frequencies of R, C, J, Q, and O haplogroups in the Uyghur population [38], while another study on Y-STRs showed high frequencies of the I and J haplogroups [39]. No East Asian Y-haplogroup was found in our Uyghur population, indicating that the collected samples were unaffected by the recent migration of Han Chinese to Xinjiang [40,41]. The combined match probabilities for the 26 autosomal STR loci were 7.69E−32 based on the numbers of repeats, and 3.85E−36 based on sequence variation in the STR and flanks (Supplementary material Table S4). The typical paternity indices for the combined set of 26 autosomal STR loci were 1.49E + 16 and 2.04E + 13 with sequence- and length-based alleles, respectively. Similar numbers were obtained for Danes in a previous study [12].

HID-SNPs

The ForenSeq™ DNA Signature Prep Kit amplified 94 HID-SNPs [42,43]. However, the read depths of 15 loci were too low in many samples, and dropouts were frequent. Allele frequencies were calculated for the remaining 79 HID-SNPs (Supplementary material Table S5(A)). The observed heterozygosity (Ho) ranged from 0.19697 (rs938283) to 0.52874 (rs354439), and the expected heterozygosity (He) ranged from 0.19593 (rs938283) to 0.50264 (rs1498553). No deviation from HWE was observed for any of the 79 loci after Bonferroni correction (P < 0.001). A total of 118 pairs of HID-SNP loci were in linkage disequilibrium after Bonferroni correction (P < 0.00001623) (Supplementary material Table S5(B)). Three of these pairs were located on the same chromosome. However, the pairs were not in LD in Danes [12].

Conclusion

In this study, we typed 264 Uyghurs using the recommended protocol for the ForenSeq™ DNA Signature Prep Kit. Except for two of the 59 STRs and 15 of the 94 HID-SNPs, the loci were efficiently amplified and sequenced. The longest alleles of D22S1045 were poorly amplified compared to the shortest ones, which resulted in skewed heterozygote balances and made genotype calling problematic. For DXS10103 and the 15 poorly performing HID-SNPs, the read depths were too low for many samples, and we decided not to analyse them further. Detection of the complete sequence variation in the STR and their flanking regions increased the allelic diversity of 33 of the 57 remaining STRs compared to the standard-length based PCR-CE methods usually applied in forensic genetic laboratories. The additional information for these loci increased the power of discrimination, which is one of the important advantages PCR-MPS assays may offer to the forensic genetic community. The concordance between PCR-CE and PCR-MPS results of 14 autosomal STRs was 99.86%. Two discordances in D19S433 were caused by a well-known CT deletion upstream of the STR. These examples underlined the importance of analysing the flanking regions of the STRs to ensure continued backward compatibility with older PCR-CE results. A relatively high number of variants in the flanking regions (28 SNPs and three InDels) were observed in the Uyghur population. This was expected since Uyghurs have both European and East Asian origin, and variants present in populations from both continents may be observed in the individuals from Xinjiang. All the rare variants were confirmed by genotyping the individuals a second time, and some of these variants may have been observed for the first time in this work. Click here for additional data file.

38 in total

1. A multiplex assay with 52 single nucleotide polymorphisms for human identification.

Authors: Juan J Sanchez; Chris Phillips; Claus Børsting; Kinga Balogh; Magdalena Bogus; Manuel Fondevila; Cheryl D Harrison; Esther Musgrave-Brown; Antonio Salas; Denise Syndercombe-Court; Peter M Schneider; Angel Carracedo; Niels Morling
Journal: Electrophoresis Date: 2006-05 Impact factor: 3.535

2. Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows.

Authors: Laurent Excoffier; Heidi E L Lischer
Journal: Mol Ecol Resour Date: 2010-03-01 Impact factor: 7.090

3. Flanking region variation of ForenSeq™ DNA Signature Prep Kit STR and SNP loci in Yavapai Native Americans.

Authors: Frank R Wendt; Jonathan L King; Nicole M M Novroski; Jennifer D Churchill; Jillian Ng; Robert F Oldt; Kelly L McCulloh; Jessica A Weise; David Glenn Smith; Sreetharan Kanthaswamy; Bruce Budowle
Journal: Forensic Sci Int Genet Date: 2017-02-27 Impact factor: 4.882

4. The Danish STR sequence database: duplicate typing of 363 Danes with the ForenSeq™ DNA Signature Prep Kit.

Authors: C Hussing; R Bytyci; C Huber; N Morling; C Børsting
Journal: Int J Legal Med Date: 2018-05-24 Impact factor: 2.686

5. Second generation sequencing of three STRs D3S1358, D12S391 and D21S11 in Danes and a new nomenclature for sequenced STR alleles.

Authors: Chiara Gelardi; Eszter Rockenbauer; Sigrun Dalsgaard; Claus Børsting; Niels Morling
Journal: Forensic Sci Int Genet Date: 2014-05-10 Impact factor: 4.882

6. Massively parallel sequencing of forensic STRs and SNPs using the Illumina^® ForenSeq™ DNA Signature Prep Kit on the MiSeq FGx™ Forensic Genomics System.

Authors: Fei Guo; Jiao Yu; Lu Zhang; Jun Li
Journal: Forensic Sci Int Genet Date: 2017-09-08 Impact factor: 4.882

7. Sequence-based diversity of 23 autosomal STR loci in Koreans investigated using an in-house massively parallel sequencing panel.

Authors: Eun Hye Kim; Hwan Young Lee; So Yeun Kwon; Eun Young Lee; Woo Ick Yang; Kyoung-Jin Shin
Journal: Forensic Sci Int Genet Date: 2017-07-09 Impact factor: 4.882

8. "The devil's in the detail": Release of an expanded, enhanced and dynamically revised forensic STR Sequence Guide.

Authors: C Phillips; K Butler Gettings; J L King; D Ballard; M Bodner; L Borsuk; W Parson
Journal: Forensic Sci Int Genet Date: 2018-02-21 Impact factor: 4.882

9. The Eurasian heartland: a continental perspective on Y-chromosome diversity.

Authors: R S Wells; N Yuldasheva; R Ruzibakiev; P A Underhill; I Evseeva; J Blue-Smith; L Jin; B Su; R Pitchappan; S Shanmugalakshmi; K Balakrishnan; M Read; N M Pearson; T Zerjal; M T Webster; I Zholoshvili; E Jamarjashvili; S Gambarov; B Nikbin; A Dostiev; O Aknazarov; P Zalloua; I Tsoy; M Kitaev; M Mirrakhimov; A Chariev; W F Bodmer
Journal: Proc Natl Acad Sci U S A Date: 2001-08-28 Impact factor: 11.205

10. Developmental validation of the MiSeq FGx Forensic Genomics System for Targeted Next Generation Sequencing in Forensic DNA Casework and Database Laboratories.

Authors: Anne C Jäger; Michelle L Alvarez; Carey P Davis; Ernesto Guzmán; Yonmee Han; Lisa Way; Paulina Walichiewicz; David Silva; Nguyen Pham; Glorianna Caves; Jocelyne Bruand; Felix Schlesinger; Stephanie J K Pond; Joe Varlaro; Kathryn M Stephens; Cydne L Holt
Journal: Forensic Sci Int Genet Date: 2017-01-27 Impact factor: 4.882