| Literature DB >> 34436655 |
Petra Hölzl-Müller1, Martin Bodner1, Burkhard Berger1, Walther Parson2,3.
Abstract
Here, we present the results from a population study that evaluated the performance of massively parallel sequencing (MPS) of short tandem repeats (STRs) with a particular focus on DNA intelligence databasing purposes. To meet this objective, 247 randomly selected reference samples, earlier being processed with conventional capillary electrophoretic (CE) STR sizing from the Austrian National DNA Database, were reanalyzed with the PowerSeq 46Y kit (Promega). This sample set provides MPS-based population data valid for the Austrian population to increase the body of sequence-based STR variation. The study addressed forensically relevant parameters, such as concordance and backward compatibility to extant amplicon-based genotypes, sequence-based stutter ratios, and relative marker performance. Of the 22 autosomal STR loci included in the PowerSeq 46GY panel, 99.98% of the allele calls were concordant between MPS and CE. Moreover, 25 new sequence variants from 15 markers were found in the Austrian dataset that are yet undescribed in the STRSeq online catalogue and were submitted for inclusion. Despite the high degree of concordance between MPS and CE derived genotypes, our results demonstrate the need for a harmonized allele nomenclature system that is equally applicable to both technologies, but at the same time can take advantage of the increased information content of MPS. This appears to be particularly important with regard to database applications in order to prevent false exclusions due to varying allele naming based on different analysis platforms and ensures backward compatibility.Entities:
Keywords: Allele frequencies; Autosomal short tandem repeats; Massively parallel sequencing; Reference samples; Sequence-based population data
Mesh:
Substances:
Year: 2021 PMID: 34436655 PMCID: PMC8523457 DOI: 10.1007/s00414-021-02685-x
Source DB: PubMed Journal: Int J Legal Med ISSN: 0937-9827 Impact factor: 2.686
Run and quality metrics information for five sequencing runs. Sequencing was performed using the PowerSeq 46GY kit (Promega, USA) on a MiSeq FGx instument (Verogen, USA) according to the manufacturer’s recommendations
| Cluster density (K/mm2) | Cluster passing filter (PF; %) | Phasing (%) | Pre-phasing (%) | Total no. of reads | Total no. of reads PF | % ≥ Q30 | Number of samples/run (including two controls) | |
|---|---|---|---|---|---|---|---|---|
| Run 1 | 495 ± 14 | 97.29 ± 0.18 | 0.107 | 0.134 | 9,739,105 | 9,474,979 | 90.0 | 50 |
| Run 2 | 1466 ± 22 | 82.47 ± 1.45 | 0.124 | 0.116 | 27,486,348 | 22,667,346 | 80.8 | 50 |
| Run 3 | 1141 ± 25 | 89.26 ± 0.58 | 0.127 | 0.121 | 21,667,218 | 19,339,568 | 81.8 | 50 |
| Run 4 | 1202 ± 27 | 87.29 ± 1.39 | 0.114 | 0.125 | 22,705,688 | 19,821,280 | 79.5 | 50 |
| Run 5 | 920 ± 20 | 89.92 ± 1.34 | 0.113 | 0.138 | 17,228,100 | 15,489,251 | 84.0 | 58 |
Fig. 1Relative marker performance showing box-whisker plots of the PowerSeq 46GY kit comprising 23 markers (including amelogenin). The expected value (dotted line, mean shown as “ + ”) marks the proportion of reads for a given marker, assuming 100% equally performing markers included in the PowerSeq 46GY panel (n = 23; expected value = 4.35%). The standard deviation for the mean relative marker performance was 1.23%
Global stutter analysis was performed for 22 forensic markers on a subset of the Austrian population sample (n = 50). Samples were selected according to the total number of reads and divided into two categories (selection criteria: category I: ≤ 63,500 reads; category II: ≥ 199,000 reads). Bold numbers denote stutter values exceeding 20%. In general, mean stutter values were comparable to CE-based stutter heights
| Global stutter analysis — category I | Global stutter analysis — category II | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Marker | Mean | SD | Median | Min | Max | Stutter > 20% ( | Mean | SD | Median | Min | Max | Stutter > 20% ( | ||
| D22S1045 | 30 | 15.2 | 3.9 | 14.9 | 6.7 | 2 | 39 | 13.4 | 4.8 | 14.0 | 6.4 | 2 | ||
| D18S51 | 40 | 13.1 | 3.4 | 13.4 | 8.5 | 2 | 44 | 12.2 | 2.8 | 12.0 | 7.8 | 18.8 | ||
| D1S1656 | 46 | 14.0 | 3.6 | 13.1 | 9.1 | 5 | 45 | 14.3 | 3.9 | 13.0 | 8.3 | 4 | ||
| D2S1338 | 43 | 13.3 | 3.6 | 12.7 | 7.1 | 3 | 46 | 12.9 | 3.2 | 12.2 | 8.0 | 1 | ||
| D19S433 | 37 | 12.9 | 2.7 | 12.8 | 8.4 | 1 | 36 | 14.1 | 2.4 | 13.4 | 10.1 | 2 | ||
| D10S1248 | 33 | 13.1 | 2.8 | 11.9 | 9.9 | 2 | 33 | 13.7 | 3.1 | 12.6 | 9.3 | 1 | ||
| D12S391 | 48 | 12.2 | 3.5 | 12.3 | 3.3 | 19.5 | 46 | 11.8 | 4.0 | 12.2 | 3.7 | 19.0 | ||
| FGA | 38 | 12.0 | 2.8 | 12.0 | 7.1 | 18.7 | 36 | 12.4 | 2.5 | 12.5 | 7.8 | 17.0 | ||
| D3S1358 | 46 | 12.0 | 2.0 | 12.0 | 8.7 | 18.6 | 43 | 12.7 | 2.2 | 12.8 | 8.7 | 17.3 | ||
| vWA | 33 | 12.2 | 2.8 | 12.1 | 7.7 | 18.3 | 36 | 12.7 | 2.2 | 12.2 | 7.0 | 18.1 | ||
| D8S1179 | 40 | 10.9 | 2.2 | 10.9 | 6.8 | 16.9 | 40 | 10.5 | 2.1 | 10.5 | 5.8 | 16.5 | ||
| D16S539 | 39 | 10.5 | 2.6 | 10.5 | 6.3 | 15.7 | 38 | 10.7 | 2.4 | 10.9 | 5.7 | 15.3 | ||
| D13S317 | 34 | 7.7 | 2.4 | 7.4 | 3.5 | 14.1 | 44 | 6.7 | 2.5 | 7.1 | 2.9 | 12.5 | ||
| D21S11 | 40 | 9.7 | 2.0 | 9.6 | 6.2 | 14.1 | 45 | 9.4 | 1.5 | 9.4 | 4.4 | 12.9 | ||
| CSF1PO | 31 | 8.0 | 1.8 | 7.6 | 5.2 | 13.4 | 27 | 7.6 | 1.3 | 7.4 | 5.1 | 10.0 | ||
| D7S820 | 40 | 8.8 | 2.0 | 8.7 | 4.7 | 13.2 | 39 | 8.0 | 2.1 | 7.8 | 3.9 | 12.1 | ||
| Penta E | 27 | 7.0 | 1.8 | 6.4 | 5.0 | 11.8 | 43 | 5.7 | 2.3 | 5.8 | 0.9 | 11.3 | ||
| TPOX | 29 | 5.7 | 1.9 | 5.4 | 3.3 | 11.7 | 40 | 5.1 | 1.2 | 5.1 | 3.4 | 8.2 | ||
| D5S818 | 44 | 8.5 | 1.7 | 8.7 | 4.8 | 11.6 | 43 | 9.3 | 1.9 | 9.0 | 5.7 | 14.5 | ||
| D2S441 | 35 | 6.0 | 2.1 | 5.7 | 2.4 | 9.4 | 39 | 6.4 | 2.2 | 6.4 | 3.2 | 10.2 | ||
| TH01 | 13 | 6.8 | 1.4 | 6.5 | 4.6 | 9.1 | 39 | 4.3 | 1.4 | 3.8 | 2.0 | 7.6 | ||
| Penta D | 1 | 5.2 | NA | 5.2 | 5.2 | 5.2 | 39 | 1.9 | 0.7 | 1.9 | 0.6 | 3.8 | ||
Overview of sequence variation observed within the Austrian population using the PowerSeq 46GY kit: As expected, MPS techniques revealed increased genetic variation compared to length-based technologies. To characterize the location of sequence variation, we used repeat and flanking region definitions reported in the updated Forensic STR Sequence Structure Guide v5 (Phillips 2018). Due to the lack of a harmonized MPS allele nomenclature that would also define flanking region lengths, we considered the fully available sequence strings up- and downstream from the repeat region as flanking regions. Size-based STR analysis was performed using the AmpFlSTR NGM SElect Express kit (Thermo Fisher Scientific, USA) and the PowerPlex16 kit (Promega, USA)
| Number of different alleles observed | Increase in sequence variation | Region of sequence variation | ||||
|---|---|---|---|---|---|---|
| Marker | Length-based | Sequence-based | No. of alleles | x-fold↑ | Repeat | Flanking |
| D12S391 | 16 | 53 | 37 | 3.3 | ◊ | |
| D2S1338 | 11 | 33 | 22 | 3.0 | ◊ | • |
| D21S11 | 14 | 36 | 22 | 2.6 | ◊ | |
| D3S1358 | 7 | 20 | 13 | 2.9 | ◊ | |
| vWA | 7 | 19 | 12 | 2.7 | ◊ | • |
| D7S820 | 8 | 20 | 12 | 2.5 | • | |
| D5S818 | 7 | 18 | 11 | 2.6 | • | |
| D8S1179 | 10 | 19 | 9 | 1.9 | ◊ | |
| D1S1656 | 16 | 25 | 9 | 1.6 | ◊ | • |
| D13S317 | 9 | 16 | 7 | 1.8 | • | |
| Penta D | 12 | 18 | 6 | 1.5 | • | |
| D16S539 | 9 | 14 | 5 | 1.6 | • | |
| D2S441 | 11 | 15 | 4 | 1.4 | ◊ | • |
| FGA | 15 | 18 | 3 | 1.2 | ◊ | |
| D19S433 | 15 | 18 | 3 | 1.2 | ◊ | • |
| TPOX | 6 | 8 | 2 | 1.3 | • | |
| D18S51 | 14 | 16 | 2 | 1.1 | ◊ | |
| TH01 | 7 | 8 | 1 | 1.1 | ◊ | • |
| CSF1PO | 8 | 9 | 1 | 1.1 | ◊ | • |
| D10S1248 | 9 | 9 | - | - | - | - |
| Penta E | 16 | 16 | - | - | - | - |
| D22S1045 | 9 | 9 | - | - | - | - |
Overview of STR markers showing the total number of observed alleles using two different STR analysis technologies. MPS increased heterozygosity by identifying 181 homozygous genotypes as isometric heterozygotes. Isometric alleles, also known as isoalleles, are alleles of identical length but different internal sequence
| Total number of alleles obtained | Heterozygosity | ||||
|---|---|---|---|---|---|
| Marker | Length | Sequence | Isoalleles ( | Length-based | Sequence-based |
| D5S818 | 434 | 468 | 34 | 0.88 | 0.95 |
| D3S1358 | 429 | 454 | 25 | 0.87 | 0.92 |
| D13S317 | 432 | 455 | 23 | 0.87 | 0.92 |
| D21S11 | 449 | 465 | 16 | 0.91 | 0.94 |
| D7S820 | 449 | 464 | 15 | 0.91 | 0.94 |
| D8S1179 | 454 | 468 | 14 | 0.92 | 0.95 |
| D16S539 | 439 | 453 | 14 | 0.89 | 0.92 |
| vWA | 448 | 459 | 11 | 0.91 | 0.93 |
| D2S1338 | 455 | 464 | 9 | 0.92 | 0.94 |
| D2S441 | 431 | 439 | 8 | 0.87 | 0.89 |
| D12S391 | 473 | 479 | 6 | 0.96 | 0.97 |
| D1S1656 | 468 | 472 | 4 | 0.95 | 0.96 |
| TPOX | 417 | 419 | 2 | 0.84 | 0.85 |
| FGA | 455 | 455 | - | 0.92 | 0.92 |
| CSF1PO | 425 | 425 | - | 0.86 | 0.86 |
| D10S1248 | 435 | 435 | - | 0.88 | 0.88 |
| TH01 | 432 | 432 | - | 0.87 | 0.87 |
| Penta E | 465 | 465 | - | 0.94 | 0.94 |
| D18S51 | 463 | 463 | - | 0.94 | 0.94 |
| D19S433 | 449 | 449 | - | 0.91 | 0.91 |
| Penta D | 455 | 455 | - | 0.92 | 0.92 |
| D22S1045 | 433 | 433 | - | 0.88 | 0.88 |