| Literature DB >> 32172688 |
Dong Wang1, Ruiyang Tao2, Zhiqiang Li1, Dun Pan1, Zhuo Wang3, Chengtao Li4, Yongyong Shi5.
Abstract
BACKGROUND: Short tandem repeats (STRs) are important polymorphism makers for human identification and kinship analyses in forensic science. With the continuous development of massively parallel sequencing (MPS), more laboratories have utilized this technology for forensic applications. Existing STR genotyping tools, mostly developed for whole-genome sequencing data, are not effective for MPS data. More importantly, their backward compatibility with the conventional capillary electrophoresis (CE) technology has not been evaluated and guaranteed.Entities:
Keywords: Forensic sequencing; Massively parallel sequencing; STR genotyping; Short tandem repeats; Validation studies
Mesh:
Substances:
Year: 2020 PMID: 32172688 PMCID: PMC7075041 DOI: 10.1186/s41065-020-00120-6
Source DB: PubMed Journal: Hereditas ISSN: 0018-0661 Impact factor: 3.271
Fig. 1Summary of the STRsearch pipeline. The resulting report consists of three tables of genotyping results, multiple alleles, and a quality control matrix on each STR locus
Comparison results between STRsearch and CE for 9947A in the Ion S5 dataset
| Marker | STR sequence sturucture1 | Alleles (a1, a2) | Supporting reads (a1, a2) | Alleles correction2 (a1, a2) | Allele sequences (a1, a2) | CE |
|---|---|---|---|---|---|---|
| D1S1677 | [TTCC]n | 13, 14 | 592, 498 | – | [TTCC]13, [TTCC]14 | 13, 14 |
| D1S1656 | CCTA [TCTA]n | 18.3, 19.1 | 1182, 378 | 18.3, 18.3 | CCTA [TCTA]13 TCATCTATCTATCTATCTA, CCTA [TCTA]13 TCATCTATCTATCTATCTACA | 18.3, 18.3 |
| TPOX | [AATG]n | 8, 7 | 2554, 95 | 8, 8 | [AATG]8, [AATG]7 | 8, 8 |
| D2S441 | [TCTA]n | 10, 14 | 1007, 874 | – | [TCTA]8 TCTGTCTA, [TCTA]11 TTTATCTATCTA | 10, 14 |
| D2S1776 | [AGAT]n | 10, 9 | 2825, 172 | 10, 10 | [AGAT]10, [AGAT]9 | 10, 10 |
| D2S1338 | [GGAA]n GGAC [GGAA]n [GGCA]n | 19, 23 | 718, 715 | – | [GGAA]12 [GGCA]7, [GGAA]2 GGAC [GGAA]13 [GGCA]7 | 19, 23 |
| D3S1358 | [TCTA]n [TCTG]n [TCTA]n | 15, 14 | 2052, 1916 | – | [TCTA]1 [TCTG]2 [TCTA]12, [TCTA]1 [TCTG]2 [TCTA]11 | 14, 15 |
| D3S4529 | [GATA]n AATA [GATA]n | 12, 11 | 1886, 65 | 12, 12 | [GATA]4 AATA [GATA]7, [GATA]4 AATA [GATA]6 | 13, 13 |
| D4S2408 | [ATCT]n | 10, 9 | 98, 56 | – | [ATCT]10, [ATCT]9 | 9, 10 |
| FGA | [GGAA]n GGAG [AAAG]n AGAA AAAA [GAAA]n | 7.5, 8.5 | 693, 27 | 7.5, 7.5 | [GGAA]2 GGAG [AAAG]4 AGAA, [GGAA]2 GGAG [AAAG]5 AGAA | 23, 24 |
| D5S2800 | [GGTA]n [GACA]n [GATA]n [GATT]n | 14, 23 | 1130, 876 | – | [GGTA]3 [GACA]6 [GATA]2 [GATT]3, [GGTA]9 [GACA]6 [GATA]3 [GATT]5 | 14, 23 |
| D5S818 | [ATCT]n | 11, 11 | 2373, 295 | 11, 11 | [ATCT]11, [ATCT]11 T | 11, 11 |
| CSF1PO | [ATCT]n | 10, 12 | 1348, 1178 | – | [ATCT]10, [ATCT]12 | 10, 12 |
| D6S1043 | [ATCT]n | 12, 18 | 1693, 1263 | – | [ATCT]12, ATCTATCTATCTATCTATCTATGT [ATCT]12 | 12, 18 |
| D6S474 | [AGAT]n [GATA]n | 14, 18 | 1898, 1304 | – | [AGAT]5 [GATA]9, [AGAT]5 [GATA]13 | 13, 17 |
| D7S820 | [TATC]n | 10, 11 | 1133, 823 | – | [TATC]10, [TATC]11 | 10, 11 |
| D8S1179 | [TCTA]n [TCTG] n [TCTA]n | 13, 13 | 1382, 998 | – | [TCTA]1 [TCTG]1 [TCTA]11, [TCTA]13 | 13, 13 |
| D10S1248 | [GGAA]n | 13, 15 | 815, 811 | – | [GGAA]13, [GGAA]15 | 13, 15 |
| TH01 | [AATG]n ATG [AATG]n | 8, 9.3 | 1728, 1527 | – | [AATG]8, [AATG]6 ATG [AATG]3 | 8, 9.3 |
| vWA | [TAGA]n [CAGA] n TAGA | 17, 18 | 1330, 952 | – | [TAGA]12 [CAGA]4 TAGA, [TAGA]13 [CAGA]4 TAGA | 17, 18 |
| D12S391 | [AGAT]n GAT [AGAT] n [AGAC]n AGAT | 18, 20 | 1171, 846 | – | [AGAT]11 [AGAC]6 AGAT, [AGAT]12 [AGAC]7 AGAT | 18, 20 |
| D12ATA63 | [TTG]n [TTA]n | 13, 12 | 1697, 214 | 13, 13 | [TTG]3 [TTA]10, [TTG]3 [TTA]9 | 13, 13 |
| D13S317 | [TATC]n | 11, 10 | 2216, 177 | 11, 11 | [TATC]11, [TATC]10 | 11, 11 |
| D14S1434 | [CTGT]n [CTAT]n | 11, 13 | 1418, 1094 | – | [CTGT]3 [CTAT]8, [CTGT]3 [CTAT]10 | 11, 13 |
| Penta E | [TCTTT]n | 12, 13 | 443, 425 | – | [TCTTT]12, [TCTTT]13 | 12, 13 |
| D16S539 | [GATA]n | 11, 12 | 2293, 1661 | – | [GATA]11, [GATA]12 | 11, 12 |
| D18S51 | [AGAA]n | 5, 3 | 2030, 159 | 5, 5 | [AGAA]5, [AGAA]3 | 15, 19 |
| D19S433 | [CCTT]n ccta [CCTT] n cttt [CCTT]n | 8, 7 | 859, 485 | – | [CCTT]8, [CCTT]7 | 14, 15 |
| D21S11 | [TCTA]n [TCTG]n [TCTA]n ta [TCTA]n tca [TCTA]n tccata [TCTA]n TA [TCTA]n | 30, 29 | 1450, 166 | 30, 30 | [TCTA]6 [TCTG]5 [TCTA]3 ta [TCTA]3 tca [TCTA]2 tccata [TCTA]11, [TCTA]6 [TCTG]5 [TCTA]3 ta [TCTA]3 tca [TCTA]2 tccata [TCTA]10 | 30, 30 |
| Penta D | [AAAGA]n | 4, 3 | 206, 22 | 4, 4 | [AAAGA]4, [AAAGA]3 | 12, 12 |
| D22S1045 | [ATT]n ACT [ATT]n | 11, 14 | 1033, 616 | – | [ATT]8 ACT [ATT]2, [ATT]11 ACT [ATT]2 | 11, 14 |
1Reference sequence repeat region sequence structure summary based on the most up-to-date forensic STR sequence guide
2Alleles correction according to the stutter ratio, which is 0.5 in this study. ‘-’, not applicable
Comparison results between STRScan and CE for 9947A in the Ion S5 dataset
| Marker | Repeat motif1 | Alleles (a1, a2) | Supporting reads (a1, a2) | Alleles correction2 (a1, a2) | CE |
|---|---|---|---|---|---|
| D1S1677 | (TTCC)15 | 13, 14 | 1323, 1103 | – | 13, 14 |
| D1S1656 | (CCTA)1(TCTA)16 | 19, 18 | 2493, 532 | 19, 19 | 18.3, 18.3 |
| TPOX | (AATG)8 | 8, 7 | 4552, 184 | 8, 8 | 8, 8 |
| D2S441 | (TCTA)12 | 10, 14 | 2134, 1825 | – | 10, 14 |
| D2S1776 | (AGAT)11 | 10, 9 | 5559, 320 | 10, 10 | 10, 10 |
| D2S1338 | (GGAA)2(GGAC)1(GGAA)13(GGCA)7 | 23, 22 | 752, 95 | 23, 23 | 19, 23 |
| D3S1358 | (TCTA)1(TCTG)1(TCTA)14 | 15, 14 | 3054, 2966 | – | 14, 15 |
| D3S4529 | (GATA)4AATA(GATA)7 | 11, 10 | 2713, 120 | 11, 11 | 13, 13 |
| D4S2408 | (ATCT)9 | 10, 9 | 993, 776 | – | 9, 10 |
| FGA | (GGAA)2GGAG(AAAG)14AGAAAAAA(GAAA)3 | 20, 21 | 229, 124 | – | 23, 24 |
| D5S2800 | (GGTA)3(GACA)8(GATA)3(GATT)3 | 14, 13 | 1615, 66 | 14, 14 | 14, 23 |
| D5S818 | (ATCT)11 | 11, 10 | 3734, 426 | 11, 11 | 11, 11 |
| CSF1PO | (ATCT)14 | 10, 12 | 2442, 1996 | – | 10, 12 |
| D6S1043 | (ATCT)12 | 12, 18 | 3563, 2567 | – | 12, 18 |
| D6S474 | (AGAT)5(GATA)12 | 14, 18 | 3063, 2052 | – | 13, 17 |
| D7S820 | (TATC)13 | 10, 11 | 1544, 554 | 10, 10 | 10, 11 |
| D8S1179 | (TCTA)1(TCTG)1(TCTA)11 | 13, 12 | 2665, 275 | 13, 13 | 13, 13 |
| D10S1248 | (GGAA)13 | 15, 13 | 1327, 1057 | – | 13, 15 |
| THO1 | (AATG)7ATG(AATG)0 | 9, 9 | 3298, 122 | 9, 9 | 8, 9.3 |
| vWA | (TAGA)11(CAGA)5TAGA | NA | 0, 0 | – | 17, 18 |
| D12S391 | (AGAT)11(AGAC)7AGAT | NA | 0, 0 | – | 18, 20 |
| D12ATA63 | (TTG)3(TTA)10 | 12, 11 | 3478, 417 | 12, 12 | 13, 13 |
| D13S317 | (TATC)11 | 11, 10 | 3516, 196 | 11, 11 | 11, 11 |
| D14S1434 | (CTGT)3(CTAT)10 | 11, 13 | 2236, 1679 | – | 11, 13 |
| Penta E | (TCTTT)5 | 13, 12 | 629, 624 | – | 12, 13 |
| D16S539 | (GATA)11 | 11, 12 | 4105, 3085 | – | 11, 12 |
| D18S51 | (AGAA)18 | 15, 19 | 1300, 934 | – | 15, 19 |
| D19S433 | (CCTT)12cctaCCTTctttCCTT | NA | 0, 0 | – | 14, 15 |
| D21S11 | (TCTA)4(TCTG)6(TCTA)3ta(TCTA)3tca(TCTA)2tccata(TCTA)11 | 30, 29 | 2833, 301 | 30, 30 | 30, 30 |
| Penta D | (AAAGA)13 | 12, 13 | 53, 1 | 12, 12 | 12, 12 |
| D22S1045 | (ATT)14ACT(ATT)2 | 10, 13 | 2109, 1390 | – | 11, 14 |
1Reference sequence repeat region sequence structure based on the latest forensic STR sequence guide with modifications to meet requirements of STRScan
2Alleles correction according to the stutter ratio, which is 0.5 in this study. ‘-’, not applicable
Fig. 2A distribution plot of impacts each feature has on the model output. The color represents the feature value (red high, blue low). This reveals, for example, that a short mean distance between the first allele sequence end and read 3′-end (Dis1_mean_3) lowers the predicted probability of correct genotyping