| Literature DB >> 32784546 |
Martin Bodner1, Walther Parson1,2.
Abstract
STRidER, the STRs for Identity ENFSI Reference Database, is a curated, freely publicly available online allele frequency database, quality control (QC) and software platform for autosomal Short Tandem Repeats (STRs) developed under the endorsement of the International Society for Forensic Genetics. Continuous updates comprise additional STR loci and populations in the frequency database and many further STR-related aspects. One significant innovation is the autosomal STR data QC provided prior to publication of datasets. Such scrutiny was lacking previously, leaving QC to authors, reviewers and editors, which led to an unacceptably high error rate in scientific papers. The results from scrutinizing 184 STR datasets containing >177,000 individual genotypes submitted in the first two years of STRidER QC since 2017 revealed that about two-thirds of the STR datasets were either being withdrawn by the authors after initial feedback or rejected based on a conservative error rate. Almost no error-free submissions were received, which clearly shows that centralized QC and data curation are essential to maintain the high-quality standard required in forensic genetics. While many errors had minor impact on the resulting allele frequencies, multiple error categories were commonly found within single datasets. Several datasets contained serious flaws. We discuss the factors that caused the errors to draw the attention to redundant pitfalls and thus contribute to better quality of autosomal STR datasets and allele frequency reports.Entities:
Keywords: STR profile; allele frequency; database; error; genotype; population data
Mesh:
Year: 2020 PMID: 32784546 PMCID: PMC7463946 DOI: 10.3390/genes11080901
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Submissions to STRidER (STRs for Identity ENFSI Reference Database) for quality control in the first two years. Bars indicate the 184 submitted datasets in chronological order of submission between July 2017 and July 2019, dataset size according to the left axis; line indicates the cumulated number of samples over all submitted datasets according to the right axis.
Statistics of autosomal short tandem repeat (STR) datasets submitted to STRidER in its first two years and quality control outcome.
| All | CE 1 | MPS 2 | |
|---|---|---|---|
| Number of Genotypes | |||
| Total | 177,595 | 173,709 | 3886 |
| Mean per dataset | 965 | 1053 | 205 |
| Median per dataset | 506 | 522 | 140 |
|
| |||
| Total | 184 | 165 | 19 |
| Passed QC | 48 | 35 | 13 |
| Withdrawal during QC | 41 | 36 | 5 |
| Rejection by STRidER | 58 | 58 | 0 |
| QC pending | 37 | 36 | 1 |
|
| |||
| Acceptance rate | 32.7 | 27.1 | 72.2 |
| Withdrawal/Rejection rate | 67.3 | 72.9 | 27.8 |
1 Length-based genotypes generated by capillary electrophoresis (CE). 2 Genotypes generated by massively parallel sequencing (MPS). Note: Results as per six months after the initial two-year period. QC, quality control.
Statistics of errors found in the 165 autosomal STR datasets generated by capillary electrophoresis (CE) and submitted to STRidER in its first two years.
|
| (%) | |||
|---|---|---|---|---|
|
|
|
| ||
|
| (i) | Identical genotypes | 63 | 38.2 |
| (ii) | Non-ascending allele pairs | 58 | 35.2 | |
| (iii) | Allele nomenclature errors | 29 | 17.6 | |
| (iv) | Allele calling errors | 17 | 10.3 | |
| (v) | Incomplete genotypes | 16 | 9.7 | |
| (vi) | Errors in locus nomenclature | 10 | 6.1 | |
| (vii) | Aneuploidy | 9 | 5.5 | |
| (viii) | No raw data/shuffled data | 9 | 5.5 | |
| (ix) | Identical identifiers | 7 | 4.2 | |
| (x) | Information mismatch | 5 | 3.0 | |
| (xi) | Locus swapping | 3 | 1.8 | |
| (xii) | Loss of intermediate alleles | 2 | 1.2 | |
|
|
|
| ||
Note: Sum of datasets allocated to categories is larger than 100% because of datasets that harbored multiple error categories. Results as per six months after the initial two-year period.