| Literature DB >> 22226306 |
Jianye Ge1, Arthur Eisenberg, Bruce Budowle.
Abstract
BACKGROUND: Recently, the Combined DNA Index System (CODIS) Core Loci Working Group established by the US Federal Bureau of Investigation (FBI) reviewed and recommended changes to the CODIS core loci. The Working Group identified 20 short tandem repeat (STR) loci (composed of the original CODIS core set loci (minus TPOX), four European recommended loci, PentaE, and DYS391) plus the Amelogenin marker as the new core set. Before selecting and finalizing the core loci, some evaluations are needed to provide guidance for the best options of core selection.Entities:
Year: 2012 PMID: 22226306 PMCID: PMC3314575 DOI: 10.1186/2041-2223-3-1
Source DB: PubMed Journal: Investig Genet ISSN: 2041-2223
General information on the STR loci selected by Hares [2], including chromosomal location, loci in kits or panels, mutation rates, and match probabilities, based on a Caucasian population1-3
| Locus | Location | Size, Mb | Mutation rate | MP7 | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 13 Core loci | New FBI core loci4 | European loci5 | Identifiler | PowerPlex 16 | China6 | Paternal | Maternal | ||||
| D1S1656 | A | S | 1q42 | 228.972 | 1.54 × 10-3 | 3.70 × 10-4 | 0.019 | ||||
| D2S441 | A | S | 2p14 | 68.214 | 1.54 × 10-3 | 3.70 × 10-4 | 0.095 | ||||
| D2S1338 | A | D | √ | 2q35 | 218.705 | 1.36 × 10-3 | 2.49 × 10-4 | 0.028 | |||
| D3S1358 | √ | A | S | √ | √ | √ | 3p21.31 | 45.543 | 1.68 × 10-3 | 2.55 × 10-4 | 0.076 |
| FGA | √ | A | S | √ | √ | √ | 4q28 | 155.866 | 3.71 × 10-3 | 4.93 × 10-4 | 0.038 |
| D5S818 | √ | A | √ | √ | √ | 5q23.2 | 123.139 | 1.66 × 10-3 | 2.69 × 10-4 | 0.158 | |
| CSF1PO | √ | A | √ | √ | √ | 5q33.1 | 149.436 | 1.98 × 10-3 | 3.19 × 10-4 | 0.118 | |
| D7S820 | √ | A | √ | √ | √ | 7q21.11 | 83.433 | 1.37 × 10-3 | 7.23 × 10-5 | 0.065 | |
| D8S1179 | √ | A | S | √ | √ | √ | 8q24.13 | 125.976 | 2.06 × 10-3 | 3.33 × 10-4 | 0.061 |
| D10S1248 | A | S | 10q26.3 | 130.567 | 1.54 × 10-3 | 3.70 × 10-4 | 0.092 | ||||
| TH01 | √ | A | S | √ | √ | 11p15.5 | 2.149 | 5.20 × 10-5 | 6.03 × 10-5 | 0.081 | |
| D12S391 | A | S | 12p13.2 | 12.215 | 1.54 × 10-3 | 3.70 × 10-4 | 0.02 | ||||
| VWA | √ | A | S | √ | √ | √ | 12p13.31 | 5.963 | 3.25 × 10-3 | 4.68 × 10-4 | 0.063 |
| D13S317 | √ | A | √ | √ | √ | 13q31.1 | 81.620 | 1.74 × 10-3 | 4.03 × 10-4 | 0.085 | |
| PentaE | A | √ | 15q26.2 | 95.175 | 2.60 × 10-4 | 2.53 × 10-4 | 0.02 | ||||
| D16S539 | √ | A | D | √ | √ | √ | 16q24.1 | 84.944 | 1.03 × 10-3 | 5.25 × 10-4 | 0.1 |
| D18S51 | √ | A | S | √ | √ | √ | 18q21.33 | 59.100 | 2.23 × 10-3 | 7.93 × 10-4 | 0.029 |
| D19S433 | A | D | √ | 19q12 | 35.109 | 9.75 × 10-4 | 5.48 × 10-4 | 0.088 | |||
| D21S11 | √ | A | S | √ | √ | √ | 21q21.1 | 19.476 | 1.75 × 10-3 | 1.18 × 10-3 | 0.046 |
| DYS391 | A | Yq11.21 | 14.103 | 1.70 × 10-3 | - | 0.455 | |||||
| TPOX | √ | B | √ | √ | 2p25.3 | 1.472 | 1.65 × 10-4 | 1.05 × 10-4 | 0.195 | ||
| SE33 | B | D | 6q14 | 89.043 | 6.40 × 10-3 | 3.00 × 10-3 | 0.005 | ||||
| PentaD | B | √ | 21q22.3 | 43.880 | 2.59 × 10-4 | 2.53 × 10-4 | 0.049 | ||||
| D22S1045 | B | S | 22q12.3 | 35.779 | 1.54 × 10-3 | 3.70 × 10-4 | 0.134 | ||||
| Loci, n | 13 | 20 + 4 | 12 + 4 | 15 | 15 | 11 | |||||
1Amelogenin is not included.
2'√' means that the locus is in the particular panel.
3The table is sorted by chromosome and location, and sections A and B loci are separated.
4'New FBI core' refers to the panel described by Hares [2]. 'A' and 'B' denote the loci placed into sections A and B, respectively.
5'S' denotes the loci in European Standard Set (ESS); 'D' denotes additional loci to expand the European Standard Set. The European loci panel differs by one locus (SE33) from the NGM loci [7].
6Eleven STR loci common to five major commercial kits used in China: Identifiler, Sinofiler, PowerPlex16, DNAtyper15, and AGCU (17+1) [24].
7MP, match probability.
The expected match probability (EMP) of the kits/panels.1
| Panel (number of STR loci) | Unrelated | Parent/child | Full sibling | |||
|---|---|---|---|---|---|---|
| Fst = 02 | Fst = 0.01 | Fst = 0 | Fst = 0.01 | Fst = 0 | Fst = 0.01 | |
| New FBI core (24)3 | 6.28 × 10-30 | 5.12 × 10-29 | 3.63 × 10-18 | 1.15 × 10-17 | 3.49 × 10-11 | 4.86 × 10-11 |
| New FBI core section A (20)3 | 9.54 × 10-25 | 4.77 × 10-24 | 3.83 × 10-15 | 9.37 × 10-15 | 1.74 × 10-9 | 2.29 × 10-9 |
| 13-loci CODIS core (13) | 2.34 × 10-15 | 5.83 × 10-15 | 1.74 × 10-9 | 2.86 × 10-9 | 3.39 × 10-6 | 4.05 × 10-6 |
| Identifiler (15) | 5.93 × 10-18 | 1.73 × 10-17 | 5.04 × 10-11 | 9.17 × 10-11 | 4.21 × 10-7 | 5.17 × 10-7 |
| PowerPlex16 (15) | 2.43 × 10-18 | 7.48 × 10-18 | 3.06 × 10-11 | 5.74 × 10-11 | 3.61 × 10-7 | 4.45 × 10-7 |
| NGM4 (15) | 1.12 × 10-19 | 4.15 × 10-19 | 5.68 × 10-12 | 1.17 × 10-11 | 2.03 × 10-7 | 2.52 × 10-7 |
1Caucasian population data were used.
2Fst is the autosomal short tandem repeat (STR) co-ancestry coefficient for population substructure correction. 3The DYS391 locus is included only in 'New FBI core' and 'New FBI core section A', and no population substructure correction was applied to this locus.
4NGM has the same STR loci as the 'European loci' excluding the SE33 locus.
Average kinship index (AKI) of the short tandem repeat (STR) loci for full-sibling (FS) and parent/child (PC) relationships with Caucasian population data.1,2
| Locus | AKI | |
|---|---|---|
| PC | FS | |
| PentaE | 3.47 | 2.74 |
| D12S391 | 3.37 | 2.75 |
| D1S1656 | 3.37 | 2.67 |
| D2S1338 | 2.91 | 2.41 |
| D18S51 | 2.84 | 2.38 |
| FGA | 2.63 | 2.23 |
| D21S11 | 2.54 | 2.18 |
| D8S1179 | 2.31 | 2.05 |
| D7S820 | 2.09 | 1.84 |
| VWA | 2.08 | 1.85 |
| D19S433 | 2.07 | 1.93 |
| D13S317 | 2.00 | 1.81 |
| D2S441 | 1.99 | 1.76 |
| D3S1358 | 1.92 | 1.71 |
| D10S1248 | 1.88 | 1.74 |
| D16S539 | 1.87 | 1.73 |
| TH01 | 1.85 | 1.68 |
| CSF1PO | 1.79 | 1.64 |
| D5S818 | 1.65 | 1.55 |
| DYS391 | 2.20 | 2.20 |
| SE33 | 6.24 | 4.42 |
| PentaD | 2.32 | 2.04 |
| D22S1045 | 1.74 | 1.63 |
| TPOX | 1.58 | 1.47 |
1The AKI values were estimated by 100,000 simulations for each locus.
2This table is sorted by the AKI of parent-child relationship for section A loci (excluding the DYS391 locus) and section B separately.
Figure 1The log. Log10(KI) distributions for parent/child (PC) or full-sibling (FS) identified as unrelated profiles and unrelated identified as potential related profiles. (A) The new FBI core loci in section A (20 STR loci); (B) the 13 current CODIS core loci. In total, 1 million simulations were performed for each distribution. The KI of the DYS391 locus for true relatives is 2.2. The distributions of true parent/child distributions with or without the DYS391 locus are close, as were the true full-sibling distributions.
Figure 2The distributions of the number of included profiles in a two-person mixture based on autosomal STRs for four panels. The four panels were the 13 CODIS core loci, the 19 autosomal loci in section A, the 10 most informative of the 13 CODIS core loci (D18S51, FGA, D21S11, D8S1179, VWA, D7S820, D3S1358, TH01, D13S317, and D16S539), and the 10 least informative of the 13 CODIS core loci (D8S1179, VWA, D7S820, D3S1358, TH01, D13S317, D16S539, CSF1PO, D5S818, and TPOX). The distributions were obtained by simulation, in which 1 million profiles were first generated as a database, and then 1 million two-person mixtures were randomly generated without replacement. Each mixture was searched against the database to determine the number of candidate part-contributors beyond those that comprised the mixture. The Y-axis represents the proportion of mixtures with specific number of candidate contributors in a database search. For example, with 13 CODIS loci, no candidate contributors were identified for 67.7% of mixtures. Only 1.3% or 0.4% of two-person mixtures generated more than 10 or 20 candidate contributors, respectively. With the 19 loci in section A, almost 100% of two-person mixtures had no candidate contributors in a database of 1 million profiles.
Match probability and mutation rates per Y-STR locus.
| Locus | MP | Mutation rates × 10-3 | |
|---|---|---|---|
| Caucasian | YHRD | ||
| DYS385 | 0.17 | 1.57 | 2.134 |
| DYS458 | 0.23 | 1.05 | 6.444 |
| DYS456 | 0.27 | 8.36 | 4.243 |
| DYS389II | 0.3 | 1.04 | 3.644 |
| DYS390 | 0.31 | 1.05 | 2.102 |
| DYS439 | 0.35 | 0 | 5.214 |
| DYS635 | 0.37 | 3.13 | 3.467 |
| DYS448 | 0.37 | 2.09 | 1.571 |
| DYS392 | 0.4 | 0 | 4.123 |
| YGATA H4 | 0.41 | 2.09 | 2.434 |
| DYS437 | 0.41 | 2.09 | 1.226 |
| DYS438 | 0.42 | 0 | 3.059 |
| DYS391 | 0.45 | 2.09 | 2.599 |
| DYS19 | 0.46 | 0 | 2.299 |
| DYS389I | 0.48 | 1.05 | 2.523 |
| DYS393 | 0.68 | 2.09 | 1.045 |
The table is sorted by increasing match probability (MP), shown in the second column.
The column 'YHRD' lists the mutation rates from http://www.yhrd.org/[15]; all other match probability and mutation rates were from Budowle et al [9] and Ge et al [14].
Y- chromosome short tandem repeat (Y-STR) combinations with minimum match probability (MP) for a specified number of Y-STR markers.1
| Number | Y-STR combinations with minimum MP2 | MP | KI = 1/MP3 |
|---|---|---|---|
| 1 | 15 | 0.1748 | 5.72 |
| 24 | 5, 15 | 0.0477 | 20.95 |
| 3 | 3, 5, 15 | 0.0178 | 56.25 |
| 4 | 1, 3, 5, 15 | 0.0083 | 121.1 |
| 5 | 1, 2, 3, 5, 15 | 0.0045 | 223.65 |
| 6 | 1, 2, 3, 5, 13, 15 | 0.0027 | 372.45 |
| 7 | 1, 2, 3, 5, 9, 13, 15 | 0.0020 | 501.29 |
| 8 | 1, 2, 3, 5, 9, 13, 14, 15 | 0.0016 | 620.91 |
| 9 | 1, 2, 3, 5, 9, 10, 13, 14, 15 | 0.0014 | 711.62 |
| 10 | 1, 2, 3, 5, 8, 9, 10, 13, 14, 15 | 0.0013 | 770.27 |
| 11 | 1, 2, 3, 4, 5, 8, 9, 10, 13, 14, 15 | 0.0012 | 819.92 |
| 12 | 0, 1, 2, 3, 4, 5, 8, 9, 10, 13, 14, 15 | 0.0012 | 847.22 |
| 13 | 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 13, 14, 15 | 0.0012 | 866.46 |
| 14 | 0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 13, 14, 15 | 0.0011 | 876.41 |
| 15 | 0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15 | 0.0011 | 886.59 |
| 16 | 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 | 0.0011 | 891.77 |
1This table was generated from the data of Budowle et al [9].
20 = DYS389I, 1 = DYS389II, 2 = DYS390, 3 = DYS456, 4 = DYS19, 5 = DYS458, 6 = DYS437, 7 = DYS438, 8 = DYS448, 9 = Y GATA H4, 10 = DYS391, 11 = DYS392, 12 = DYS393, 13 = DYS439, 14 = DYS635, and 15 = DYS385.
3Kinship index (KI) for true paternal lineage is the inverse of MP.
4For example, in all 2-loci Y-STR haplotype combinations (total 16 × 15 ÷ 2 = 120), the combination DYS458 and DYS385 ('5, 15' in the second row) had the lowest MP.
Figure 3The log. Log10(AKI) distributions of full-sibling and parent/child relationships with the most informative autosomal and Y-chromosome STRs (Y-STRs) in Tables 1 and 4. The horizontal axis labels are '14 auto- + 6 Y-STRs' to '18 auto- + 2 Y-STRs' which are the combinations of a specified number of the most informative autosomal STRs in section A (except for the D12S391 locus, because this locus is linked with the VWA locus) and a specified number of the most informative Y-STRs. The term '19 auto- + 1 Y-STRs' refers to all 19 autosomal STR loci in section A and the most informative Y-STR (DYS385). Independence between D12S391 and VWA was assumed in calculation of AKI values of '19 auto- + 1 Y-STRs'. In all calculations, the D5S818 and CSF1PO loci were assumed to be independent (although current data do not support the assumption). The true AKI of '19 auto- + 1 Y-STRs' should be slightly lower.
Match probabilities (MPs) of short tandem repeat (STR) loci combinations.
| STR combinations | MP |
|---|---|
| 14 auto + 6 Y | 1.53 × 10-21 |
| 15 auto + 5 Y | 2.42 × 10-22 |
| 16 auto + 4 Y | 4.48 × 10-23 |
| 17 auto + 3 Y | 9.64 × 10-23 |
| 18 auto + 2 Y | 4.83 × 10-24 |
| 19 auto + 1 Y1 | 3.38 × 10-25 |
| Section A | 9.20 × 10-25 |
| Identifiler + 5 Y | 7.74 × 10-20 |
| PowerPlex16 + 5 Y | 3.34 × 10-20 |
| NGM + 5 Y | 2.21 × 10-21 |
| 7 shared auto + 5 Y2 | 5.37 × 10-12 |
| 11 shared auto + 5 Y3 | 6.82 × 10-16 |
| 6 shared auto + 5 Y4 | 6.71 × 10-11 |
1The Y- chromosome STR in this row is the DYS385 locus, not the DYS391 locus in section A.
2Between 13 CODIS core loci and European loci.
3Between USA and China.
4Between China and Europe.