| Literature DB >> 24667531 |
Mark W Perlin1, Kiersten Dormer1, Jennifer Hornyak1, Lisa Schiermeier-Wood2, Susan Greenspoon2.
Abstract
Mixtures are a commonly encountered form of biological evidence that contain DNA from two or more contributors. Laboratory analysis of mixtures produces data signals that usually cannot be separated into distinct contributor genotypes. Computer modeling can resolve the genotypes up to probability, reflecting the uncertainty inherent in the data. Human analysts address the problem by simplifying the quantitative data in a threshold process that discards considerable identification information. Elevated stochastic threshold levels potentially discard more information. This study examines three different mixture interpretation methods. In 72 criminal cases, 111 genotype comparisons were made between 92 mixture items and relevant reference samples. TrueAllele computer modeling was done on all the evidence samples, and documented in DNA match reports that were provided as evidence for each case. Threshold-based Combined Probability of Inclusion (CPI) and stochastically modified CPI (mCPI) analyses were performed as well. TrueAllele's identification information in 101 positive matches was used to assess the reliability of its modeling approach. Comparison was made with 81 CPI and 53 mCPI DNA match statistics that were manually derived from the same data. There were statistically significant differences between the DNA interpretation methods. TrueAllele gave an average match statistic of 113 billion, CPI averaged 6.68 million, and mCPI averaged 140. The computer was highly specific, with a false positive rate under 0.005%. The modeling approach was precise, having a factor of two within-group standard deviation. TrueAllele accuracy was indicated by having uniformly distributed match statistics over the data set. The computer could make genotype comparisons that were impossible or impractical using manual methods. TrueAllele computer interpretation of DNA mixture evidence is sensitive, specific, precise, accurate and more informative than manual interpretation alternatives. It can determine DNA match statistics when threshold-based methods cannot. Improved forensic science computation can affect criminal cases by providing reliable scientific evidence.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24667531 PMCID: PMC3965478 DOI: 10.1371/journal.pone.0092837
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Distinguishing features of three different DNA mixture interpretation methods.
| TrueAllele | CPI | mCPI | ||
|
|
| quantitative | qualitative | qualitative |
|
| continuous | binary | ternary | |
|
| used | |||
|
| used | |||
|
| analytical | analytical and stochastic | ||
|
|
| probability model | data above analyticalthreshold | data above analyticalthreshold |
|
| allele pairs | alleles | alleles | |
|
| automated | manual | manual | |
|
| statistical | alleles | alleles | |
|
| assumed | |||
|
|
| with genotype | with alleles | with alleles |
|
| all | inclusion | stochastic inclusion | |
|
| likelihood ratio | inclusion probability | inclusion probability | |
|
| include, exclude orinconclusive | include | include | |
|
| information | inclusion | inclusion |
Attributes involving STR data usage, genotype inference and match statistic calculation are shown for the TrueAllele, CPI and mCPI methods.
Figure 1Mixture data.
Quantitative DNA mixture data are shown at the Penta E STR locus. The x-axis measures allele fragment size (bp), and the y-axis measures DNA quantity (RFU); a boxed peak number denotes allele length. The two contributor mixture is formed from a 7,14 major genotype and a 10,12 minor genotype. The result is a pattern of peak heights that reflect the underlying genotypes.
Figure 2Genotype modeling.
Linear combinations of genotype allele pairs can explain the observed quantitative mixture data. Here, a major 7,14 contributor (blue bars) having twice the DNA as a minor 10,12 contributor (green bars) explains the data well, with a high likelihood value. Alternative genotype choices or combinations would not explain the data as well, and thus have lower likelihood.
Genotype probabilities and LR calculations are shown at the Penta E locus for a minor contributor.
| Allele Pair | TrueAllele | CPI | mCPI | ||||||||
| prior | likelihood | posterior | LR | likelihood | posterior | LR | likelihood | posterior | LR | ||
| 7 | 7 | 4.3% | 1 | 17% | 1 | 67% | |||||
| 7 | 10 | 3.3% | 2 | 1 | 13% | ||||||
| 7 | 12 | 7.1% | 2 | 1% | 1 | 28% | |||||
| 7 | 14 | 1.9% | 1 | 8% | 1 | 30% | |||||
| 10 | 10 | 0.6% | 1 | 1 | 2% | ||||||
|
|
|
|
|
|
|
|
|
|
| ||
| 10 | 14 | 0.7% | 1 | 3% | |||||||
| 12 | 12 | 2.9% | 8 | 1% | 1 | 11% | |||||
| 12 | 14 | 1.6% | 1 | 1 | 6% | ||||||
| 14 | 14 | 0.2% | 1 | 1% | 1 | 3% | |||||
| Total | 100% | 100% | 100% | ||||||||
Three different mixture interpretation methods were used, TrueAllele, CPI and mCPI. Over the sample space of possible allele pairs, each method has a likelihood function and posterior probability distribution. The LR gives the ratio of posterior to prior probability at comparison allele pair 10,12 (italicized row). TrueAllele’s greater LR indicates more use of the STR data than CPI. mCPI discarded too much data, and could not yield a match statistic.
Figure 3Analytical threshold.
The purpose of this threshold is to distinguish allelic signal from background noise. Applying the threshold (red line) reduces the quantitative peaks to all-or-none putative allele events (blue bars). The analytical threshold operation eliminates individual peak heights, as well as their collective pattern.
Figure 4Stochastic threshold.
A higher threshold level (red line) is used in manual review to address random peak variation by differentiating more certain (blue bars) from less certain peaks. The stochastic threshold removes more STR loci from statistical consideration, which makes less use of the available data.
The range of biological sample types that were found in the 92 evidence items is shown.
| Sample type | Count |
| blood | 10 |
| epithelial/skin | 30 |
| fingernails | 2 |
| hair | 1 |
| saliva | 4 |
| semen | 3 |
| stain | 1 |
| touch | 41 |
For each sample type, the table records how frequently that type was seen.
The first three rows estimate for each number of contributors (first column) how many mixture items (second column) had that contributor number.
| Contributors | Items | |
|
| 2 | 40 |
| 3 | 65 | |
| 4 | 8 | |
|
| 2 or 3 | 16 |
| 3 or 4 | 3 | |
| 2, 3 or 4 | 1 |
When an item was consistent with more than one contributor number possibility, that item appears in multiple categories. The last three rows examine overlap situations where the number of contributors (first column) was uncertain, and counts the number of items (second column) in those situations.
The frequency distribution of mixture weights as inferred by the computer is shown for the matched genotypes.
| Mixture Weight | Count |
| 0.05 | 3 |
| 0.15 | 13 |
| 0.25 | 5 |
| 0.35 | 12 |
| 0.45 | 18 |
| 0.55 | 12 |
| 0.65 | 11 |
| 0.75 | 12 |
| 0.85 | 12 |
| 0.95 | 4 |
The binning is done by decile, with each row showing the center of its mixture weight range, along with the number of genotypes in that bin.
Figure 5Computer specificity.
A histogram shows empirical log(LR) distributions for 101 evidence genotype comparisons relative to 10,000 randomly generated references. There are 1,010,000 data points for each of the three ethnic populations. Note that the negative values are located far to the left of zero.
Specificity results (ban) for TrueAllele mixture interpretation log(LR) values, comparing 101 reported evidence genotypes with 10,000 random genotypes from each of three ethnic populations.
| n = 3,030,000 | Black | Caucasian | Hispanic |
| Minimum | −30.000 | −30.000 | −30.000 |
| Mean | −19.467 | −19.217 | −19.547 |
| Maximum | 2.381 | 2.726 | 3.782 |
| Standard deviation | 6.543 | 6.723 | 6.637 |
|
|
|
|
|
| 0 | 39 | 32 | 29 |
| 1 | 8 | 11 | 9 |
| 2 | 2 | 1 | 1 |
| 3 | 0 | 0 | 1 |
| log(LR) >0 | 49 | 44 | 40 |
The average exclusionary LR value was around one over a billion billion. Very few false positives were seen in over three million genotype comparisons.
Figure 6Computer precision.
The scatterplot shows log(LR) values for 101 duplicate computer runs on the same evidence. Each point gives the first (x) and second (y) values. The data lie close to the y = x diagonal, which represents exactly replicated results.
Figure 7Method sensitivity.
Three histograms show the empirical log(LR) distribution for different mixture interpretation methods on the case data. Frequency distribution (a) shows TrueAllele inferred genotype match statistics for 101 evidence genotype matches (blue). The (b) manual CPI review yielded 81 match statistics (green) that were generally less informative (leftward) and less varied (clustered). The (c) 53 mCPI match statistics (red) gave less information and had similar values.
The log(LR) DNA match information (ban) for genotype comparisons is shown for three mixture interpretation methods (TrueAllele, CPI and mCPI).
| TrueAllele | CPI | mCPI | |
| Minimum | 1.255 | 0.778 | 0.301 |
| Median | 10.550 | 6.681 | 1.857 |
| Mean | 11.054 | 6.825 | 2.145 |
| Maximum | 22.962 | 16.724 | 6.447 |
| Standard deviation | 5.421 | 2.217 | 1.675 |
| N = | 111 | 81 | 70 |
| Inclusion (≥0) | 101 | 81 | 53 |
| Persuasive (≥6) | 82 | 54 | 2 |
| Inconclusive | 17 |
The TrueAllele method preserved more identification information (mean) over a broader range (minimum, maximum) than the two inclusion methods, and produced more inclusions and persuasive match statistics.
Figure 8Method comparison.
Cumulative empirical log(LR) distributions are shown for uniform probability (black), and for each of the three mixture interpretation methods TrueAllele (blue), CPI (green) and mCPI (red). TrueAllele tracks a uniform distribution over a wide information range, whereas CPI and mCPI do not.
Paired comparisons for positive log(LR) values between TrueAllele (TA) and CPI.
| N = 81 | TA | CPI | TA – CPI | test | p-value |
|
| 11.623 | 6.825 | 4.798 | t = 8.396 | 1.350×10–12 |
|
| 10.816 | 6.681 | 4.135 | W = 3047 | 6.664×10–11 |
| r = 0.2999 | |||||
| r2 = 0.0900 |
Significance tests were done for means (Student t) and medians (Wilcoxon signed rank W). Correlation coefficients (r) and coefficient of determinations (r2) are shown. TrueAllele was significantly more informative than CPI.
Paired comparisons for positive log(LR) values between CPI and mCPI.
| N = 52 | CPI | mCPI | CPI – mCPI | test | p-value |
|
| 7.069 | 2.180 | 4.889 | t = 17.417 | 4.082×10–23 |
|
| 6.720 | 2.024 | 4.696 | W = 1378 | 3.497×10–10 |
| r = 0.5188 | |||||
| r2 = 0.2692 |
Significance tests were done for means (Student t) and medians (Wilcoxon signed rank W). Correlation coefficients (r) and coefficient of determinations (r2) are shown. CPI was significantly more informative than mCPI.
Paired comparisons for positive log(LR) values between TrueAllele (TA) and mCPI.
| N = 53 | TA | mCPI | TA – mCPI | test | p-value |
|
| 12.883 | 2.145 | 10.738 | t = 15.147 | 1.040×10–20 |
|
| 12.537 | 1.857 | 10.679 | W = 1431 | 2.386×10–10 |
| r = 0.2945 | |||||
| r2 = 0.0867 |
Significance tests were done for means (Student t) and medians (Wilcoxon signed rank W). Correlation coefficients (r) and coefficient of determinations (r2) are shown. TrueAllele was significantly more informative than mCPI.
Results are shown for ten genotype comparisons where TrueAllele did not report a match, and five others having a small LR value under a thousand.
| Interpretation Method | Data Observations | |||||||
| TrueAllele | CPI | mCPI | allele dropout | allele overlap | low peaks | peak imbalance | infeasible mixture | infeasible pattern |
| −10.64 | 3 | 4 | 1 | 1 | ||||
| −6.52 | 4 | 3 | 1 | 1 | ||||
| −5.05 | 4 | 3 | 1 | 1 | 1 | |||
| −4.87 | 3 | 1 | 1 | 1 | ||||
| −4.86 | 3.48 | 4 | 1 | 1 | ||||
| −3.22 | 6.04 | 6.34 | 2 | 1 | 1 | |||
| −2.99 | 4.23 | 2 | 1 | 1 | 1 | |||
| −2.18 | 2 | 1 | 1 | |||||
| −1.41 | 4.08 | 1 | 1 | 1 | ||||
| −0.67 | 2.95 | 0.60 | 1 | 2 | 1 | |||
| 1.26 | 3.96 | 1 | 4 | 1 | ||||
| 1.76 | 1 | 1 | 1 | |||||
| 2.01 | 2 | 8 | 1 | 1 | ||||
| 2.71 | 2 | 1 | ||||||
| 2.94 | 8 | 1 | ||||||
Allele dropout and allele overlap record the number of locus occurrences.
Allele dropout occurs when a reference allele does not appear at all in the evidence data.
Allele overlap occurs when known contributors and the reference share alleles.
Low peaks: All had reference-related allele peaks <100 RFU. A 1 indicates peaks <50 RFU.
Peak imbalance: a 1 indicates heterozygote imbalance under 60% at reference alleles.
An infeasible mixture (1) has an inconsistent mixture weight across loci.
An infeasible pattern (1) cannot be constructed quantitatively from contributor genotypes.
Each comparison row gives log(LR) match statistics (ban) for three mixture interpretation methods, and lists observations about how the evidence data interacted with the reference genotype.