| Literature DB >> 32482858 |
Zhengli Wang1, Kevin MacMillan2, Mark Powell2, Lawrence M Wein3.
Abstract
Although the backlog of untested sexual assault kits in the United States is starting to be addressed, many municipalities are opting for selective testing of samples within a kit, where only the most probative samples are tested. We use data from the San Francisco Police Department Criminalistics Laboratory, which tests all samples but also collects information on the samples flagged by sexual assault forensic examiners as most probative, to build a standard machine learning model that predicts (based on covariates gleaned from sexual assault kit questionnaires) which samples are most probative. This model is embedded within an optimization framework that selects which samples to test from each kit to maximize the Combined DNA Index System (CODIS) yield (i.e., the number of kits that generate at least one DNA profile for the criminal DNA database) subject to a budget constraint. Our analysis predicts that, relative to a policy that tests only the samples deemed probative by the sexual assault forensic examiners, the proposed policy increases the CODIS yield by 45.4% without increasing the cost. Full testing of all samples has a slightly lower cost-effectiveness than the selective policy based on forensic examiners, but more than doubles the yield. In over half of the sexual assaults, a sample was not collected during the forensic medical exam from the body location deemed most probative by the machine learning model. Our results suggest that electronic forensic records coupled with machine learning and optimization models could enhance the effectiveness of criminal investigations of sexual assaults.Entities:
Keywords: crime solving; forensic science; machine learning; optimization; sexual assaults
Mesh:
Substances:
Year: 2020 PMID: 32482858 PMCID: PMC7306798 DOI: 10.1073/pnas.2001103117
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 12.779
List of covariates describing the sexual assault, along with the number of SAKs (out of 868 SAKs) that had these values for each covariate (and the average victim age)
| Covariate | Values |
| Time delay between assault | |
| and examination (0 d/1 d/ | 315/298/255 |
| Victim age (y) | Average = 31.6 |
| Victim gender at birth (M/F) | 150/718 |
| Loss of memory (Y/N/U) | 492/332/44 |
| No. of offenders (1/ | 654/100/114 |
| Consensual sex in prior 5 d (Y/N/U) | 201/605/62 |
| Consensual sex in prior 24 h (Y/N/U) | 89/155/624 |
| Known ejaculation (Y/N/U) | 176/102/590 |
| Condom used (Y/N/U) | 71/327/470 |
| Shower or bath before examination (Y/N/U) | 287/510/71 |
| Vaginal penetration of victim by offender (Y/N/U) | 331/217/320 |
| Anal penetration of victim by offender (Y/N/U) | 153/304/411 |
| Oral penetration of victim by offender (Y/N/U) | 110/315/443 |
| Offender’s mouth on genitals (Y/N/U) | 109/318/441 |
| Offender’s mouth on breasts (Y/N/U) | 86/343/439 |
| Offender’s mouth on other body parts (Y/N/U) | 228/225/415 |
| Digital penetration of victim by offender (Y/N/U) | 170/237/461 |
| Oral penetration of offender by victim (Y/N/U) | 170/278/420 |
| Strangled (Y/N/U) | 86/732/50 |
| Punched (Y/N/U) | 92/722/54 |
| Stabbed (Y/N/U) | 3/837/28 |
| Vaginal injury (Y/N/U) | 225/621/22 |
| Other injury (Y/N/U) | 343/507/18 |
Abbreviations: M, male; F, female; Y, yes; N, no; and U, unknown.
In total and broken down by location, the number of samples tested, the number of probative samples (as deemed by the SAFE), and the number of CODIS-uploadable samples
| Sample location | No. of samples | No. of probative samples | No. of CODIS-uploadable samples |
| Body surface | 2,364 | 275 | 594 |
| Genital | 1,932 | 1,014 | 298 |
| Oral | 939 | 223 | 71 |
| Anal | 732 | 310 | 87 |
| Clothing | 287 | 19 | 102 |
| Foreign material | 64 | 7 | 7 |
| Total | 6,318 | 1,848 | 1,159 |
The ratio of the fixed cost to the variable cost, , for the four scenarios
| No prescreening | Prescreening | |
| San Francisco salary | 1.85 | 1.18 |
| Average salary | 1.65 | 1.10 |
The upper left scenario corresponds to our base case.
Fig. 1.Under the LASSO-LR model, the CODIS yield (i.e., the probability that a SAK generates at least one CODIS-uploadable sample) vs. the mean cost per SAK, under the SAFE policy (*), the priority policy (green solid line), the nonlinear priority policy (blue dotted line), and the priority policy with additional synthetic samples (red dashed line). The 95% CIs are depicted for each integer value of the parameter for the priority policy and for each integer value of the mean number of samples tested per SAK for the nonlinear priority policy and the priority policy with additional synthetic samples.
Fig. 2.Under the LASSO-LR policy, the marginal benefit-to-cost ratio of the priority policy vs. the proportion of samples tested. The benefit-to-cost ratio is 91.77 for the SAFE policy and 81.34 for full testing.