| Literature DB >> 35052461 |
Ewelina Pośpiech1, Paweł Teisseyre2,3, Jan Mielniczuk2,3, Wojciech Branicki1,4.
Abstract
The idea of forensic DNA intelligence is to extract from genomic data any information that can help guide the investigation. The clues to the externally visible phenotype are of particular practical importance. The high heritability of the physical phenotype suggests that genetic data can be easily predicted, but this has only become possible with less polygenic traits. The forensic community has developed DNA-based predictive tools by employing a limited number of the most important markers analysed with targeted massive parallel sequencing. The complexity of the genetics of many other appearance phenotypes requires big data coupled with sophisticated machine learning methods to develop accurate genomic predictors. A significant challenge in developing universal genomic predictive methods will be the collection of sufficiently large data sets. These should be created using whole-genome sequencing technology to enable the identification of rare DNA variants implicated in phenotype determination. It is worth noting that the correctness of the forensic sketch generated from the DNA data depends on the inclusion of an age factor. This, however, can be predicted by analysing epigenetic data. An important limitation preventing whole-genome approaches from being commonly used in forensics is the slow progress in the development and implementation of high-throughput, low DNA input sequencing technologies. The example of palaeoanthropology suggests that such methods may possibly be developed in forensics.Entities:
Keywords: DNA-based prediction; forensic DNA intelligence; forensic genomics; human genome variation; investigative leads; physical appearance
Mesh:
Substances:
Year: 2022 PMID: 35052461 PMCID: PMC8774670 DOI: 10.3390/genes13010121
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Procedure for the development and application of a phenotype prediction tool. The main differences in the procedures for developing a predictive model using the standard or alternative approach concern the selection of variables and the number of variables in the model. Consequently, the method of acquiring genetic data in the practical forensic applications of the next-generation predictive models may require whole-genome sequencing methods. Thus: (a) only phenotype-associated SNPs are included in prediction modelling, the models are not very extensive, and the methods of data acquisition can be less complex (SNaPshot, targeted MPS); (b) the selection of relevant variables (SNPs) is targeted towards improving the prediction accuracy of the model, and much more advanced variable selection methods are required. Some complex models may involve many thousands of SNPs, which, in biological traces, must be analysed using whole-genome sequencing methods that are effective for low DNA input samples.
Examples of various approaches proposed for genetic prediction of physical traits.
| Physical Trait | Statistical Model | Number of Predictors in the Model | Prediction Accuracy Parameters | Ref. |
|---|---|---|---|---|
| Eye colour | Multinomial logistic regression (IrisPlex) 1 | 6 SNPs | AUCbrown = 0.93 2 | [ |
| Likelihood ratio | 4 SNPs | LRlight-dark depends on genotypes | [ | |
| Multiple linear regression | 3 SNPs | R2 = 0.764 | [ | |
| No statistical model, classification based on genotypes | 6 SNPs | Overall classification success rate (blue–green–brown): 98.94% | [ | |
| Likelihood ratio | 6 SNPs | LRlight-dark depends on genotypes | [ | |
| Bayesian naïve classifier (Snipper) | 23 SNPs | Classification success rate: | [ | |
| Multiple response classification tree | 4 SNPs | Classification success rate: | [ | |
| No statistical model, prediction based on genotypes | 5 SNPs | Overall classification success rate (blue–green–brown): 97.64% | [ | |
| Hair colour | Multinomial logistic regression + prediction guide (HIrisPlex) 1 | 22 SNPs | Classification success rate: | [ |
| Bayesian naïve classifier (Snipper) | 12 SNPs | Classification success rate: | [ | |
| Multinomial logistic regression | 270 SNPs | AUCblond = 0.74 | [ | |
| Skin colour | Multiple linear regression, including interaction | 3 SNPs | R2 = 0.496 | [ |
| No statistical model, classification based on genotypes | 5 SNPs | Overall classification success rate (dark–medium–light): 62% | [ | |
| Bayesian naïve classifier (Snipper) | 10 SNPs | AUCwhite = 0.999AUCintermediate = 0.803 | [ | |
| Multinomial logistic regression (HIrisPlex-S) 1 | 36 SNPs | AUClight = 0.97 | [ | |
| Multiple linear regression | 9 SNPs | R2 = 0.65 | [ | |
| Freckles | Binomial logistic regression | 34 SNPs + sex | AUCfreckled = 0.809 | [ |
| Multinomial logistic regression | 20 SNPs + sex | AUCnon-freckled = 0.754 | [ | |
| Hair loss | Binomial logistic regression | 20 SNPs | AUCbald = 0.66 | [ |
| Binomial logistic regression | 14 SNPs | AUCearly-onset baldness = 0.74 | [ | |
| Polygenic scores | 261 autosomal SNPs; 70 X chromosomal SNPs | AUCsevere baldness = 0.748 | [ | |
| Hair shape | Binomial logistic regression | 3 SNPs | AUCstraight = 0.62 | [ |
| Binomial and multinomial logistic regression | 32 SNPs in binomial modelor33 SNPs in multinomial model | AUCstraight = 0.66 in Europeans | [ | |
| Hair greying | Binary and multi-class neural network | 10 SNPs + age and sex in binary model | AUCgreying = 0.87 (mostly based on age)or | [ |
| Height | Polygenic scores | 54 SNPs | AUCtall stature = 0.65 | [ |
| Polygenic scores | 180 SNPs | AUCtall stature = 0.75 | [ | |
| Polygenic scores | 689 SNPs | AUCtall stature = 0.79 | [ | |
| L1-penalized regression (LASSO) | >20,000 SNPs | r = 0.64 | [ | |
| Face | Partial least squares regression | Genomic ancestry (68 DNA variants) + sex + 24 SNPs | Genomic ancestry explains 9.6% of the total facial variation; sex independently from ancestry explains 12.9%; SNPs make a small contribution to improving facial distinctiveness | [ |
| Ridge regression | Genomic ancestry | Genomic ancestry and sex explain large proportion of the predictive accuracy of the model; age and BMI improve the accuracy of the model | [ | |
| Simple quantitative method (principal component analysis and partial least square analysis used to extract new face traits) | 277 SNPs | SSA statistic 3: | [ |
1 SNaPshot and MPS forensically validated genetic tests for data collection available; 2 AUC—area under the ROC (receiver operating characteristic) curve, describes the general performance of the model, 1 means perfect prediction and 0.5 means random assignment; 3 SSA—a shape similarity statistic (shape space angle) developed to measure the angle between two shapes in the 3D face modelling data space.