| Literature DB >> 21554759 |
Chih Lee1, Ion I Măndoiu, Craig E Nelson.
Abstract
BACKGROUND: The assignment of DNA samples to coarse population groups can be a useful but difficult task. One such example is the inference of coarse ethnic groupings for forensic applications. Ethnicity plays an important role in forensic investigation and can be inferred with the help of genetic markers. Being maternally inherited, of high copy number, and robust persistence in degraded samples, mitochondrial DNA may be useful for inferring coarse ethnicity. In this study, we compare the performance of methods for inferring ethnicity from the sequence of the hypervariable region of the mitochondrial genome.Entities:
Year: 2011 PMID: 21554759 PMCID: PMC3090759 DOI: 10.1186/1753-6561-5-S2-S11
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
Comparison of 5-fold CV accuracy measures on the trimmed forensic dataset
| # Samples | Classification Algorithm | ||||
|---|---|---|---|---|---|
| PCA-QDA | PCA-LDA | 1NN | PCA-SVM | ||
| 1674 | 83.15 | 90.2 | 93.73 | 94.62 | |
| 761 | 72.93 | 74.11 | 83.31 | 84.76 | |
| 1305 | 84.6 | 88.28 | 86.59 | 89.81 | |
| 686 | 71.57 | 68.22 | 72.01 | 72.59 | |
| 4426 | 80.03 | 83.46 | 86.47 | 88.10 | |
| 4426 | 78.06 | 80.20 | 83.91 | 85.45 | |
Figure 1Effects of incomplete data on accuracy Comparison of PCA-QDA, PCA-LDA, 1NN, and PCA-SVM 5-fold CV micro-accuracy on regions obtained by iteratively deleting groups of 10% polymorphisms starting from HVR1 towards HVR2 (A), respectively from HVR2 towards HVR1 (B), and on sliding windows spanning 10% of the nucleotides in HVR1+HVR2 (C).
Confusion table of the PCA-SVM test results on the trimmed published dataset
| True Ethnicity | # Samples | Predicted Ethnicity | |||
|---|---|---|---|---|---|
| Caucasian | Asian | African | Hispanic | ||
| 1956 | 5.47 | 1.53 | 0.41 | ||
| 450 | 25.78 | 3.11 | 3.33 | ||
| 134 | 5.22 | 3.73 | 3.73 | ||
| Micro-Accuracy: 87.91% | |||||
| Macro-Accuracy: 82.56% | |||||