| Literature DB >> 32499222 |
Lauren A Baker1, Mehdi Momen1, Kore Chan1, Nathan Bollig1, Fernando Brito Lopes2, Guilherme J M Rosa2, Rory J Todhunter3, Emily E Binversie1, Susannah J Sample1, Peter Muir4.
Abstract
Anterior cruciate ligament (ACL) rupture is a common, debilitating condition that leads to early-onset osteoarthritis and reduced quality of human life. ACL rupture is a complex disease with both genetic and environmental risk factors. Characterizing the genetic basis of ACL rupture would provide the ability to identify individuals that have high genetic risk and allow the opportunity for preventative management. Spontaneous ACL rupture is also common in dogs and shows a similar clinical presentation and progression. Thus, the dog has emerged as an excellent genomic model for human ACL rupture. Genome-wide association studies (GWAS) in the dog have identified a number of candidate genetic variants, but research in genomic prediction has been limited. In this analysis, we explore several Bayesian and machine learning models for genomic prediction of ACL rupture in the Labrador Retriever dog. Our work demonstrates the feasibility of predicting ACL rupture from SNPs in the Labrador Retriever model with and without consideration of non-genetic risk factors. Genomic prediction including non-genetic risk factors approached clinical relevance using multiple linear Bayesian and non-linear models. This analysis represents the first steps toward development of a predictive algorithm for ACL rupture in the Labrador Retriever model. Future work may extend this algorithm to other high-risk breeds of dog. The ability to accurately predict individual dogs at high risk for ACL rupture would identify candidates for clinical trials that would benefit both veterinary and human medicine.Entities:
Keywords: ACL rupture; Canine; Dog model; GenPred; Genomic prediction; Shared data resources
Mesh:
Year: 2020 PMID: 32499222 PMCID: PMC7407450 DOI: 10.1534/g3.120.401244
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Schematic of data analysis and modeling workflow. HWE: Hardy-Weinberg equilibrium; MAF: minor allele frequency; SNP: single nucleotide polymorphism; GWAS: genome-wide association study; ML: machine learning.
Figure 210-fold cross validation of Labrador Retriever SNPs performed with Bayesian genomic prediction models. Averages for model prediction across all folds for 5 repeats per model are reported. Charts show model performance with (base model+COV) and without (base model) inclusion of covariates. The graphs compare model performance with and without removal of highly correlated SNPs prior to analysis.
Highest performing machine learning models in 10-fold cross validation for prediction of ACL rupture in Labrador Retriever dogs
| Model | Feature Selection | No. SNPs | AUC (SD) |
|---|---|---|---|
| wRF | GWAS | 7500 | 0.584 (0.048) |
| meanDiff | 7500 | 0.572 (0.059) | |
| GBT | GWAS | 10000 | 0.590 (0.049) |
| meanDiff | 7500 | 0.588 (0.059) | |
| NB | GWAS | 7500 | 0.584 (0.025) |
| meanDiff | 7500 | 0.584 (0.055) | |
| KNN | GWAS | 10000 | 0.553 (0.045) |
| meanDiff | 7500 | 0.564 (0.039) | |
| wRF | GWAS | 15000 | 0.599 (0.050) |
| meanDiff | 7500 | 0.598 (0.056) | |
| GBT | GWAS | 7500 | 0.599 (0.039) |
| meanDiff | 7500 | 0.597 (0.040) | |
| NB | GWAS | 750 | 0.587 (0.054) |
| meanDiff | 7500 | 0.576 (0.036) | |
| KNN | GWAS | 12500 | 0.565 (0.052) |
| meanDiff | 5 | 0.567 (0.045) | |
| wRF | GWAS | 10 | 0.782 (0.035) |
| meanDiff | 5 | 0.767 (0.034) | |
| GBT | GWAS | 10 | 0.770 (0.050) |
| meanDiff | 100 | 0.749 (0.037) | |
| NB | GWAS | 5 | 0.688 (0.033) |
| meanDiff | 5 | 0.674 (0.038) | |
| KNN | GWAS | 7500 | 0.562 (0.034) |
| meanDiff | 12500 | 0.557 (0.039) | |
| wRF | GWAS | 5 | 0.778 (0.025) |
| meanDiff | 5 | 0.792 (0.027) | |
| GBT | GWAS | 10 | 0.757 (0.027) |
| meanDiff | 5 | 0.777 (0.031) | |
| NB | GWAS | 5 | 0.683 (0.031) |
| meanDiff | 5 | 0.699 (0.040) | |
| KNN | GWAS | 15000 | 0.569 (0.038) |
| meanDiff | 7500 | 0.567 (0.044) | |
wRF = weighted random forest; GBT = gradient boosted trees; NB = Naïve Bayes; KNN = K nearest neighbors; AUC = Area under the ROC curve.
Figure 310-fold cross validation of Labrador Retriever SNPs was performed with models trained on feature sets from 5 to 15,000 SNPs. Averages for model prediction across all folds over five runs per model are reported. This analysis used n = 247 cases and n = 375 controls. A. Base model performance without LD pruning or covariates; B. Model performance after LD pruning was performed at r2 > 0.7. C. Model performance with covariates (weight, sex, neutering) considered as additional features. D. Model performance with LD pruning and covariates. AUC: area under the ROC curve; wRF: weighted subspace random forest; NB: Naïve Bayes; kNN: k-nearest neighbor; GBT: gradient boosted trees.
Highest performing ensemble models in 10-fold cross validation
| Ensemble | N (SD) | No. SNPs | AUC (SD) |
|---|---|---|---|
| nAgreement | 5.86 (1.31) | 12500 | 0.598 (0.04) |
| GLM | N/A | 7500 | 0.611 (0.07) |
| RF | N/A | 5 | 0.583 (0.09) |
| nAgreement | 5.68 (1.17) | 4000 | 0.607 (0.04) |
| GLM | N/A | 4000 | 0.611 (0.06) |
| RF | N/A | 5 | 0.579 (0.09) |
| nAgreement | 5.14 (0.76) | 5 | 0.687 (0.07) |
| GLM | N/A | 25 | 0.695 (0.09) |
| RF | N/A | 5 | 0.692 (0.09) |
| nAgreement | 5.24 (0.96) | 10 | 0.694 (0.07) |
| GLM | N/A | 100 | 0.703 (0.08) |
| RF | N/A | 5 | 0.702 (0.09) |
N = average number of models that agreed on the prediction; AUC = area under the ROC curve; GLM = supervisory learning with logistic regression; RF = random forest.