| Literature DB >> 30516002 |
Timothy Vivian-Griffiths1, Emily Baker1, Karl M Schmidt2, Matthew Bracher-Smith1, James Walters1, Andreas Artemiou2, Peter Holmans1, Michael C O'Donovan1, Michael J Owen1, Andrew Pocklington1, Valentina Escott-Price1.
Abstract
A major controversy in psychiatric genetics is whether nonadditive genetic interaction effects contribute to the risk of highly polygenic disorders. We applied a support vector machines (SVMs) approach, which is capable of building linear and nonlinear models using kernel methods, to classify cases from controls in a large schizophrenia case-control sample of 11,853 subjects (5,554 cases and 6,299 controls) and compared its prediction accuracy with the polygenic risk score (PRS) approach. We also investigated whether SVMs are a suitable approach to detecting nonlinear genetic effects, that is, interactions. We found that PRS provided more accurate case/control classification than either linear or nonlinear SVMs, and give a tentative explanation why PRS outperforms both multivariate regression and linear kernel SVMs. In addition, we observe that nonlinear kernel SVMs showed higher classification accuracy than linear SVMs when a large number of SNPs are entered into the model. We conclude that SVMs are a potential tool for assessing the presence of interactions, prior to searching for them explicitly.Entities:
Keywords: polygenic risk score; schizophrenia; support vector machines
Mesh:
Year: 2018 PMID: 30516002 PMCID: PMC6492016 DOI: 10.1002/ajmg.b.32705
Source DB: PubMed Journal: Am J Med Genet B Neuropsychiatr Genet ISSN: 1552-4841 Impact factor: 3.568
Figure 1Box plots of the distribution of prediction accuracy (AUC‐ROC score, y‐axis) of PRS and SVM algorithms in Batch 1 data and in the combined (Batch 1 + 2 data) using 125 GWAS significant SNPs. The box plot represents the distribution of data with the horizontal line being the median, the boundaries of the box are the first and third quartiles and the extremes are minimum and maximum values on the sample [Color figure can be viewed at wileyonlinelibrary.com]
Figure 2The distribution of prediction accuracy of PRS and SVM models in Batch 1 data and in the combined (Batch 1 + 2) using 4,998 SNPs [Color figure can be viewed at wileyonlinelibrary.com]
Figure 3Correlation coefficient r, for varying ORs and MAF in case/control sample for two independent SNPs. In all data (black) and cases (red) and controls (green) separately for MAF = 0.2 (solid) and MAF = 0.3 (dashed) [Color figure can be viewed at wileyonlinelibrary.com]
Figure 4Comparison of case/control association p‐values when two SNPs are included as one predictor variable (PRS) in logistic regression (x‐axis) and separately (y‐axis)