| Literature DB >> 30972108 |
Daniel Sik Wai Ho1, William Schierding1, Melissa Wake2, Richard Saffery2, Justin O'Sullivan1.
Abstract
In the past decade, precision genomics based medicine has emerged to provide tailored and effective healthcare for patients depending upon their genetic features. Genome Wide Association Studies have also identified population based risk genetic variants for common and complex diseases. In order to meet the full promise of precision medicine, research is attempting to leverage our increasing genomic understanding and further develop personalized medical healthcare through ever more accurate disease risk prediction models. Polygenic risk scoring and machine learning are two primary approaches for disease risk prediction. Despite recent improvements, the results of polygenic risk scoring remain limited due to the approaches that are currently used. By contrast, machine learning algorithms have increased predictive abilities for complex disease risk. This increase in predictive abilities results from the ability of machine learning algorithms to handle multi-dimensional data. Here, we provide an overview of polygenic risk scoring and machine learning in complex disease risk prediction. We highlight recent machine learning application developments and describe how machine learning approaches can lead to improved complex disease prediction, which will help to incorporate genetic features into future personalized healthcare. Finally, we discuss how the future application of machine learning prediction models might help manage complex disease by providing tissue-specific targets for customized, preventive interventions.Entities:
Keywords: complex disease risk; genetic disease risk prediction; machine learning; personalized medicine; polygenic risk score; precision medicine
Year: 2019 PMID: 30972108 PMCID: PMC6445847 DOI: 10.3389/fgene.2019.00267
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Workflow for creating a supervised machine learning model from a genotype dataset.
Figure 2The strengths and weaknesses of polygenic risk scoring and machine learning model.
A brief view of common machine learning algorithms.
| Regression based | Examples | |
|---|---|---|
| Logistic regression | • Use parametric regressions to estimate the probabilities of dichotomous outputs ( | |
| Neural Network | • Use multi-layers of non-parametric regressions and transformations to model input data to outputs ( | |
| Support vector machine (SVM) | • Use non-parametric regressions to model input data for creating multi-dimensional hyperspaces to discriminate the outputs ( | |
| Lasso | • Apply L1 penalized loss functions in regression ( | |
| Elastic net | • Apply L1 and L2 penalized loss functions in regression ( | |
| Decision tree | • Utilize binary decision splitting rule approaches to model the relationships between input data and outputs ( | |
| Random forest | • Utilize an ensemble of randomized decision trees to model input data to outputs ( |