| Literature DB >> 26892725 |
Yalu Wen1, Zihuai He2, Ming Li3, Qing Lu4.
Abstract
With the advance in high-throughput sequencing technology, it is feasible to investigate the role of common and rare variants in disease risk prediction. While the new technology holds great promise to improve disease prediction, the massive amount of data and low frequency of rare variants pose great analytical challenges on risk prediction modeling. In this paper, we develop a forward random field method (FRF) for risk prediction modeling using sequencing data. In FRF, subjects' phenotypes are treated as stochastic realizations of a random field on a genetic space formed by subjects' genotypes, and an individual's phenotype can be predicted by adjacent subjects with similar genotypes. The FRF method allows for multiple similarity measures and candidate genes in the model, and adaptively chooses the optimal similarity measure and disease-associated genes to reflect the underlying disease model. It also avoids the specification of the threshold of rare variants and allows for different directions and magnitudes of genetic effects. Through simulations, we demonstrate the FRF method attains higher or comparable accuracy over commonly used support vector machine based methods under various disease models. We further illustrate the FRF method with an application to the sequencing data obtained from the Dallas Heart Study.Entities:
Mesh:
Year: 2016 PMID: 26892725 PMCID: PMC4759688 DOI: 10.1038/srep21120
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Four weight functions considered in our study.
| Un-weighted (UW) | Beta distribution type of weights (BETA) | Weighted sum statistics type of weights (WSS) | Logarithm of MAFs as weights (LOG) |
|---|---|---|---|
Figure 1The impact of different weights on methods’ performance under various disease models.
Figure 2The impact of causal/non-causal SNVs ratio on methods’ performance under various disease models.
The impact of varied causal/non-causal SNVs ratio on the performance of three methods under different disease models.
| Disease Model | % of causal markers | FRF | MSVM | SVM | Probability |
|---|---|---|---|---|---|
| Equal | 100 | 0.863 | 0.658 | 0.717 | 0.941 |
| 50 | 0.842 | 0.680 | 0.700 | 0.940 | |
| 25 | 0.810 | 0.730 | 0.671 | 0.951 | |
| 16.7 | 0.784 | 0.747 | 0.653 | 0.944 | |
| Beta | 100 | 0.837 | 0.895 | 0.730 | 0.947 |
| 50 | 0.836 | 0.760 | 0.723 | 0.950 | |
| 25 | 0.843 | 0.674 | 0.718 | 0.945 | |
| 16.7 | 0.841 | 0.635 | 0.714 | 0.936 | |
| WSS | 100 | 0.798 | 0.784 | 0.697 | 0.951 |
| 50 | 0.796 | 0.708 | 0.696 | 0.954 | |
| 25 | 0.766 | 0.632 | 0.676 | 0.947 | |
| 16.7 | 0.738 | 0.593 | 0.655 | 0.945 | |
| LOG | 100 | 0.808 | 0.811 | 0.730 | 0.945 |
| 50 | 0.801 | 0.705 | 0.724 | 0.944 | |
| 25 | 0.789 | 0.673 | 0.710 | 0.950 | |
| 16.7 | 0.780 | 0.669 | 0.700 | 0.938 |
*The probability of excluding at least one non-disease-related gene.
Figure 3The impact of the number of non-causal genes on methods’ performance under various disease models.
Figure 4ROC curves of three prediction models formed by FRF, SVM and MSVM using the DHS sequencing data.
The AUC values of the three prediction models formed by FRF, SVM and MSVM using the DHS sequencing data.
| FRF | SVM | MSVM | |
|---|---|---|---|
| Mean | 0.570 | 0.528 | 0.529 |
| Standard Error | 0.022 | 0.021 | 0.024 |