| Literature DB >> 25330160 |
Zhenyu Jiang1, Chengan Du2, Assen Jablensky3, Hua Liang2, Zudi Lu4, Yang Ma5, Kok Lay Teo1.
Abstract
Genetic information, such as single nucleotide polymorphism (SNP) data, has been widely recognized as useful in prediction of disease risk. However, how to model the genetic data that is often categorical in disease class prediction is complex and challenging. In this paper, we propose a novel class of nonlinear threshold index logistic models to deal with the complex, nonlinear effects of categorical/discrete SNP covariates for Schizophrenia class prediction. A maximum likelihood methodology is suggested to estimate the unknown parameters in the models. Simulation studies demonstrate that the proposed methodology works viably well for moderate-size samples. The suggested approach is therefore applied to the analysis of the Schizophrenia classification by using a real set of SNP data from Western Australian Family Study of Schizophrenia (WAFSS). Our empirical findings provide evidence that the proposed nonlinear models well outperform the widely used linear and tree based logistic regression models in class prediction of schizophrenia risk with SNP data in terms of both Types I/II error rates and ROC curves.Entities:
Mesh:
Year: 2014 PMID: 25330160 PMCID: PMC4201476 DOI: 10.1371/journal.pone.0109454
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
WAFSS Study: Estimated coefficients , and their standard deviations (s.d.).
| SNP |
|
|
|
| 0.0058 (0.0042) | 0.1393 (0.0050) |
|
| 0.3166 (0.0052) | 0.1727 (0.0051) |
|
| −0.0797 (0.0041) | −0.1082 (0.0044) |
|
| −0.0161 (0.0043) | −0.0541 (0.0044) |
|
| 0.0004 (0.0048) | 0.1058 (0.0042) |
|
| 0.1194 (0.0042) | 0.1804 (0.0047) |
|
| −0.0343 (0.0055) | 0.0503 (0.0047) |
|
| −0.0905 (0.0047) | 0.0630 (0.0042) |
|
| −0.1112 (0.0042) | 0.0810 (0.0048) |
|
| 0.1359 (0.0036) | −0.0288 (0.0054) |
|
| −0.0203 (0.0040) | −0.0993 (0.0050) |
|
| −0.0531 (0.0051) | −0.2190 (0.0054) |
|
| −0.2258 (0.0059) | 0.0227 (0.0040) |
|
| −0.0350 (0.0047) | 0.0800 (0.0048) |
|
| 0.1220 (0.0051) | 0.0241 (0.0042) |
|
| −0.1378 (0.0060) | 0.0976 (0.0039) |
|
| −0.2081 (0.0056) | 0.1916 (0.0043) |
|
| 0.0368 (0.0056) | −0.3270 (0.0042) |
|
| 0.1109 (0.0046) | 0.0651 (0.0046) |
|
| −0.0826 (0.0050) | −0.2375 (0.0049) |
|
| −0.0473 (0.0036) | −0.0544 (0.0052) |
|
| −0.2784 (0.0050) | 0.1464 (0.0046) |
|
| 0.1016 (0.0052) | −0.0622 (0.0049) |
|
| 0.1064 (0.0045) | −0.3077 (0.0042) |
|
| −0.1824 (0.0050) | −0.1235 (0.0048) |
|
| −0.0405 (0.0060) | −0.4768 (0.0041) |
|
| 0.2444 (0.0049) | −0.0024 (0.0051) |
|
| −0.1094 (0.0048) | −0.1192 (0.0035) |
|
| −0.5100 (0.0053) | −0.0415 (0.0050) |
|
| −0.1139 (0.0047) | 0.0162 (0.0047) |
|
| −0.0795 (0.0050) | −0.1194 (0.0047) |
|
| −0.2502 (0.0048) | 0.1427 (0.0048) |
|
| 0.0332 (0.0043) | −0.0568 (0.0039) |
|
| 0.0342 (0.0055) | 0.2519 (0.0048) |
|
| −0.0555 (0.0048) | −0.1884 (0.0045) |
|
| 0.0770 (0.0051) | −0.0204 (0.0051) |
|
| −0.1033 (0.0056) | 0.1180 (0.0058) |
|
| 0.0845 (0.0049) | 0.2366 (0.0050) |
|
| −0.3238 (0.0045) | 0.0979 (0.0043) |
|
| 0.0410 (0.0043) | −0.0302 (0.0047) |
Estimated coefficients , and their standard deviations calculated by bootstrap method in TILoR model for the WAFSS schizophrenia data set.
|
| 0.0274 | 0.4358 | −2.7377 | 1.3744 |
| s.d. (bootstrap) | 0.0139 | 0.0873 | 0.0561 | 0.0547 |
|
| 0.0260 | −0.0748 | 2.4239 | 0.4685 |
| s.d. (bootstrap) | 0.0281 | 0.0875 | 0.0629 | 0.0553 |
WAFSS Study: The components of and whose absolute values are greater than 0.2.
| Component of X | (Gene:SNP) | Component of |
|
| (APOE:rs439401) | 0.3166 |
|
| (DAB:rs17424216) | −0.2258 |
|
| (DISC1:rs9432024) | −0.2081 |
|
| (DLG2:rs17507049) | −0.2785 |
|
| (DLG2:rs1943699) | 0.2444 |
|
| (DLG4:rs17203281) | −0.5099 |
|
| (NUDEL:rs931671) | −0.2502 |
|
| (VLDLR:rs1454626) | −0.3238 |
Figure 1:TILoR model for general schizophrenia: The plot of the functions g and g, respectively.
WAFSS Study: Type I, Type II errors rates, predictive accuracy rates, and area under the curve (AUC) based on cross-validation estimate using GLM models, TILoR models, and random forest (RF) method.
| Fold1 | Fold2 | Fold3 | Average | ||
| TILoR | Type I error | 38.59% | 36.84% | 21.05% | 32.16% |
| Type II error | 25.92% | 31.48% | 28.70% | 28.70% | |
| predictive accuracy | 69.69% | 66.67% | 73.94% | 70.10% | |
| AUC | 0.812 | 0.812 | 0.791 | 0.805 | |
| GLM | Type I error | 52.63% | 57.89% | 70.17% | 60.23% |
| Type II error | 23.14% | 20.37% | 15.74% | 19.75% | |
| predictive accuracy | 66.67% | 66.67% | 65.45% | 66.26% | |
| AUC | 0.774 | 0.774 | 0.774 | 0.774 | |
| RF | Type I error | 63.16% | 77.19% | 77.19% | 72.51% |
| Type II error | 8.33% | 5.56% | 3.70% | 5.86% | |
| Prediction accuracy | 72.73% | 69.70% | 70.91% | 71.11% | |
| AUC | 0.688 | 0.702 | 0.732 | 0.707 |
Figure 2:The ROC curves based on three methods/models (TILoR: Blue line; GLM: Red line; random forest: Green line) corresponding to folds 1–3.
Figure 3:Boxplot of the estimates of the parameters in g, , g and based on 100 simulations: .
Figure 4:Boxplot of the estimates of the parameters in g, , g and based on 100 simulations: .
Figure 5:Boxplot of the absolute errors (AEs) of the estimates of the parameters in g, , g and based on 100 simulations: .
Figure 6:Boxplot of the absolute errors (AEs) of the estimates of the parameters in g, , g and based on 100 simulations: .