| Literature DB >> 24174989 |
Jinseog Kim1, Insuk Sohn, Dennis Dong Hwan Kim, Sin-Ho Jung.
Abstract
One of main objectives of a genome-wide association study (GWAS) is to develop a prediction model for a binary clinical outcome using single-nucleotide polymorphisms (SNPs) which can be used for diagnostic and prognostic purposes and for better understanding of the relationship between the disease and SNPs. Penalized support vector machine (SVM) methods have been widely used toward this end. However, since investigators often ignore the genetic models of SNPs, a final model results in a loss of efficiency in prediction of the clinical outcome. In order to overcome this problem, we propose a two-stage method such that the the genetic models of each SNP are identified using the MAX test and then a prediction model is fitted using a penalized SVM method. We apply the proposed method to various penalized SVMs and compare the performance of SVMs using various penalty functions. The results from simulations and real GWAS data analysis show that the proposed method performs better than the prediction methods ignoring the genetic models in terms of prediction power and selectivity.Entities:
Mesh:
Year: 2013 PMID: 24174989 PMCID: PMC3794570 DOI: 10.1155/2013/340678
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Genotype frequencies.
| AA | AB | BB | Total | |
|---|---|---|---|---|
| Response |
|
|
|
|
| No response |
|
|
|
|
|
| ||||
| Total |
|
|
|
|
The result of simulations with 100 replications: selected SNPs and prognostic SNPs indicate the averaged numbers of the selected SNPs and the selected prognostic SNPs, respectively, in the fitted models; standard error is reported in the parentheses.
|
| Genetic model | Selected SNPs | Prognostic SNPs | Misclassification error | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
| Enet | SCAD |
| Enet | SCAD |
| Enet | SCAD | ||
| 0 | Proposed | 43.10 | 48.66 | 20.31 | 5.11 | 5.46 | 3.31 | 0.1766 | 0.1567 | 0.2736 |
| (0.54) | (0.70) | (0.65) | (0.09) | (0.07) | (0.14) | (0.0048) | (0.0054) | (0.0062) | ||
| Recessive | 40.50 | 41.38 | 25.28 | 4.62 | 4.74 | 3.66 | 0.2408 | 0.2518 | 0.2912 | |
| (0.56) | (1.11) | (0.66) | (0.10) | (0.10) | (0.14) | (0.0054) | (0.0065) | (0.0047) | ||
| Additive | 42.71 | 45.58 | 24.12 | 5.23 | 5.35 | 4.07 | 0.2161 | 0.2118 | 0.3272 | |
| (0.49) | (0.87) | (0.91) | (0.08) | (0.08) | (0.18) | (0.0048) | (0.0064) | (0.0076) | ||
| Dominant | 41.35 | 43.46 | 23.99 | 4.70 | 4.86 | 3.38 | 0.2457 | 0.2347 | 0.2995 | |
| (0.58) | (0.98) | (0.70) | (0.10) | (0.09) | (0.12) | (0.0056) | (0.0063) | (0.0042) | ||
|
| ||||||||||
| 0.3 | Proposed | 42.92 | 45.68 | 19.56 | 5.12 | 5.20 | 3.40 | 0.1690 | 0.1541 | 0.2833 |
| (0.50) | (0.80) | (0.48) | (0.08) | (0.08) | (0.10) | (0.0049) | (0.0047) | (0.0060) | ||
| Recessive | 39.49 | 41.09 | 27.23 | 4.34 | 4.47 | 3.03 | 0.2383 | 0.2368 | 0.2741 | |
| (0.58) | (0.87) | (0.59) | (0.10) | (0.11) | (0.03) | (0.0057) | (0.0057) | (0.0019) | ||
| Additive | 42.07 | 43.90 | 21.89 | 5.06 | 5.04 | 3.74 | 0.2126 | 0.2074 | 0.3338 | |
| (0.52) | (0.96) | (0.67) | (0.08) | (0.08) | (0.08) | (0.0052) | (0.0057) | (0.0056) | ||
| Dominant | 39.97 | 38.56 | 24.97 | 4.56 | 4.29 | 2.04 | 0.2502 | 0.2338 | 0.2607 | |
| (0.62) | (1.03) | (0.53) | (0.10) | (0.11) | (0.03) | (0.0065) | (0.0059) | (0.0039) | ||
The results of CML data: number of selected SNPs and misclassification error are calculated on average over 100 random partitions; standard error is reported in the parentheses.
| Genetic model | Average number of selected SNPs | Misclassification error | ||||
|---|---|---|---|---|---|---|
|
| Enet-SVM | SCAD-SVM |
| Enet-SVM | SCAD-SVM | |
| Proposed | 70.38 | 99.80 | 55.90 | 0.0737 | 0.0590 | 0.1098 |
| (1.29) | (4.10) | (0.52) | (0.0036) | (0.0062) | (0.0013) | |
| Recessive | 55.24 | 120.46 | 27.82 | 0.1184 | 0.0562 | 0.2003 |
| (1.19) | (4.73) | (2.33) | (0.0048) | (0.0051) | (0.0044) | |
| Additive | 66.32 | 120.76 | 43.50 | 0.1063 | 0.0667 | 0.1530 |
| (1.12) | (5.00) | (0.27) | (0.0051) | (0.0061) | (0.0026) | |
| Dominant | 51.90 | 91.92 | 50.90 | 0.1013 | 0.0702 | 0.1663 |
| (0.89) | (4.81) | (1.30) | (0.0062) | (0.0069) | (0.0044) | |
List of SNPs selected commonly by three penalized methods.
| RS ID | Genetic model |
| RS ID | Genetic model |
| RS ID | Genetic model |
|
|---|---|---|---|---|---|---|---|---|
| rs3750551 | D | 0.000510 | rs9289221 | R | 0.000160 | rs6621316 | A | 0.000890 |
| rs3886721 | A | 0.000040 | rs16972014 | A | 0.000170 | rs9890262 | R | 0.000210 |
| rs2938451 | A | 0.000000 | rs3013492 | R | 0.000760 | rs6779769 | A | 0.000510 |
| rs6429646 | R | 0.000050 | rs7095688 | A | 0.000920 | rs9502826 | D | 0.000690 |
| rs6426870 | R | 0.000230 | rs1439691 | R | 0.000100 | rs9896683 | R | 0.000850 |
| rs4784924 | R | 0.000100 | rs7123207 | R | 0.000490 | rs12907966 | D | 0.000220 |
| rs8075266 | R | 0.000190 | rs16830058 | A | 0.000830 | rs5979009 | D | 0.000150 |
| rs4851920 | R | 0.000130 | rs10484180 | R | 0.000930 | rs17157980 | D | 0.000730 |
| rs9809817 | R | 0.000190 | rs1952096 | A | 0.000250 | rs2865510 | R | 0.000160 |
| rs342735 | A | 0.000180 | rs2842068 | D | 0.000600 | rs12457620 | D | 0.000810 |
| rs17066311 | D | 0.000790 | rs420549 | D | 0.000440 | rs4510937 | R | 0.000390 |
| rs6627852 | A | 0.000470 | rs16822723 | A | 0.000590 | rs8073928 | R | 0.000510 |
| rs11841074 | D | 0.000130 | rs2492664 | A | 0.000270 | rs10409991 | R | 0.000290 |
| rs9447907 | R | 0.000650 | rs2029866 | R | 0.000730 | rs1871332 | A | 0.000150 |
| rs16873423 | D | 0.000360 | rs764515 | A | 0.000030 | rs1264547 | D | 0.000670 |
| rs315025 | A | 0.000390 | rs11197596 | A | 0.000240 | rs2016016 | A | 0.000360 |
| rs2355615 | A | 0.000130 | rs9344734 | D | 0.000690 | rs6605081 | R | 0.000150 |