| Literature DB >> 25519351 |
Hsin-Hsiung Huang1, Tu Xu1, Jie Yang1.
Abstract
In this paper, we compare logistic regression and 2 other classification methods in predicting hypertension given the genotype information. We use logistic regression analysis in the first step to detect significant single-nucleotide polymorphisms (SNPs). In the second step, we use the significant SNPs with logistic regression, support vector machines (SVMs), and a newly developed permanental classification method for prediction purposes. We also detect rare variants and investigate their impact on prediction. Our results show that SVMs and permanental classification both outperform logistic regression, and they are comparable in predicting hypertension status.Entities:
Year: 2014 PMID: 25519351 PMCID: PMC4143639 DOI: 10.1186/1753-6561-8-S1-S96
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
Most significant 18 single-nucleotide polymorphisms based on logistic regression on SIMPHEN.1.csv and corresponding p-values
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|---|---|---|---|---|---|---|---|---|
| rs11711953 | rs11706549 | rs275678 | rs9828391 | rs7653745 | rs6789918 | rs9829009 | rs11719850 | rs7645789 |
| 4.4 × 10−13 | 1.1 × 10−12 | 6.8 × 10−11 | 9.9 × 10−11 | 1.2 × 10−10 | 6.4 × 10−10 | 8.4 × 10−10 | 1.5 × 10−9 | 2.0 × 10−9 |
| rs7609918 | rs6444467 | rs1471695 | rs7632157 | rs17785248 | rs6777472 | rs16862782 | rs12497460 | rs4680987 |
| 2.1 × 10−9 | 2.2 × 10−9 | 2.2 × 10−9 | 3.0 × 10−9 | 3.0 × 10−9 | 3.5 × 10−9 | 3.7 × 10−9 | 4.3 × 10−9 | 4.7 × 10−9 |
Frequency table of hypertension status and genotype
| rs11711953 | CC | TC | XX |
|---|---|---|---|
| No | 1858 | 125 | 13 |
| Yes | 536 | 13 | 2 |
CC, cytosine- cytosine pair; TC, thymine- thymine pair; XX, unknown pair.
Prediction errors of logistic regression with multiple single-nucleotide polymorphisms across different data sets and number of single-nucleotide polymorphisms included
| Number of SNPs | SIMPHEN.1 | SIMPHEN.2 | SIMPHEN.3 | SIMPHEN.4 | SIMPHEN.5 |
|---|---|---|---|---|---|
| 0 | 0.221 | 0.232 | 0.223 | 0.216 | 0.228 |
| 5 | 0.211 | 0.229 | 0.218 | 0.210 | 0.228 |
| 10 | 0.189 | 0.230 | 0.225 | 0.207 | 0.223 |
| 15 | 0.190 | 0.234 | 0.224 | 0.208 | 0.225 |
| 20 | 0.188 | 0.242 | 0.229 | 0.213 | 0.235 |
SNP, single-nucleotide polymorphism.
Rare variants
| HTN/genotype | rs9829721 | rs776105 | ||
|---|---|---|---|---|
| TT | Non-TT | AA | Non-AA | |
| 0 | 17 | 1857 | 58 | 1816 |
| 1 | 37 | 636 | 62 | 611 |
AA, adenine-adenine pair; HTN, hypertension; TT, thymine-thymine pair.
Prediction errors of support vector machine and permanental classification using common variants
| Number of SNPs | 0 | 5 | 10 | 15 | 20 | 50 | 100 | 200 |
|---|---|---|---|---|---|---|---|---|
| Radial kernel SVM (training) | 0.2301 | 0.0291 | 0.0180 | 0.0203 | 0.0130 | 0.0175 | 0.0052 | 0.0008 |
| Radial kernel SVM (testing) | 0.2419 | 0.1460 | 0.1350 | 0.1303 | 0.1272 | 0.1257 | 0.1213 | 0.1248 |
| Permanental classification (training) | 0.2231 | 0.1031 | 0.0971 | 0.0536 | 0.0404 | 0.0334 | 0.0337 | 0.0321 |
| Permanental classification (testing) | 0.2642 | 0.1517 | 0.1433 | 0.1473 | 0.1350 | 0.1347 | 0.1233 | 0.1231 |
SNP, single-nucleotide polymorphism; SVM, support vector machine.
Prediction errors of support vector machine and permanental classification using rare variants
| Number of SNPs | 10 | 50 | 100 | 200 |
|---|---|---|---|---|
| Radial kernel SVM (training) | 0.0795 | 0.0087 | 0.0087 | 0.0087 |
| Radial kernel SVM (testing) | 0.2484 | 0.2440 | 0.2432 | 0.2424 |
| Permanental classification (training) | 0.0843 | 0.0637 | 0.0711 | 0.0575 |
| Permanental classification (testing) | 0.2533 | 0.2330 | 0.2331 | 0.2303 |
SNP, single-nucleotide polymorphism; SVM, support vector machine.
Prediction errors of support vector machine and permanental classification using common and rare variants
| Number of common and rare variants | (10,10) | (50,50) | (100,100) | (200,200) |
|---|---|---|---|---|
| Radial kernel SVM (training) | 0.0195 | 0.0067 | 0.0067 | 0.0067 |
| Radial kernel SVM (testing) | 0.1350 | 0.1351 | 0.1350 | 0.1301 |
| Permanental classification (training) | 0.0943 | 007137 | 0.0631 | 0.0513 |
| Permanental classification (testing) | 0.1433 | 0.1330 | 0.1300 | 0.1300 |
SVM, support vector machine.