| Literature DB >> 20307319 |
Wei Yu1, Tiebin Liu, Rodolfo Valdez, Marta Gwinn, Muin J Khoury.
Abstract
BACKGROUND: We present a potentially useful alternative approach based on support vector machine (SVM) techniques to classify persons with and without common diseases. We illustrate the method to detect persons with diabetes and pre-diabetes in a cross-sectional representative sample of the U.S. population.Entities:
Mesh:
Year: 2010 PMID: 20307319 PMCID: PMC2850872 DOI: 10.1186/1472-6947-10-16
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Description of the National Health and Nutrition Examination Survey data set used for the study
| Diagnostic category | Definition | N | Classification Scheme I | Classification Scheme II |
|---|---|---|---|---|
| Diagnosed diabetes | Answered "yes" to question "Have you ever been told by a doctor or health professionals that you had diabetes?" | 1,266 | Cases | Excluded from analysis |
| Undiagnosed diabetes | Answered "no" to question "Have you ever been told by a doctor or health professionals that you had diabetes?" | 195 | Cases | Cases |
| Pre-diabetes | Fasting plasma glucose level 100-125 mg/dl | 1,576 | Non-cases | Cases |
| No diabetes | Fasting plasma glucose level <100 mg/dl | 3,277 | Non-cases | Non-cases |
Notes: Total number of the cases for classification scheme I = 1461
Total number of the non-cases for classification scheme I = 4853
Total number of the cases for classification scheme II = 1709
Total number of the non-cases for classification scheme II = 3206
Figure 1Demonstration of finding a separating hyperplane in high dimensional space vs in low dimensional space.
The performance of support vector machine models with four kernel functions for the Classification I and Classification II
| Model | Area under the curve | |||
|---|---|---|---|---|
| Linear | Polynomial | Radial basis function | Sigmoid | |
| Classification Scheme I* | 0.8332 | 0.7655 | 0.8347** | 0.8341 |
| Classification Scheme II* | 0.7318* | 0.6673 | 0.7259 | 0.7273 |
* see Table 1 for the definitions of Classification Schemes I and II
**Best performance
The performance of support vector machine models for the Classification I and Classification II
| Model | Data set | Sensitivity | Specificity | PPV | NPV | AUC |
|---|---|---|---|---|---|---|
| Classification Scheme I* | Test | 0.7715 | 0.7503 | 0.4926 | 0.9127 | 0.8347 |
| Training | 0.7938 | 0.7169 | 0.4550 | 0.9211 | 0.8383 | |
| 10-fold cross- validation | 0.7765 | 0.7027 | 0.4388 | 0.9130 | 0.8242 | |
| Classification Scheme II* | Test | 0.7359 | 0.6254 | 0.5061 | 0.8195 | 0.7318 |
| Training | 0.7092 | 0.6590 | 0.6729 | 0.8087 | 0.7393 | |
| 10-fold cross- validation | 0.7059 | 0.6589 | 0.5293 | 0.8054 | 0.7357 |
PPV, positive predictive value; NPV, negative predictive value; AUC, area under the curve.
*See Table 1 for the definitions of Classification Schemes I and II.
Figure 2ROC curves for Classifications Schemes I (a) and II (b) with SVM models and logistic regression models. Note: see Table 1 for the definitions of Classification Schemes I and II.