| Literature DB >> 23401685 |
Hsueh-Wei Chang1, Yu-Hsien Chiu, Hao-Yun Kao, Cheng-Hong Yang, Wen-Hsien Ho.
Abstract
An essential task in a genomic analysis of a human disease is limiting the number of strongly associated genes when studying susceptibility to the disease. The goal of this study was to compare computational tools with and without feature selection for predicting osteoporosis outcome in Taiwanese women based on genetic factors such as single nucleotide polymorphisms (SNPs). To elucidate relationships between osteoporosis and SNPs in this population, three classification algorithms were applied: multilayer feedforward neural network (MFNN), naive Bayes, and logistic regression. A wrapper-based feature selection method was also used to identify a subset of major SNPs. Experimental results showed that the MFNN model with the wrapper-based approach was the best predictive model for inferring disease susceptibility based on the complex relationship between osteoporosis and SNPs in Taiwanese women. The findings suggest that patients and doctors can use the proposed tool to enhance decision making based on clinical factors such as SNP genotyping data.Entities:
Year: 2013 PMID: 23401685 PMCID: PMC3557627 DOI: 10.1155/2013/850735
Source DB: PubMed Journal: Int J Endocrinol ISSN: 1687-8337 Impact factor: 3.257
Panel of 11 SNPs [9].
| SNP | Gene | rs number | Genotype | ||
|---|---|---|---|---|---|
| 1 | 2 | 3 | |||
| 1 | TNF | rs1799724 | TT | TC | CC |
| 2 | TGF | rs1800469 | TT | TC | CC |
| 3 | Osteocalcin | rs1800247 | CC | CT | TT |
| 4 | TNF | rs1800629 | AA | AG | GG |
| 5 | PTH (BstB I) | rs6254 | GG | AG | AA |
| 6 | PTH (Dra II) | rs6256 | AA | AC | CC |
| 7 | IL1_rab | VNTRa | A1A1b | A1A2 | A1A4 |
| 8 | HSP70 hom | rs2227956 | CC | CT | TT |
| 9 | HSP 70-2 | rs1061581 | GG | AG | AA |
| 10 | CTR | rs1801197 | CC | CT | TT |
| 11 | BMP-4 | rs17563 | CC | CT | TT |
aVNTR: various number of tandem repeat.
bIL1_ra genotype: A1: 410 bp; A2: 240 bp; A4: 325 bp.
Demographic data for study subjects.
| Factor | Range | Descriptive statistics |
|---|---|---|
| Age (year) | 27–83 |
|
| Menopause | Postpremenopausal/ | 247 (83.73%)/48 (16.27%) |
| BMI (kg/m2) | 17.22–35.49 |
|
| BMD | High/low | 112 (37.97%)/183 (62.03%) |
BMI: body mass index; BMD: bone mineral density.
Figure 1Flowchart of wrapper-based approach to feature subset selection [23].
Figure 2State space search for feature subset selection [23].
Figure 3In the wrapper-based feature selection approach, genetic factors are evaluated independently of multilayer feedforward neural network (MFNN), naive Bayes, and logistic regression.
Results of repeated 10-fold cross-validation experiment using multilayer feedforward neural network (MFNN), naive Bayes, and logistic regression without feature selection.
| Algorithm | AUC | Sensitivity | Specificity | Number of SNPs |
|---|---|---|---|---|
| MFNN | 0.489 | 0.400 | 0.629 | 11 |
| Naive Bayes | 0.462 | 0.296 | 0.612 | 11 |
| Logistic regression | 0.485 | 0.333 | 0.615 | 11 |
AUC: area under the ROC curve.
Results of repeated 10-fold cross-validation experiment using multilayer feedforward neural network (MFNN), naive Bayes, and logistic regression with wrapper-based feature selection approach.
| Algorithm | AUC | Sensitivity | Specificity | Number of SNPs |
|---|---|---|---|---|
| MFNN | 0.631 | 0.579 | 0.689 | 4 (rs1800469, VNTR, rs2227956, rs1801197) |
| Naive Bayes | 0.569 | 0 | 0.620 | 3 (rs1800469, rs1800247, rs1801197) |
| Logistic regression | 0.620 | 0.407 | 0.623 | 8 (rs1800469, rs1800629, rs6254, rs6256, rs2227956, rs1061581, rs1801197, rs17563) |
AUC: area under the ROC curve.