| Literature DB >> 19772600 |
Lung-Cheng Huang1, Sen-Yen Hsu, Eugene Lin.
Abstract
BACKGROUND: In the studies of genomics, it is essential to select a small number of genes that are more significant than the others for the association studies of disease susceptibility. In this work, our goal was to compare computational tools with and without feature selection for predicting chronic fatigue syndrome (CFS) using genetic factors such as single nucleotide polymorphisms (SNPs).Entities:
Mesh:
Year: 2009 PMID: 19772600 PMCID: PMC2765429 DOI: 10.1186/1479-5876-7-81
Source DB: PubMed Journal: J Transl Med ISSN: 1479-5876 Impact factor: 5.531
Demographic information of study subjects.
| CFS/non-fatigue (n) | 55/54 |
| Age (year) | 50.5 ± 8.5 |
| Male/Female (n) | 16/93 |
| Race; white/black/other (n) | 104/4/1 |
CFS = chronic fatigue syndrome.
Data are presented as mean ± standard deviation.
A panel of 42 SNPs by the CDC Chronic Fatigue Syndrome Research Group.
| COMT | rs4646312, rs740603, rs6269, rs4633, rs165722, rs933271, rs5993882 |
| CRHR1 | rs110402, rs1396862, rs242940, rs173365, rs242924, rs7209436 |
| CRHR2 | rs2267710, rs2267714, rs2284217 |
| MAOA | rs1801291, rs979606, rs979605 |
| MAOB | rs3027452, rs2283729, rs1799836 |
| NR3C1 | rs2918419, rs1866388, rs860458, rs852977, rs6196, rs6188, rs258750 |
| POMC | rs12473543 |
| SLC6A4 | rs2066713, rs4325622, rs140701 |
| TH | rs4074905, rs2070762 |
| TPH2 | rs2171363, rs4760816, rs4760750, rs1386486, rs1487280, rs1872824, rs10784941 |
The "rs number" means the NCBI SNP ID.
COMT = catechol-O-methyltransferase, CRHR1 = corticotropin releasing hormone receptor 1, CRHR2 = corticotropin releasing hormone receptor 2, MAOA = monoamine oxidase A, MAOB = monoamine oxidase B, NR3C1 = nuclear receptor subfamily 3, group C, member 1 glucocorticoid receptor, POMC = proopiomelanocortin, SLC6A4 = solute carrier family 6 member 4, SNP = Single nucleotide polymorphism, TH = tyrosine hydroxylase, TPH2 = tryptophan hydroxylase 2.
The result of a repeated 10-fold cross-validation experiment using naive Bayes, support vector machine (SVM), and C4.5 decision tree without feature selection.
| Naïve Bayes | 0.60 ± 0.17 | 0.64 ± 0.20 | 0.52 ± 0.21 | 42 |
| SVM with linear kernel | 0.55 ± 0.14 | 0.55 ± 0.21 | 0.56 ± 0.21 | 42 |
| SVM with polynomial kernel | 0.59 ± 0.13 | 0.46 ± 0.24 | 0.71 ± 0.21 | 42 |
| SVM with sigmoid kernel | 0.61 ± 0.13 | 0.62 ± 0.20 | 0.61 ± 0.19 | 42 |
| SVM with Gaussian radial basis function kernel | 0.62 ± 0.13 | 0.60 ± 0.20 | 0.64 ± 0.19 | 42 |
| C4.5 decision tree | 0.50 ± 0.16 | 0.52 ± 0.21 | 0.48 ± 0.21 | 11 |
AUC = the area under the receiver operating characteristic curve, SNP = single nucleotide polymorphism.
Data are presented as mean ± standard deviation.
The result of a repeated 10-fold cross-validation experiment using naive Bayes, support vector machine (SVM), and C4.5 decision tree with the hybrid feature selection approach that combines the chi-squared and information-gain methods.
| Naive Bayes | 0.70 ± 0.16 | 0.65 ± 0.21 | 0.60 ± 0.20 | 12 |
| SVM with linear kernel | 0.67 ± 0.13 | 0.62 ± 0.20 | 0.73 ± 0.19 | 14 |
| SVM with polynomial kernel | 0.62 ± 0.13 | 0.56 ± 0.21 | 0.68 ± 0.18 | 9 |
| SVM with sigmoid kernel | 0.64 ± 0.13 | 0.62 ± 0.20 | 0.67 ± 0.19 | 4 |
| SVM with Gaussian radial basis function kernel | 0.64 ± 0.13 | 0.58 ± 0.20 | 0.71 ± 0.18 | 3 |
| C4.5 decision tree | 0.64 ± 0.13 | 0.80 ± 0.16 | 0.46 ± 0.20 | 2 |
AUC = the area under the receiver operating characteristic curve, SNP = single nucleotide polymorphism.
Data are presented as mean ± standard deviation.
The result of a repeated 10-fold cross-validation experiment using naive Bayes, support vector machine (SVM), and C4.5 decision tree with the wrapper-based feature selection method.
| Naive Bayes | 0.70 ± 0.16 | 0.64 ± 0.20 | 0.63 ± 0.19 | 8 |
| SVM with linear kernel | 0.63 ± 0.14 | 0.71 ± 0.20 | 0.55 ± 0.21 | 9 |
| SVM with polynomial kernel | 0.63 ± 0.12 | 0.43 ± 0.20 | 0.82 ± 0.16 | 12 |
| SVM with sigmoid kernel | 0.64 ± 0.13 | 0.59 ± 0.21 | 0.70 ± 0.18 | 6 |
| SVM with Gaussian radial basis function kernel | 0.63 ± 0.13 | 0.60 ± 0.20 | 0.66 ± 0.19 | 7 |
| C4.5 decision tree | 0.59 ± 0.16 | 0.65 ± 0.21 | 0.55 ± 0.22 | 6 |
AUC = the area under the receiver operating characteristic curve, SNP = single nucleotide polymorphism.
Data are presented as mean ± standard deviation.