| Literature DB >> 25551433 |
Minta Thomas1, Kris De Brabanter2, Johan A K Suykens3, Bart De Moor4.
Abstract
BACKGROUND: Clinical data, such as patient history, laboratory analysis, ultrasound parameters-which are the basis of day-to-day clinical decision support-are often used to guide the clinical management of cancer in the presence of microarray data. Several data fusion techniques are available to integrate genomics or proteomics data, but only a few studies have created a single prediction model using both gene expression and clinical data. These studies often remain inconclusive regarding an obtained improvement in prediction performance. To improve clinical management, these data should be fully exploited. This requires efficient algorithms to integrate these data sets and design a final classifier. LS-SVM classifiers and generalized eigenvalue/singular value decompositions are successfully used in many bioinformatics applications for prediction tasks. While bringing up the benefits of these two techniques, we propose a machine learning approach, a weighted LS-SVM classifier to integrate two data sources: microarray and clinical parameters.Entities:
Mesh:
Year: 2014 PMID: 25551433 PMCID: PMC4308909 DOI: 10.1186/s12859-014-0411-1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Summary of the 5 breast cancer data sets
|
|
|
|
| |
|---|---|---|---|---|
|
|
| |||
| Case I | 85 | 25 | 5000 | Age, Ethnicity, ER status, PR status, Radiation treatment, Chemotherapy, |
| Hormonal therapy, Nodal status, Metastasis, Tumor stage, | ||||
| Tumor size, Tumor grade. | ||||
| Case II | 33 | 96 | 6000 | Age, Ethnicity, pretreatment tumor stage, nodal status, |
| nuclear grade, ER status, PR status, HER2 status. | ||||
| Case III | 112 | 65 | 5000 | Age, Tumor size, Nodal status, ER status, Tamoxifen treatment. |
| Case IV | 46 | 51 | 12192 | Age, Tumor size, Grade, Erp, Angioinvasion, Lymphocytic Infiltrate, PRp. |
| Case V | 58 | 193 | 20055 | Age, Tumor size, Grade, ER, Prp, Lymph node. |
Figure 1Overview of algorithm. The data sets represented as matrices with rows corresponding to patients and columns corresponding to genes and clinical parameters respectively for first and second data sets. LOO-CV is applied to select the optimal parameters.
Comparisons of different classifiers : test AUC(std) of breast cancer cases
|
|
|
|
|
| |
|---|---|---|---|---|---|
|
| |||||
| CL +LS-SVM | |||||
| test AUC | 0.7795(0.0687) | 0.7772(0.0554) | 0.6152(0.0565) | 0.6622(0.0628) | 0.7740(0.0833) |
| p-value | 0.0039 | 1.48E-04 | 0.0086 | 5.21E-06 | 0.1602 |
| MA+LS-SVM | |||||
| test AUC | 0.7001(0.0559) | 0.8065(0.0730) | 0.6217(0.0349) | 0.7357(0.0085) | 0.6166(0.0508) |
| p-value | 0.0059 | 0.0140 | 0.0254 | 2.41E-04 | 0.0020 |
| GEVD+LS-SVM | |||||
| test AUC | 0.7801(0.0717) | 0.7673(0.0548) | 0.6196(0.0829) | 0.7730(0.1011) | 0.8001(0.0648) |
| p-value | 0.0137 | 3.41E-05 | 0.0040 | 0.1558 | 0.0840 |
| KGEVD+LS-SVM | |||||
| test AUC | 0.7982(0.0927) | 0.8210(0.0670) | 0.6437(0.0313) | 0.7901(0.0917) | 0.8031(0.0624) |
| p-value | 0.0195 | 0.1144 | 0.0020 | 0.6162 | 0.0720 |
| weighted LS-SVM | |||||
| test AUC |
|
|
|
|
|
p-value: a paired test, Wilcoxon signed rank test.
CL and MA are the clinical and microarray kernels of RBF kernel functions.
Figure 2Comparison of the prediction accuracy of the classifiers. Boxplots of the test AUC values obtained in 100 repetitions for 5 breast cancer cases. (a) Case I (b) Case II (c) Case III (d) Case IV (e) Case V.
Naive Bayes classifiers performance on clinical and microarray data sets in terms of test AUC(std)
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| Clinical data | 0.6235(0.0912) | 0.739(0.0722) | 0.5533(0.0438) | 0.7156(0.0503) | 0.6767(0.0513) |
| Microarray | 0.5028(0.037) | 0.6662(0.088) | 0.5324(0.0616) | 0.6011(0.0699) | 0.5189(0.0412) |
Comparisons of RBF with clinical kernel functions in terms of LOO-CV performances
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| Clinical kernel | 0.8108(0.0351) |
|
| 0.7385(0.1100) | 0.7673(0.0213) |
| RBF |
| 0.8202(0.0100) | 0.7143(0.0217) |
|
|