| Literature DB >> 20223834 |
Kim-Anh Lê Cao1, Emmanuelle Meugnier, Geoffrey J McLachlan.
Abstract
MOTIVATION: Microarrays are being increasingly used in cancer research to better characterize and classify tumors by selecting marker genes. However, as very few of these genes have been validated as predictive biomarkers so far, it is mostly conventional clinical and pathological factors that are being used as prognostic indicators of clinical course. Combining clinical data with gene expression data may add valuable information, but it is a challenging task due to their categorical versus continuous characteristics. We have further developed the mixture of experts (ME) methodology, a promising approach to tackle complex non-linear problems. Several variants are proposed in integrative ME as well as the inclusion of various gene selection methods to select a hybrid signature.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20223834 PMCID: PMC2859127 DOI: 10.1093/bioinformatics/btq107
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.General principle of ME networks.
Mean error rate percentage using 10-CV (10 validation trials) obtained on X* and Z alone
| Prostate | Breast | CNS | ||
|---|---|---|---|---|
| RF | 29.36 (1.43) | 29.96 (1.15) | 49 (3.61) | |
| multinom | 29.74 (2.93) | 28.16 (1.18) | 40.33 (3.49) | |
| RF | 27.72 (6.35) | 33.94 (2.60) | 41.83 (3.88) | |
| RFE | 39.11 (3.93) | 29.49 (0.2) | 41.83 (4.61) | |
| NSC | 35.44 (0.6) | 31.79 (0.6) | 36.67 (3.14) | |
| PLS-RF | 33.64 (4.51) | 28.58 (0.84) | 36.99 (4.43) | |
| ME/ | 26.45 (2.89) | 28.83 (1.01) | 40.67 (4.17) | |
| RF (2) | 28.86 (5.53) | 34.88 (1.82) | 46.17 (7.03) | |
| sPLS (3) | 26.71 (2.42) | 28.94 (2.22) | 41.33 (6.75) |
For X*, the wrapper approaches perform internal variable selection, with p* = 5.
Mean error rate percentage using 10-CV (10 validation trials) obtained on both datasets X*Z together
| Gene selection | RF (2) | sPLS (3) | ||
|---|---|---|---|---|
| Prostate | PLS-RF | 28.98 (4.16) | ||
| cforest | ||||
| multinom | 27.09 (2.47) | 26.96 (4.74) | ||
| ME-indep | 29.24 (4.44) | 28.86 (5.53) | 31.64 (3.48) | |
| ME-multinom | 27.59 (5.62) | |||
| ME-loc | ||||
| Breast | PLS-RF | 28.20 (1.86) | ||
| cforest | 31.15 (1.60) | |||
| multinom | 31.33 (1.53) | 28.59 (2.02) | ||
| ME-indep | 30.19 (1.73) | |||
| ME-multinom | 29.88 (1.80) | |||
| ME-loc | 29.92 (1.25) | |||
| CNS | PLS-RF | 38.68 (2.18) | ||
| cforest | 38.33 (6.90) | |||
| multinom | 36.83 (7.51) | 37.17 (4.16) | ||
| ME-indep | 43.83 (6.5) | 42.81 (6.57) | 41.83 (4.81) | |
| ME-multinom | 39.67 (6.08) | |||
| ME-loc |
Classification performances which were as good or better than X* or Z alone (see Table 1) are indicated in bold (p* = 5).
Fig. 2.Average error rates of different methods on prostate (a) breast (b) and CNS (c) with respect to the number of selected genes. Black lines, wrapper approaches on X* only; red lines, integrative ME on X* and Z and blue line: logistic regression on Z only.
Mean error rate percentage using 10-CV (10 validation trials) obtained when variable selection is performed on both X*Z* (p* = 5 and q* = 3)
| Gene selection | RF (2) | sPLS (3) | ||
|---|---|---|---|---|
| Prostate | PLS-RF | 29.36 (5.99) | ||
| cforest | 24.72 (3.62) | |||
| ME-indep | ||||
| ME-multinom | ||||
| Breast | PLS-RF | 27.15 (1.73) | ||
| cforest | 31.60 (1.74) | |||
| ME-indep | ||||
| ME-multinom | 28.05 (2.16) | 31.76 (2.51) | 27.46 (1.93) | |
| CNS | PLS-RF | 38.68 (5.64) | ||
| cforest | ||||
| ME-indep | 42.83 (8.92) | |||
| ME-multinom | 38 (3.49) | 36.5 (5.58) |
Classification performances which were as good or better than X*Z (see Table 2) are indicated in bold.
Comparison of sensitivity and specificity percentages using 10-CV (10 validation trials) obtained on X*, Z and X*Z using different approaches (p* = 5)
| Sensitivity | Specificity | ||||||
|---|---|---|---|---|---|---|---|
| Prostate | Breast | CNS | Prostate | Breast | CNS | ||
| RF | 66.22 | 15.24 | 20.93 | 70.26 | |||
| RF | 81.27 | 33.81 | 74.28 | 29.47 | 70.00 | ||
| PLS-RF | 64.05 | 36.67 | 72.86 | 35.2 | |||
| ME+(1) | 65.94 | 75.19 | 70.95 | 69.49 | |||
| ME+(2) | 67.02 | 71.71 | 31.90 | 74.76 | 49.20 | 65.64 | |
| ME+(3) | 70.81 | 78.45 | 43.33 | 53.20 | 66.92 | ||
| PLS-RF | 62.97 | 10.95 | 77.62 | 28.40 | |||
| cforest | 71.35 | 11.42 | 21.07 | 84.87 | |||
| ME-indep (1) | 67.56 | 84.42 | 42.38 | 73.57 | 42.67 | 63.85 | |
| ME-indep (2) | 67.02 | 83.87 | 42.38 | 74.76 | 35.87 | 63.85 | |
| ME-indep (3) | 64.05 | 84.75 | 42.38 | 72.42 | 42.00 | 66.67 | |
| ME-multi (1) | 64.86 | 41.20 | |||||
| ME-multi (2) | 69.72 | 84.31 | 39.52 | 78.09 | 35.87 | 71.54 | |
| ME-multi (3) | 66.76 | 84.81 | 40.95 | 77.38 | 41.47 | 76.92 | |
| ME-loc (1) | 75.67 | 84.42 | 40.95 | 73.57 | 42.53 | 76.92 | |
| ME-loc (2) | 75.67 | 84.42 | 40.95 | 73.57 | 35.47 | 76.92 | |
| ME-loc (3) | 84.70 | 76.42 | 73.33 | ||||
(1), (2) and (3) indicate the variable selection procedure used (see Section 2.3).
Most relevant selected genes with a potential biomarker status in prostate
| Gene name (symbol) | Lvl | Gene selection method [rank] |
|---|---|---|
| Etoposide induced 2.4 mRNA (EI24) | + | |
| Erythrocyte membrane protein band 4.9 (EPB4.9) | − | |
| CHMP1A | − | |
| ASNS | + | RF[4] |
| PTMA | + | RF[5] |
Expression level in subjects with respect to class ‘recurrent’ is indicated: overexpressed (+), underexpressed (−).