| Literature DB >> 26093633 |
Putri W Novianti1, Victor L Jong2,3, Kit C B Roes4, Marinus J C Eijkemans5.
Abstract
BACKGROUND: Class prediction models have been shown to have varying performances in clinical gene expression datasets. Previous evaluation studies, mostly done in the field of cancer, showed that the accuracy of class prediction models differs from dataset to dataset and depends on the type of classification function. While a substantial amount of information is known about the characteristics of classification functions, little has been done to determine which characteristics of gene expression data have impact on the performance of a classifier. This study aims to empirically identify data characteristics that affect the predictive accuracy of classification models, outside of the field of cancer.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26093633 PMCID: PMC4475623 DOI: 10.1186/s12859-015-0610-4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Individual random effect meta-regression
| Study Factor | Coef* | AIC | P value | Individual explained-variation |
|---|---|---|---|---|
| Cell type | 0.24 + | 137.9 | 0.44 | 4.87% |
| Medical question | −0.32 ++ | 137.8 | 0.38 | 2.55% |
| Sample size | −0.01 | 135.9 | 0.10 | 12.06% |
| The number of differentially expressed genes | 0.21 | 116.0 | <0.001 | 72.16% |
| Fold change | 1.42 | 118.1 | <0.001 | 57.31% |
| Within class correlation | 1.74 | 137.5 | 0.31 | 5.80% |
* Coefficient of the corresponding study factor in the random effects logistic regression
+ Coefficient for the non-blood category in the Cell Type study factor
++ Coefficient for the non-diagnostic category in the Medical Question study factor
Fig. 1The individual explained-variation of study factors. Abbreviations: the number of differentially expressed genes on the log scale (pDEG), the fold change (fc), the sample size (n), the average within-class correlation coefficient (withincor), the cell type (celltype), and the medical question (medques)
Fig. 2Visualization of the generated gene expression datasets with the scenario of fc1 = +,fc2 = −,cc1 = cc2 = +. Abbreviations: fc1(2): fold change of gene 1 (2); cc1(2): correlation coefficient of gene 1 (2)
Characteristic of the gene expression experiments
| Disease | ID+ | Medical question | Disease class | Cell/Tissue type | Affymetrix platform | Citation * | N | p | Ndeg | fc | cc |
|---|---|---|---|---|---|---|---|---|---|---|---|
| UC1 | E-GEOD-14580 | Response to treatment (non-/responder) | Inflammation | Colonic mucosal biopsies | HG U133 Plus 2.0 | yes | 24 (16,8) | 4650 | 623 | 1.551 | 0.162 |
| UC2 | E-GEOD-21231 | Response to treatment (non-/responder) | Inflammation | Blood | HG 1.0 ST | yes | 40 (20,20) | 3388 | 0 | 0.207 | 0.112 |
| UC3 | E-GEOD-36807 | Diagnostic (UC/CD) | Inflammation | Intestinal biopsy | HG U133 Plus 2.0 | no | 28 (15,13) | 6541 | 21 | 2.222 | 0.305 |
| UC4 | E-GEOD-23597 | Response to treatment (non-/responder) | Inflammation | Colonic biopsy | HG U133 Plus 2.0 | yes | 14 (7,7) | 4793 | 0 | 1.119 | 0.298 |
| UC5 | E-MTAB-331 | Diagnostic (UC/CD) | Inflammation | CD8+ T cell | HG 1.0 ST and HG 1.1 ST | yes | 59 (30,29) | 1402 | 312 | 0.714 | 0.164 |
| UC6 | E-GEOD-9452 | Diagnostic (with/without inflammation) | Inflammation | Colon | HG U133 Plus 2.0 | yes | 17 (8,9) | 3702 | 2401 | 3.697 | 0.165 |
| UC7 | E-GEOD-6731 | Diagnostic (UC/CD) | Inflammation | Colon | HG U95AV2 | yes | 30 (11,19) | 1055 | 0 | 0.485 | 0.228 |
| AST1 | E-GEOD-27011 | Diagnostic (mild/severe) | Inflammation | Blood | HG 1.0 ST | no | 36 (19,17) | 1293 | 39 | 0.302 | 0.113 |
| AST2 | E-GEOD-51392 | Diagnostic (asthma/rhinitis) | Inflammation | Bronchial epithelial cells | HG U133 Plus 2.0 | no | 11 (6,5) | 3969 | 0 | 1.805 | 0.171 |
| AST3 | E-GEOD-31773 | Diagnostic (non/severe) | Inflammation | CD4 T cells | HG U133 Plus 2.0 | no | 12 (4,8) | 18321 | 14488 | 16.964 | 0.317 |
| DYS | E-GEOD-19419 | Diagnosis (carrier/symp) | Infection | Blood | HG 1.0 ST | yes | 45 (22,23) | 2811 | 0 | 0.182 | 0.153 |
| HIV1 | E-GEOD-35864 | Diagnostic (HIV/HIV with complication) | Infection | Basal ganglia | HG U133 Plus 2.0 | no | 18(6,12) | 8737 | 0 | 1.14 | 0.346 |
| HIV2 | E-GEOD-14278 | Prognostic (resistant/susceptible) | Infection | Peripheral blood | HG U133 Plus 2.0 | no | 18 (9,9) | 11286 | 4 | 0.58 | 0.12 |
| HIV3 | E-GEOD-6740 | Diagnostic (chronic/non chronic) | Infection | CD4 T cell | HG U133A | yes | 15 (10,5) | 865 | 5 | 0.74 | 0.168 |
| PSO | E-GEOD-18948 | Response to treatment (non-/responder) | Immune | Blood | HG U95 | yes | 16 (7,9) | 1987 | 34 | 1.131 | 0.369 |
| KD | E-GEOD-16797 | Response to treatment (IVIG responsive /non) | Immune | Blood | HG U133 Plus 2.0 | yes | 12 (6,6) | 11043 | 5 | 1.688 | 0.224 |
| Dia1 | E-GEOD-18732 | Diagnostic (type 2 diabetes/intolerant) | Immune | Skeletal muscle | HG U133 Plus 2.0 | no | 71 (45,26) | 2038 | 10 | 0.279 | 0.16 |
| Dia2 | E-CBIL-30 | Diagnostic (diabetes type 2/abnormal glucose) | Immune | Skeletal muscle | HG U133A | yes | 26 (18,8) | 1749 | 0 | 0.269 | 0.435 |
| ALZ1 | E-GEOD-1297 | Diagnostic (severe/not severe) | Degenerative | Hippocampus | HG U133A | yes | 22 (7;15) | 2295 | 13 | 0.693 | 0.287 |
| ALZ2 | E-MEXP-2280 | Diagnostic (Alz/Pick's disease) | Degenerative | Medial temporal lobe | HG U133 Plus 2.0 | yes | 19 (7,12) | 6899 | 1592 | 1.086 | 0.231 |
| PARKI | E-GEOD-6613 | Diagnostic (Parkinson/non-Parkinson) | Degenerative | Blood | HG U133A | yes | 83 (50,33) | 638 | 0 | 0.192 | 0.361 |
| HF | E-GEOD-26887 | Diagnostic (with/-out Diabetes) | Degenerative | Left ventricle cardiac biopsies | HG 1.0 ST | yes | 19 (7,12) | 2068 | 0 | 0.374 | 0.131 |
| GAU | E-GEOD-21899 | Diagnostic (type 1/ 3) | Hereditary | Skin | HG U133A 2.0 | no | 10 (5,5) | 2017 | 4 | 1.807 | 0.143 |
| CS | E-MEXP-2236 | Diagnostic (Apert/Muenke) | Hereditary | Skin | HG U133 Plus 2.0 | yes | 20 (10;10) | 5422 | 21 | 0.59 | 0.255 |
| CF | E-GEOD-10406 | Diagnostic (Chronic rhinosinusitis/+Cystic fibrosis) | Hereditary | Sinus mucosa | HG U133 Plus 2.0 | no | 15 (9,6) | 7604 | 0 | 0.786 | 0.206 |
+ : The ArrayExpress accessing ID
* : Paper availability
Ndeg : The number of differentially expressed probesets
fc : The average fold change from all probesets
cc : The average within class correlation values from all probesets