| Literature DB >> 32383690 |
Vânia Rodrigues1, Sérgio Deusdado2.
Abstract
The discovery of diagnostic or prognostic biomarkers is fundamental to optimize therapeutics for patients. By enhancing the interpretability of the prediction model, this work is aimed to optimize Leukemia diagnosis while retaining a high-performance evaluation in the identification of informative genes. For this purpose, we used an optimal parameterization of Kernel Logistic Regression method on Leukemia microarray gene expression data classification, applying metalearners to select attributes, reducing the data dimensionality before passing it to the classifier. Pearson correlation and chi-squared statistic were the attribute evaluators applied on metalearners, having information gain as single-attribute evaluator. The implemented models relied on 10-fold cross-validation. The metalearners approach identified 12 common genes, with highest average merit of 0.999. The practical work was developed using the public datamining software WEKA.Entities:
Keywords: informative genes; leukemia; machine learning; metalearning; microarray
Mesh:
Year: 2020 PMID: 32383690 PMCID: PMC7734502 DOI: 10.1515/jib-2019-0069
Source DB: PubMed Journal: J Integr Bioinform ISSN: 1613-4516
Figure 1:Procedures workflow.
Results achieved with 10-fold cross-validation.
| KLR | MetaLearner (correlation-KLR) | MetaLearner (chi-squared-KLR) | |
|---|---|---|---|
| ACC (%) (st. dev.) | 98.17 (8.17) | 98.50 (7.53) | 98.50 (7.53) |
|
| 0.95 (0.20) | 0.97 (0.14) | 0.97 (0.16) |
| MAE (st. dev.) | 0.02 (0.06) | 0.01 (0.05) | 0.01 (0.05) |
| Recall (st. dev.) | 1 | 1 | 0.98 (0.11) |
| F-measure (st. dev.) | 0.99 (0.06) | 0.99 (0.07) | 0.99 (0.06) |
| Area under ROC (st. dev.) | 1 | 1 | 1 |
*Statistically different at significance level 0.05.
Figure 2:Average merit of information gain attribute selection after used metalearner-correlation-KLR with 10-fold cross-validation.
Figure 3:Average merit of information gain attribute selection after used metalearner-chi-squared KLR with 10-fold cross-validation.
Features with highest average merit of information gain attribute selection after used metalearner correlation-KLR with 10-fold cross-validation.
| Feature | Gene name (Protein) |
|
|---|---|---|
| 39318_at | TCL1A (T cell leukemia/lymphoma 1A) | 1.38057E-13 |
| 1389_at | MME (membrane metallo-endopeptidase) | 8.81046E-06 |
| 31797_at | TBPL1 (TBP-like 1) | 4.08536E-07 |
| 1456_s_at | IFI16 (Gamma-interferon-inducible protein) | 1.67549E-06 |
| 37508_f_at | FUBP3 (Far upstream element-binding protein) | 4.83968E-09 |
| 37988_at | CD79B (B-cell antigen receptor complex-associated protein beta chain) | 6.735E-07 |
| 38242_at | SLP65 (B-cell linker protein) | 5.73938E-06 |
| 32541_at | PPP3CC (protein phosphatase 3 (formerly 2B) | 4.01276E-06 |
| 34168_at | DNTT (DNA deoxynucleotidyltransferase) | 8.78887E-08 |
| 32315_at | RPS24 (ribosomal protein S24) | 9.8389E-10 |
| 266_s_at | CD24 (Signal transducer CD24) | 6.64401E-11 |
| 40701_at | USP13 (Ubiquitin carboxyl-terminal hydrolase) | 8.47601E-07 |
Features with highest average merit of information gain attribute selection after used metalearner chi-squared-KLR.
| Feature | Gene name (Protein) |
|
|---|---|---|
| 32872_at | TCF4 (Transcription factor 4) | 9.4977E-05 |
| 36239_at | POU2AF1 (POU domain class 2-associating factor 1) | 2.42789E-05 |
| 40505_at | UBE2L6 (Ubiquitin-conjugating enzyme E2L 6) | 7.0848E-05 |
| 266_s_at | CD24 (Signal transducer CD24) | 6.64401E-11 |
| 34168_at | DNTT (DNA deoxynucleotidyltransferase) | 8.78887E-08 |
| 35164_at | WFS1 (Wolframin) | 0.003024172 |
| 1389_at | MME (Neprilysin) | 8.81046E-06 |
| 1456_s_at | IFI16 (Gamma-interferon-inducible protein 16) | 1.67549E-06 |
| 39318_at | TCL1A (T cell leukemia/lymphoma 1A) | 1.38057E-13 |
| 33154_at | PSMB4 (proteasome subunit beta 4) | 4.26805E-06 |
| 37988_at | CD79B (CD79B antigen immunoglobulin-associated beta) | 6.735E-07 |
| 32315_at | RPS24 (ribosomal protein S24) | 9.8389E-10 |
| 33374_at | C2 (complement component 2) | 0.000998743 |
| 32847_at | MYLK (Myosin light chain kinase, smooth muscle) | 0.000681112 |
| 754_s_at | BCR (Breakpoint cluster region protein) | 5.61522E-06 |
| 40701_at | USP13 (Ubiquitin carboxyl-terminal hydrolase) | 8.47601E-07 |
| 32579_at | SMARCA4 (Transcription activator BRG1) | 7.58503E-05 |
| 31797_at | TBPL1 (TBP-like 1) | 4.08536E-07 |
| 35775_at | SMYD2 ( | 8.45417E-06 |
| 31855_at | SRPX (Sushi repeat-containing protein SRPX) | 1.76652E-06 |
| 37508_f_at | FUBP3 (Far upstream element-binding protein 3) | 4.83968E-09 |
| 38242_at | SLP65 (B-cell linker protein) | 5.73938E-06 |
| 34322_r_at | FAM3C (Protein FAM3C) | 0.002052207 |
| 32541_at | PPP3CC (Serine/threonine-protein phosphatase 2B catalytic subunit gamma isoform) | 4.01276E-06 |