| Literature DB >> 35052318 |
Joaquim Carreras1, Naoya Nakamura1, Rifat Hamoudi2,3.
Abstract
Mantle cell lymphoma (MCL) is a subtype of mature B-cell non-Hodgkin lymphoma characterized by a poor prognosis. First, we analyzed a series of 123 cases (GSE93291). An algorithm using multilayer perceptron artificial neural network, radial basis function, gene set enrichment analysis (GSEA), and conventional statistics, correlated 20,862 genes with 28 MCL prognostic genes for dimensionality reduction, to predict the patients' overall survival and highlight new markers. As a result, 58 genes predicted survival with high accuracy (area under the curve = 0.9). Further reduction identified 10 genes: KIF18A, YBX3, PEMT, GCNA, and POGLUT3 that associated with a poor survival; and SELENOP, AMOTL2, IGFBP7, KCTD12, and ADGRG2 with a favorable survival. Correlation with the proliferation index (Ki67) was also made. Interestingly, these genes, which were related to cell cycle, apoptosis, and metabolism, also predicted the survival of diffuse large B-cell lymphoma (GSE10846, n = 414), and a pan-cancer series of The Cancer Genome Atlas (TCGA, n = 7289), which included the most relevant cancers (lung, breast, colorectal, prostate, stomach, liver, etcetera). Secondly, survival was predicted using 10 oncology panels (transcriptome, cancer progression and pathways, metabolic pathways, immuno-oncology, and host response), and TYMS was highlighted. Finally, using machine learning, C5 tree and Bayesian network had the highest accuracy for prediction and correlation with the LLMPP MCL35 proliferation assay and RGS1 was made. In conclusion, artificial intelligence analysis predicted the overall survival of MCL with high accuracy, and highlighted genes that predicted the survival of a large pan-cancer series.Entities:
Keywords: MCL35 assay; artificial intelligence; artificial neural network; deep learning; gene expression; immuno-oncology; machine learning; mantle cell lymphoma; multilayer perceptron; overall survival
Year: 2022 PMID: 35052318 PMCID: PMC8775707 DOI: 10.3390/healthcare10010155
Source DB: PubMed Journal: Healthcare (Basel) ISSN: 2227-9032
Prognostic and pathogenic genes of mantle cell lymphoma.
| Genes ( |
|---|
Eighty-six genes with predictive and pathogenic role in MCL were selected from the literature. These genes were later tested for overall survival in the GSE93291 series. Only significant ones were chosen for the neural network analysis.
Pathogenic genes of mantle cell lymphoma (GSE93291 series) (Method 1).
| Gene | Keyword | Function | Correlation with the Overall Survival of MCL | ||
|---|---|---|---|---|---|
|
|
| HR | |||
|
| Apoptosis | B-cell apoptotic process | 1.0 | <0.01 | 2.7 |
|
| Regulation of gene expression | Component of the Polycomb group (PcG) multiprotein PRC1-like complex, negative regulation of gene expression, epigenetic | −0.5 | 0.042 | 0.6 |
|
| Lysosomes | BORC complex, role in lysosomes movement and localization at the cell periphery | −1.0 | <0.01 | 0.4 |
|
| Cell cycle | Positive regulation of G1/S transition of the mitotic cell cycle | 1.1 | <0.01 | 3.1 |
|
| Cell cycle, apoptosis | Positive regulation of G1/S transition of the mitotic cell cycle, negative regulation of apoptosis | −0.7 | 0.018 | 0.5 |
|
| Cell cycle, apoptosis | Negative regulation of G1/S transition of the mitotic cell cycle, positive regulation of apoptotic process | 1.4 | <0.01 | 4.0 |
|
| Cell cycle, NF-kB, apoptosis | Negative regulation of G1/S transition of the mitotic cell cycle, negative regulation of NF-kB, positive regulation of apoptotic process | 1.0 | <0.01 | 2.7 |
|
| Cell cycle | Negative regulation of G1/S transition of the mitotic cell cycle | 1.0 | <0.01 | 2.8 |
|
| Cell cycle, DNA repair, apoptosis | Positive regulation of cell cycle, DNA damage checkpoint and repair, apoptosis | 1.1 | <0.01 | 3.0 |
|
| Cell cycle, DNA repair, apoptosis | Positive regulation of cell cycle, DNA damage checkpoint and repair, apoptosis | 0.8 | <0.01 | 2.1 |
|
| Chemotaxis, apoptosis | Cell chemotaxis, defense response, negative regulation of apoptotic process, DNA damage | −0.6 | 0.014 | 0.5 |
|
| Cell differentiation and proliferation | Cell differentiation, cell proliferation, positive regulation of mRNA splicing | 0.8 | 0.016 | 2.3 |
|
| Cell cycle | Negative regulation of cell growth, cooperates with TP53 | −1.1 | <0.01 | 0.3 |
|
| Cell proliferation | rRNA transcription | 1.5 | <0.01 | 4.4 |
|
| Cell proliferation | Transcription factor that binds DNA and activates transcription of growth-related genes (positive regulation of gene expression), negative regulation of apoptotic process | 0.9 | <0.01 | 2.5 |
|
| Gene expression | Regulation of gene expression, DNA-binding | −0.5 | 0.052 | 0.6 |
|
| Multiple negative regulations | Affects the implementation of differentiation, proliferation, angiogenesis, and apoptotic programs. Multiple negative regulations | −0.8 | <0.01 | 0.5 |
|
| Multiple regulations | Affects the implementation of differentiation, proliferation and apoptotic programs | 0.6 | 0.020 | 1.8 |
|
| B-cell development | Histone methyltransferase, B-cell development (B1), and B2 activation, humoral immune response, isotype class switch recombination, germinal center formation | 1.0 | <0.01 | 2.7 |
|
| B-cell development | The commitment of lymphoid progenitors to B-lymphocyte lineage, promotes development of the mature B-cell stage. | −0.7 | 0.010 | 0.5 |
|
| ERBB2 signaling, apoptosis | Cell migration, ERBB2 signaling pathway, negative regulation of apoptosis, | 0.5 | 0.042 | 1.7 |
|
| B-cell development and function | Mediates immune responses. Contributes to B-cell development, proliferation, migration, and function. Required for B-cell receptor (BCR) signaling | 0.5 | 0.025 | 1.7 |
|
| Cell cycle, tumor suppressor gene | Negative regulation of G1/S transition of the mitotic cell cycle | −0.8 | 0.012 | 0.5 |
|
| Multiple regulations | Regulation of cell migration, adhesion, cell cycle progression, cell proliferation, apoptosis, MAPK/ERK1 pathway, MDM2 and TP53 recruitment | 0.5 | 0.035 | 1.7 |
|
| Cell cycle, tumor suppressor gene | Tumor suppressor that is a key regulator of the G1/S transition of the cell cycle | −0.5 | 0.043 | 0.6 |
|
| Cytoskeleton | Cytoskeleton-nuclear membrane anchor activity, maintaining of subcellular spatial organization | −0.6 | <0.01 | 0.5 |
|
| Telomerase, multiple functions | Telomerase, negative regulation apoptosis, positive regulation G1/S transition of the mitotic cell cycle, negative regulation of gene expression | 0.7 | <0.01 | 2.0 |
|
| Multiple functions, regulation of caspases and apoptosis | Multi-functional protein that regulates not only caspases and apoptosis, but also modulates inflammatory signaling and immunity, copper homeostasis, mitogenic kinase signaling, cell proliferation, as well as cell invasion and metastasis | −0.8 | <0.01 | 0.5 |
From an initial set of 86 genes with known pathogenic role in MCL, a final set of 28 genes were selected because their predictive value for overall survival using a Kaplan–Meier and log-rank test in the GSE93291: P, p value; HR, hazard risk. The gene information is based on UniProt [54], and Genecards [55].
Figure 1General architecture for multilayer perceptron (MLP) networks. A neural network is a set of non-linear data modeling tools consisting of input layers plus one or two hidden layers. The multilayer perceptron procedure is a feedforward architecture. In comparison to RBF, the MLP con find more complex relationships but it is slower to compute. The MLP network is a function of one or more predictors (also called inputs or independent variables) that minimizes the prediction error of one or more target variables (also called outputs) [32,33,60].
Figure 2General architecture for radial basis function (RBF) networks. A radial basis function (RBF) network is a feed-forward, supervised learning network with only one hidden layer, called the radial basis function layer [32,33,60].
Figure 3Sensitivity analysis. Independent variable importance analysis. Performs a sensitivity analysis, which computes the importance of each predictor in determining the neural network [32,33,60].
Figure 4Summary of the analysis methodology. The analysis was comprised of two methods, one based on the analysis of 20,862 genes and a second based on 10 immuno-oncology panels. This research used artificial neural networks and several machine learning techniques to identify genes associated with the overall survival of the patients. Correlation with known MCL pathogenic genes and the LLMPP MCL35 proliferation assay was also made.
Figure 5Artificial neural network analysis for the prediction of the overall survival of mantle cell lymphoma (Method 1). From a start point of 20,862 genes, using several neural networks, a correlation between the overall survival outcome and several mantle cell lymphoma pathogenic genes managed to reduce to a final set of 10 genes. These 10 genes correlated with the survival of the patients, but also with the proliferation index as expressed by MKI67 gene: MLP, multilayer perceptron; RBF, radial basis function; OS, overall survival; DA, dead/alive; GSEA, gene set enrichment analysis; AUC, area under the curve.
Figure 6Multilayer perceptron analysis using the selected 58 genes (Method 1 continuation). As shown in Figure 4, the neural networks reduced the initial input of 20,862 genes to 58 predictive genes. Next, the overall survival outcome (dead/alive) was predicted using 58 genes and a neural network. Several parameters display the network performance: model summary; classification results; receiver operating characteristic ROC curve; cumulative gains chart; lift chart; predicted by observed chart; and the independent variable importance analysis. ROC analysis displays a curve for each categorical dependent variable and category and the area under each curve [34,35,36,44,45,55,56]. The genes were ranked according to their normalized importance for predicting the overall survival outcome as a dichotomic variables (dead vs. alive). A GSEA analysis confirmed the association toward a dead outcome. The characteristics of the network were as follows. Case processing: training n = 93 (76%); testing n = 30 (24%). Units n = 58. Rescaling = standardized. Hidden layer: number = 1; units = 2; activation function = hyperbolic tangent. Output layer: dependent variables = 1 (overall survival outcome dead/alive); units = 2, activation function = softmax, error function = cross-entropy. Model summary: training, cross-entropy error = 30.8, 14% of incorrect predictions; testing, cross-entropy error = 14.5, 23% of incorrect predictions. Classification: training, 86% overall correct (93.8% alive, 82% dead); testing, 77% overall correct (82% alive, 74% dead). Area under the curve = 0.9. Top 10 most relevant genes were RAB13, ZFYVE19, FANCG, KIF18A, RPGRIP1L, YBX3, ZCCHC4, NCLN, OLFM1, and PDZRN3. A complete description of the multilayer perceptron is present in our recent publication (Carreras J. et al. Artificial Neural Networks Predicted the Overall Survival and Molecular Subtypes of Diffuse Large B-Cell Lymphoma Using a Pan-cancer Immune-Oncology Panel. Cancers 2021, 13, 6384; https://doi.org/10.3390/cancers13246384) [58].
Figure 7Overall survival analysis (Method 1 continuation). Because of the neural network analysis and dimensional reduction (Figure 4 and Figure 5), a final set of 10 genes with overall survival relationship was highlighted. These genes not only correlated with the clinical outcome but also with the proliferation index, as expressed by MKI67. Of note, ki67 is a marker routinely used for prediction in mantle cell lymphoma, and the most relevant marker of the LLMPP MCL35 proliferation assay.
Figure 8Artificial neural network analysis for predicting of the overall survival of mantle cell lymphoma using several immune oncology panels (Method 2). Overall survival was predicted using 10 immuno-oncology panels. After several multilayer perceptron analyses, a set of 125 genes predicted the overall survival outcome (dead/alive) with high accuracy. Among the most relevant genes, TYMS was highlighted. GSEA analysis had a sinusoidal-like, with some genes enriched toward dead or alive survival outcomes.
Figure 9Overall survival in a pan-cancer series. The multilayer perceptron using the 20,862 genes identified a final set of 19 genes with prognostic value in mantle cell lymphoma. As a start point of the gene expression of the set of 19 genes and using a risk-score formula [36,46], we confirmed that these genes also contributed to the overall survival of diffuse large B-cell lymphoma (DLBCL). Additionally, these genes could also predict the overall survival of a pan-cancer series of 7289 cases from The Cancer Genome Atlas (TCGA) program that included the most frequent human cancers. Of note, the weight and direction of the overall survival association was different in each subtype of neoplasia. Risk scores were calculated by multiplying the beta values of the multivariate Cox regression analysis for overall survival of each gene with the values of the corresponding gene expressions, as previously described [58].
Figure 10Overall survival in a pan cancer series.
Figure 11Bayesian network. A Bayesian network successfully modeled the overall survival outcome (dead/alive) using the 19 genes, previously identified in the neural network analysis (Figure 5, Method 1). The Bayesian network enables you to build a probability model by combining observed and recorded evidence with “common-sense” real-world knowledge to establish the likelihood of occurrences by using seemingly unlinked attributes. The node focuses on Tree Augmented Naïve Bayes (TAN) and Markov Blanket networks that are primarily used for classification. This graphical model shows the variables (nodes) and the probabilistic, or conditional, independencies between them. The links of the network (arcs) may represent causal relationships, but the links do not necessary represent direct cause and effect. This Bayesian network is used to calculate the probability of a patient of being alive or dead, given the gene expression of 19 genes, if the probabilistic independencies between the gene expression and the overall survival outcome as displayed on the graph hold true. Bayesian networks are very robust in case of missing data.
Figure 12C5.0 decision tree model. A decision tree successfully modeled the overall survival outcome (dead/alive) using the 19 genes, previously identified in the neural network analysis (Figure 5, Method 1). This model uses the C5.0 algorithm to build either a decision tree or a rule set. A C5.0 model works by splitting the sample based on the field that provides the maximum information gain. Each subsample defined by the first split is then split again, usually based on a different field, and the process repeats until the subsamples cannot be split any further. Finally, the lowest-level splits are reexamined, and those that do not contribute significantly to the value are removed. In this model, the target field (variable) must be categorical (i.e., nominal or ordinal, such as de overall survival outcome as dead vs. alive). The input fields (predictors) can be of any type (in our analysis, the 19 genes were entered as quantitative gene expression). The C5.0 models are quite robust in the presence of problems such as missing data and large numbers of input fields. The C5.0 tree shows how using only the gene expression of 9 genes, the overall survival outcome as dead or alive can be predicted with high accuracy.
Figure 13Addition of the MCL35 proliferation signature in a Bayesian network. A Bayesian network modeling was performed using the highlighted genes of both Methods 1 (19 genes) and Methods 2 (15) with the previously identified prognostic genes of MCL of the LLMPP, the MCL35 signature. Some of the most relevant genes are highlighted, in red for the bad, green for the good prognostic genes, and their interrelationships (arrows).
Figure 14Overall survival according to the immunohistochemical expression of RGS1.
Figure A1Differential gene expression of the set of 19 genes per cancer subtype. Based on a risk-score formula and the gene expression of 19 genes, the overall survival for each risk-group could be calculated. The contribution in the prognosis for each gene is shown on the right. This Figure is complementary to Figure 9.
Multilayer Perceptron Neural Network Analysis of Mantle Cell Lymphoma (Method 1).
| Gene | Num. Genes Top 70% | Case Processing Summary | Network Layers | Model Summary | Classification | Area under the Curve (AUC) | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Training | Testing | Input | Hidden | Output | Training | Testing | Training (% Correct) | Testing (% Correct) | ||||||||||||||
| Num. | % | Num. | % | Units | Num. | Units | Num. | Units | Cross Entropy Error | Incorrect Predictions % | Training Time | Cross Entropy Error | Incorrect Predictions % | Observed 0 | Observed 1 | Overall | Observed 0 | Observed 1 | Overall | |||
| Dead/Alive | 80 | 84 | 68.3 | 39 | 31.7 | 20863 | 1 | 6 | 1 | 2 | 38.2 | 21.4 | 01:04.9 | 10.4 | 12.8 | 67.6 | 86 | 78.6 | 88.9 | 86.7 | 87.2 | 0.90 |
|
| 6 | 90 | 73.2 | 33 | 26.8 | 20862 | 1 | 12 | 1 | 2 | 38.5 | 18.9 | 01:05.8 | 8.8 | 9.1 | 59.3 | 90.5 | 81.1 | 66.7 | 96.3 | 90.9 | 0.86 |
|
| 80 | 87 | 70.7 | 36 | 29.3 | 20862 | 1 | 11 | 1 | 2 | 32.0 | 14.9 | 01:06.3 | 6.4 | 5.6 | 64 | 93.5 | 85.1 | 83.3 | 96.7 | 94.4 | 0.92 |
|
| 154 | 85 | 69.1 | 38 | 30.9 | 20862 | 1 | 8 | 1 | 2 | 37.5 | 27.1 | 01:01.5 | 14.4 | 13.2 | 36.4 | 85.7 | 72.9 | 66.7 | 93.1 | 86.8 | 0.82 |
|
| 56 | 87 | 70.7 | 36 | 29.3 | 20862 | 1 | 8 | 1 | 2 | 40.5 | 19.5 | 00:57.4 | 10.1 | 8.3 | 44 | 95.2 | 80.5 | 83.3 | 93.3 | 91.7 | 0.83 |
|
| 20 | 84 | 68.3 | 39 | 31.7 | 20862 | 1 | 9 | 1 | 2 | 29.9 | 20.2 | 00:58.2 | 11.8 | 17.9 | 92.3 | 36.8 | 79.8 | 93.1 | 50 | 82.1 | 0.90 |
|
| 47 | 87 | 70.7 | 36 | 29.3 | 20862 | 1 | 11 | 1 | 2 | 30.4 | 13.8 | 00:51.2 | 13.8 | 22.2 | 91.3 | 66.7 | 86.2 | 100 | 27.3 | 77.8 | 0.89 |
|
| 25 | 93 | 85.6 | 30 | 24.4 | 20862 | 1 | 8 | 1 | 2 | 53.0 | 26.9 | 00:56.3 | 13.2 | 16.7 | 71.7 | 74.5 | 73.1 | 93.8 | 71.4 | 83.3 | 0.81 |
|
| 94 | 76 | 61.8 | 47 | 38.2 | 20862 | 1 | 10 | 1 | 2 | 36.3 | 17.1 | 00:52.7 | 22.7 | 27.7 | 50 | 93.1 | 82.9 | 30.8 | 88.2 | 72.3 | 0.76 |
|
| 38 | 91 | 74 | 32 | 26 | 20862 | 1 | 9 | 1 | 2 | 43.0 | 20.9 | 01:04.7 | 15.1 | 15.6 | 82.4 | 75 | 79.1 | 91.7 | 80 | 84.4 | 0.86 |
|
| 6 | 93 | 75.6 | 30 | 24.4 | 20862 | 1 | 13 | 1 | 2 | 40.2 | 16.1 | 01:07.3 | 7.9 | 10 | 97.1 | 43.5 | 83.9 | 91.3 | 85.7 | 90 | 0.85 |
|
| 4 | 76 | 61.8 | 47 | 38.2 | 20862 | 1 | 10 | 1 | 2 | 26.4 | 13.2 | 00:52.4 | 17.7 | 12.8 | 94.8 | 61.1 | 86.8 | 94.3 | 66.7 | 87.2 | 0.88 |
|
| 86 | 91 | 74 | 32 | 26 | 20862 | 1 | 9 | 1 | 2 | 45.3 | 27.5 | 00:58.7 | 12.9 | 18.8 | 68.8 | 76.7 | 72.5 | 92.9 | 72.2 | 81.3 | 0.85 |
|
| 8 | 90 | 73.2 | 33 | 26.8 | 20862 | 1 | 10 | 1 | 2 | 39.8 | 18.9 | 01:07.6 | 13.0 | 15.2 | 77.3 | 84.8 | 81.1 | 83.3 | 86.7 | 84.8 | 0.88 |
|
| 50 | 82 | 66.7 | 41 | 33.3 | 20862 | 1 | 10 | 1 | 2 | 17.6 | 11.0 | 01:08.1 | 14.6 | 14.6 | 90.9 | 86.8 | 89 | 90.9 | 78.9 | 85.4 | 0.96 |
|
| 22 | 85 | 69.1 | 38 | 30.9 | 20862 | 1 | 12 | 1 | 2 | 40.2 | 18.8 | 00:49.9 | 17.7 | 23.7 | 83.7 | 78.6 | 81.2 | 85.7 | 64.7 | 76.3 | 0.87 |
|
| 23 | 88 | 71.5 | 35 | 28.5 | 20862 | 1 | 7 | 1 | 2 | 45.3 | 27.3 | 00:55.2 | 13.0 | 8.6 | 20 | 93.7 | 72.7 | 50 | 100 | 91.4 | 0.75 |
|
| 12 | 71 | 57.7 | 52 | 42.3 | 20862 | 1 | 5 | 1 | 2 | 29.9 | 19.7 | 00:50.1 | 24.2 | 23.1 | 92.6 | 41.2 | 80.3 | 94.9 | 23.1 | 76.9 | 0.82 |
|
| 12 | 85 | 69.1 | 38 | 30.9 | 20862 | 1 | 11 | 1 | 2 | 39.2 | 21.2 | 00:53.3 | 11.6 | 10.5 | 40.9 | 92.1 | 78.8 | 55.6 | 100 | 89.5 | 0.83 |
|
| 86 | 84 | 68.3 | 39 | 31.7 | 20862 | 1 | 10 | 1 | 2 | 36.0 | 20.2 | 00:57.0 | 12.2 | 7.7 | 92.1 | 42.9 | 79.8 | 93.3 | 88.9 | 92.3 | 0.85 |
|
| 10 | 84 | 68.3 | 39 | 31.7 | 20862 | 1 | 9 | 1 | 2 | 28.9 | 16.7 | 00:56.2 | 14.2 | 20.5 | 87.7 | 68.4 | 83.3 | 96.4 | 36.4 | 79.5 | 0.90 |
|
| 23 | 87 | 70.7 | 36 | 29.3 | 20862 | 1 | 8 | 1 | 2 | 38.3 | 23.0 | 01:03.5 | 6.7 | 2.8 | 92.3 | 31.8 | 77 | 96.4 | 100 | 97.2 | 0.89 |
|
| 2 | 93 | 75.6 | 30 | 24.4 | 20862 | 1 | 10 | 1 | 2 | 40.2 | 20.4 | 01:04.6 | 11.7 | 16.7 | 78 | 81.4 | 79.6 | 85.7 | 81.3 | 83.3 | 0.89 |
|
| 46 | 76 | 61.8 | 47 | 38.2 | 20862 | 1 | 9 | 1 | 2 | 32.4 | 21.1 | 00:54.9 | 17.7 | 14.9 | 90.7 | 50 | 78.9 | 92.3 | 50 | 85.1 | 0.84 |
|
| 112 | 91 | 74 | 32 | 26 | 20862 | 1 | 14 | 1 | 2 | 22.0 | 9.9 | 00:53.6 | 11.3 | 21.9 | 94.4 | 73.7 | 90.1 | 91.3 | 44.4 | 78.1 | 0.93 |
|
| 6 | 90 | 73.2 | 33 | 26.8 | 20862 | 1 | 8 | 1 | 2 | 46.7 | 26.7 | 00:58.1 | 13.5 | 15.2 | 67.4 | 78.7 | 73.3 | 89.5 | 78.6 | 84.8 | 0.85 |
|
| 205 | 82 | 66.7 | 41 | 33.3 | 20862 | 1 | 9 | 1 | 2 | 34.6 | 20.7 | 01:00.8 | 14.9 | 19.5 | 93.7 | 31.6 | 79.3 | 93.3 | 45.5 | 80.5 | 0.85 |
|
| 15 | 85 | 69.1 | 38 | 30.9 | 20862 | 1 | 11 | 1 | 2 | 32.4 | 17.6 | 00:49.1 | 16.3 | 21.1 | 88.2 | 58.8 | 82.4 | 88.5 | 58.3 | 78.9 | 0.85 |
|
| 47 | 88 | 71.5 | 35 | 28.5 | 20862 | 1 | 12 | 1 | 2 | 48.9 | 27.3 | 00:56.3 | 14.3 | 17.1 | 65.1 | 80 | 72.7 | 78.9 | 87.5 | 82.9 | 0.83 |
|
| 18 | 91 | 74 | 32 | 26 | 20835 | 1 | 8 | 29 | 58 | 1348.9 | 25.7 | 01:22.2 | 525.3 | 29.4 | - | - | 74.3 | - | - | 70.6 | - |
| Average | 85.9 | 70.1 | 37.1 | 30.2 | 20861 | 1 | 9.6 | - | - | 80.4 | 20.1 | - | 30.6 | 15.8 | 75.0 | 70.8 | 79.9 | 84.2 | 73.5 | 84.2 | 0.9 | |
Input layer: standardized rescaling method for covariates. Hidden layer: hyperbolic tangent activation function. Output layer: softmax activation function, cross-entropy error function. Model summary, training, one consecutive step(s) with no decrease in error (error computations are based on the testing sample) as stopping rule.
Radial Basis Function Neural Network Analysis of Mantle Cell Lymphoma (Method 1).
| Gene | Num. Genes Top 70% | Case Processing Summary | Network Layers | Model Summary | Classification | Area under the Curve (AUC) | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Training | Testing | Input | Hidden | Output | Training | Testing | Training (% Correct) | Testing (% Correct) | ||||||||||||||
| Num. | % | Num. | % | Units | Num. | Units | Num. | Units | Sum of Squares Error | Incorrect Predictions % | Training Time | Sum of Squares Error | Incorrect Predictions % | Observed 0 | Observed 1 | Overall | Observed 0 | Observed 1 | Overall % | |||
| Dead/Alive | 37 | 92 | 74.8 | 31 | 25.2 | 20863 | 1 | 8 | 1 | 2 | 16.9 | 27.2 | 04:13.3 | 6.7 | 38.7 | 45.5 | 88.1 | 72.8 | 10.0 | 85.7 | 61.3 | 0.73 |
|
| 18 | 85 | 69.1 | 38 | 30.9 | 20862 | 1 | 8 | 1 | 2 | 10.4 | 17.6 | 02:46.3 | 7.4 | 23.7 | 40.9 | 96.8 | 82.4 | 27.3 | 96.3 | 76.3 | 0.79 |
|
| 28 | 80 | 65 | 43 | 35 | 20862 | 1 | 6 | 1 | 2 | 8.2 | 16.3 | 02:24.1 | 3.1 | 9.3 | 81.8 | 84.5 | 83.8 | 100.0 | 88.2 | 90.7 | 0.93 |
|
| 48 | 82 | 66.7 | 41 | 33.3 | 20862 | 1 | 6 | 1 | 2 | 11.1 | 20.7 | 02:32.2 | 7.4 | 31.7 | 30.0 | 95.2 | 79.3 | 9.1 | 90.0 | 68.3 | 0.78 |
|
| 50 | 82 | 66.7 | 41 | 33.3 | 20862 | 1 | 5 | 1 | 2 | 12.7 | 22.0 | 02:39.9 | 8.2 | 26.8 | 10.0 | 100.0 | 78.0 | 0.0 | 100.0 | 73.2 | 0.74 |
|
| 29 | 92 | 74.8 | 31 | 25.2 | 20862 | 1 | 10 | 1 | 2 | 11.7 | 15.2 | 03:18.6 | 4.9 | 25.8 | 98.6 | 35.0 | 84.8 | 100.0 | 11.1 | 74.2 | 0.80 |
|
| 16 | 82 | 66.7 | 41 | 33.3 | 20862 | 1 | 10 | 1 | 2 | 11.4 | 20.7 | 02:21.8 | 4.9 | 17.1 | 98.3 | 27.3 | 79.3 | 100.0 | 0.0 | 82.9 | 0.83 |
|
| 41 | 90 | 73.2 | 33 | 26.8 | 20862 | 1 | 5 | 1 | 2 | 20.0 | 34.4 | 03:21.6 | 7.4 | 39.4 | 77.6 | 51.2 | 65.6 | 100.0 | 35.0 | 60.6 | 0.70 |
|
| 40 | 79 | 64.2 | 44 | 35.8 | 20862 | 1 | 4 | 1 | 2 | 14.8 | 26.6 | 02:14.7 | 7.6 | 22.7 | 0.0 | 100.0 | 73.4 | 0.0 | 100.0 | 77.3 | 0.60 |
|
| 39 | 92 | 74.8 | 31 | 25.2 | 20862 | 1 | 10 | 1 | 2 | 13.6 | 20.7 | 03:11.6 | 4.1 | 9.7 | 85.7 | 72.1 | 79.3 | 85.7 | 94.1 | 90.3 | 0.88 |
|
| 19 | 90 | 73.2 | 33 | 26.8 | 20862 | 1 | 3 | 1 | 2 | 16.2 | 24.4 | 03:15.7 | 5.8 | 24.2 | 100.0 | 0.0 | 75.6 | 100.0 | 0.0 | 75.8 | 0.64 |
|
| 46 | 79 | 64.2 | 44 | 35.8 | 20862 | 1 | 8 | 1 | 2 | 12.5 | 24.1 | 02:23.1 | 7.7 | 25.0 | 93.3 | 21.1 | 75.9 | 100.0 | 0.0 | 75.0 | 0.74 |
|
| 51 | 92 | 74.8 | 31 | 25.2 | 20862 | 1 | 8 | 1 | 2 | 16.4 | 26.1 | 03:12.5 | 7.0 | 41.9 | 78.6 | 70.0 | 73.9 | 50.0 | 72.7 | 58.1 | 0.80 |
|
| 80 | 88 | 71.5 | 35 | 28.5 | 20862 | 1 | 9 | 1 | 2 | 13.5 | 25.0 | 02:57.1 | 5.9 | 22.9 | 59.1 | 90.9 | 75.0 | 66.7 | 88.2 | 77.1 | 0.86 |
|
| 47 | 79 | 64.2 | 44 | 35.8 | 20862 | 1 | 3 | 1 | 2 | 12.1 | 20.3 | 02:15.3 | 8.0 | 27.3 | 66.7 | 90.7 | 79.7 | 63.3 | 92.9 | 72.9 | 0.83 |
|
| 89 | 79 | 64.2 | 44 | 35.8 | 20862 | 1 | 8 | 1 | 2 | 10.7 | 17.7 | 02:20.4 | 11.0 | 43.2 | 88.4 | 75.0 | 82.3 | 66.7 | 47.8 | 56.8 | 0.80 |
|
| 81 | 89 | 72.4 | 34 | 27.6 | 20862 | 1 | 9 | 1 | 2 | 14.5 | 24.7 | 02:55.3 | 6.0 | 26.5 | 13.0 | 97.0 | 75.3 | 0.0 | 96.2 | 73.5 | 0.71 |
|
| 28 | 88 | 71.5 | 35 | 28.5 | 20862 | 1 | 8 | 1 | 2 | 10.9 | 14.8 | 02:51.2 | 4.1 | 14.3 | 100.0 | 43.5 | 85.2 | 96.4 | 42.9 | 85.7 | 0.86 |
|
| 41 | 86 | 69.9 | 37 | 30.1 | 20862 | 1 | 3 | 1 | 2 | 13.8 | 23.3 | 02:45.9 | 5.8 | 18.9 | 19.0 | 95.4 | 76.7 | 30.0 | 100.0 | 81.1 | 0.76 |
|
| 23 | 92 | 74.8 | 31 | 25.2 | 20862 | 1 | 7 | 1 | 2 | 11.1 | 16.3 | 03:14.2 | 3.5 | 12.9 | 95.4 | 55.6 | 83.7 | 92.9 | 33.3 | 87.1 | 0.84 |
|
| 18 | 92 | 74.8 | 31 | 25.2 | 20862 | 1 | 9 | 1 | 2 | 9.8 | 16.3 | 03:31.2 | 4.1 | 25.8 | 91.8 | 52.6 | 83.7 | 95.0 | 36.4 | 74.2 | 0.90 |
|
| 42 | 82 | 66.7 | 41 | 33.3 | 20862 | 1 | 10 | 1 | 2 | 11.2 | 19.5 | 02:29.4 | 6.0 | 26.8 | 88.3 | 59.1 | 80.5 | 87.9 | 12.5 | 73.2 | 0.81 |
|
| 37 | 90 | 73.2 | 33 | 26.8 | 20862 | 1 | 10 | 1 | 2 | 12.6 | 21.1 | 03:00.8 | 5.0 | 21.2 | 88.0 | 67.5 | 78.9 | 78.6 | 78.9 | 78.8 | 0.89 |
|
| 40 | 79 | 64.2 | 44 | 35.8 | 20862 | 1 | 4 | 1 | 2 | 12.3 | 24.1 | 02:14.5 | 7.6 | 25.0 | 100.0 | 0.0 | 75.9 | 100.0 | 0.0 | 75.0 | 0.74 |
|
| 56 | 92 | 74.8 | 31 | 25.2 | 20862 | 1 | 6 | 1 | 2 | 14.1 | 20.7 | 03:02.7 | 5.0 | 25.8 | 97.2 | 15.0 | 79.3 | 100.0 | 0.0 | 74.2 | 0.73 |
|
| 34 | 88 | 71.5 | 35 | 28.5 | 20862 | 1 | 9 | 1 | 2 | 17.6 | 21.6 | 02:50.9 | 8.9 | 34.3 | 86.8 | 72.0 | 78.4 | 58.3 | 81.8 | 65.7 | 0.78 |
|
| 58 | 79 | 64.2 | 44 | 35.8 | 20862 | 1 | 10 | 1 | 2 | 10.3 | 17.7 | 02:17.2 | 10.0 | 27.3 | 93.7 | 37.5 | 82.3 | 100.0 | 14.3 | 72.7 | 0.71 |
|
| 71 | 79 | 64.2 | 44 | 35.8 | 20862 | 1 | 3 | 1 | 2 | 12.4 | 22.8 | 02:14.6 | 7.3 | 25.0 | 100.0 | 0.0 | 77.2 | 100.0 | 0.0 | 75.0 | 0.74 |
|
| 87 | 89 | 72.4 | 34 | 27.6 | 20862 | 1 | 2 | 1 | 2 | 22.2 | 47.2 | 02:55.3 | 8.7 | 55.9 | 100.0 | 0.0 | 52.8 | 100.0 | 0.0 | 44.1 | 0.49 |
|
| 87 | 93 | 75.6 | 30 | 24.4 | 20835 | 1 | 14 | 29 | 58 | 366.4 | 20.4 | 09:53.4 | 147.2 | 23.7 | - | - | 79.6 | - | - | 76.3 | - |
| Average | 86.0 | 69.9 | 37.0 | 30.1 | 20861 | 1 | 7.2 | 25.0 | 22.3 | 11.2 | 26.4 | 73.4 | 58.4 | 77.7 | 69.6 | 51.7 | 73.6 | 0.77 | ||||
Input layer: standardized rescaling method for covariates. Hidden layer: softmax activation function. Output layer: identity activation function, sum of squares error function. Model summary, testing, sum of square error (the number of hidden units is determined by the testing data criterion: The “best” number of hidden units is the one that yields the smallest error in the testing data).
Multivariate Cox regression analysis for predicting overall survival outcome (Method 1).
| Num | Gene | B | SE | Wald | df | Hazard Risk | 95.0% CI for HR | ||
|---|---|---|---|---|---|---|---|---|---|
| Lower | Upper | ||||||||
| 1 |
| 2.7 | 0.3 | 58.3 | 1 | <0.001 | 14.2 | 7.2 | 28.1 |
| 2 |
| 0.8 | 0.2 | 19.0 | 1 | <0.001 | 2.2 | 1.6 | 3.2 |
| 3 |
| 0.9 | 0.2 | 14.6 | 1 | <0.001 | 2.5 | 1.6 | 4.1 |
| 4 |
| 1.2 | 0.3 | 13.4 | 1 | <0.001 | 3.2 | 1.7 | 6.0 |
| 5 |
| 0.9 | 0.3 | 10.1 | 1 | 0.001 | 2.5 | 1.4 | 4.3 |
| 6 |
| 1.2 | 0.4 | 9.8 | 1 | 0.002 | 3.3 | 1.6 | 7.0 |
| 7 |
| 1.1 | 0.3 | 9.5 | 1 | 0.002 | 2.9 | 1.5 | 5.7 |
| 8 |
| 0.6 | 0.2 | 8.4 | 1 | 0.004 | 1.9 | 1.2 | 2.8 |
| 9 |
| 0.8 | 0.4 | 4.7 | 1 | 0.029 | 2.2 | 1.1 | 4.4 |
| 10 |
| 0.6 | 0.3 | 3.9 | 1 | 0.048 | 1.8 | 1.0 | 3.1 |
| 11 |
| 0.7 | 0.4 | 3.5 | 1 | 0.063 | 1.9 | 1.0 | 3.9 |
| 12 |
| 0.4 | 0.2 | 2.8 | 1 | 0.094 | 1.5 | 0.9 | 2.3 |
| 13 |
| −1.5 | 0.3 | 20.3 | 1 | <0.001 | 0.2 | 0.1 | 0.4 |
| 14 |
| −1.6 | 0.4 | 18.9 | 1 | <0.001 | 0.2 | 0.1 | 0.4 |
| 15 |
| −1.0 | 0.2 | 15.6 | 1 | <0.001 | 0.4 | 0.2 | 0.6 |
| 16 |
| −0.5 | 0.1 | 10.5 | 1 | 0.001 | 0.6 | 0.5 | 0.8 |
| 17 |
| −0.8 | 0.3 | 7.5 | 1 | 0.006 | 0.4 | 0.3 | 0.8 |
| 18 |
| −1.2 | 0.5 | 7.5 | 1 | 0.006 | 0.3 | 0.1 | 0.7 |
| 19 |
| −0.4 | 0.2 | 4.5 | 1 | 0.034 | 0.7 | 0.5 | 1.0 |
Cox regression, backward conditional.
Kaplan–Meier analysis for prediction of overall survival outcome (Method 1).
| m | Gene | Cut-Off | Log-Rank | Breslow | Hazard Risk | Correlation with High | OR |
|---|---|---|---|---|---|---|---|
| 1 |
| 8.71 | <0.001 | <0.001 | 3.5 (2.1–5.8) | 1.3 (0.6–3.0) | 0.499 |
| 2 |
| 11.83 | 0.001 | 0.002 | 2.3 (1.4–3.8) | 2.3 (0.9–5.3) | 0.056 |
| 3 |
| 8.75 | 0.015 | 0.016 | 1.9 (1.1–3.1) | 1.1 (0.5–2.5) | 0.798 |
| 4 |
| 7.66 | 0.037 | 0.137 | 1.8 (1.0–3.3) | 2.1 (0.9–4.9) | 0.077 |
| 5 |
| 8.81 | 0.034 | 0.014 | 1.6 (1.0–2.5) | 0.9 (0.4–1.7) | 0.649 |
| 6 |
| 12.81 | 0.028 | 0.048 | 0.6 (0.4–0.9) | 0.2 (0.1–0.5) | 0.001 |
| 7 |
| 8.99 | 0.039 | 0.029 | 0.5 (0.3–0.9) | 0.5 (0.2–1.1) | 0.068 |
| 8 |
| 13.37 | 0.019 | 0.042 | 0.5 (0.3–0.9) | 0.2 (0.1–0.4) | <0.001 |
| 9 |
| 12.02 | 0.022 | 0.042 | 0.5 (0.3–0.9) | 0.2 (0.1–0.5) | 0.01 |
| 10 |
| 9.95 | <0.001 | <0.001 | 0.3 (0.2–0.6) | 0.2 (0.1–0.5) | 0.001 |
This analysis is a univariate.
Multivariate Cox regression overall survival analysis between MKI67 and the 10 highlighted genes (Method 1).
| Gene | B | SE | Wald | df | Sig. | HR | 95.0% CI for HR | |
|---|---|---|---|---|---|---|---|---|
| Lower | Upper | |||||||
|
| 1.3 | 0.3 | 20.5 | 1 | 0.000 | 3.8 | 2.1 | 6.8 |
|
| 0.9 | 0.3 | 11.3 | 1 | 0.001 | 2.6 | 1.5 | 4.4 |
|
| −0.5 | 0.3 | 3.0 | 1 | 0.085 | 0.6 | 0.3 | 1.1 |
|
| 0.6 | 0.2 | 6.9 | 1 | 0.009 | 1.9 | 1.2 | 3.1 |
|
| −0.7 | 0.3 | 4.5 | 1 | 0.035 | 0.5 | 0.2 | 0.9 |
|
| 0.8 | 0.3 | 5.3 | 1 | 0.021 | 2.2 | 1.1 | 4.2 |
|
| 1.5 | 0.3 | 26.6 | 1 | 0.000 | 4.3 | 2.5 | 7.6 |
|
| 0.8 | 0.3 | 6.6 | 1 | 0.010 | 2.1 | 1.2 | 3.8 |
Multivariate Cox regression analysis, backward conditional. HR, hazard risk. Note: There are only 8 genes because it is a multivariate Cox regression analysis with the backward conditional method. In this method, the nonsignificant variables are eliminated.
Multilayer perceptron analysis of the immuno-oncology pathways (Method 2).
| Pathway | Num. Genes Top 70% | Case Processing Summary | Network Layers | Model Summary | Classification | Area under the Curve (AUC) | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Training | Testing | Input | Hidden | Output | Training | Testing | Training (% Correct) | Testing (% Correct) | ||||||||||||||
| Num. | % | Num. | % | Units | Num. | Units | Num. | Units | Cross Entropy Error | Incorrect Predictions % | Training Time | Cross Entropy Error | Incorrect Predictions % | Observed Alive | Observed Dead | Overall | Observed Alive | Observed Dead | Overall % | |||
| Cancer Transcriptome | 13 | 84 | 68.3 | 39 | 31.7 | 1785 | 1 | 6 | 1 | 2 | 41.1 | 27.4 | 00:03.9 | 17.6 | 23.1 | 58.8 | 82.0 | 72.6 | 55.6 | 83.3 | 76.9 | 0.84 |
| Pan Cancer Human IO360 | 15 | 84 | 68.3 | 39 | 31.7 | 727 | 1 | 8 | 1 | 2 | 22.5 | 13.1 | 00:01.4 | 14.7 | 15.4 | 82.4 | 90.0 | 86.9 | 88.9 | 83.3 | 84.6 | 0.94 |
| Pan Cancer Immune Profiling | 1 | 84 | 68.3 | 39 | 31.7 | 707 | 1 | 5 | 1 | 2 | 44.9 | 26.2 | 00:01.5 | 15.0 | 12.8 | 64.7 | 80.0 | 73.8 | 88.9 | 86.7 | 87.2 | 0.82 |
| Pan Cancer Progression | 18 | 84 | 68.3 | 39 | 31.7 | 715 | 1 | 11 | 1 | 2 | 51.2 | 32.1 | 00:01.7 | 18.7 | 12.8 | 29.4 | 94.0 | 67.9 | 66.7 | 93.3 | 87.2 | 0.74 |
| Pan Cancer Pathways | 6 | 84 | 68.3 | 39 | 31.7 | 712 | 1 | 8 | 1 | 2 | 36.9 | 21.4 | 00:01.8 | 16.8 | 15.4 | 67.6 | 86.0 | 78.6 | 77.8 | 86.7 | 84.6 | 0.89 |
| Metabolic Pathways | 27 | 84 | 68.3 | 39 | 31.7 | 737 | 1 | 14 | 1 | 2 | 39.8 | 22.6 | 00:01.6 | 13.7 | 17.9 | 55.9 | 92.0 | 77.4 | 66.7 | 86.7 | 82.1 | 0.87 |
| Immune Exhaustion | 12 | 84 | 68.3 | 39 | 31.7 | 720 | 1 | 10 | 1 | 2 | 47.2 | 31.0 | 00:01.6 | 18.2 | 17.9 | 50.0 | 82.0 | 69.0 | 66.7 | 86.7 | 82.1 | 0.79 |
| Human Inflammation | 23 | 84 | 68.3 | 39 | 31.7 | 247 | 1 | 9 | 1 | 2 | 33.7 | 17.9 | 00:00.6 | 16.6 | 23.1 | 73.5 | 88.0 | 82.1 | 55.6 | 83.3 | 76.9 | 0.89 |
| Host Response | 8 | 84 | 68.3 | 39 | 31.7 | 747 | 1 | 9 | 1 | 2 | 41.1 | 21.4 | 00:01.6 | 18.1 | 20.5 | 67.6 | 86.0 | 78.6 | 66.7 | 83.3 | 79.5 | 0.83 |
| Autoimmune | 13 | 84 | 68.3 | 39 | 31.7 | 719 | 1 | 10 | 1 | 2 | 11.9 | 6.0 | 00:01.5 | 12.5 | 10.3 | 88.2 | 98.0 | 94.0 | 88.9 | 90.0 | 89.7 | 0.98 |
| Organ Transplantation | 12 | 84 | 68.3 | 39 | 31.7 | 728 | 1 | 11 | 1 | 2 | 41.5 | 21.4 | 00:01.6 | 15.7 | 10.3 | 64.7 | 88.0 | 78.6 | 88.9 | 90.0 | 89.7 | 0.85 |
Input layer: standardized rescaling method for covariates. Hidden layer: hyperbolic tangent activation function. Output layer: softmax activation function, cross-entropy error function. Model summary, training, one consecutive step(s) with no decrease in error (error computations are based on the testing sample) as stopping rule.
Overall survival of the pan cancer series using the risk-scores.
| Subtype | Overall | Low-Risk | High-Risk | K–M Log-Rank | Cox | Cox HR | 95% CI for HR | |
|---|---|---|---|---|---|---|---|---|
| Lower | Higher | |||||||
| Breast | 962 | 821 | 141 | 4.0 × 10−17 | 6.5 × 10−15 | 4.0 | 2.8 | 5.6 |
| Lung | 475 | 426 | 49 | 1.0 × 10−10 | 1.1 × 10−9 | 3.3 | 2.3 | 4.9 |
| Prostate | 497 | 446 | 51 | 1.5 × 10−4 | 2.0 × 10−3 | 9.2 | 2.3 | 37.2 |
| Colorectal | 466 | 415 | 51 | 1.4 × 10−5 | 3.3 × 10−5 | 2.9 | 1.7 | 4.8 |
| Cervix | 191 | 169 | 22 | 3.4 × 10−10 | 8.9 × 10−8 | 7.7 | 3.6 | 16.2 |
| Stomach | 440 | 293 | 147 | 2.6 × 10−4 | 3.1 × 10−4 | 1.8 | 1.3 | 2.4 |
| Skin (melanoma) | 335 | 177 | 158 | 3.2 × 10−10 | 1.3 × 10−9 | 2.6 | 1.9 | 3.5 |
| Bladder | 389 | 207 | 182 | 9.2 × 10−13 | 9.7 × 10−12 | 3.0 | 2.2 | 4.1 |
| Ovary | 247 | 217 | 30 | 0.6 × 10−5 | 1.5 × 10−5 | 2.9 | 1.8 | 4.6 |
| DLBCL | 414 | 289 | 125 | 3.3 × 10−16 | 1.5 × 10−14 | 3.3 | 2.5 | 4.5 |
| Kidney | 792 | 470 | 322 | 5.9 × 10−17 | 2.5 × 10−15 | 3.2 | 2.4 | 4.3 |
| Uterus (endometrium) | 247 | 214 | 33 | 5.5 × 10−11 | 2.4 × 10−8 | 7.4 | 3.7 | 15.0 |
| Leukemia (AML) | 149 | 115 | 34 | 1.9 × 10−14 | 7.0 × 10−12 | 5.5 | 3.4 | 9.0 |
| Pancreas | 176 | 109 | 67 | 0.4 × 10−5 | 9.0 × 10−6 | 2.6 | 1.7 | 3.9 |
| Thyroid | 489 | 434 | 55 | 9.9 × 10−12 | 6.4 × 10−7 | 17.4 | 5.6 | 53.5 |
| Liver | 361 | 197 | 164 | 6.7 × 10−10 | 4.0 × 10−9 | 3.0 | 2.1 | 4.3 |
| CNS (GBM) | 659 | 209 | 450 | 2.6 × 10−17 | 8.9 × 10−15 | 4.5 | 3.1 | 6.6 |
| Overall | 7289 | 5208 | 2081 | 2.8 × 10−178 | 2.5 × 10−159 | 3.3 | 2.9 | 3.6 |
K–M, Kapan–Meier; HR, hazard risk, DLBCL, diffuse large B-cell lymphoma; AML, acute myeloid leukemia; CNS, central nervous system; GBM, glioblastoma multiforme. This analysis is univariate.
Machine learning and neural network analysis of the combined Methods 1 and 2 with the MCL35 signature.
| Model | Overall Accuracy for Predicting the Overall Survival | No. of Genes Used in the Final Model | Gene Names |
|---|---|---|---|
| Logistic regression | 100 | 50 | All the 50 |
| Bayesian network | 92 | 50 | All the 50 |
| Discriminant | 86 | 50 | All the 50 |
| CHAID | 85 | 6 | |
| C&R tree | 85 | 21 | |
| SVM | 81 | 50 | All the 50 |
| KNN algorithm | 78 | 50 | All the 50 |
| Neural network | 76 | 50 | All the 50 |
| C5 | 76 | 3 | |
| Quest | 65 | 50 | All the 50 |
In this analysis, several methods were tested, including C5, logistic regression, Bayesian network, discriminant analysis, KNN algorithm, LSVM, random trees, SVM, Tree-AS, CHAID, Quest, C&R tree, and neural networks. Among them, logistic regression and Bayesian network had the best overall accuracy for predicting the overall survival (dead vs. alive). The analysis used a custom field (genes) assignment. The target variable was the overall survival as a dichotomic (binary) variable (dead vs. alive). The inputs (predictive genes) were the most relevant genes (n = 50) that were previously identified in the Methods 1 (n = 19), 2 (n = 15), and the MCL35 signature (n = 17), as follows: ADAMDEC1, ADGRG2, AHR, AMOTL2, AR, ATL1, BST2, CCNB2, CD8B, CDC20, CDKN3, CEACAM6, CFB, CSF1, E2F2, ESPL1, FABP5, FAM83D, FMNL3, FOXM1, GCNA, GLIPR1, ID1, IGFBP7, IL6ST, ITGAX, KCTD12, KIF18A, KIF2C, MKI67, NCAPG, PALLD, PCK2, PEMT, PIK3CD, POGLUT3, RAB13, RGS1, ROBO4, RPGRIP1L, RRAS, SELENOP, TAMM41, TMEM176B, TOP2A, TYMS, YBX3, ZCCHC4, ZDHHC21, and ZWINT. A total of 13 models were selected and ranked according to their overall accuracy for predicting the overall survival. In the modeling, every possible combination of options was tested, and the best models were saved. Of note, in the final models not all the genes were necessary or contributed to the model, and only the best combinations were selected (e.g., 50 genes in the Bayesian network but only 6 in the CHAID tree).
Function and association of the highlighted genes in neoplasia.
| Gene | Function | Role in Cancer |
|---|---|---|
|
| Microtubule motor activity, role in mitosis | Overexpressed in various types of cancer; inhibitors are available [ |
|
| Translation repression, negative regulation of intrinsic apoptosis signaling | Related to myelodysplastic syndromes and acute myeloid leukemia [ |
|
| Negative regulation of cell proliferation, positive regulation of lipoprotein metabolic process | Critical role in breast cancer progression [ |
|
| Acidic repeat-containing protein, expressed in germ cells (testis) | Regulate genome stability [ |
|
| Protein glucosyltransferase, specifically targets extracellular EGF repeats of proteins (NOTCH1 and NOTCH3) | Related to glioblastoma multiforme tumorigenesis [ |
|
| Transport of selenium, response to oxidative stress | Prostate cancer recurrence [ |
|
| Actin cytoskeleton organization, angiogenesis, cell migration, Wnt-signaling pathway | Angiogenesis in pancreatic, and proliferation in lung cancer [ |
|
| Cell adhesion, metabolic process (retinoic acid, cortisol), regulation of cell growth | Prognosis of acute lymphoblastic leukemia [ |
|
| GABA-B receptors auxiliary subunit | Proliferation in breast cancer [ |
|
| G protein-coupled receptor signaling pathway | Tumor suppressor in endometrial cancer [ |
|
| Regulation of mitotic cell cycle (G1/S transition) | Association with non-Hodgkin lymphomas, prognosis of pancreatic cancer [ |
The gene information is based on UniProt [54], and Genecards [55]. TYMs was highlighted in Method 2; the rest of genes in Method 1.