| Literature DB >> 31741712 |
Shelli R Kesler1,2,3, Rebecca A Harrison4,3, Melissa L Petersen4, Vikram Rao1,2, Hannah Dyson4, Kristin Alfaro-Munoz4, Shiao-Pei Weathers4, John de Groot4.
Abstract
BACKGROUND: Gliomas are the most common type of malignant brain tumor. Clinical outcomes depend on many factors including tumor molecular characteristics. Mutation of the isocitrate dehydrogenase (IDH) gene confers significant benefits in terms of survival and quality of life. Preoperative determination of IDH genotype can facilitate surgical planning, allow for novel clinical trial designs, and assist clinical counseling surrounding the individual patient's disease.Entities:
Keywords: IDH; MRI; connectomics; glioma; machine learning
Year: 2019 PMID: 31741712 PMCID: PMC6849657 DOI: 10.18632/oncotarget.27301
Source DB: PubMed Journal: Oncotarget ISSN: 1949-2553
Patient characteristics
|
| |
| IDH (Mutant) | 148 (63%) |
| Age |
Mean = 43.85 +/– 15.12 Range = 18–82 |
| Sex (Male) | 146 (62%) |
| Grade II | 101 (43%) |
| Grade III | 63 (27%) |
| Grade IV | 70 (30%) |
| Oligodendroglioma | 57 (24%) |
| Astrocytoma | 170 (73%) |
| Oliogoastrocytoma | 7 (3%) |
| Tumor Hemisphere (Left) | 168 (72%) |
| Tumor Location (Primary) | |
| Frontal | 117 (50%) |
| Insular | 17 (7%) |
| Occipital | 1 (.4%) |
| Parietal | 34 (15%) |
| Temporal | 65 (28%) |
| Multifocal Tumor | 33 (14%) |
| MGMT Promoter Methylation ( |
Positive: 24 (59%) Negative: 17 (41%) |
| KPS ( |
100: 15 (29%) 90: 21 (41%) 80: 11 (22%) 70: 4 (8%) |
Machine learning model performance
| Features | Model | Accuracy | Sensitivity | Specificity | AUC |
|---|---|---|---|---|---|
| 90 connectome efficiencies, brain volume, network degree, network size | RF | 86% | 89% | 83% | .94 |
| SVM | 77% | 79% | 75% | .77 | |
| LR | 78% | 84% | 73% | .76 | |
| MLP | 80% | 84% | 77% | .85 | |
| 90 connectome efficiencies, brain volume, network degree, network size, age, tumor hemisphere, tumor lobe | RF | 89% | 90% | 89% | .95 |
| Age, tumor hemisphere, tumor lobe | RF | 77% | 79% | 76% | .87 |
RF = random forest, SVM = support vector machine, LR = logistic regression, MLP = multilayer perceptron, AUC = area under the curve.
Figure 1Receiver operator characteristic (ROC) curves for machine learning models predicting IDH genotype from connectome features.
RF = random forest, MLP = multilayer perceptron, LR = logistic regression, SVM = support vector machine.
Figure 2Violin plots for RF model AUCs including nested recursive feature elimination (RFE) or elastic net (EN) regression.
Machine learning approaches
| Classifier | Description | Advantages | Tuning Parameters |
|---|---|---|---|
| Random Forest (RF) [ | Ensemble of decision trees each trained on a random subset of features | Aggregates multiple independent classifiers, scale invariant, implicit feature selection, resistant to overfitting |
ntree = 1000 mtry = 7 [log2(nfeats)+1] |
| Support Vector Machine (SVM) [ | Defines an optimal hyperplane that maximizes the margin between classes | Kernel trick can solve complex problems, can handle imbalanced classes by weighting misclassification penalty | C = 1.0 |
| Logistic Regression (LR) [ | Multinomial logistic regression model with a ridge estimator | Simple, highly interpretable | ridge value = 1.0E-8 |
| Multilayer Perceptron (MLP) [ | Simple model of a biological brain that implements backpropagation | Can generalize in non-local ways akin to intelligent behavior, inherent feature selection |
learning rate = 0.3 momentum = 0.2 |
Figure 3MRI preprocessing and connectome construction steps.
First, voxel-based morphometry (VBM) involves standard procedures to extract gray matter volumes including reorientation to anterior and posterior commissure for improved spatial normalization, automated removal of skull, probabilistic segmentation into tissue classes (gray, white, CSF), creation of a sample-specific template via DARTEL, spatial normalization to standard Montreal Neurologic Institute (MNI) space, modulation using jacobian determinant and quality assurance checks. Second, modulated and normalized gray matter volumes were used to construct a connectome map for each patient as the correlation coefficients, r, between voxel values captured by all pairs of 3 × 3 × 3 voxel cubes (nodes) spanning the entire volume. This correlation, or similarity matrix was then thresholded to remove false positives resulting in a binary matrix where a connection (edge) between two nodes = 1. Third, the binary similarity matrix was submitted to graph theoretical analysis. Efficient information exchange is assumed to follow the shortest path between regions. As illustrated here, the shortest, most efficient path from node a to c is marked in red. Efficiency (E) is defined as the average inverse shortest path length across all regions in the network where n is the number of nodes and d(v,v is the length of the shortest path between nodes i and j. Efficiencies were averaged across all cubic nodes with MNI coordinates within one of 90 discrete anatomic regions defined by the Automated Anatomical Labeling Atlas (AAL).