| Literature DB >> 31234143 |
Nidan Qiao1,2.
Abstract
INTRODUCTION: Machine learning methods in sellar region diseases present a particular challenge because of the complexity and the necessity for reproducibility. This systematic review aims to compile the current literature on sellar region diseases that utilized machine learning methods and to propose a quality assessment tool and reporting checklist for future studies.Entities:
Keywords: artificial intelligence; craniopharyngioma; growth; pituitary; prediction
Year: 2019 PMID: 31234143 PMCID: PMC6612064 DOI: 10.1530/EC-19-0156
Source DB: PubMed Journal: Endocr Connect ISSN: 2049-3614 Impact factor: 3.335
Quality assessment of machine learning studies.
| Categories | Items | Description | Reported |
|---|---|---|---|
| Unmet need | Limits in current non-machine-learning approach | Low diagnostic accuracy, low human-level prediction accuracy or prolonged diagnostic procedure | Yes/no |
| Reproducibility | Feature engineering methods | How features were generated before model training | Yes/no |
| Platforms/packages | Both platforms and packages should be reported | Yes/no | |
| Hyperparameters | All hyperparameters which are needed for study replication | Yes/no | |
| Robustness | Valid methods to overcome over-fit | Leave-one-out or k-fold cross-validation or bootstrap | Yes/no |
| The stability of results | Calculated variation in the validation statistics | Yes/no | |
| Generalizability | External data validation | Validation in settings different from the research framework | Yes/no |
| Clinical significance | Predictors explanation | Explanation of the importance of each predictor | Yes/no |
| Suggested clinical use | Proposed possible applications in clinical care | Yes/no |
Summary of studies on sellar region disease using machine learning methods.
| Cohort selection | Predictors | Outcomes | Models (parameters) | Performance statistics | |||||
|---|---|---|---|---|---|---|---|---|---|
| Sample size | Diagnosis | Outcomes and controls | Distribution | Discrimination | CV method | Variation in validation | |||
| Learned-Miller 2006 (6) | 49 | Acromegaly | Parameters from 3D shape of face | Acromegaly/healthy | 24:25 | SVM (linear or quadratic kernel) | Acc: 85.7% | LOOCV | NA |
| Kitajima 2009 (7) | 43 | Sellar mass | Age and 9 MRI features | Pituitary adenoma/craniopharyngioma/Rathke’s cyst | 20:11:12 | NN (FC(7)*1) | AUC: 0.990 | LOOCV | NA |
| Lalys 2011 (8) | 500 | Pituitary adenoma | Features in surgical images | Six surgical phases: nasal incision/retract/tumor removal/column replacement/suture/nose compress | NA | SVM (linear kernel), HMM | Acc: 87.6% | 10-fold CV | |
| Hu 2012 (9) | 68 | Pituitary adenoma | 9 serum proteins | NFPA healthy | 34:34 | Decision tree (Gini index) | Sen: 82.4% | 10-fold CV | NA |
| Steiner 2012 (10) | 15 | Pituitary adenoma | Spectrum from histology | GH+/GH−/non-tumor cells | 1000:1000:1000 | k means (k = 10), LDA | Acc: 85.3% | LOOCV | |
| Calligaris 2015 (11) | 45 | Pituitary adenoma | Protein signature in mass spectrometry from histology | ACTH pituitary tumor/GH pituitary tumor/PRL pituitary tumor/pituitary gland | 6:9:9:6 | SVM | Sen: 83.0% | NA | NA |
| Paul 2017 (12) | 233 | Brain tumors | Pixels in MRI images | Meningioma/glioma/pituitary tumor | 208:492:289 | CNN ((Cov(64)-Max)*2 + FC(800)*2), NN, SVM | Acc: 94.0% | 5-fold CV | |
| Kong 2018 (13) | 1123 | Acromegaly | Features in photos | Acromegaly/healthy | 527:596 | Ensemble | Acc: 95.5% | NA | NA |
| Zhang 2018 (14) | 112 | Pituitary adenoma | Features in MRI images | Null cell adenoma/other subtypes | 46:66 | SVM (radial kernel) | AUC:0.804 | Bootstrap | NA |
| Murray 2018 (15) | 124 | Growth hormone deficiency | Age, sex, IGF1, gene expressions | Growth hormone deficiency/healthy | 98:26 | RF | AUC:0.990 | Out-of-bag (3-fold CV) | NA |
| Yang 2018 (16) | 168 | Craniopharyngioma | Expression levels of signature genes | Craniopharyngioma/other brain or brain tumor samples | 24:144 | SVM (radial kernel) | AUC:0.850 | NA | NA |
| Hollon 2018 (17) | 400 | Pituitary adenoma | 26 patient’s characteristics | Poor early postoperative outcome/good | 124:276 | Elastic net, NB, SVM, RF | Acc: 87.0% | NA | NA |
| Staartjes 2018 (18) | 140 | Pituitary adenoma | patient characteristics, MRI features | Gross-total resection/not | 95:45 | NN (FC(5)*NA) | AUC: 0.96 | 5-fold CV without holdout | |
| Kocak 2018 (19) | 47 | Acromegaly | Features in MRI images | Response to somatostatin analogs/resistant | 24:23 | k-NN (k = 5) | Acc: 85.1% | 10-fold CV | |
| Ortea 2018 (20) | 30 | Growth hormone deficiency | Three serum proteins | Growth hormone deficiency/healthy | 15:15 | RF, SVM | Acc: 100% | Bootstrap | NA |
| Smyczynska 2018 (21) | 272 | Growth hormone deficiency with GH treatment | Patient characteristics, GH level, IGF-1 level, GH dose | Height change after GH treatment | 0.66 ± 0.57 | NN (FC(2)*1) | RMSE: 0.267 | NA | NA |
Acc, accuracy; ACTH, adrenocorticotropic hormone; AUC, area under curve; BoVW, bag-of-visual-word; CNN, convolutional neural network; Cov, convolutional layer; CV, cross-validation; FC, fully-connected neural network; GH, growth hormone; HMM, hidden Markov model; IGF1, insulin-like growth hormone 1; LDA, linear discriminant analysis; LOOCV, leave-one-out cross-validation; Max, max pooling layer; MRI, magnetic resonance image; NA, not available; NB, naïve Bayesian; NFPA, non-functional pituitary adenoma; NN, neural network; PRL, prolactin; RF, random forest; RMSE, root mean square error; s.d., standard deviation; Sen, sensitivity; Spe, specificity; SVM, support vector machine.
Figure 1The scheme of a machine learning study. The process can be categorized into four steps: a good clinical question; pre-processed data; training and validation of the model and significance in clinical applications.
Notations of special machine learning terms.
| Terms | Explanations |
|---|---|
| Unsupervised learning | A subgroup of machine leaning models with the purpose of finding similarities among samples where no outcomes are available |
| Supervised learning | A subgroup of machine leaning models with both predictors and outcomes, and the purpose is to learn the mapping function from the predictors to the outcomes |
| Feature | Predictors in a machine learning algorithm |
| Categorization | Transforming a continuous variable into a categorical variable |
| One-hot encoding | Using a vector (all the elements of the vector are 0 except one) to re-code a categorical variable |
| Standardization | Rescale data to a specific range, e.g., dividing by mean or dividing by standard deviation |
| Normalization | Transforming unnormalized data into normalized data, e.g., logarithm transformation |
| Over-fit | The established model corresponds too exactly to the training dataset, and may therefore fail to predict future unseen observations |
| Imputation | Assigning the value of a missing data, e.g., using the mean of the existing data |
| Dimension reduction | Representing the original data with lesser dimensions |
| Training | The learning process of the data pattern by a model |
| LASSO | Least Absolute Shrinkage and Selection Operator: A regression analysis method that performs both variable selection and regularization |
| SVM | Support Vector Machine: Finding the best hyperplane to separate data in a high dimensional space |
| Naïve Bayes | A simple probabilistic classifier based on Bayes’ theorem |
| kNN | k Nearest Neighbor: Classification of a sample according to the distance to other samples in the multidimensional space |
| Neural network | A family of models inspired by biological neural networks |
| Tree | A tree-like graph model of decisions and their possible consequences |
| Ensemble | Combining several different models, calculating predictions from these models and then those predictions are used as weighted inputs into another regression model for the ultimate prediction |
| Parameters | Coefficients of a model formula that need to be learned from the data |
| Hyperparameters | All the configuration variables of a model which are often set manually by the practitioner |
| Validation | Calculating performance of a trained model in a separated dataset |
| Discrimination | The ability of a model to separate individuals in multiple classes |
| Calibration | How well a model’s predicted probabilities concur with the actual probabilities |
| Cross-validation | First, the data is partitioned into k (5 or 10) equally sized parts randomly with one part as the validation dataset and others as the training dataset. This process is repeated for k times with each of the subsamples used exactly once as the validation dataset |
| Leave-one-out | Leaving one sample out each time and training the model on the remaining samples. The process is repeated multiple times till all the samples are “leave-outed” once |
| Bootstrapping | Randomly sampled data from the whole original data (patients can be sampled multiple times) can be used to create new data. Training and validation are based on the new data, and the resampling process is repeated multiple times |
| Robust | The stability of a model in cross-validation or in sensitivity analysis |
| Feature importance | How much the accuracy decreases when the feature is excluded |
Quality assessment of machine learning studies in sellar region disease.
| Unmet need | Reproducibility | Robustness | Generalizability | Clinical significance | |||||
|---|---|---|---|---|---|---|---|---|---|
| Limits in current non-machine-learning approach | Feature engineering | Platforms, packages | Hyperparameters | Valid methods for over-fitting | Stability of results | External data validation | Predictors explanation | Suggested clinical use | |
| Learned-Miller 2006 (6) | Yes | No | Yes | No | Yes | No | No | No | Yes |
| Kitajima 2009 (7) | Yes | Yes | No | Yes | Yes | No | No | No | Yes |
| Lalys 2011 (8) | No | Yes | No | Yes | Yes | Yes | No | No | Yes |
| Hu 2012 (9) | No | NA | Yes | Yes | Yes | No | No | No | Yes |
| Steiner 2012 (10) | Yes | Yes | No | Yes | Yes | Yes | No | No | Yes |
| Calligaris 2015 (11) | Yes | NA | No | No | No | No | No | No | Yes |
| Paul 2017 (12) | Yes | Yes | No | Yes | Yes | Yes | No | No | No |
| Kong 2018 (13) | Yes | Yes | No | Yes | No | No | No | No | Yes |
| Zhang 2018 (14) | Yes | Yes | Yes | No | Yes | No | No | No | Yes |
| Murray 2018 (15) | Yes | Yes | Yes | No | Yes | No | No | Yes | Yes |
| Yang 2018 (16) | Yes | Yes | Yes | Yes | No | No | Yes | No | No |
| Hollon 2018 (17) | No | No | Yes | No | No | No | No | Yes | No |
| Staartjes 2018 (18) | Yes | Yes | Yes | No | No | Yes | No | Yes | Yes |
| Kocak 2018 (19) | Yes | Yes | Yes | Yes | Yes | Yes | No | No | No |
| Ortea 2018 (20) | Yes | NA | Yes | No | Yes | No | No | No | Yes |
| Smyczynska 2018 (21) | Yes | Yes | No | Yes | No | No | No | Yes | No |
NA, no need.
A proposed reporting checklist of future studies using machine learning.
| Reporting of background should include |
| Results of human intelligence or non-machine-learning approach |
| A summarized research question |
| Reporting of method should include |
| Diagnoses of the cohort |
| Locations and time span of the patients included |
| How the control group was determined |
| All the variables as predictors |
| Data coding and data transformation methods |
| Missing data imputation methods |
| Any censoring data |
| The reason for choosing a specific model |
| The platform and the package for model building |
| All the hyperparameters in the model if applicable |
| Reporting of results should include |
| The rate of binary outcome or the distribution of categorical or continuous outcome |
| The appropriate validation statistic based on the clinical question |
| 95% confidence interval of validation statistic by cross-validation or bootstrapping |
| Whether an external validation was obtained |
| Reporting of the discussion should include |
| The reason if arbitrarily chosen cut-off value |
| Clinical meaning of the discrimination or calibration statistics |
| Explanation of the model (provide coefficients or feature importance if possible); |
| Discussion on how the model will be integrated in clinical care |