| Literature DB >> 36233640 |
Abstract
BACKGROUND: It is important to be able to predict, for each individual patient, the likelihood of later metastatic occurrence, because the prediction can guide treatment plans tailored to a specific patient to prevent metastasis and to help avoid under-treatment or over-treatment. Deep neural network (DNN) learning, commonly referred to as deep learning, has become popular due to its success in image detection and prediction, but questions such as whether deep learning outperforms other machine learning methods when using non-image clinical data remain unanswered. Grid search has been introduced to deep learning hyperparameter tuning for the purpose of improving its prediction performance, but the effect of grid search on other machine learning methods are under-studied. In this research, we take the empirical approach to study the performance of deep learning and other machine learning methods when using non-image clinical data to predict the occurrence of breast cancer metastasis (BCM) 5, 10, or 15 years after the initial treatment. We developed prediction models using the deep feedforward neural network (DFNN) methods, as well as models using nine other machine learning methods, including naïve Bayes (NB), logistic regression (LR), support vector machine (SVM), LASSO, decision tree (DT), k-nearest neighbor (KNN), random forest (RF), AdaBoost (ADB), and XGBoost (XGB). We used grid search to tune hyperparameters for all methods. We then compared our feedforward deep learning models to the models trained using the nine other machine learning methods.Entities:
Keywords: DNN; EHR; breast cancer; clinical; deep learning; machine learning; metastasis; metastatic breast cancer; non-image; prediction
Year: 2022 PMID: 36233640 PMCID: PMC9570670 DOI: 10.3390/jcm11195772
Source DB: PubMed Journal: J Clin Med ISSN: 2077-0383 Impact factor: 4.964
Case counts of the LSM datasets (#: number).
| Total # of Cases | # Positive Cases | # Negative Cases | |
|---|---|---|---|
| LSM-5year | 4189 | 437 | 3752 |
| LSM-10year | 1827 | 572 | 1255 |
| LSM-15year | 751 | 608 | 143 |
Figure 1A DFNN (deep feedforward neural network) model that contains n hidden layers.
Description of the DFNN (deep feedforward neural network) hyperparameters and their values tested (note: # represents the word “number” in this table).
| Hyperparameter | Description | Values |
|---|---|---|
| # of Hidden Layers | The depth of a DFNN | 1, 2, 3, 4 |
| # of Hidden Nodes | Number of neurons in a hidden layer | 10, 20, …, 70, 75, 80, 90, … 120, 200, 300, …, 1100 |
| Optimizer | Optimizes internal model parameters towards minimizing the loss | SGD (stochastic gradient descent), AdaGrad |
| Learning rate | Used by both SGD and AdaGrad | 0.001 to 0.3, step size: 0.001 |
| Momentum | Smooths out the curve of gradients by moving average. Used by SGD. | 0, 0.4, 0.5, 0.9 |
| Iteration-based decay | Iteration-based decay; updating learning rate by a decreasing factor in each epoch | 0 0.0001, 0.0002, …, 0.001, 0.002, …, 0.01 |
| Dropout rate | Manage overfitting and training time by randomly selecting nodes to ignore | 0, 0.4, 0.5 |
| Epochs | Number of times model is trained by each of the training set samples exactly once | 20, 30, 50, 80, 100, 200, …, 800 |
| Batch_size | Unit number of samples fed to the optimizer before updating weights | 1, 10, 20, …, 100 |
| L1 (Lebesgue 1) | Sparsity regularization | 0, 0.0005, 0.0008, 0.001, 0.002, 0.005, 0.008, 0.01, 0.02, 0.05, 0, 0.1, 0.2, 0.5 |
| L2 (Lebesgue 2) | Weight decay regularization; it penalizes large weights to adjust the weight updating step | 0, 0.0005, 0.0008, 0.001, 0.002, 0.005, 0.008, 0.01, 0.02, 0.05, 0, 0.1, 0.2, 0.5 |
| L1ORL2 | Using L1 and L2 combinations to regularize overfitting | L1 only, L2 only, L1 and L2 |
The mean test AUCs and mean train AUCs of the best-performing models (LSM: Lynn Sage Dataset for Metastasis; DFNN: Deep feedforward neural network; NB: Naïve bayes; LR: Logistic regression; DT: Decision tree; SVM: Support vector machine; LASSO: Least absolute shrinkage and selection operator; KNN: K-nearest neighbor; RF: Random forest; ADB: AdaBoost; XGB: XGBoost).
| Mean Test AUC/Mean Train AUC | LSM-5 Year | LSM-10 Year | LSM-15 Year |
|---|---|---|---|
| DFNN | 0.769/0.806 | 0.793/0.830 | 0.842/0.873 |
| NB | 0.751/0.753 | 0.797/0.798 | 0.763/0.826 |
| LR | 0.771 /0.773 | 0.777/0.809 | 0.844/0.884 |
| DT | 0.762/0.780 | 0.783/0.827 | 0.783/0.838 |
| SVM | 0.739/0.811 | 0.771/0.808 | 0.845/0.867 |
| LASSO | 0.772/0.774 | 0.778/0.806 | 0.844/0.887 |
| KNN | 0.789/0.816 | 0.793/0.819 | 0.799/0.832 |
| RF | 0.789/0.801 | 0.804/0.840 | 0.802/0.849 |
| ADB | 0.759/0.754 | 0.792/0.800 | 0.796/0.829 |
| XGB | 0.793/0.813 | 0.806/0.845 | 0.800/0.854 |
The hyperparameter values of the best-performing DFNN models learned from 5-year, 10-year, and 15-year datasets, respectively (LSM: Lynn Sage Dataset for Metastasis).
| Hyperparameter Values of the Best-Performing Model | LSM-5 Year | LSM-10 Year | LSM-15 Year |
|---|---|---|---|
| Number of hidden layers. | 2 | 1 | 3 |
| Number of hidden nodes | {75, 75} | {75} | {300, 300, 300} |
| Kernel initializer | he_normal | he_normal | he_normal |
| Optimizer | SGD | SGD | SGD |
| Learning rate | 0.005 | 0.01 | 0.005 |
| Momentum | 0.9 | 0.9 | 0.9 |
| Iteration-based decay | 0.01 | 0.01 | 0.01 |
| Dropout rate | 0.5 | 0.5 | 0.5 |
| Epochs | 100 | 100 | 100 |
| L1 (Lebesgue 1) | 0 | 0 | 0 |
| L2 (Lebesgue 1) | 0.008 | 0.008 | 0.008 |
| L1 and L2 combined | No | No | No |
Experiment time per model per dataset, number of models trained, and total experiment time. (#: number; LSM: Lynn Sage Dataset for Metastasis; DFNN: Deep feedforward neural network; NB: Naïve bayes; LR: Logistic regression; DT: decision tree; SVM: Support vector machine; LASSO: Least absolute shrinkage and selection operator; KNN: K-nearest neighbor; RF: Random forest; ADB: AdaBoost; XGB: XGBoost).
| Method | LSM-5 (Sec) | LSM-10 (Sec) | LSM-15 (Sec) | # of Models Trained | Total Time (Days) |
|---|---|---|---|---|---|
| DFNN | 117.430 | 45.021 | 20.212 | 24,111 | 50.974 |
| NB | 0.060 | 0.046 | 0.026 | 18,109 | 0.028 |
| LR | 0.563 | 0.353 | 0.253 | 22,399 | 0.303 |
| DT | 0.048 | 0.037 | 0.032 | 107,351 | 0.145 |
| LASSO | 0.860 | 0.372 | 0.189 | 1024 | 0.017 |
| SVM | 12.197 | 2.876 | 0.362 | 1799 | 0.321 |
| KNN | 1.636 | 0.436 | 0.132 | 42,341 | 1.080 |
| RF | 0.774 | 0.603 | 0.549 | 27,000 | 0.602 |
| ADB | 0.655 | 0.508 | 0.403 | 13 | 0.000 |
| XGB | 4.710 | 4.566 | 3.850 | 46,980 | 7.137 |
Figure 2ROC curves of the best-performing models for all methods, each respectively, for predicting 5-year metastasis (ROC: receiver operating characteristic; DFNN: Deep feedforward neural network; NB: Naïve bayes; LR: Logistic regression; DT: Decision tree; SVM: Support vector machine; LASSO: Least absolute shrinkage and selection operator; KNN: K-nearest neighbor; RF: Random forest; ADB: AdaBoost; XGB: XGBoost).
Figure 3ROC curves of the best-performing models for all methods, each respectively, for predicting 10-year metastasis (ROC: receiver operating characteristic; DFNN: Deep feedforward neural network; NB: Naïve bayes; LR: Logistic regression; DT: Decision tree; SVM: Support vector machine; LASSO: Least absolute shrinkage and selection operator; KNN: K-nearest neighbor; RF: Random forest; ADB: AdaBoost; XGB: XGBoost).
Figure 4ROC curves of the best-performing models for all methods, each respectively, for predicting 15-year metastasis (ROC: receiver operating characteristic; DFNN: Deep feedforward neural network; NB: Naïve bayes; LR: Logistic regression; DT: Decision tree; SVM: Support vector machine; LASSO: Least absolute shrinkage and selection operator; KNN: K-nearest neighbor; RF: Random forest; ADB: AdaBoost; XGB: XGBoost).
Figure 5Boxplots to compare the mean test AUCs of all methods (AUC: area under the ROC curves; DFNN: Deep feedforward neural network; NB: Naïve bayes; LR: Logistic regression; DT: Decision tree; SVM: Support vector machine; LASSO: Least absolute shrinkage and selection operator; KNN: K-nearest neighbor; RF: Random forest; ADB: AdaBoost; XGB: XGBoost).
Figure 6Side by side comparisons of the mean test AUCs of all methods when predicting 5-, 10-, and 15-year breast cancer metastasis (AUC: area under the ROC curves; DFNN: Deep feedforward neural network; NB: Naïve bayes; LR: Logistic regression; DT: Decision tree; SVM: Support vector machine; LASSO: Least absolute shrinkage and selection operator; KNN: K-nearest neighbor; RF: Random forest; ADB: AdaBoost; XGB: XGBoost).
Figure 7A comparison of the data imbalance status of LSM-5 year, 10 year, and 15 year datasets (LSM: Lynn Sage Dataset for Metastasis).