| Literature DB >> 35719963 |
Ze Yu1, Xuan Ye2,3, Hongyue Liu2,3, Huan Li2,3, Xin Hao4, Jinyuan Zhang5, Fang Kou1, Zeyuan Wang6, Hai Wei1, Fei Gao5, Qing Zhai2,3.
Abstract
Lapatinib is used for the treatment of metastatic HER2(+) breast cancer. We aim to establish a prediction model for lapatinib dose using machine learning and deep learning techniques based on a real-world study. There were 149 breast cancer patients enrolled from July 2016 to June 2017 at Fudan University Shanghai Cancer Center. The sequential forward selection algorithm based on random forest was applied for variable selection. Twelve machine learning and deep learning algorithms were compared in terms of their predictive abilities (logistic regression, SVM, random forest, Adaboost, XGBoost, GBDT, LightGBM, CatBoost, TabNet, ANN, Super TML, and Wide&Deep). As a result, TabNet was chosen to construct the prediction model with the best performance (accuracy = 0.82 and AUC = 0.83). Afterward, four variables that strongly correlated with lapatinib dose were ranked via importance score as follows: treatment protocols, weight, number of chemotherapy treatments, and number of metastases. Finally, the confusion matrix was used to validate the model for a dose regimen of 1,250 mg lapatinib (precision = 81% and recall = 95%), and for a dose regimen of 1,000 mg lapatinib (precision = 87% and recall = 64%). To conclude, we established a deep learning model to predict lapatinib dose based on important influencing variables selected from real-world evidence, to achieve an optimal individualized dose regimen with good predictive performance.Entities:
Keywords: TabNet; breast cancer; deep learning; individualized medication model; lapatinib; machine learning; real-world study
Year: 2022 PMID: 35719963 PMCID: PMC9203846 DOI: 10.3389/fonc.2022.893966
Source DB: PubMed Journal: Front Oncol ISSN: 2234-943X Impact factor: 5.738
Figure 1Workflow of data process and model establishment.
Prediction performance of different algorithms with six-fold cross-validation.
| Metrics Algorithms | Dose regimen | Precision (mean ± std, P) | Recall (mean ± std, P) | f1_score (mean ± std, P) | Support | Accuracy (mean ± std, P) | AUC (mean ± std, P) |
|---|---|---|---|---|---|---|---|
| LR | 0 | 0.47 ± 0.34 (0.317) | 0.15 ± 0.11 (0.785) | 0.23 ± 0.16 (0.668) | 11 | 0.68 ± 0.06 (0.463) | 0.59 ± 0.11 (0.489) |
| 1 | 0.65 ± 0.03 (0.482) | 0.91 ± 0.05 (0.317) | 0.78 ± 0.04 (0.409) | 19 | |||
| SVM | 0 | 0.92 ± 0.09 (0.317) | 0.31 ± 0.10 (0.585) | 0.42 ± 0.14 (0.48) | 11 | 0.71 ± 0.07 (0.405) | 0.30 ± 0.13 (0.408) |
| 1 | 0.71 ± 0.06 (0.429) | 0.99 ± 0.02 (0.317) | 0.81 ± 0.04 (0.377) | 19 | |||
| RF | 0 | 0.81 ± 0.15 (0.317) | 0.42 ± 0.18 (0.525) | 0.56 ± 0.17 (0.484) | 11 | 0.75 ± 0.05 (0.424) | 0.79 ± 0.03 (0.397) |
| 1 | 0.73 ± 0.05 (0.418) | 0.94 ± 0.04 (0.317) | 0.83 ± 0.03 (0.388) | 19 | |||
| AdaBoost | 0 | 0.81 ± 0.07 (0.431) | 0.42 ± 0.10 (0.435) | 0.58 ± 0.07 (0.305) | 11 | 0.76 ± 0.04 (0.418) | 0.77 ± 0.06 (0.363) |
| 1 | 0.74 ± 0.06 (0.394) | 0.92 ± 0.09 (0.373) | 0.82 ± 0.09 (0.381) | 19 | |||
| XGBoost | 0 | 0.80 ± 0.14 (0.317) | 0.44 ± 0.07 (0.585) | 0.58 ± 0.09 (0.48) | 11 | 0.76 ± 0.05 (0.405) | 0.76 ± 0.04 (0.406) |
| 1 | 0.74 ± 0.03 (0.429) | 0.93 ± 0.04 (0.317) | 0.83 ± 0.04 (0.377) | 19 | |||
| GBDT | 0 | 0.77 ± 0.17 (0.317) | 0.59 ± 0.14 (0.467) | 0.67 ± 0.14 (0.4) | 11 | 0.79 ± 0.08 (0.368) | 0.81 ± 0.11 (0.357) |
| 1 | 0.79 ± 0.06 (0.388) | 0.88 ± 0.06 (0.317) | 0.84 ± 0.06 (0.354) | 19 | |||
| LightGBM | 0 | 0.69 ± 0.16 (0.391) | 0.44 ± 0.07 (0.585) | 0.54 ± 0.11 (0.505) | 11 | 0.72 ± 0.08 (0.424) | 0.76 ± 0.06 (0.413) |
| 1 | 0.74 ± 0.09 (0.434) | 0.87 ± 0.12 (0.343) | 0.80 ± 0.07 (0.391) | 19 | |||
| CatBoost | 0 | 0.84 ± 0.11 (0.317) | 0.53 ± 0.11 (0.467) | 0.65 ± 0.09 (0.4) | 11 | 0.79 ± 0.09 (0.368) | 0.78 ± 0.04 (0.362) |
| 1 | 0.77 ± 0.04 (0.388) | 0.93 ± 0.05 (0.317) | 0.85 ± 0.04 (0.354) | 19 | |||
| TabNet | 0 | 0.87 ± 0.03 (0.317) | 0.64 ± 0.01 (0.525) | 0.73 ± 0.01 (0.461) | 11 | 0.82 ± 0.05 (0.405) | 0.83 ± 0.01 (0.397) |
| 1 | 0.81 ± 0.02 (0.413) | 0.95 ± 0.03 (0.317) | 0.87 ± 0.02 (0.38) | 19 | |||
| ANN | 0 | 0.42 ± 0.11 (0.549) | 0.45 ± 0.15 (0.585) | 0.44 ± 0.13 (0.568) | 11 | 0.55 ± 0.06 (0.484) | 0.58 ± 0.07 (0.482) |
| 1 | 0.67 ± 0.05 (0.453) | 0.54 ± 0.06 (0.43) | 0.62 ± 0.04 (0.442) | 19 | |||
| Super TML | 0 | 0.78 ± 0.11 (0.427) | 0.34 ± 0.11 (0.415) | 0.48 ± 0.09 (0.361) | 11 | 0.71 ± 0.05 (0.385) | 0.56 ± 0.07 (0.354) |
| 1 | 0.70 ± 0.04 (0.323) | 0.93 ± 0.05 (0.367) | 0.80 ± 0.04 (0.381) | 19 | |||
| Wide&Deep | 0 | 0.69 ± 0.09 (0.549) | 0.43 ± 0.12 (0.585) | 0.54 ± 0.11 (0.568) | 11 | 0.72 ± 0.07 (0.328) | 0.74 ± 0.12 (0.389) |
| 1 | 0.72 ± 0.03 (0.355) | 0.87 ± 0.04 (0.243) | 0.79 ± 0.04 (0.355) | 19 |
aRegimen of 1,000 mg lapatinib corresponds to “0,” and regimen of 1,250 mg lapatinib corresponds to “1.”
Feature importance from TabNet with six-fold cross-validation.
| Feature | Importance (mean ± std) | P value |
|---|---|---|
| Treatment protocols | 0.47 ± 0.05 | 0.612 |
| Weight | 0.23 ± 0.07 | 0.345 |
| Number of chemotherapy treatments | 0.16 ± 0.05 | 0.472 |
| Number of metastases | 0.05 ± 0.09 | 0.247 |
Treatment protocols including protocol_1 (combination regimen of lapatinib + capecitabine), protocol_2 (combination regimen of paclitaxel + carboplatin + herceptin + lapatinib), protocol_3 (combination regimen of vinorelbine + lapatinib), and protocol_4 (other combination regimens).
Description of demographic and clinical characteristics.
| Categories | Variables | Cases (N = 149) | Missing rate |
|---|---|---|---|
| Lapatinib information | Initial dose regimen, n (%) | 0 | |
| 0 | 55 (36.90%) | ||
| 1 | 94 (63.10%) | ||
| Demographic information | Age, year, median (IQR) | 51 (42.0–58.0) | 0 |
| Height, cm, median (IQR) | 160.2 (158.0–162.0) | 0 | |
| Weight, kg, median (IQR) | 58.3 (53.0–64.0) | 0 | |
| Age ≥ 52 years, n (%) | 77 (51.78%) | 0 | |
| Drug combination | Prior use of anthracycline, n (%) | 101 (67.79%) | 0 |
| Prior use of taxane, n (%) | 134 (89.93%) | 0 | |
| Prior use of platinum, n (%) | 63 (42.28%) | 0 | |
| Prior use of fluorouracil, n (%) | 75 (50.34%) | 0 | |
| Prior use of trastuzumab, n (%) | 136 (91.28%) | 0 | |
| Physiopathological condition | Hypertension, n (%) | 18 (12.08%) | 0 |
| Diabetes, n (%) | 6 (4.03%) | 0 | |
| Heart disease, n (%) | 5 (3.36%) | 0 | |
| Other underlying diseases, n (%) | 14 (9.4%) | 0 | |
| Postmenopausal, n (%) | 89 (59.73%) | 2.7% | |
| Treatment information | Number of chemotherapy treatments, n (%) | 0 | |
| <3 | 72 (48.32%) | ||
| ≥3 | 77 (51.68%) | ||
| Ki-67, median (IQR) | 38.1 (20.0–50.0) | 8.7% | |
| Prior endocrine therapy, n (%) | 54 (36.24%) | 0 | |
| ER, n (%) | 0 | ||
| 0 | 67 (44.97%) | ||
| 1 | 82 (55.03%) | ||
| PR, n (%) | 0 | ||
| 0 | 47 (31.54%) | ||
| 1 | 102 (68.46%) | ||
| Stage, n (%) | 0 | ||
| 2 | 7 (4.70%) | ||
| 3 | 29 (19.46%) | ||
| 4 | 113 (75.84%) | ||
| Operation, n (%) | 0 | ||
| 0 | 13 (8.72%) | ||
| 1 | 131 (87.92%) | ||
| 2 | 2 (1.34%) | ||
| ECOG, n (%) | 2.0% | ||
| 1 | 145 (97.32%) | ||
| 2 | 4 (2.68%) | ||
| Number of metastases, n (%) | 0 | ||
| 0 | 36 (24.16%) | ||
| 1 | 60 (40.27%) | ||
| 2 | 33 (22.15%) | ||
| 3 | 14 (9.04%) | ||
| 4 | 6 (4.03%) | ||
| Metastases, n (%) | 0 | ||
| <2 | 96 (64.43%) | ||
| ≥2 | 53 (35.57%) | ||
| Lung metastases, n (%) | 63 (42.28%) | 0 | |
| Liver metastases, n (%) | 40 (26.85%) | 0 | |
| Bone metastases, n (%) | 47 (31.54%) | 0 | |
| Brain metastases, n (%) | 27 (18.12%) | 0 | |
| Other metastases, n (%) | 35 (23.49%) | 0 | |
| Treatment protocols, n (%) | 0 | ||
| Protocol_1 | 100 (67.11%) | ||
| Protocol_2 | 24 (16.11%) | ||
| Protocol_3 | 10 (6.71%) | ||
| Protocol_4 | 15 (10.07%) | ||
| Safety and effectiveness | Safety and effectiveness, n (%) | 0 | |
| 1 | 46 (30.87%) | ||
| 0 | 103 (69.13%) | ||
aRegimen of 1250 mg lapatinib corresponds to “1,” and regimen of 1,000 mg lapatinib corresponds to “0.”
bProtocol_1 indicates combination regimen of lapatinib + capecitabine, protocol_2 indicates combination regimen of paclitaxel + carboplatin + herceptin + lapatinib, protocol_3 indicates combination regimen of vinorelbine + lapatinib, and protocol_4 indicates other combination regimens.
cPatient showing both safety and effectiveness corresponds to “1,” and other situations (either showing safety or effectiveness; not showing safety nor effectiveness) corresponds to “0.”
IQR, interquartile range; ER, estrogen receptors; PR, progesterone receptors; ECOG, Eastern Cooperative Oncology Group.
Figure 2F1_score of RF model corresponding to the number of ranked variables. RF, random forest.
Figure 3Confusion matrix in TabNet model.