| Literature DB >> 34912724 |
Jaesik Kim1,2,3, Kyung-Ah Sohn1,4, Jung-Hak Kwak5, Min Jung Kim5,6, Seung-Bum Ryoo5, Seung-Yong Jeong5,6, Kyu Joo Park5, Hyun-Cheol Kang7, Eui Kyu Chie7,8, Sang-Hyuk Jung2,3,9, Dokyoon Kim2,3, Ji Won Park2,3,5,6.
Abstract
BACKGROUND: Preoperative chemoradiotherapy (CRT) is a standard treatment for locally advanced rectal cancer (LARC). However, individual responses to preoperative CRT vary from patient to patient. The aim of this study is to develop a scoring system for the response of preoperative CRT in LARC using blood features derived from machine learning.Entities:
Keywords: early-treatment blood features; machine learning; pathologic response; prediction; preoperative chemoradiotherapy; rectal cancer
Year: 2021 PMID: 34912724 PMCID: PMC8666428 DOI: 10.3389/fonc.2021.790894
Source DB: PubMed Journal: Front Oncol ISSN: 2234-943X Impact factor: 6.244
Figure 1Overview of the entire study. The best model is selected by evaluating possible machine learning models with N repeats, and a novel scoring system with significant predictive power and simplicity is designed using the N best models. The order of important features is determined based on generalized feature importance (β), and significance of score candidates are compared to find the optimal number of important features (K).
Performance comparison of a total of 18 models for 1,000 repeats.
| Training set | Tuning set | Validation set | ||||
|---|---|---|---|---|---|---|
| AUROC (mean ± std) | AUPRC (mean ± std) | AUROC (mean ± std) | AUPRC (mean ± std) | AUROC (mean ± std) | AUPRC (mean ± std) | |
|
| ||||||
| Logistic regression | 0.6212 ± 0.0206 | 0.5352 ± 0.0279 | 0.5448 ± 0.0835 | 0.5301 ± 0.1062 | 0.5353 ± 0.0785 | 0.5325 ± 0.1039 |
| Ridge regression | 0.6210 ± 0.0205 | 0.5353 ± 0.0280 | 0.5458 ± 0.0836 | 0.5313 ± 0.1063 | 0.5339 ± 0.0784 | 0.5312 ± 0.1039 |
| Lasso regression | 0.5857 ± 0.0521 | 0.5762 ± 0.0848 | 0.5687 ± 0.0653 | 0.6035 ± 0.1137 | 0.5133 ± 0.0634 | 0.5675 ± 0.1311 |
| Gradient boosting | 0.9524 ± 0.0731 | 0.9479 ± 0.0776 | 0.5743 ± 0.0784 | 0.5369 ± 0.1035 | 0.5034 ± 0.0806 | 0.4917 ± 0.0956 |
| Random forest | 0.8594 ± 0.1227 | 0.8461 ± 0.1341 | 0.5791 ± 0.0775 | 0.5352 ± 0.1023 | 0.5123 ± 0.0774 | 0.4942 ± 0.0940 |
| Two-layer neural network | 0.6031 ± 0.0510 | 0.5366 ± 0.0441 | 0.6166 ± 0.0724 | 0.5922 ± 0.1077 | 0.5175 ± 0.0812 | 0.5144 ± 0.1006 |
|
| ||||||
| Logistic regression | 0.6529 ± 0.0210 | 0.5593 ± 0.0282 | 0.5117 ± 0.0844 | 0.4875 ± 0.0993 | 0.5081 ± 0.0788 | 0.4961 ± 0.0974 |
| Ridge regression | 0.6487 ± 0.0205 | 0.5572 ± 0.0283 | 0.5303 ± 0.0830 | 0.5056 ± 0.1021 | 0.5108 ± 0.0787 | 0.5044 ± 0.0999 |
| Lasso regression | 0.5933 ± 0.0637 | 0.5956 ± 0.0824 | 0.5586 ± 0.0605 | 0.5995 ± 0.1178 | 0.4957 ± 0.0623 | 0.5587 ± 0.1404 |
| Gradient boosting | 0.9938 ± 0.0191 | 0.9930 ± 0.0217 | 0.5825 ± 0.0745 | 0.5432 ± 0.1039 | 0.5027 ± 0.0821 | 0.4972 ± 0.1001 |
| Random forest | 0.9779 ± 0.0517 | 0.9737 ± 0.0621 | 0.5858 ± 0.0780 | 0.5475 ± 0.1066 | 0.5280 ± 0.0812 | 0.5193 ± 0.1011 |
| Two-layer neural network | 0.8048 ± 0.1393 | 0.7687 ± 0.1628 | 0.6094 ± 0.0675 | 0.5726 ± 0.1025 | 0.5089 ± 0.0878 | 0.5068 ± 0.1010 |
|
| ||||||
| Logistic regression w/FS | 0.6463 ± 0.0286 | 0.5770 ± 0.0348 | 0.4881 ± 0.0815 | 0.4772 ± 0.0954 | 0.4856 ± 0.0854 | 0.4852 ± 0.0955 |
| Ridge regression w/FS | 0.6450 ± 0.0276 | 0.5753 ± 0.0335 | 0.4954 ± 0.0810 | 0.4820 ± 0.0954 | 0.4827 ± 0.0868 | 0.4832 ± 0.0955 |
| Lasso regression | 0.6983 ± 0.1125 | 0.6915 ± 0.0604 | 0.5764 ± 0.0647 | 0.5961 ± 0.1110 | 0.5022 ± 0.0715 | 0.5471 ± 0.1293 |
| Gradient boosting w/FS | 0.9848 ± 0.0413 | 0.9815 ± 0.0489 | 0.5757 ± 0.0825 | 0.5392 ± 0.1052 | 0.4983 ± 0.0916 | 0.4919 ± 0.1028 |
| Random forest w/FS | 0.9338 ± 0.0967 | 0.9193 ± 0.1173 | 0.5502 ± 0.0840 | 0.5194 ± 0.1035 | 0.4910 ± 0.0893 | 0.4856 ± 0.0998 |
| Two-layer neural network | 0.6937 ± 0.1320 | 0.6416 ± 0.1496 | 0.5827 ± 0.0703 | 0.5531 ± 0.0989 | 0.4917 ± 0.0870 | 0.4879 ± 0.0955 |
|
| ||||||
| Logistic regression w/FS | 0.6870 ± 0.0243 | 0.6032 ± 0.0281 | 0.5894 ± 0.0864 | 0.5508 ± 0.1087 | 0.5975 ± 0.0804 | 0.5681 ± 0.1052 |
| Ridge regression w/F | 0.6711 ± 0.0227 | 0.5845 ± 0.0281 | 0.6400 ± 0.0800 | 0.5959 ± 0.1087 | 0.6322 ± 0.0771 | 0.5965 ± 0.1067 |
| Lasso regression | 0.6923 ± 0.0568 | 0.6357 ± 0.0415 | 0.6285 ± 0.0764 | 0.5913 ± 0.1082 | 0.5844 ± 0.0815 | 0.5663 ± 0.1113 |
| Gradient boosting w/FS | 0.9911 ± 0.0171 | 0.9898 ± 0.0194 | 0.5968 ± 0.0738 | 0.5616 ± 0.1027 | 0.5226 ± 0.0897 | 0.5160 ± 0.1034 |
| Random forest w/FS | 0.8629 ± 0.0948 | 0.8395 ± 0.1120 | 0.6189 ± 0.0763 | 0.5830 ± 0.1075 | 0.5857 ± 0.0836 | 0.5723 ± 0.1087 |
| Two-layer neural network | 0.7053 ± 0.0879 | 0.6484 ± 0.1040 | 0.6514 ± 0.0770 | 0.6126 ± 0.1084 | 0.5836 ± 0.0901 | 0.5636 ± 0.1084 |
|
| ||||||
| Logistic regression w/FS | 0.7145 ± 0.0241 | 0.6283 ± 0.0315 | 0.5690 ± 0.0837 | 0.5303 ± 0.1025 | 0.5742 ± 0.0840 | 0.5451 ± 0.1036 |
| Ridge regression w/F | 0.6982 ± 0.0228 | 0.6023 ± 0.0302 | 0.6242 ± 0.0807 | 0.5734 ± 0.1064 | 0.6151 ± 0.0819 | 0.5757 ± 0.1060 |
| Lasso regression | 0.7118 ± 0.0857 | 0.6697 ± 0.0677 | 0.6131 ± 0.0718 | 0.5862 ± 0.1058 | 0.5610 ± 0.0856 | 0.5558 ± 0.1134 |
| Gradient boosting w/FS | 0.9970 ± 0.0079 | 0.9965 ± 0.0091 | 0.6265 ± 0.0759 | 0.5875 ± 0.1038 | 0.5581 ± 0.0921 | 0.5427 ± 0.1065 |
| Random forest w/FS | 0.9295 ± 0.0772 | 0.9114 ± 0.0979 | 0.6300 ± 0.0789 | 0.5825 ± 0.1064 | 0.5918 ± 0.0830 | 0.5630 ± 0.1078 |
| Two-layer neural network | 0.8001 ± 0.1160 | 0.7583 ± 0.1411 | 0.6420 ± 0.0741 | 0.6015 ± 0.1054 | 0.5622 ± 0.0882 | 0.5346 ± 0.1042 |
w/FS denotes “with feature selection”.
Figure 2A plot of P-values for testing the difference between the scoring system and the ground truth label’s group by adding the blood features to the system in the most important order.
Figure 3The feature coefficients of the five most important blood features in the ridge regression models using early-CRT for 1,000 repeats.
Figure 4ROC curves of significant single blood features and systemic inflammatory and nutritional indicators of pre- or early-CRT and the RPS system for binary prediction for the true label. A value in parentheses means AUROC.
Figure 5Stratification of outcomes for CRT by quartile grouping using the RPS system. (A) Tumor regression grade (TRG), (B) overall downstaging, (C) T-downstaging, and (D) N-downstaging.