| Literature DB >> 27355357 |
Hsin-Yao Wang1, Chia-Hsun Hsieh2, Chiao-Ni Wen1, Ying-Hao Wen1, Chun-Hsien Chen3, Jang-Jih Lu1,4.
Abstract
BACKGROUND: Analytic measurement of serum tumour markers is one of commonly used methods for cancer risk management in certain areas of the world (e.g. Taiwan). Recently, cancer screening based on multiple serum tumour markers has been frequently discussed. However, the risk-benefit outcomes appear to be unfavourable for patients because of the low sensitivity and specificity. In this study, cancer screening models based on multiple serum tumour markers were designed using machine learning methods, namely support vector machine (SVM), k-nearest neighbour (KNN), and logistic regression, to improve the screening performance for multiple cancers in a large asymptomatic population.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27355357 PMCID: PMC4927114 DOI: 10.1371/journal.pone.0158285
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Clinicopathological Information for the Training and Validation Sets.
| Variable | Training Set | Validation Set | p-value |
|---|---|---|---|
| No. of patients | 134 | 3097 | - |
| Age (yr) | 57. 19 (57) ± 14.05 | 50.59 (50) ± 12.33 | <0.001* |
| AFP (ng/mL) | 2176.31 (3.44) ± 25124.34 | 77.83 (3.24) ± 4111.53 | 0.336 |
| CEA (ng/mL) | 6.40 (2.3) ± 26.45 | 2.23 (1.84) ± 2.09 | 0.07 |
| CA19-9 (U/mL) | 15.72 (7.23) ± 32.25 | 8.38 (5.00) ± 12.01 | 0.01* |
| CYFRA21-1 (ng/mL) | 2.07 (1.80) ± 1.15 | 1.68 (1.47) ± 0.98 | <0.001* |
| SCC (ng/mL) | 0.71 (0.40) ± 0.91 | 0.62 (0.40) ± 0.48 | 0.243 |
| PSA (ng/ml) | 15.06 (1.04) ± 140.91 | 1.34 (0.80) ± 2.24 | 0.262 |
| No. of patients | 116 | 3801 | - |
| Age (yr) | 51.16 (51) ± 12.44 | 48.25 (48) ± 11.57 | 0.014* |
| AFP (ng/ml) | 4.06 (3.09) ± 6.59 | 3.54 (2.96) ± 4.07 | 0.405 |
| CEA (ng/ml) | 5.88 (1.47) ± 28.70 | 1.85 (1.28) ± 17.56 | 0.136 |
| CA19-9 (U/ml) | 11.58 (6.72) ± 13.09 | 16.15 (6.43) ± 302.69 | 0.367 |
| CYFRA21-1 (ng/ml) | 1.92 (1.46) ± 2.21 | 1.40 (1.23) ± 0.84 | 0.014* |
| SCC (ng/ml) | 0.60 (0.40) ± 0.89 | 0.51 (0.30) ± 0.59 | 0.325 |
| CA125 (U/ml) | 17.45 (9.875) ± 23.79 | 14.54 (10) ± 24.62 | 0.197 |
| CA15-3 (U/ml) | 10.36 (8.65) ± 5.07 | 9.75 (8.50) ± 4.77 | 0.206 |
Data are presented as mean (median) ± standard deviation. Significant differences are denoted by * (P < .05).
Occult Cancer Tumour Types for the Training and Validation sets.
| Tumour Type | Training Set | Validation Set | |||
|---|---|---|---|---|---|
| no. of tumours/total no. | (%) | no. of tumours/total no. | (%) | p-value | |
| Lung | 7/67 | 10.45% | 6/33 | 18.18% | 0.35 |
| Liver | 7/67 | 10.45% | 2/33 | 6.06% | 0.71 |
| Colorectal | 9/67 | 13.43% | 1/33 | 3.03% | 0.16 |
| Prostate | 14/67 | 20.90% | 3/33 | 9.09% | 0.17 |
| Thyroid | 4/67 | 5.97% | 2/33 | 6.06% | 1.00 |
| Gastric | 4/67 | 5.97% | 1/33 | 3.03% | 1.00 |
| Pancreas | 3/67 | 4.48% | 1/33 | 3.03% | 1.00 |
| Bile duct | 1/67 | 1.49% | 1/33 | 3.03% | 1.00 |
| Head & Neck | 5/67 | 7.46% | 5/33 | 15.15% | 0.29 |
| Urinary | 3/67 | 4.48% | 5/33 | 15.15% | 0.11 |
| Hematopoietic & lymphoid | 3/67 | 4.48% | 3/33 | 9.09% | 0.39 |
| Skin | 4/67 | 5.97% | 1/33 | 3.03% | 1.00 |
| CNS | 1/67 | 1.49% | 0/33 | 0% | 1.00 |
| Thymus | 0/67 | 0% | 1/33 | 3.03% | 0.33 |
| Unknown primary | 2/67 | 2.99% | 1/33 | 3.03% | 1.00 |
| Lung | 5/58 | 8.62% | 2/29 | 6.90% | 1.00 |
| Liver | 1/58 | 1.72% | 1/29 | 3.45% | 1.00 |
| Colorectal | 3/58 | 5.17% | 2/29 | 6.90% | 1.00 |
| Breast | 15/58 | 25.86% | 10/29 | 34.48% | 0.46 |
| Gynecologic | 13/58 | 22.41% | 7/29 | 24.14% | 1.00 |
| Thyroid | 7/58 | 12.07% | 6/29 | 20.69% | 0.34 |
| Gastric | 4/58 | 6.90% | 0/29 | 0.00% | 0.30 |
| Pancreas | 1/58 | 1.72% | 0/29 | 0.00% | 1.00 |
| Bile duct | 1/58 | 1.72% | 0/29 | 0.00% | 1.00 |
| Head & Neck | 2/58 | 3.45% | 0/29 | 0.00% | 0.55 |
| Urinary | 2/58 | 3.45% | 1/29 | 3.45% | 1.00 |
| Hematopoietic & lymphoid | 1/58 | 1.72% | 0/29 | 0.00% | 1.00 |
| Skin | 1/58 | 1.72% | 0/29 | 0.00% | 1.00 |
| Unknown primary | 2/58 | 3.45% | 0/29 | 0.00% | 0.55 |
Results of the Multivariate LR Analysis (Male).
| Variable | Coefficient | SE | Odds Ratio | 95% CI | p-value |
|---|---|---|---|---|---|
| AFP | .000 | .006 | 1.000 | .989–1.011 | 0.987 |
| CEA | .185 | .029 | 1.203 | 1.137–1.274 | <0.001* |
| CA19-9 | .005 | .002 | 1.005 | 1.001–1.009 | 0.020* |
| CYFRA21-1 | .409 | .063 | 1.505 | 1.331–1.701 | <0.001* |
| SCC | -.276 | .200 | .759 | .512–1.124 | 0.169 |
| PSA | .073 | .014 | 1.076 | 1.047–1.106 | <0.001* |
| Constant | -5.934 | .221 | .003 | <0.001* |
Significant differences are denoted by * (P < .05). (SE: standard error; CI: confidence interval)
Results of the Multivariate LR Analysis (Female).
| Variable | Coefficient | SE | Odds Ratio | 95% CI | p-value |
|---|---|---|---|---|---|
| AFP | .005 | .016 | 1.005 | .973–1.038 | 0.744 |
| CEA | -.003 | .003 | .997 | .991–1.003 | 0.282 |
| CA199 | .002 | .001 | 1.002 | .999–1.004 | 0.186 |
| CYFRA21-1 | .280 | .050 | 1.323 | 1.199–1.460 | <0.001* |
| SCC | .048 | .071 | 1.050 | .914–1.206 | 0.493 |
| CA125 | -.001 | .001 | .999 | 0.997–1.001 | 0.502 |
| CA15-3 | .038 | .017 | 1.039 | 1.006–1.074 | 0.020* |
| Constant | -5.799 | .235 | .003 | <0.001* |
Significant differences are denoted by * (P < .05). (SE: standard error; CI: confidence interval)
Fig 1(a) Variable Selection (Male). Evaluation of Youden index values (expressed as Youden index + 1) under various tumour markers combinations are displayed as the mean ± the standard deviation for each combination (as indicated). Significant differences are denoted by ★ (P < .05). (b) Variable Selection (Female). Evaluation of Youden index values (expressed as Youden index + 1) under different combinations of tumour markers are displayed as the mean ± the standard deviation for each combination (as indicated). Significant differences are denoted by ★ (P < .05).
Fig 2(a) ROC Curves of the Various Machine Learning Models for Cancer Screening (Male). (b) ROC Curves of the Various Tumour Markers for Cancer Screening (Male). (c) ROC Curves of the Various Machine Learning Models for Cancer Screening (Female). (d) ROC Curves of the Various Tumour Markers for Cancer Screening (Female).
AUC Values of Various Classifiers and Tumour Markers for Cancer Screening (Male).
| Classifier/Tumour marker | Area under the curve | 95% CI |
|---|---|---|
| SVM | .726 | .621-.831 |
| KNN | .727 | .630-.825 |
| LR | .766 | .676-.856 |
| CYFRA21-1 | .657 | .562-.752 |
| CEA | .639 | .538-.741 |
| AFP | .607 | .507-.706 |
| CA19-9 | .599 | .498-.701 |
| PSA | .568 | .454-.682 |
| SCC | .514 | .418-.609 |
CI: confidence interval
AUC Values of the Various Classifiers and Tumour Markers for Cancer Screening (Female).
| Classifier/Tumour marker | Area under the curve | 95% CI |
|---|---|---|
| SVM | .650 | .529-.771 |
| KNN | .699 | .594-.804 |
| LR | .649 | .528-.770 |
| CYFRA21-1 | .651 | .530-.771 |
| SCC | .610 | .518-.703 |
| CA15-3 | .583 | .459-.708 |
| CA125 | .576 | .472-.679 |
| CA19-9 | .572 | .456-.688 |
| CEA | .531 | .394-.668 |
| AFP | .504 | .403-.605 |
CI: confidence interval
Performance of the Various Methods for Cancer Screening (Male).
| Sensitivity (95% CI) | Specificity (95% CI) | PPV (95% CI) | NPV (95% CI) | Youden Index (95% CI) | |
|---|---|---|---|---|---|
| .758 (.612-.904) | .757 (.742-.772) | .032 (.020-.044) | .997 (.994-.999) | .514 (.403-.626) ** | |
| .515 (.345-.686) | .862 (.850-.874) | .039 (.020-.057) | .994 (.991-.997) | .377 (.230-.524) ** | |
| .485 (.315-.656) | .859 (.847-.871) | .036 (.019-.053) | .994 (.991-.997) | .344 (.197-.490) | |
| .515 (.345-.686) | .851 (.838-.864) | .036 (.019-.052) | .994 (.991-.997) | .366 (.220-.511) |
The Youden index values of the SVM, KNN, and LR models were compared with the combined test. Significantly higher differences are denoted by ** (P < .01).
Performance of the Various Methods for Cancer Screening (Female).
| Sensitivity (95% CI) | Specificity (95% CI) | PPV (95% CI) | NPV (95% CI) | Youden Index (95% CI) | |
|---|---|---|---|---|---|
| .517 (.335-.699) | .816 (.804-.828) | .016 (.007-.025) | .996 (.994-.998) | .347 (.198-.500) ** | |
| .655 (.482-.828) | .691 (.676-.706) | .021 (.013-.029) | .995 (.993-.998) | .333 (.213-.453) ** | |
| .517 (.335-.699) | .758 (.744-.772) | .016 (.008-.024) | .995 (.992-.998) | .275 (.137-.414) ** | |
| .345 (.172-.518) | .880 (.870-.890) | .022 (.009-.035) | .994 (.991-.997) | .225 (.073-.377) |
The Youden index values of the SVM, KNN, and LR models were compared with the combined test. Significantly higher differences are denoted by ** (P < .01).
RRR, ARR, and ARI of the Various Machine Learning Methods and the Combined Test (Male).
| RRR (95% CI) | ARR (95% CI) | ARI (95% CI) | |
|---|---|---|---|
| .758 (.623-.845) | .008 (.004-.012) | .241 (.226-.256) | |
| .515 (.317-.655) | .006 (.003-.008) | .137 (.124-.149) | |
| .485 (.280-.632) | .005 (.003-.008) | .140 (.128-.152) | |
| .515 (.317-.655) | .006 (.003-.008) | .148 (.135-.160) |
RRR, ARR, and ARI of the Various Machine Learning Methods and the Combined Test (Female).
| RRR (95% CI) | ARR (95% CI) | ARI (95% CI) | |
|---|---|---|---|
| .517 (.303-.665) | .004 (.002-.006) | .183 (.171-.195) | |
| .655 (.478-.772) | .005 (.003-.007) | .306 (.291-.321) | |
| .517 (.303-.665) | .004 (.002-.006) | .240 (.226-.254) | |
| .345 (.086-.531) | .003 (.001-.005) | .119 (.109-.129) |