| Literature DB >> 35116736 |
Chunyan Qiu1, Lingong Jiang1, Yangsen Cao1, Can Hu1, Yiyi Yu2, Huojun Zhang1.
Abstract
BACKGROUND: De novo metastasis of breast cancer is a complex clinical issue to be identified. This study was the first to construct artificial neural networks (ANN) and logistic regression (LR) models with comparison to find out important factors associated with occurrence of de novo metastasis in invasive breast cancer.Entities:
Keywords: Artificial neural network (ANN); de novo metastatic disease; invasive breast cancer; logistic regression (LR)
Year: 2019 PMID: 35116736 PMCID: PMC8797980 DOI: 10.21037/tcr.2019.01.01
Source DB: PubMed Journal: Transl Cancer Res ISSN: 2218-676X Impact factor: 1.241
Figure 1Flow chart of models construction and evaluation. LR, logistic regression; ANN, artificial neural network.
Patients’ characteristics stratified by de novo metastatic disease
| Variables | Non-metastasis (N=38,617) | Metastasis (N=2,282) | P value# |
|---|---|---|---|
| Race | <0.001 | ||
| White | 30,653 (79.4) | 1,727 (75.7) | |
| Black | 4,128 (10.7) | 374 (16.4) | |
| Asian | 3,539 (9.2) | 175 (7.7) | |
| Unknown | 297 (0.8) | 6 (0.3) | |
| Histology | <0.001 | ||
| Invasive ductal carcinoma | 30,272 (78.4) | 1,472 (64.5) | |
| Invasive lobular carcinoma | 3,475 (9.0) | 254 (11.1) | |
| Invasive tubular carcinoma | 248 (0.6) | 0 (0.0) | |
| Invasive mixed ductal, lobular carcinoma | 2,238 (5.8) | 93 (4.1) | |
| Mucinous carcinoma | 785 (2.0) | 10 (0.4) | |
| Other invasive carcinoma | 1,592 (4.1) | 451 (19.8) | |
| Paget’s disease | 7 (0.0) | 2 (0.1) | |
| Primary site | <0.001 | ||
| Nipple and central portion | 1,966 (5.1) | 138 (6.0) | |
| Upper-inner quadrant | 4,531 (11.7) | 132 (5.8) | |
| Lower-inner quadrant | 2,214 (5.7) | 70 (3.1) | |
| Upper-outer quadrant | 13,114 (34.0) | 483 (21.2) | |
| Lower-outer quadrant | 2,815 (7.3) | 88 (3.9) | |
| Axillary tail | 186 (0.5) | 14 (0.6) | |
| Overlapping | 8,169 (21.2) | 417 (18.3) | |
| Unknown | 5,622 (14.6) | 940 (41.2) | |
| Tumor grade | <0.001 | ||
| Well differentiated | 8,067 (20.9) | 124 (5.4) | |
| Moderately differentiated | 15,694 (40.6) | 697 (30.5) | |
| Poorly differentiated | 12,177 (31.5) | 818 (35.8) | |
| Undifferentiated, anaplastic | 174 (0.5) | 25 (1.1) | |
| Unknown | 2,505 (6.5) | 618 (27.1) | |
| Laterality | <0.001 | ||
| Left side | 19,482 (50.4) | 1,053 (46.1) | |
| Right side | 18,701 (48.4) | 1,069 (46.8) | |
| Bilateral | 9 (0.0) | 22 (1.0) | |
| Unknown | 425 (1.1) | 138 (6.0) | |
| Regional lymph nodes status | <0.001 | ||
| N0 | 24,806 (64.2) | 439 (19.2) | |
| N1 | 8,254 (21.4) | 696 (30.5) | |
| N2 | 2,111 (5.5) | 198 (8.7) | |
| N3 | 1,293 (3.3) | 612 (26.8) | |
| Nx | 2,153 (5.6) | 337 (14.8) | |
| ER status | <0.001 | ||
| Negative | 6,505 (16.8) | 493 (21.6) | |
| Positive | 30,096 (77.9) | 1,478 (64.8) | |
| Borderline | 36 (0.1) | 1 (0.0) | |
| Unknown | 1,980 (5.1) | 310 (13.6) | |
| PR status | <0.001 | ||
| Negative | 10,539 (27.3) | 806 (35.3) | |
| Positive | 25,725 (66.6) | 1,137 (49.8) | |
| Borderline | 114 (0.3) | 9 (0.4) | |
| Unknown | 2,239 (5.8) | 330 (14.5) | |
| Her-2 status | <0.001 | ||
| Negative | 29,313 (75.9) | 1,376 (60.3) | |
| Positive | 5,335 (13.8) | 448 (19.6) | |
| Borderline | 908 (2.4) | 70 (3.1) | |
| Unknown | 3,061 (7.9) | 388 (17.0) | |
| Tumor size (mm) | 22.38±26.03 | 42.48±38.11 | <0.001 |
| Bloom-Richardson (Nottingham) score | 6.31±1.53 | 6.53±1.18 | <0.001 |
| Number of positive ipsilateral axillary lymph nodes | 1.17±3.07 | 2.19±4.02 | <0.001 |
| Lymph node ratio (LNR) | 0.11±0.21 | 0.24±0.28 | <0.001 |
Data are presented as n (%) or mean ± std. #, P value of the difference of categorical variables is calculated by Chi-square test. P value of the difference of continuous variables is calculated by t-test.
Comparison of the LR model and ANN models
| Variable | LR models (mean ± std) | ANN models (mean ± std) |
|---|---|---|
| Sensitivity | 99.5%±0.1% | 83.1%±0.9% |
| Specificity | 16.7%±2.3% | 88.0%±2.9% |
| PPV | 95.4%±0.4% | 99.1%±0.2% |
| NPV | 66.6%±6.6% | 23.7%±1.9% |
| Accuracy | 95.0%±0.4% | 83.4%±0.8% |
| AUC | 0.844±0.011 | 0.917±0.01 |
LR, logistic regression; ANN, artificial neural network; PPV, positive predictive value; NPV, negative predictive value; AUC, area under ROC (receiver operating curve).
Multivariate logistic regression analysis of factors associated with de novo metastasis
| Variable | B | S.E. | Sig. | Exp(B) | 95% CI for Exp(B) |
|---|---|---|---|---|---|
| Race: black (contrast: white) | 0.179 | 0.072 | 0.013 | 1.196 | 1.037–1.375 |
| Histology: invasive lobular carcinoma (contrast: invasive ductal carcinoma) | 0.400 | 0.084 | <0.001 | 1.492 | 1.262–1.756 |
| Primary site: upper-outer quadrant (contrast: nipple and central portion) | −0.381 | 0.116 | 0.001 | 0.683 | 0.546–0.861 |
| Tumor grade: undifferentiated, anaplastic (contrast: well differentiated) | 1.491 | 0.287 | <0.001 | 4.440 | 2.474–7.657 |
| Laterality: bilateral (contrast: left side) | 2.783 | 0.473 | <0.001 | 16.176 | 6.665–43.674 |
| Regional lymph nodes status: N1 (contrast: N0) | 1.431 | 0.073 | <0.001 | 4.183 | 3.629–4.825 |
| ER status: ER (+) [contrast: ER(−)] | 0.244 | 0.086 | 0.005 | 1.276 | 1.078–1.509 |
| PR status: PR (+) [contrast: PR(−)] | −0.293 | 0.073 | <0.001 | 0.746 | 0.646–0.862 |
| Her-2 status: Her-2(+) [contrast: Her-2(−)] | 0.245 | 0.069 | <0.001 | 1.278 | 1.114–1.463 |
| Tumor size | 0.006 | 0.001 | <0.001 | 1.006 | 1.005–1.008 |
| Bloom-Richardson (Nottingham) score | −0.083 | 0.025 | <0.001 | 0.921 | 0.878–0.967 |
| Number of positive ipsilateral axillary lymph nodes | −0.165 | 0.011 | <0.001 | 0.848 | 0.830–0.866 |
| Lymph node ratio (LNR) | 0.525 | 0.121 | <0.001 | 1.691 | 1.332–2.136 |
B, coefficient values; S.E., standard error; Sig., significant value; Exp(B), odds ratio; CI, confidence interval.
Figure 2The structure of the ANN model. ANN, artificial neural network.
Figure 3The normalized importance of input variables in predicting metastasis in ANN model. ANN, artificial neural network.
Figure 4Comparison of logistic regression and artificial neural network models. LR, logistic regression; ANN, artificial neural network; PPV, positive predictive value; NPV, negative predictive value; AUC, area under ROC curve. ***, P<0.001.
Experiment environment and processing time of LR and ANN models
| Aspects | LR models | ANN models |
|---|---|---|
| Experiment environment | ||
| System | Windows 7 64-bits | |
| processor | Intel® Core™ i5-6200U CPU @ 2.30 GHz | |
| RAM | 8.00 GB | |
| Software | R version 3.0.0 | |
| Processing time | 15 minutes for 10-fold cross-validation | 14,400 minutes for 10-fold cross-validation |
LR, logistic regression; ANN, artificial neural network.