| Literature DB >> 26240478 |
Chiwon Lee1, Jung Chan Lee2, Boyoung Park3, Jonghee Bae4, Min Hyuk Lim2, Daehee Kang5, Keun-Young Yoo5, Sue K Park5, Youdan Kim6, Sungwan Kim7.
Abstract
Breast cancer is the second leading cancer for Korean women and its incidence rate has been increasing annually. If early diagnosis were implemented with epidemiologic data, the women could easily assess breast cancer risk using internet. National Cancer Institute in the United States has released a Web-based Breast Cancer Risk Assessment Tool based on Gail model. However, it is inapplicable directly to Korean women since breast cancer risk is dependent on race. Also, it shows low accuracy (58%-59%). In this study, breast cancer discrimination models for Korean women are developed using only epidemiological case-control data (n = 4,574). The models are configured by different classification techniques: support vector machine, artificial neural network, and Bayesian network. A 1,000-time repeated random sub-sampling validation is performed for diverse parameter conditions, respectively. The performance is evaluated and compared as an area under the receiver operating characteristic curve (AUC). According to age group and classification techniques, AUC, accuracy, sensitivity, specificity, and calculation time of all models were calculated and compared. Although the support vector machine took the longest calculation time, the highest classification performance has been achieved in the case of women older than 50 yr (AUC = 64%). The proposed model is dependent on demographic characteristics, reproductive factors, and lifestyle habits without using any clinical or genetic test. It is expected that the model could be implemented as a web-based discrimination tool for breast cancer. This tool can encourage potential breast cancer prone women to go the hospital for diagnostic tests.Entities:
Keywords: Breast Neoplasms; Computers; Neural Networks; Support Vector Machines
Mesh:
Year: 2015 PMID: 26240478 PMCID: PMC4520931 DOI: 10.3346/jkms.2015.30.8.1025
Source DB: PubMed Journal: J Korean Med Sci ISSN: 1011-8934 Impact factor: 2.153
Fig. 1Incidence rates of breast cancer (in 2008): Korean women vs white women in the USA (1415).
Fig. 2Artificial neural network (ANN) structure. AFFP, age of first full-term pregnancy; NOC, number of children; AOMn, age of menarche; BMI, body mass index; FMH, family medical history of breast cancer; MS, menopausal status; RM, regular mammography; RE, regular exercise; ED, estrogen duration.
Fig. 3Naive structure of a Bayesian network (BN).
Major risk factors selected from the trial set of the Seoul Breast Cancer Study (SeBCS)
| Risk factors | Total set (n=4,574) | U50 (n=2,622) | O50 (n=1,952) | |||
|---|---|---|---|---|---|---|
| Cases | Controls | Cases | Controls | Cases | Controls | |
| Age of first full-term pregnancy, yr (No. %) | ||||||
| No children | 208 (9.08) | 183 (8.02) | 157 (11.95) | 161 (12.31) | 51 (5.22) | 22 (2.26) |
| <24 (early pregnancy) | 459 (20.03) | 579 (25.36) | 154 (11.72) | 231 (17.66) | 305 (31.22) | 348 (35.69) |
| 24 to 28 | 1,191 (51.99) | 1,157 (50.68) | 708 (53.88) | 667 (50.99) | 483 (49.44) | 490 (50.26) |
| ≥28 (late pregnancy) | 433 (18.9) | 364 (15.94) | 295 (22.45) | 249 (19.04) | 138 (14.12) | 115 (11.79) |
| Number of children, (No. %) | ||||||
| 0 (no childbirth) | 239 (10.43) | 204 (8.94) | 178 (13.55) | 176 (13.45) | 61 (6.24) | 28 (2.87) |
| 1 to 2 | 1,458 (63.64) | 1,392 (60.97) | 993 (75.57) | 965 (73.78) | 465 (47.6) | 427 (43.8) |
| ≥3 (many childbirths) | 594 (25.93) | 687 (30.09) | 143 (10.88) | 167 (12.77) | 451 (46.16) | 520 (53.33) |
| Age of menarche, yr (mean±SD) | 15.03±1.75 | 15.23±1.82 | 14.57±1.55 | 14.69±1.67 | 15.67±1.81 | 15.96±1.77 |
| Age of menarche, yr (No. %) | ||||||
| ≤15 (early menarche) | 1,492 (65.12) | 1,351 (59.18) | 1,006 (76.56) | 951 (72.71) | 486 (49.74) | 400 (41.03) |
| >15 (late menarche) | 799 (34.88) | 932 (40.82) | 308 (23.44) | 357 (27.29) | 491 (50.26) | 575 (58.97) |
| Body mass index (mean±SD) | 23.16±3.11 | 23.04±2.99 | 22.40±2.89 | 22.44±2.93 | 24.17±3.10 | 23.83±2.88 |
| Body mass index (No. %) | ||||||
| <25 (underweight woman) | 1,709 (74.6) | 1,753 (76.78) | 1,081 (82.27) | 1,087 (83.11) | 628 (64.28) | 666 (68.31) |
| 25 to 30 | 520 (22.7) | 486 (21.29) | 216 (16.44) | 202 (15.44) | 304 (31.11) | 284 (29.13) |
| ≥30 (overweight woman) | 62 (2.7) | 44 (1.93) | 17 (1.29) | 19 (1.45) | 45 (4.61) | 25 (2.56) |
| Family medical history of breast cancer (No. %) | ||||||
| Yes | 97 (4.23) | 52 (2.28) | 58 (4.41) | 30 (2.29) | 39 (3.99) | 22 (2.26) |
| No | 2,194 (95.77) | 2,231 (97.72) | 1,256 (95.59) | 1,278 (97.71) | 938 (96.01) | 953 (97.74) |
| Menopausal status (No. %) | ||||||
| Premenopausal | 165 (7.2) | 295 (12.92) | 112 (8.52) | 165 (12.61) | 53 (5.42) | 130 (13.33) |
| Postmenopausal | 2,126 (92.8) | 1,988 (87.08) | 1,202 (91.48) | 1,143 (87.39) | 924 (94.58) | 845 (86.67) |
| Regular mammography (No. %) | ||||||
| No | 1,275 (55.65) | 957 (41.92) | 709 (53.96) | 537 (41.06) | 566 (57.93) | 420 (43.08) |
| Regular | 1,016 (44.35) | 1,326 (58.08) | 605 (46.04) | 771 (58.94) | 411 (42.07) | 555 (56.92) |
| Regular exercise (No. %) | ||||||
| No | 1,429 (62.37) | 1,283 (56.2) | 801 (60.96) | 737 (56.35) | 628 (64.28) | 546 (56) |
| Regular | 862 (37.63) | 1,000 (43.8) | 513 (39.04) | 571 (43.65) | 349 (35.72) | 429 (44) |
| Estrogen duration, yr (mean±SD) | 28.57±7.46 | 27.80±7.14 | 24.61±5.54 | 24.40±5.48 | 33.88±6.32 | 32.35±6.56 |
| Estrogen duration, yr (No. %) | ||||||
| ≤10 | 13 (0.57) | 12 (0.53) | 11 (0.84) | 11 (0.84) | 2 (0.2) | 1 (0.1) |
| 10 to 15 | 61 (2.66) | 70 (3.07) | 56 (4.26) | 60 (4.59) | 5 (0.51) | 10 (1.03) |
| 15 to 20 | 230 (10.04) | 245 (10.73) | 217 (16.51) | 217 (16.59) | 13 (1.33) | 28 (2.87) |
| 20 to 25 | 393 (17.15) | 427 (18.7) | 343 (26.1) | 358 (27.37) | 50 (5.12) | 69 (7.08) |
| 25 to 30 | 560 (24.44) | 621 (27.2) | 432 (32.88) | 444 (33.94) | 128 (13.1) | 177 (18.15) |
| 30 to 35 | 626 (27.32) | 619 (27.11) | 247 (18.8) | 218 (16.67) | 379 (38.79) | 401 (41.13) |
| 35 to 40 | 296 (12.92) | 211 (9.24) | 8 (0.61) | 0 (0) | 288 (29.48) | 211 (21.64) |
| 40 to 45 | 66 (2.88) | 44 (1.93) | 0 (0) | 0 (0) | 66 (6.76) | 44 (4.51) |
| 45 to 50 | 32 (1.4) | 23 (1.01) | 0 (0) | 0 (0) | 32 (3.27) | 23 (2.36) |
| 50 to 55 | 7 (0.31) | 5 (0.22) | 0 (0) | 0 (0) | 7 (0.72) | 5 (0.51) |
| >55 | 7 (0.31) | 6 (0.26) | 0 (0) | 0 (0) | 7 (0.72) | 6 (0.62) |
U50, under 50 yr old group; O50, equal to or over 50 yr old group.
Optimal combinations of risk factors. Accuracy with sensitivity and specificity is presented as the mean and 95% confidence interval (CI) of the maximum values at each receiver operating characteristic (ROC) analysis. The area under the curve (AUC) of the ROC curve and iterative calculation time are presented according to classification algorithms and age division models
| CA | Age group | Risk factors | NSF | Accuracy mean | Sensitivity mean | Specificity mean | AUC mean | SICT (s) | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AFFP | NOC | AOMn | BMI | FMH | MS | RM | RE | ED | ||||||||
| SVM | ALL | O | O | O | O | O | O | O | O | O | 9 | 0.6041 | 0.5506 | 0.6578 | 0.6213 | 16.1134 |
| U50 | O | O | O | O | X | O | O | O | O | 8 | 0.5944 | 0.6106 | 0.5781 | 0.6076 | 4.6627 | |
| O50 | O | O | O | O | O | O | O | O | X | 8 | 0.6133 | 0.5871 | 0.6394 | 0.6415 | 2.5091 | |
| ANN | ALL | O | O | O | O | O | O | O | O | O | 9 | 0.6013 | 0.5523 | 0.6505 | 0.6173 | 9.9536 |
| U50 | O | O | O | X | O | O | O | O | O | 8 | 0.5977 | 0.6096 | 0.5858 | 0.6060 | 5.4916 | |
| O50 | O | O | O | O | O | X | O | O | O | 8 | 0.6230 | 0.5711 | 0.6750 | 0.6383 | 3.9561 | |
| BN | ALL | O | X | O | O | O | O | O | O | O | 8 | 0.5948 | 0.5694 | 0.6204 | 0.6101 | 2.9548 |
| U50 | O | X | O | X | O | O | O | O | O | 7 | 0.5928 | 0.6192 | 0.5663 | 0.6027 | 1.5560 | |
| O50 | X | O | O | O | O | O | O | O | O | 8 | 0.6117 | 0.5401 | 0.6833 | 0.6290 | 1.2727 | |
CA, classification algorithms; AFFP, age of first full-term pregnancy; NOC, number of children; AOMn, age of menarche; BMI, body mass index; FMH, family medical history of breast cancer; MS, menopausal status; RM, regular mammography; RE, regular exercise; ED, estrogen duration; NSF, Number of selected factors; SICT, single iterative calculation time; SVM, support vector machine; ANN, artificial neural network, BN, Bayesian network; ALL, all ages; U50, under 50 years old group; O50, equal to or over 50 years old group; O, risk factor included in the model; X, risk factor not included in the model.
Fig. 4Receiver operating characteristic (ROC) curves according to the classification algorithms and age division models. (A) Support Vector Machine (SVM). (B) Artificial Neural Network (ANN). (C) Bayesian Network (BN).
Fig. 5Contribution of a specific risk factor on the area under curve (AUC). AFFP, age of first full-term pregnancy; NOC, number of children; AOMn, age of menarche; BMI, body mass index; FMH, family medical history of breast cancer; MS, menopausal status; RM, regular mammography; RE, regular exercise; ED, estrogen duration; SVM, support vector machine; ANN, artificial neural network, BN, Bayesian network; U50, under 50 yr old group; O50, equal to or over 50 yr old group.