Literature DB >> 35089226

An app to classify a 5-year survival in patients with breast cancer using the convolutional neural networks (CNN) in Microsoft Excel: Development and usability study.

Cheng-Yao Lin^1,2,3, Tsair-Wei Chien⁴, Yen-Hsun Chen⁵, Yen-Ling Lee^6,7, Shih-Bin Su⁸.

Abstract

BACKGROUND: Breast cancer (BC) is the most common malignant cancer in women. A predictive model is required to predict the 5-year survival in patients with BC (5YSPBC) and improve the treatment quality by increasing their survival rate. However, no reports in literature about apps developed and designed in medical practice to classify the 5YSPBC. This study aimed to build a model to develop an app for an automatically accurate classification of the 5YSPBC.
METHODS: A total of 1810 patients with BC were recruited in a hospital in Taiwan from the secondary data with codes on 53 characteristic variables that were endorsed by professional staff clerks as of December 31, 2019. Five models (i.e., revolution neural network [CNN], artificial neural network, Naïve Bayes, K-nearest Neighbors Algorithm, and Logistic regression) and 3 tasks (i.e., extraction of feature variables, model comparison in accuracy [ACC] and stability, and app development) were performed to achieve the goal of developing an app to predict the 5YSPBC. The sensitivity, specificity, and receiver operating characteristic curve (area under ROC curve) on models across 2 scenarios of training (70%) and testing (30%) sets were compared. An app predicting the 5YSPBC was developed involving the model estimated parameters for a website assessment.
RESULTS: We observed that the 15-variable CNN model yields higher ACC rates (0.87 and 0.86) with area under ROC curves of 0.80 and 0.78 (95% confidence interval 0.78-82 and 0.74-81) based on 1357 training and 540 testing cases an available app for patients predicting the 5YSPBC was successfully developed and demonstrated in this study.
CONCLUSION: The 15-variable CNN model with 38 parameters estimated using CNN for improving the ACC of the 5YSPBC has been particularly demonstrated in Microsoft Excel. An app developed for helping clinicians assess the 5YSPBC in clinical settings is required for application in the future.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35089226 PMCID： PMC8797502 DOI： 10.1097/MD.0000000000028697

Source DB: PubMed Journal: Medicine (Baltimore) ISSN： 0025-7974 Impact factor: 1.889

Using the convolutional neural networks (CNN) to develop an app to classify a 5-year survival in patients with breast cancer has not be reported in literature. For the purpose of shared decision-making in medicine (SDM) effectively implemented in healthcare setting, the study provides an app for helping physicians and technicians discuss procedures with patients and their family members as a communication tool used in hospital. The way to evaluate the survivals in patients with breast cancer can be applied to other cancer-related patients using CNN-based app to predict the 5YSPBC in the future.

Introduction

Breast cancer (BC) is the most common malignant cancer in women. Over 173,541 articles were found in search for “breast cancer” in the title and 70 documents addressing prediction and survival rate as of September 18, 2020.[ However, none were using the neural network technique to predict survival of patients with BC up to several years (e.g., 5 years) after its initial diagnosis.

The 5-year survival in patients with BC

BC is the leading cause of cancer deaths in women, accounting for >1.6% of deaths worldwide,[ and seems to be the most common cause of death in female patients aging 40 to 55 years.[ Around 5.8 million women will die from BC by 2025, which is equivalent to 28 women being diagnosed every day and nearly 2000 death from BC every day in the United States.[ Based on the study[ proposing BC prediction and diagnosis system based on the Rough Set, whether a predictive model can be developed and designed along with an app for classifying 5-year survival in patients with BC (5YSPBC) used in clinical settings should be verified.

Factors are complex in the 5YSPBC prediction

BC incidence increases with age; however, patients with comorbidities would significantly influence their treatment alternatives and overall survival.[ Prognostic factors and survival for patients with BC are important, but its cause is difficult to understand. BC survival causes are worthy of investigation and exploration, and app development for clinicians and patients. Correct diagnosis and treatment would be interacted on a good prognosis.[ Thus, no medical guarantee for the success rate of each treatment reaching 100% of effect for survival. An accurate prediction of survival classification for patients with BC surgery (and/or other factors) after 5 years is required to develop shared decision-making in medicine (SDM). In which both the patient and the physician contribute to the SDM process and agree on treatment decisions.[ As such, healthcare providers can explain treatments and alternatives to patients and help patients and their family members choose appropriate treatments that best align their preferences as well as their unique cultural and personal beliefs.[ Accordingly, the development of app is required to assist the communication of SDM between patients and professional clinicians. In recent years, with the progress of information technology, computer science, particularly with machine learning techniques, has been widely launched onto medicine.[ Computer algorithms with high accuracy (ACC) in prediction can be designed and used by professionals for the timely detection of BC survival.[ Information technology application to construct empirical medicine, establish clinical medical treatment guidelines, and apply disease prognosis prediction (or survival as proposed in this study) are urgently essential and necessary to assist patients and physicians in clinical decision-making accordingly.[

Online classification using smartphones

The use of sophisticated modeling approaches using machine learning (ML),[ such as artificial neural network (ANN),[ convolutional neural network (CNN),[ Naïve Bayes (NB),[ K-nearest Neighbors Algorithm (KNN),[ and logistic regression (LR)[ are required to extract meaningful information from biology datasets and play an important role in healthcare settings.[ Whether the CNN can improve the prediction ACC on the classification of the 5YSPBC higher than other algorithms as reported in the increased CNN of 7.14%[ is worthy of study and verification. The first hypothesis of CNN with higher predictive ACC than other model algorithms was made for verification. As with all forms of web-based technology, advances in mobile health communication technology are rapidly increasing.[ Until now, no smartphone app to classify the 5YSPBC using the ML technique has been proposed in the literature. Therefore, ML is used to create a model to predict the 5YSPBC and improve the treatment quality of increasing the survival rate of patients with BC. The second hypothesis of app developed for helping clinicians assessing the 5YSPBC in hospital settings was proposed in this study.

Study aims

This study aimed to build an ML model to develop an app to automatically and accurately classify the 5YSPBC and improve the treatment quality and increase the survival rate of patients with BC. Two hypotheses were made for verification, including the CNN has higher predictive ACC than other model algorithms and an app that can be developed and designed to help clinicians in assessing the 5YSPBC.

Methods

Data source

A total of 1810 patients with BC were included in this study. Secondary data came from the registration cancer center of Liuying Chi-Mei hospital in southern Taiwan with complete variable assessment evaluated by professional staff clerks as of December 31, 2019; see those 53 characteristic variables illustrated in Supplemental Digital Content 1. Wherein several feature variables would be extracted from 53 eligible variables (shown in Supplemental Digital Content 1). The coding scheme (assigned by professional staff clerks) was illustrated in Table 1 and Supplemental Digital Content 1. All missing data were filled out with further discussion in the study team with proper judgment.

Table 1

Illustration of the coding schema of response in this study.

Variable and coding schema	n	%
Age
1. <40 yrs	261	14.42
2. 41–50 yrs	614	33.92
3. 51–60 yrs	512	28.29
4. >60 yrs	423	23.37
Pathologic M
1: M0	985	54.42
2: M1	33	1.82
3: M1b uncertain	706	39.01
4: M1c (found during or after surgery but not confirmed by pathology M1c)	1	0.06
5: Null	85	4.7
Clinical N
1: N0	1144	63.2
2: N1	381	21.05
3: N2	50	2.76
4: N2a	4	0.22
5: N3	19	1.05
6: N3a	2	0.11
7: N3b	3	0.17
8: N3c	12	0.66
9. Null	195	10.77
Clinical M
1: M0	1533	84.7
2: M1	105	5.8
3. Null	172	9.5
Cancer status
1: There is no evidence of this primary cancer	1488	82.21
2: There is such primary cancer clinically	322	17.79
Vital status
0: Non-survival	298	16.46
1: Survival	1512	83.54

Note. The coding schema is referred to as Supplemental Digital Content 1.

Illustration of the coding schema of response in this study. Note. The coding schema is referred to as Supplemental Digital Content 1. This study was approved and monitored by the institutional review board (11004-L04) from Liuying Chi-Mei hospital. All patient identifiers were stripped before conducting this study.

Feature variables (Task 1)

A multidisciplinary team was established, including physicians and specialists in BC, data scientists, information engineers, nurse practitioners (e.g., case-management clerks for patients with BC), and quality managers for this study on artificial intelligence implementation of BC-survival-prediction model development. Inclusion criteria of cases were determined by the multidisciplinary team—patients without complete records in the first 5 years after diagnosis were excluded. Feature variables were extracted from 53 items (shown in Supplemental Digital Content 1) that were determined by the multidisciplinary team using the Weka software[ (Orlando, FL, USA) via the following steps: standardize each variable to the mean (0) and standard deviation (i.e., SD = 1), use the search method ([Select Attributes]/[InfoGainAttributeEval] [Attribute Evaluator]/[Ranker][Search Method]), use full training sets, and click on the suggested feature items.[ Forest plots[ were drawn to present the extracted feature variables. The standard mean difference (SMD) method was utilized to compare differences in variables alone (such as the t test) and with hospital types (such as an analysis of variance) using the forest plot.[ The Chi-square test was conducted to assess the heterogeneity between variables. The forest plots (confidence interval [CI] plot) were drawn to display the effect estimates and their CIs for each study.

Comparison of model accuracy and prediction stability in prediction and classification (Task 2)

Comparison of 5 proposed models

Five predictive models (i.e., ANN,[ CNN,[ NB,[ KNN,[ and LR[) along with 2 scenarios (i.e., accuracies in training and testing sets, accounting for 70% and 30%, respectively) were compared. All 5 modules, their algorithmic process, and MP4 abstract video are illustrated and deposited in Supplemental Digital Content 1.

Model building and scenarios in comparison

This study is focused on model ACC (e.g., >0.8) and prediction stability (or generalizability, e.g., the consistency between training and test sets, e.g., >0.7) out of various perspectives, such as model feasibility, efficacy, and efficiency, using the maximum receiver operating characteristic curve (area under ROC curve [AUC]) between models using the training cases to predict the learning cases; see the following steps to create the prediction models and design the scenarios in comparison:

Models in comparison

These five models were analyzed using 2 scenarios previously mentioned. A total of 1810 BC cases were randomly split into training and test sets in a proportion of 70% (n = 1267) to 30% (n = 543), where the training set was used to predict the testing set. The ACC (e.g., sensitivity [SENS], specificity [SPEC], ACC, precision [PREC], and AUC [>0.80]) and stability (or generalizability, e.g., using the training set to predict the test set evaluated by observing the AUC) were verified (e.g., AUC >0.70). The training and testing sets in models are provided in Supplemental Digital Content 1.

Data presentations of model accuracy in models

Accuracy was determined by observing the high AUC along with indicators of SENS, SPEC, and ACC, etc, in models. The definitions are listed as follows:

Selection of prediction models referring to accuracy in algorithms

Comparison of accuracies in training and testing sets was made by observing high indicators of AUC among the 5 models to better understand its effectiveness and efficacy. That is, all indicators are based on high AUC rather than ACC in Eq. (12) since imbalanced-class data exist in this study (e.g., 83.54% vs 16.46% of the ratio between survival and non-survival shown in Table 1). High ACC rates with imbalanced SENS and SPEC are expected in imbalanced-class data using traditional approaches.[ Accordingly, we applied the minimization of average model residuals in both classes to obtain balanced SENS and SPEC and overcome the disadvantage of high ACC rates (i.e., minimum residuals minimized by the formula of average [residuals in survival group] + average [residuals in the non-survival group]) as did in the previous study.[ Notably, a balanced SENS and SPEC using professional ML software are hard to gain when an imbalanced number of classes exist, unless the method of minimizing model residuals is controlled by the user as the module provided in Supplemental Digital Content 1.

Developing an app for classifying the 5YSPBC (Task 3)

An app for classifying the 5YSPSC was designed and developed. Model parameters were embedded in the computer module. Results of classification (i.e., survival and non-survival) instantly appear on smartphones. The visual representation with binary (i.e., survival and non-survival) categorical probabilities is shown on a dashboard displayed on Google Maps.

Statistical tools and data analysis

International Business Machines Statistical Package for the Social Sciences 22.0 for Windows (SPSS Inc., Chicago, IL) and MedCalc 9.5.0.0 for Windows (MedCalc Software, Ostend, Belgium) was used to obtain the descriptive statistics and frequency distributions among groups and compute the model prediction indicators expressed in Eqs. (1)–(12). The significance level of type I errors was set at 0.05. All those proposed models were performed on MS Excel; see Supplemental Digital Content 1. A visual representation of the classification was plotted using 2 curves based on the probability theory of the Rasch model.[ Three tasks of data representations under 5 proposed models are involved in obtaining results; see the study flowchart in Fig. 1.

Figure 1

Study flowchart.

Results

Demographical patient characteristics

A total of 1810 patients with BC (5YSPBC survival at 83.54%, n = 1512) were recruited in this study. The most number of patients in groups were 33.92% (n = 614) aged 41 to 50 years, 30.22% (n = 547) in body mass index group ranging from 18.5 to 24, 92.87% (n = 1681) classified as first malignant tumor or carcinoma in patient's life, 30.61% (n = 554) diagnosed with primary site at the upper-outer quadrant of the breast, 22.04% (n = 399) with pathologic stage at Stage IIA, and 54.42% (n = 985) with pathologic M at M0. Other demographical characteristics are included in Supplemental Digital Content 1.

Task 1: Feature variables extracted from data

Of the original 53 variables, 15 feature variables in Table 2 were extracted using the Weka software.[Figure 2[ shows the SMD method used in the meta-analysis.[ We can see that 3 variables at the top panel in Fig. 2 (denoted by P in the far-left column) have the characteristics of higher values toward non-survival, 9 variables (denoted by N in the far-left column) are negative toward non-survival (i.e., the smaller, the better), and 3 variables are the delay-effect indicators not used for predicting BC, but for BC survival classification, with negative characteristics toward non-survival as well. The overall effect presents a significant statistical difference between the 2 groups of survival and non-survival.

Table 2

The 15 variables extracted from the 53 eligible variables.

Stage of variable	No.	Variable	Definition
TNM	1	Pathologic M	Refers to whether there is a remote transfer
Characteristics	2	Body BMI	kg/m²
TNM	3	Scope of regional lymph node surgery at this facility	The scope of simultaneous removal, slicing, or aspiration of regional lymph nodes during the primary site operation or another independent operation in the reporting hospital
Treatment	4	Whether the surgery of the primary site	Have clear surgical records and dates
TNM	5	Clinical N	Refers to whether there is regional lymph node metastasis and the extent of metastasis
TNM	6	Regional lymph nodes positive	Total number of regional lymph nodes positive by a pathologist
TNM	7	Clinical M	Refers to whether there is a remote transfer
Treatment	8	Reason for no surgery of primary site	The reason why the case was not operated on the primary site in any medical institution
Cancer	9	Grade/Differentiation	Describe the similarity of tumors and normal tissues
Cancer	10	Tumor size	The largest size or diameter of the primary tumor
Treatment	11	Surgical margins of the primary site	The final state of the surgical margin after resection of the primary tumor
TNM	12	Clinical stage group	Based on clinical T, N, and M to determine the degree of disease invasion on the anatomical site.
Recurrence	13	Date of first recurrence	From the date of the first diagnosis to the date of confirmation of the relapse diagnosis
Recurrence	14	Type of first recurrence	The case records the type of the first relapse after a period of disease-free intermission or remission
Recurrence	15	Cancer status	Whether the case had cancer at the “last contact or death date”

Note. The coding schema is referred to as Supplemental Digital Content 1.

Figure 2

Feature variable comparison between two groups (survival vs non-survival) using the forest plot.

The 15 variables extracted from the 53 eligible variables. Note. The coding schema is referred to as Supplemental Digital Content 1. Feature variable comparison between two groups (survival vs non-survival) using the forest plot.

Task 2: Comparisons of accuracies in training and test samples

When comparing the 5 models with the 1267 and 543 cases in the training and testing sets, the KNN3 (i.e., in the least 3 distances with the highest ACC among all possible selections of least distances using the mode function in Microsoft Excel to determine the classification of survival or non-survival) model has higher AUC (=0.85) than others, indicating that the KNN3 model has higher ACC in the training set. However, the CNN has the highest prediction stability (i.e., AUC = 0.78) than other counterparts (see the AUC in rows of testing in Table 3).

Table 3

Comparison of predictive models across indicators of accuracy and AUC.

Model	n	Sensitivity	Specificity	Precision	F₁ score	Accuracy	AUC	95% CI
ANN
Training	1357	0.87	0.58	0.88	0.87	0.8	0.72	0.7–0.75
Testing	540	0.83	0.49	0.9	0.86	0.78	0.66	0.62–0.70
CNN
Training	1357	0.92	0.68	0.91	0.78	0.87	0.8	0.78–0.82
Testing	540	0.9	0.66	0.93	0.76	0.86	0.78	0.74–0.81
KNN3
Training	1357	0.93	0.77	0.94	0.93	0.89	0.85	0.83–0.87
Testing	540	0.87	0.6	0.93	0.9	0.82	0.73	0.69–0.77
LR
Training	1357	0.97	0.52	0.88	0.68	0.87	0.75	0.72–0.77
Testing	540	0.96	0.54	0.92	0.69	0.89	0.75	0.71–0.79
Bayes
Training	1357	0.83	0.61	0.88	0.7	0.78	0.72	0.69–0.74
Testing	540	0.72	0.76	0.94	0.74	0.72	0.74	0.70–0.78

The highest model stability (=0.78) from the testing set predicted by the training set is from the CNN model.

ANN = artificial neural network, CNN = convolutional neural network, KNN = k-nearest neighbors algorithm, LR = logistic regression.

Comparison of predictive models across indicators of accuracy and AUC. The highest model stability (=0.78) from the testing set predicted by the training set is from the CNN model. ANN = artificial neural network, CNN = convolutional neural network, KNN = k-nearest neighbors algorithm, LR = logistic regression.

Task 3: App created to predict the 5YSPBC

An app to predict the 5YSPBC for clinicians of Liuying Chi-Mei hospital in southern Taiwan was developed (Fig. 3). Readers are invited to click on the link in the reference[ to practice the app on their own. It is worth noting that all 38 model parameters are embedded in the 15-variable CNN model to classify either non-survival− or survival+ once all 15 items have been responded to.

Figure 3

Data entry and assessment result.

Data entry and assessment result. One resulting example is presented in Fig. 3 (right), from which we can see the high probability survival (=0.71) is shown on the fail curve from the left-top to the right-bottom corner and the survival+ with a low probability (.29) on the success curve. The sum of both probabilities (i.e., survival+ and survival–) equals 1.0. The odds can be computed by the formula (P/[1 – P] = 0.29/0.71 = 0.41), indicating the patient with a moderately high probability or tendency toward non-survival. An app to predict the 5YSPBC was developed involving 56 models with 56 estimated parameters for website classification.

Online dashboards shown on google maps

Two QR-codes shown in figures (or links)[ are provided for readers who can manipulate the dashboards on their own.

Discussion

Principle findings

We observed that the 15-variable CNN model yields higher ACC rates (0.87 and 0.86) with AUCs of 0.80 and 0.78 (95% CI 0.78–82 and 0.74–81) on the 1267 training and 543 testing cases and an available app to predict the 5YSPBC was successfully developed and demonstrated in this study. The aim of this study was thus achieved, and 2 hypotheses were supported, including the CNN has higher predictive ACC than other model algorithms and an app can be developed and designed to help clinicians in assessing the 5YSPBC.

Review of research findings

The incidence of BC increases with age. In patients with BC, comorbidities would significantly influence their treatment alternatives and overall survival.[ As such, the prognostic factors and survival of patients with BC are important to the identification of BC survival. BC is a fairly complex disease for identifying prognostic factors.[ The initial diagnosis of BC may increase the chance of treatment effect. The early diagnosis of BC is crucial to treatment effect.[ However, understanding the prognostic factors can increase the 5YSPBC as we did in this study. Differences in prognosis are still observed even if patients with the same stage receive the same treatment.[ Accordingly, the study of prognostic cancer factors (e.g., those with 3 types of variables in Fig. 2) for clinical treatment can help not only understand the degree of disease deterioration but also provide patients with information about treatment effects with appropriate treatment strategies, and finally improve the patient survival. BC mortality improvements can be attributed to diagnosis (e.g., mammographic screening[) and treatment.[ We merely focused on the feature variables of BC treatment in the first 5 years after diagnosis using the variable assessment evaluated by professional staff clerks in the hospital. Most stage-specific survival improvement in women younger than 70 years old is unexplained by tumor size and estrogen receptor (ER) status; however, suggesting a key role for treatment, tumor size contributed importantly for women ≥70 years old with local and regional stage and stratification by tumor size and ER status explained, even more, the survival improvement among women age ≥70 years in the first 5 years after diagnosis.[ Furthermore, 3.3% of deaths were avoided within 5 years of diagnosis among patients with cancer, female BC accounting for most up to 28% (=4822/17,041).[ As such, feature variables extracted from the data are worthy of investigation, as in Fig. 2. The SDM (i.e., both the patient and the physician contribute to the medical decision-making process and agree on treatment decisions[) can be informative in helping patients, and their family members choose the treatment option that best aligns with their preferences as well as their unique cultural and personal beliefs[ based on the app tool to simulate the alternative and results of patients with BC. Over 70 published articles adopted tools to predict the survival rates as of June 3, 2021.[ However, none were using neural network technique to classify the 5YSPBC for patients after an initial diagnosis of BC so far. Different types of classification algorithms in ML,[ such as Logistic Regression, Support Vector Machines,[ NB,[ KNN,[ LR,[ ANN,[ and CNN[ are present. The CNN model used in this study was the superior classifier of stability with an AUC of 0.78 based on the 543 testing cases higher than other counterparts.

Implications and applications of this study

CNN can improve the prediction ACC (up to 7.14%).[ In this study, CNN was evident of superior classifier than others. No difference was found in these 5 when their 95% CIs were compared; however, anyone using the CNN approach to predict the 5YSPBC in the literature was not seen, which is a breakthrough and the first feature of this study. Over 1282 articles have been found using the keyword “convolutional neural network” (Title) searched in PubMed Central (PMC) as of June 3, 2021, fewer than those with 2048 articles in PMC using the keyword in search of “ANN” (Title). Until now, no one used Microsoft Excel to perform CNN to classify the 5YSPBC. Interpretations for the CNN concept and process or even the parameter estimations are shown in studies[ and Supplemental Digital Content 1, which is the second feature of this study and is rarely seen applicable in literature. The principle concerning more with the vital few and less with the numerous trivial cases is frequently emphasized in the quality control process.[ Therefore the CNN-based app is proposed as the third feature, as shown in Fig. 3. Furthermore, curves of category probabilities based on the Rasch rating scale model[ were used to interpret the classification on an app. The binary categories (e.g., success and failure on an assessment in the psychometric field) have been applied in health-related outcomes.[ However, none provided the animation-type dashboard showing on Google Maps in use for patients predicting the 5YSPBC, as we did in Fig. 3.

Strengths of this study

We proposed the CNN algorithm along with the 38-model parameters to design the routine on an app that is used to predict the 5YSPBC in hospitals, which was never seen before in literature. As with all forms of web-based technology, advances in health communication technology are rapidly emerging.[ The mobile online 5YSPBC app is promising and worth generalizing the model to many other fields of health assessment. The online app can be applied to inform patients, their family members, or physicians with the cancer case manager quickly about the 5YSPBC and prognostic factors for providing patients with information about treatment effect with appropriate treatment strategies, and finally improve the patient survival as mentioned in SDM in previous studies.[ The 5YSPBC app is promising and worth using for promoting patient health literacy by using the animation-type assessment on smartphones. Readers are recommended to click on the link in the reference[ and watch the MP4 video in Supplemental Digital Content 1. This 5YSPBC app might be an example of SDM[ used in numerous disciplines in hospital settings. The CNN module on Microsoft Excel is unique and innovative. Users who are not familiar with the CNN software (e.g., Python) can apply our Excel-VBA module to conduct CNN-related research in the future if the sample size is not large (e.g., fewer than 10,000 cases). The module is not limited to binary classification. The multi-classification module can be done by adding the layers on CNN. Any other types of self-assessment, such as predicting the 14-day hospital readmission of patients with pneumonia,[ predicting active NBA players most likely to be inducted into the basketball hall of Famers,[ and screening BC,[ can apply the CNN model to predict and classify the levels of harmfulness and disease in the future.

Limitations and suggestions

Our study has some limitations. First, the psychometric properties of the variables have been validated for predicting the 5YSPBC after removing invalidate variables from the 53 original categories; however, no evidence can support the 15-variable 5YSPBC viable and validate its use in our predictive models though correlation coefficients of 5YSPBC have been examined before involving them into models. Hence, additional studies are recommended using other selection techniques to select suitable feature variables (i.e., variables correlated to the label of survival and non-survival reaches 0.1 or above[) in the future. Second, any improvement variables used to effectively increase the predictive ACC were not discussed. For instance, whether other feature variables applied to the CNN 5YSPBC model can increase the ACC rate is worthy of further study. Future studies are encouraged to look for other feature variables that are not included in the database of cancer registration coded by professional clerks and can also improve the power of the model prediction of the 5YSPBC. Third, the study was based on patients treated in one single hospital in Taiwan. All data were collected within the period from 2004 to 2019. If any environment or condition is changed (e.g., other countries and hospitals), the result (e.g., the model's parameters) must be different from the current study. Fourth, based on the model ACC and prediction stability in Table 3, CNN was selected instead of KNN as app development. The reason is that KNN is severely affected by the SENS of the neighborhood size and simple majority voting in the regions of neighborhoods, especially in the case of the small sample size with the existing outliers.[ The CNN is thus reliable and viable in this study. Fifth, the sample size (=1180) is not big enough to increase the model prediction accuracy further. The sample size is required to increase in the future research, not limited to the BC only. Sixth, the AUC is common as one of indicator to evaluate the model accuracy. The Phi ϕ as the larger effect size (≥0.50) for Chi-square test could be a supplemental alterative to the AUC (≥0.75) in the future relevant studies; see the result of a small study in Supplemental Digital Content 2. Seventh, readers might raise doubts about why the treatment approach was not included as a feature variable to assess the 5-year survival (e.g., the choice of radiotherapy or chemotherapy drugs that may affect the survival rate significantly). Feature variables extracted from 53 items were determined by using the Weka software[ (see the Section 2.2). Those variables (e.g., #28 to #33 shown in Supplemental Digital Content 1) were not selected, implying that those treatment approaches are not relatively important to the 15 selected variables. It is worth validating the results further using the Weka software[ or other tools in the future relevant studies. Finally, the study examined the prediction ACC in 5 models (i.e., CNN, KNN, LR, NB, and ANN) across 2 different scenarios of training and testing sets. Numerous types of classification algorithms in ML are available. Whether another algorithm has higher ACC than CNN under an equal condition (e.g., 15 feature variables) is expected to be verified in future 5YSPBC-related research.

Conclusion

Features and contributions in this study were illustrated as follows: CNN performed in Microsoft Excel, online app demonstrated to display results using a visual dashboard on Google Maps, and category probability curves based on Rasch model observed in the CNN prediction model. The novelty of the app with the CNN algorithm improves the predictive ACC of the 5YSPBC. It is expected to help BC-case managers use the app to assess the 5YSPBC in hospital settings in the future.

Acknowledgments

The authors thank Enago (www.enago.tw) for the English language review of this manuscript.

Author contributions

Tsair-Wei Chien developed the study concept and design. Cheng-Yao Lin, Yen-Hsun Chen, Shih-Bin Su, and Yen-Ling Lee analyzed and interpreted the data. Shih-Bin Su monitored the process of this study and helped in responding to the reviewers’ advice and comments. Tsair-Wei Chien drafted the manuscript, and all authors provided critical revisions for important intellectual content. The study was supervised by Shih-Bin Su. All authors read and approved the final manuscript. Conceptualization: Cheng-Yao Lin, Yen-Hsun Chen. Data curation: Yen-Ling Lee. Investigation: Shih-Bin Su. Methodology: Tsair-Wei Chien. Supervision: Shih-Bin Su.

46 in total

1. UK and USA breast cancer deaths down 25% in year 2000 at ages 20-69 years.

Authors: R Peto; J Boreham; M Clarke; C Davies; V Beral
Journal: Lancet Date: 2000-05-20 Impact factor: 79.321

2. Survival and reduction in mortality from breast cancer. Impact of mammographic screening is not clear.

Authors: M Baum
Journal: BMJ Date: 2000-12-09

3. Improvements in US Breast Cancer Survival and Proportion Explained by Tumor Size and Estrogen-Receptor Status.

Authors: Ju-Hyun Park; William F Anderson; Mitchell H Gail
Journal: J Clin Oncol Date: 2015-07-20 Impact factor: 44.544

Review 4. Shared decision making: examining key elements and barriers to adoption into routine clinical practice.

Authors: France Légaré; Holly O Witteman
Journal: Health Aff (Millwood) Date: 2013-02 Impact factor: 6.301

5. Clinical decision-making: predictors of patient participation in nursing care.

Authors: Jan Florin; Anna Ehrenberg; Margareta Ehnfors
Journal: J Clin Nurs Date: 2008-11 Impact factor: 3.036

Review 6. Deep Learning for Health Informatics.

Authors: Daniele Ravi; Charence Wong; Fani Deligianni; Melissa Berthelot; Javier Andreu-Perez; Benny Lo; Guang-Zhong Yang
Journal: IEEE J Biomed Health Inform Date: 2016-12-29 Impact factor: 5.772

7. Sleep Quality Prediction From Wearable Data Using Deep Learning.

Authors: Aarti Sathyanarayana; Shafiq Joty; Luis Fernandez-Luque; Ferda Ofli; Jaideep Srivastava; Ahmed Elmagarmid; Teresa Arora; Shahrad Taheri
Journal: JMIR Mhealth Uhealth Date: 2016-11-04 Impact factor: 4.773

8. An App for Detecting Bullying of Nurses Using Convolutional Neural Networks and Web-Based Computerized Adaptive Testing: Development and Usability Study.

Authors: Shu-Ching Ma; Willy Chou; Tsair-Wei Chien; Huan-Fang Lee; Julie Chi Chow; Yu-Tsen Yeh; Po-Hsin Chou
Journal: JMIR Mhealth Uhealth Date: 2020-05-20 Impact factor: 4.773

9. An App for Classifying Personal Mental Illness at Workplace Using Fit Statistics and Convolutional Neural Networks: Survey-Based Quantitative Study.

Authors: Yu-Hua Yan; Tsair-Wei Chien; Willy Chou; Shu-Chen Hsing; Yu-Tsen Yeh
Journal: JMIR Mhealth Uhealth Date: 2020-07-31 Impact factor: 4.773

10. Comparison of Support Vector Machine, Naïve Bayes and Logistic Regression for Assessing the Necessity for Coronary Angiography.

Authors: Parastoo Golpour; Majid Ghayour-Mobarhan; Azadeh Saki; Habibollah Esmaily; Ali Taghipour; Mohammad Tajfard; Hamideh Ghazizadeh; Mohsen Moohebati; Gordon A Ferns
Journal: Int J Environ Res Public Health Date: 2020-09-04 Impact factor: 3.390

2 in total

Review 1. Artificial intelligence empowered digital health technologies in cancer survivorship care: A scoping review.

Authors: Lu-Chen Pan; Xiao-Ru Wu; Ying Lu; Han-Qing Zhang; Yao-Ling Zhou; Xue Liu; Sheng-Lin Liu; Qiao-Yuan Yan
Journal: Asia Pac J Oncol Nurs Date: 2022-08-23

2. Artificial intelligence for predicting five-year survival in stage IV metastatic breast cancer patients: A focus on sarcopenia and other host factors.

Authors: Woocheol Jang; Changwon Jeong; KyungA Kwon; Tae In Yoon; Onvox Yi; Kyung Won Kim; Seoung-Oh Yang; Jinseok Lee
Journal: Front Physiol Date: 2022-09-27 Impact factor: 4.755

2 in total