| Literature DB >> 33250149 |
Hanyin Wang1, Yikuan Li1, Seema A Khan2, Yuan Luo3.
Abstract
Distant recurrence of breast cancer results in high lifetime risks and low 5-year survival rates. Early prediction of distant recurrent breast cancer could facilitate intervention and improve patients' life quality. In this study, we designed an EHR-based predictive model to estimate the distant recurrent probability of breast cancer patients. We studied the pathology reports and progress notes of 6,447 patients who were diagnosed with breast cancer at Northwestern Memorial Hospital between 2001 and 2015. Clinical notes were mapped to Concept unified identifiers (CUI) using natural language processing tools. Bag-of-words and pre-trained embedding were employed to vectorize words and CUI sequences. These features integrated with clinical features from structured data were downstreamed to conventional machine learning classifiers and Knowledge-guided Convolutional Neural Network (K-CNN). The best configuration of our model yielded an AUC of 0.888 and an F1-score of 0.5. Our work provides an automated method to predict breast cancer distant recurrence using natural language processing and deep learning approaches. We expect that through advanced feature engineering, better predictive performance could be achieved.Entities:
Keywords: Breast cancer; Distant recurrence; Entity embeddings; Knowledge-guided convolutional neural network; Word embeddings
Year: 2020 PMID: 33250149 PMCID: PMC7983067 DOI: 10.1016/j.artmed.2020.101977
Source DB: PubMed Journal: Artif Intell Med ISSN: 0933-3657 Impact factor: 5.326