Literature DB >> 30944913

Natural language processing and recurrent network models for identifying genomic mutation-associated cancer treatment change from patient progress notes.

Meijian Guan^1,2, Samuel Cho^1,3, Robin Petro², Wei Zhang^2,4, Boris Pasche^2,4, Umit Topaloglu^2,4.

Abstract

OBJECTIVES: Natural language processing (NLP) and machine learning approaches were used to build classifiers to identify genomic-related treatment changes in the free-text visit progress notes of cancer patients.
METHODS: We obtained 5889 deidentified progress reports (2439 words on average) for 755 cancer patients who have undergone a clinical next generation sequencing (NGS) testing in Wake Forest Baptist Comprehensive Cancer Center for our data analyses. An NLP system was implemented to process the free-text data and extract NGS-related information. Three types of recurrent neural network (RNN) namely, gated recurrent unit, long short-term memory (LSTM), and bidirectional LSTM (LSTM_Bi) were applied to classify documents to the treatment-change and no-treatment-change groups. Further, we compared the performances of RNNs to 5 machine learning algorithms including Naive Bayes, K-nearest Neighbor, Support Vector Machine for classification, Random forest, and Logistic Regression.
RESULTS: Our results suggested that, overall, RNNs outperformed traditional machine learning algorithms, and LSTM_Bi showed the best performance among the RNNs in terms of accuracy, precision, recall, and F1 score. In addition, pretrained word embedding can improve the accuracy of LSTM by 3.4% and reduce the training time by more than 60%. DISCUSSION AND
CONCLUSION: NLP and RNN-based text mining solutions have demonstrated advantages in information retrieval and document classification tasks for unstructured clinical progress notes.

Entities: Chemical

Keywords: cancer; electronic health records; genomics; machine learning; natural language processing

Year: 2019 PMID： 30944913 PMCID： PMC6435007 DOI： 10.1093/jamiaopen/ooy061

Source DB: PubMed Journal: JAMIA Open ISSN： 2574-2531

INTRODUCTION

The advent of next generation sequencing (NGS) technologies and their continually declining costs have resulted in the accumulation of very large sets of genetic data and facilitated identification of actionable genetic alterations in different tumor types. Despite the dramatic growth of the availability and affordability of such testing, it has also brought challenges, including the need of evaluating the effectiveness and actionability of genetic testing that could be invaluable for assisting tumor diagnosis and prognosis to direct patient treatment. Additionally, with the widespread use of electronic health record (EHR) systems in clinical care, secondary use of clinically relevant information of cancer patients are available to biomedical research including comparative effectiveness, patient reported outcomes, clinical actionability of genomic profiling, and precision medicine. However, in contrast to structured available data, a sizable percentage of the patient data are unstructured (or semistructured), which makes them not easily parsable by the machines and software., Therefore, harnessing the potential of clinical narratives in the EHR requires strategies for efficient and automated information extraction and understanding. Natural language processing (NLP) and machine learning techniques could map unstructured text into structured (semistructured) form as well as could enable automatic identification and extraction of relevant information. Additionally, such automated system would significantly reduce delays in EHR processing and allow more accurately extraction of embedded information. Many clinical NLP systems have been in development and widely adopted in biomedical settings, for example, the Mayo Clinical Text Analysis and Knowledge Extraction System (cTAKES), MetaMap, and Noble Tools. However, these approaches mainly focus on utilizing medical vocabularies such as unified medical language system (UMLS) to perform concepts recognition and information extraction. There are many conventional machine learning algorithms have been used in clinical text mining, however, these models require human experts to encode domain knowledge through feature engineering, and have so far had mixed results modeling sequential events or time dependencies. More recently, multilayer neural networks, or deep learning, have been applied to gain actionable insights from heterogeneous clinical data., The major differences between deep learning and conventional neural network (NN) are the number of hidden layers, as well as their capability to learn meaningful abstractions of the input. Deep learning has been applied to process aggregated EHR documents, including both structured (eg diagnosis, laboratory tests) and unstructured data (eg medical notes, images). Several studies used deep learning to predict diseases from the patient clinical notes, for example, Cheng et al used a 4-layer convolutional neural network (CNN) to predict congestive heart failure and chronic obstructive pulmonary disease and showed promising performance. Word embedding, learned in an unsupervised manner, has seen a successful word representation method in numerous NLP tasks in recent years. Unlike traditional word representation methods, such as bag-of-words and one-hot encoding, word embedding can capture the semantic meanings of the words within numeric vectors. Words that are semantically similar are closer to each other in distance, while words that are semantically different are farther apart in distance. Word embedding has been utilized extensively in biomedical named entity recognition (NER) tasks, such as medical synonym extraction, relation extraction including chemical-disease relations, drug-drug interactions,, protein-protein interactions, biomedical IR,, and medical abbreviation disambiguation. In this project, we explored how word embedding and deep learning techniques can help to efficiently extract information from free-text EHR documents (eg progress notes) and evaluate the effectiveness and actionability of genetic testing in assisting cancer patient treatment adjustment. A total of 5889 deidentified progress reports for 755 cancer patients who have undergone a clinical NGS testing in Wake Forest Baptist Comprehensive Cancer Center have been included in our data analyses. The primary goal of this project is to (1) identify the section of the progress report that discusses genomic testing results and treatment information, (2) predict if there is a treatment change (or not) based on the extracted information using deep learning and word embedding, and (3) compare the performance of 4 recurrent neural network (RNN)-based approaches and 5 conventional machine learning algorithms for text classification task using clinical progress reports.

METHODS

Progress reports and preprocessing

The progress reports for cancer patients were obtained from the Translational Data Warehouse at the Wake Forest Baptist Health upon the institutional review board approval for the study. The study corpus contains 5889 progress reports (2439 words on average) that were charted for the 755 NGS patients after their NGS tests. We excluded 28 patients who have NGS testing completed twice. A text preprocessing pipeline was implemented to perform cleaning and reformatting. All the English letters were converted to lowercase. We removed English stop words, special characters and punctuations, empty spaces, and strings with length <2. Numbers were also excluded since they usually do not carry relevant information in this type of analysis. Abbreviations were replaced with the full terms, for example, “‘ve” was replaced with “have,” “‘re” was replaced with “are,” and “‘ll” was replaced with “will.” We also performed word stemming for non-NN machine learning models. We identified the section for each report that discusses genomic testing results based on keyword searches. A list of keywords including genes, mutations, and treatment names were populated from the information provided by the NGS vendors, namely Foundation Medicine (https://www.foundationmedicine.com/), Caris (https://www.carislifesciences.com/), Guardant (http://www.guardanthealth.com/), as well as our local database. A 400-word text window was extracted for each report surrounding the location of the first keyword. By extracting the target section, we reduced the size of the reports from an average of ∼5000 words to 289, which greatly eliminated the redundant content, as well as improved the efficiency of our training.

Establish true labels

A subset (44) of 755 cancer patients (452 reports) were manually classified as genomic-related treatment change and nontreatment-change groups by our precision medicine nurse. These annotations served as “true” labels to evaluate the performance of our classification task. Additionally, a “rule-based” annotation method was also implemented to label the group of the reports based on her experience, vendors’ name, cancer gene, mutation, as well as therapeutic-related keywords (Supplementary Material). We further evaluated the performance of the generated labels using a clustering algorithm to compare the natural separation and the labeled groups. The 452 manually annotated labels, as well as the generated labels using the “rule-based” method were used as “ground truth” to evaluate the machine learning algorithms.

Word representations

Two types of word representation techniques were used to convert word tokens in each report into numerical vectors, term frequency-inverse document frequency (TF-IDF) and word embedding (Word2vec), for conventional machine learning models and RNNs, respectively. TF-IDF weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus. Word2vec takes a large corpus of text as its input and produces a high-dimension vector space through which each unique word in the corpus being assigned a corresponding vector in the space. Word2vec can utilize a continuous bag-of-words (CBOW) architecture to predict the current word from a window of surrounding context words, therefore, the order of context words is not important. In this study, we applied 2 methods to generate word embeddings: (1) using Word2Vec with CBOW architecture to pretrain word embeddings on our entire corpus, (2) including an embedding layer in the network and train the word embeddings on the fly. We then compared the performance of the NNs with and without pretrained word embeddings.

Recurrent neural networks

RNNs are neural networks that add additional weights to the network to create cycles in the network, in an effort to model time dependencies and sequential events. Variations of RNN, long short-term memory (LSTM) networks and gated recurrent unit (GRU), have been invented to better handle gradient vanishing problems. LSTM was used to create DeepCare, which is an end-to-end deep dynamic network that infers current illness states and predicts future medical outcomes using EHR. Another variation of RNN, GRU, was used to develop Doctor AI, which is another model to use patient history to predict diagnoses and medications for subsequent encounters. A modified version of LSTM, bidirectional LSTM (LSTM_Bi), which allows analyzing sequential data from both directions, has been used to process medical text data and achieved elevated performances over nondeep learning tools in NER tasks., Since RNN architecture is designed to model the sequential events, such as word sequences, this architecture is specifically suitable for capturing meaningful linguistic patterns across long sequences of words within a document. We implemented 4 variations of RNN in this study: (1) LSTM with word embedding trained on the fly (LSTM_onFly); (2) LSTM with pretrained word embedding on the entire corpus (LSTM_Pre); (3) LSTM_Bi with pretrained word embedding (LSTM_Bi); and (4) a simplified version of LSTM, GRU with pretrained word embedding. We also evaluated the performance of these 4 RNN models for information extraction and text classification in this study.

Convolutional layer

CNN has been successfully applied in image processing and NLP., We incorporated a 1D-convolution layer with 32 filters, a kernel size of 3, and stride of 1 word, followed by a max-pooling layer, in our RNNs. The convolutional layer, as well as the max-pooling layer, can help to learn useful word representations and reduce the dimensions of the input corpus.

Non-neural network models

We compared the performance of the RNNs against the performance of several conventional predictive models that can also be used for text classification. These include Naive Bayes (NB), K-nearest Neighbor (KNN), Support Vector Machine for classification (SVC), Random forest (RF), and logistic regression (LR). We generated TF-IDF vectors on the processed text using unigrams with a minimum document frequency of 5, and a maximum document frequency of 80%. Singular-value decomposition (SVD) was applied to reduce the dimension of the input matrix.

Hyperparameter optimization

We used a grid search technique to perform hyperparameter optimization for non-NN algorithms. Specifically, smoothing parameter alpha of NB, number of neighbors of KNN, penalty parameter C, kernel types (linear or radial basis function), and kernel coefficient gamma of SVC, the maximum depth of a tree, the minimum number of samples required to split an internal node, the minimum number of samples required to be at a leaf node of RF, and the L2 penalty parameter C of LR, were optimized using the grid search method. A 3-fold cross-validation was used during hyperparameter optimization to evaluate the performance of each version of the algorithms.

Model setup and evaluation metrics

To be consistent, we split the data into 0.66/0.33 train/test datasets, without any overlapping patients between train and test, for each model. In addition, we performed stratified 5-fold cross-validation during the training to evaluate the model performance. Binary cross entropy was used as the loss function for all the classifiers. For RNN algorithms, we implemented early stopping mechanism—the model stops training when the loss function does not improve for 5 epochs on the validation dataset. After training, the performance of each model was tested on the test set. We used 5 evaluation metrics to compare the performance of the models, including accuracy, precision, recall, and F1 score.

Open source platforms

We used high-level NN API Keras (https://keras.io/) running on top of Tensorflow (https://github.com/tensorflow/tensorflow) to set up our neural network structures. Non-NN models, as well as clustering and parameter search algorithms were derived from Scikit-Learn (http://scikit-learn.org/). Word embedding was performed using Gensim (https://radimrehurek.com/gensim/).

EXPERIMENTAL RESULTS

Study samples

The flow chart of study design has been shown in Figure 1. Briefly, in this study, we processed 5889 free-text clinical reports from 755 patients. Target text windows from the reports were extracted for the subsequent classification task. We implemented a total of 9 classifiers, both RNN-based and traditional machine learning algorithms, to classify the treatment-change for each document. A word embedding matrix was pretrained based on the whole text corpus for some of the RNN-based models. A subset (44) of the cancer patients (452 reports) were annotated by clinical experts. These manually generated labels, along with the labels that generated by a rule-based keywords searching method, were used as “true” labels to evaluate the model performance. A total of 3736 documents being labeled as treatment-change and 2153 documents being labeled as no-treatment-change.

Figure 1.

Workflow of text processing and document classification using machine learning models.

Workflow of text processing and document classification using machine learning models. To explore additional insight about the progress reports and the separability of 2 labeled groups, we performed a SVD on the TF-IDF representation of the reports. The top 2 eigenvectors of SVD were used to plot the similarity between the 2 target groups (Figure 2). From the plot, we note that natural clustering occurs between progress reports corresponding to labeled groups. This technique can also help to better understand document misclassifications in our classification task.

Figure 2.

Dimensional reduction of term frequency-inverse document frequency (TF-IDF) representation of the documents via singular-value decomposition (SVD). Data points are colored by treatment-change (1) and nontreatment-change (0) groups.

Architecture of RNN models. GRU: gated recurrent unit; LSTM: long short-term memory; LSTM_Bi: bidirectional LSTM; RNN: recurrent neural network. Key hyperparameters for each machine learning algorithm were optimized using a grid search method. For traditional models, smoothing parameter alpha of NB was optimized to be 0; the best number of neighbors for KNN was 7; linear kernel and a penalty parameter of 30 were selected for SVC; the maximum depth of a tree, the minimum number of samples required to split an internal node, and the minimum number of samples required to be at a leaf node were optimized as 6, 5, and 5, respectively, for RF; and an L2 penalty parameter of 10 was picked for LR (Table 1).

Table 1.

Best hyperparameters for the classifiers

Classifier	Hyperparameters
Deep learning classifiers
LSTM_onFly	Optimizer=Adam, batch size=64, dropout rate=0, word embedding=trained on the fly, recurrent layer=single directional LSTM
LSTM_Pre	Optimizer=Adam, batch size=64, dropout rate=0, word embedding=pretrained on the whole corpus, recurrent layer=single directional LSTM
LSTM_Bi	Optimizer=Adam, batch size=64, dropout rate=0, word embedding=pretrained on the whole corpus, recurrent layer=bidirectional LSTM
GRU	Optimizer=Adam, batch size=64, dropout rate=0, word embedding=pretrained on the whole corpus, recurrent layer=single directional LSTM
Conventional classifiers
KNN	Number of neighbors=7
LR	L2 penalty parameter=10
NB	Smoothing parameter alpha=0
RF	Maximum depth of a tree=6 Minimum number of samples required to split an internal node=5 Minimum number of samples required to be at a leaf node=5
SVC	Kernel=linearL 2 penalty parameter=30

GRU: gated recurrent unit; KNN: K-nearest Neighbor; LR: logistic regression; LSTM: long short-term memory; NB: Naive Bayes; RF: random forest; SVC: Support Vector Machine for classification.

Best hyperparameters for the classifiers GRU: gated recurrent unit; KNN: K-nearest Neighbor; LR: logistic regression; LSTM: long short-term memory; NB: Naive Bayes; RF: random forest; SVC: Support Vector Machine for classification. For RNN-based models (Figure 3), we selected Adam optimization algorithm as the default optimizer. Except for 1 LSTM model (LSTM_onFly), all the models used pretrained word embedding matrix as the input. Based on the parameter turning, we chose batch size of 64 and a dropout rate of 0.

Figure 3.

Architecture of RNN models. GRU: gated recurrent unit; LSTM: long short-term memory; LSTM_Bi: bidirectional LSTM; RNN: recurrent neural network.

Classification performance

Four performance evaluation metrics on the task were included in Table 2 and Figure 4, including accuracy, precision, recall, and F1 score.

Table 2.

Performance of classifiers on the document classification repeated for 100 times

Classifier	Accuracy (mean±SD)	Precision (mean±SD)	Recall (mean±SD)	F1 score (mean±SD)
Deep learning classifiers
LSTM_onFly	0.821±0.026	0.850±0.029	0.872±0.040	0.860±0.023
LSTM_Pre	0.849±0.015	0.874±0.023	0.890±0.022	0.882±0.013
LSTM_Bi	0.862±0.019	0.885±0.020	0.900±0.026	0.892±0.015
GRU	0.859±0.014	0.882±0.021	0.899±0.022	0.890±0.012
Conventional classifiers
KNN	0.806±0.016	0.834±0.022	0.913±0.024	0.829±0.015
LR	0.829±0.015	0.836±0.022	0.904±0.023	0.826±0.014
NB	0.772±0.016	0.875±0.016	0.811±0.023	0.806±0.016
RF	0.809±0.015	0.804±0.023	0.926±0.017	0.809±0.015
SVC	0.826±0.014	0.814±0.024	0.830±0.019	0.772±0.016

GRU: gated recurrent unit; KNN: K-nearest Neighbor; LR: logistic regression; LSTM: long short-term memory; NB: Naive Bayes; RF: random forest; SD: standard deviation; SVC: Support Vector Machine for classification.

Figure 4.

Performance comparisons of 9 Machine Learning algorithms based on (A) a single run, and (B) models repeated for 100 times. Mean metrics (dots) and their standard deviations (bars) were included.

Performance of classifiers on the document classification repeated for 100 times GRU: gated recurrent unit; KNN: K-nearest Neighbor; LR: logistic regression; LSTM: long short-term memory; NB: Naive Bayes; RF: random forest; SD: standard deviation; SVC: Support Vector Machine for classification. Performance comparisons of 9 Machine Learning algorithms based on (A) a single run, and (B) models repeated for 100 times. Mean metrics (dots) and their standard deviations (bars) were included. Overall, RNN-based classifiers outperformed the traditional machine learning algorithms. LSTM_Bi with pretrained word embedding and a 1D-convolution layer followed by max-pooling outperformed all other models in accuracy (0.886), precision (0.878), and the F1 score (0.909). RF had the highest recall, with a score of 0.972, followed by LSTM_Bi (0.943). Because of the stochastic nature of machine learning algorithms, we repeated each model for 100 times and calculated the average metrics and corresponding standard deviations. Again, LSTM_Bi outperformed all the others in accuracy (0.862 ± 0.019), precision (0.885 ± 0.02), and F1 score (0.892 ± 0.015), while recall was leading by RF (0.926 ± 0.017) (Figure 4B).

RNN-based models training

Accuracy and model loss-based training curves of RNN-based classifiers have been shown in Figure 5. As we can see, LSTM without pretrained word embedding (LSTM_onFly) revealed the fastest model convergence (the shorter learning curve was due to early stopping), followed by GRU. LSTMs with pretrained word embeddings had similar convergence curve. However, LSTM_onFly model quickly overfitted after the first epoch, it also has the largest discrepancy between training data and validation data, while the LSTM_Bi had the smallest discrepancy.

Figure 5.

Training curves of the first 15 epochs for RNN-based models, where the upper panel is the model accuracy for training and validation datasets, and lower panel is the model loss for training and validation dataset. RNN: recurrent neural network.

Error analysis

We analyzed the confusion matrices of 9 classifiers based on their classification performance on 1982 testing documents, with 1202 documents labeled as treatment-change, and 780 documents labeled as no-treatment-change. LSTM_Bi, which indicated the highest accuracy (0.886), had 69 false negatives and 157 false positives (Figure 6). The other 2 LSTM variations, LSTM_onFly and LSTM_Pre, resulted in significantly higher number misclassifications, especially for in the false negative category, where the misclassifications nearly doubled. The GRU model, on the other hand, had similar number of false negative classifications (72) comparing to LSTM_Bi, however, it mistakenly classified 202 no-treatment-change documents as treatment-change group (false positive).

Figure 6.

Confusion matrix of (A) RNN-based models, and (B) conventional machine learning models. GRU: gated recurrent unit; LSTM: long short-term memory; NB: Naive Bayes; RF: random forest; RNN: recurrent neural network; SVC: Support Vector Machine for classification. For the conventional classifiers, KNN achieved the highest accuracy (0.824) as it correctly identified 1026 treatment-change documents, and 607 no-treatment-change documents, which was the highest among the conventional models. Notably, RF correctly classified 1168 treatment-change documents, which was the highest among all 9 models. However, it misclassified 380 no-treatment-change documents as treatment-change, which was also the highest. It is consistent with what we have observed from Table 2 and Figure 4, RF has the highest recall (0.972) but the lowest precision (0.755).

DISCUSSION

We have successfully applied NLP and machine learning methods to extract information from clinical progress reports and classify them into treatment-change and no-treatment-change groups. RNN-based algorithms with pretrained word embedding, especially LSTM_Bi, demonstrated significantly better performance on the classification task than conventional machine learning algorithms with TF-IDF features. It is most like because of the RNN structure that can capture linguistic patterns across long sequences of words and the pretrained word embeddings on the entire text corpus. In addition, we noticed that KNN and NB outperformed SVC in this study, possibly because the decision planes of SVC were not able to sufficiently separate classes due to the data structure. We first compared the performance of LSTM with and without pretrained word embedding. Based on the results in Table 2 and Figure 4, LSTM_Pre outperformed LSTM_onFly in 3 of 4 evaluation metrics, except recall, where these 2 models had comparable results. This may be because of LSTM_onFly only trains word embedding based on a smaller extracted text window, which is not able to model linguistic patterns accurately. We then compared 3 RNN models with pretrained word embeddings, LSTM_Pre, LSTM_Bi, and GRU. LSTM_Bi outperformed the other 2 models in all 4 metrics. LSTM_Bi allows analyzing sequential data from both directions, and has been used to process medical text data and achieved elevated performances over nondeep learning tools in terms of NER., Interestingly, the simplified variation of LSTM, GRU, showed better results than LSTM_Pre in 3 of 4 evaluations, except precision. This observation is consistent with previous explorations, where GRU has yield similar performance compared with LSTM, however GRU could have better performance on smaller dataset., TF-IDF based non-NN classifiers overall had poorer performance in this study. One reason is that vector space word representations, such as TF-IDF and bag-of-words, cannot take the context of each word into account, instead, they rely on the ordering of words within a small text window. Cancer progress reports typically including complex information and structures, which are usually challenging to be sufficiently captured by vector space word representations. Our results suggest that pretrained word embeddings on a large related corpus can extract information more efficiently and improve the subsequent classifications tasks. Furthermore, RNN architecture is designed to model the sequential events, such as word sequences. In our study, this architecture is able to capture meaningful linguistic patterns across long sequences of words within a document. It provides a method to extract higher-level information and make decisions based on the context of each word. Therefore, the combination of RNN and pretrained word embeddings further boosted the model performance. Due to the stochastic nature of machine learning algorithms, evaluating their performances based on a single model is not always accurate. Randomness can be introduced at any stage of the study, such as data processing, data splitting, word representation methods, weight initialization, and random seeds. To reduce the randomness and evaluate the models more accurately, we repeated each model for 100 times. The ranges, means, and standard deviations of the evaluation metrics for each model were calculated. Our results in Table 2 and Figure 4B indicated that the performances of machine learning models are consistent and reproducible. One goal of this study is to implement an automated system to reduce the time required for progress report annotation. However, NLP and machine learning models, especially for deep learning models, suffer from long processing and training time. We thus implemented several methods to improve our model efficiency. The first method was to extract a target text window from each document instead of using the whole progress report, which is usually very complex and redundant. Moreover, pretrained word embeddings significantly reduced the training time for RNN-based models, since they avoided training word embeddings on the fly. In addition, 2-dimensional reduction methods, 1D-convolution layer with max pooling and SVD, were used to further reduce the training time for RNN-based and non-NN classifiers, respectively. These methods ensure our model can make decisions more efficiently and reduce the burden of manually annotating the reports by medical experts. One important limitation of our study is that most of the progress reports lack true labels. Reading progress reports and correctly labeling them is time-consuming and challenging even for human experts. However, we generated labels for a subset of the reports to validate and improve our rule-based labeling method. Another limitation is the small sample size of our dataset only 755 qualified cancer patients are available for this study. Although we included reports at multiple visits for each patient, 5889 documents are not likely to reach the full effectiveness of RNN models. In addition, using a dataset with a small number of samples but multiple documents for each sample would increase the risk of model overfitting. To reduce overfitting, we split training and test dataset based on unique samples, which prevented the classifier from seeing the reports from 1 patient in both training and testing phases. To our knowledge, this is the first study extracting genomics-related information in clinical progress reports using NLP and deep learning. Our goal is to implement an automated annotation system for clinical progress reports that can improve the annotation accuracy, as well as reduce the time required. Moving forward, we will extend this NLP and RNN analysis pipeline to perform more tasks, for example, classify cancer stages, predict survival rate, deep phenotyping, and annotate unknown genomic mutations. Another important future direction is to generalize this pipeline to read data from multiple research facilities and multiple resources, such as pathology reports, radiology reports, medical images, as well as NGS results. In addition, during genomic testing, thousands of genetic alterations are generated with unknown pathogenic impacts on specific cancer types. Distinguishing the alterations that contribute to cancer risk from the neutral alterations is very challenging and time-consuming since it is mainly done manually. Thus, an automated genetic alteration interpretation system based on our NLP and RNN methods could be developed to incorporate relevant information from text-based sources such as pathology reports and progress notes.

CONCLUSIONS

An automated NLP and deep learning solution has demonstrated advantages and potentials in information retrieval and document classification tasks for unstructured clinical progress notes. It will help to evaluate the impact of genomic testing in the clinical practices.

SUPPLEMENTARY MATERIAL

Supplementary material is available at Journal of the American Medical Informatics Association online.

CONTRIBUTORS

MG designed data clean, processing, and analysis pipelines, drafted and revised the manuscript. UT contributed to data collection and conception of the work. SC participated in study design, revised analysis pipeline, and reviewed draft. RP contributed to data collection and ground truth generation. WZ and BP reviewed drafts and provided feedback on study design. Click here for additional data file.

15 in total

1. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications.

Authors: Guergana K Savova; James J Masanz; Philip V Ogren; Jiaping Zheng; Sunghwan Sohn; Karin C Kipper-Schuler; Christopher G Chute
Journal: J Am Med Inform Assoc Date: 2010 Sep-Oct Impact factor: 4.497

2. An overview of MetaMap: historical perspective and recent advances.

Authors: Alan R Aronson; François-Michel Lang
Journal: J Am Med Inform Assoc Date: 2010 May-Jun Impact factor: 4.497

Review 3. Deep learning.

Authors: Yann LeCun; Yoshua Bengio; Geoffrey Hinton
Journal: Nature Date: 2015-05-28 Impact factor: 49.962

Review 4. Deep learning for healthcare: review, opportunities and challenges.

Authors: Riccardo Miotto; Fei Wang; Shuang Wang; Xiaoqian Jiang; Joel T Dudley
Journal: Brief Bioinform Date: 2018-11-27 Impact factor: 11.622

Review 5. Text Mining for Precision Medicine: Bringing Structure to EHRs and Biomedical Literature to Understand Genes and Health.

Authors: Michael Simmons; Ayush Singhal; Zhiyong Lu
Journal: Adv Exp Med Biol Date: 2016 Impact factor: 2.622

Review 6. Mining the electronic health record for disease knowledge.

Authors: Elizabeth S Chen; Indra Neil Sarkar
Journal: Methods Mol Biol Date: 2014

7. Leveraging EHR data for outcomes and comparative effectiveness research in oncology.

Authors: Frank J Manion; Marcelline R Harris; Ayse G Buyuktur; Patricia M Clark; Lawrence C An; David A Hanauer
Journal: Curr Oncol Rep Date: 2012-12 Impact factor: 5.075

Review 8. Deep Learning for Health Informatics.

Authors: Daniele Ravi; Charence Wong; Fani Deligianni; Melissa Berthelot; Javier Andreu-Perez; Benny Lo; Guang-Zhong Yang
Journal: IEEE J Biomed Health Inform Date: 2016-12-29 Impact factor: 5.772

9. Drug-Drug Interaction Extraction via Convolutional Neural Networks.

Authors: Shengyu Liu; Buzhou Tang; Qingcai Chen; Xiaolong Wang
Journal: Comput Math Methods Med Date: 2016-01-31 Impact factor: 2.238

10. NOBLE - Flexible concept recognition for large-scale biomedical natural language processing.

Authors: Eugene Tseytlin; Kevin Mitchell; Elizabeth Legowski; Julia Corrigan; Girish Chavan; Rebecca S Jacobson
Journal: BMC Bioinformatics Date: 2016-01-14 Impact factor: 3.169

6 in total

1. Machine learning mortality classification in clinical documentation with increased accuracy in visual-based analyses.

Authors: Susan M Slattery; Daniel C Knight; Debra E Weese-Mayer; William A Grobman; Doug C Downey; Karna Murthy
Journal: Acta Paediatr Date: 2019-12-10 Impact factor: 2.299

Review 2. Use of machine learning in geriatric clinical care for chronic diseases: a systematic literature review.

Authors: Avishek Choudhury; Emily Renjilian; Onur Asan
Journal: JAMIA Open Date: 2020-10-08

Review 3. From Patient Engagement to Precision Oncology: Leveraging Informatics to Advance Cancer Care.

Authors: Ashley C Griffin; Umit Topaloglu; Sean Davis; Arlene E Chung
Journal: Yearb Med Inform Date: 2020-08-21

4. Generating real-world evidence from unstructured clinical notes to examine clinical utility of genetic tests: use case in BRCAness.

Authors: Yiqing Zhao; Saravut J Weroha; Ellen L Goode; Hongfang Liu; Chen Wang
Journal: BMC Med Inform Decis Mak Date: 2021-01-06 Impact factor: 2.796

Review 5. Machine learning applications for therapeutic tasks with genomics data.

Authors: Kexin Huang; Cao Xiao; Lucas M Glass; Cathy W Critchlow; Greg Gibson; Jimeng Sun
Journal: Patterns (N Y) Date: 2021-08-09

Review 6. Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing.

Authors: Liwei Wang; Sunyang Fu; Andrew Wen; Xiaoyang Ruan; Huan He; Sijia Liu; Sungrim Moon; Michelle Mai; Irbaz B Riaz; Nan Wang; Ping Yang; Hua Xu; Jeremy L Warner; Hongfang Liu
Journal: JCO Clin Cancer Inform Date: 2022-07

6 in total