PURPOSE: Cancer research using electronic health records and genomic data sets requires clinical outcomes data, which may be recorded only in unstructured text by treating oncologists. Natural language processing (NLP) could substantially accelerate extraction of this information. METHODS: Patients with lung cancer who had tumor sequencing as part of a single-institution precision oncology study from 2013 to 2018 were identified. Medical oncologists' progress notes for these patients were reviewed. For each note, curators recorded whether the assessment/plan indicated any cancer, progression/worsening of disease, and/or response to therapy or improving disease. Next, a recurrent neural network was trained using unlabeled notes to extract the assessment/plan from each note. Finally, convolutional neural networks were trained on labeled assessments/plans to predict the probability that each curated outcome was present. Model performance was evaluated using the area under the receiver operating characteristic curve (AUROC) among a held-out test set of 10% of patients. Associations between curated response or progression end points and overall survival were measured using Cox models among patients receiving palliative-intent systemic therapy. RESULTS: Medical oncologist notes (n = 7,597) were manually curated for 919 patients. In the 10% test set, NLP models replicated human curation with AUROCs of 0.94 for the any-cancer outcome, 0.86 for the progression outcome, and 0.90 for the response outcome. Progression/worsening events identified using NLP models were associated with shortened survival (hazard ratio [HR] for mortality, 2.49; 95% CI, 2.00 to 3.09); response/improvement events were associated with improved survival (HR, 0.45; 95% CI, 0.30 to 0.67). CONCLUSION: NLP models based on neural networks can extract meaningful outcomes from oncologist notes at scale. Such models may facilitate identification of clinical and genomic features associated with response to cancer treatment.
PURPOSE:Cancer research using electronic health records and genomic data sets requires clinical outcomes data, which may be recorded only in unstructured text by treating oncologists. Natural language processing (NLP) could substantially accelerate extraction of this information. METHODS:Patients with lung cancer who had tumor sequencing as part of a single-institution precision oncology study from 2013 to 2018 were identified. Medical oncologists' progress notes for these patients were reviewed. For each note, curators recorded whether the assessment/plan indicated any cancer, progression/worsening of disease, and/or response to therapy or improving disease. Next, a recurrent neural network was trained using unlabeled notes to extract the assessment/plan from each note. Finally, convolutional neural networks were trained on labeled assessments/plans to predict the probability that each curated outcome was present. Model performance was evaluated using the area under the receiver operating characteristic curve (AUROC) among a held-out test set of 10% of patients. Associations between curated response or progression end points and overall survival were measured using Cox models among patients receiving palliative-intent systemic therapy. RESULTS: Medical oncologist notes (n = 7,597) were manually curated for 919 patients. In the 10% test set, NLP models replicated human curation with AUROCs of 0.94 for the any-cancer outcome, 0.86 for the progression outcome, and 0.90 for the response outcome. Progression/worsening events identified using NLP models were associated with shortened survival (hazard ratio [HR] for mortality, 2.49; 95% CI, 2.00 to 3.09); response/improvement events were associated with improved survival (HR, 0.45; 95% CI, 0.30 to 0.67). CONCLUSION: NLP models based on neural networks can extract meaningful outcomes from oncologist notes at scale. Such models may facilitate identification of clinical and genomic features associated with response to cancer treatment.
Authors: Guergana K Savova; James J Masanz; Philip V Ogren; Jiaping Zheng; Sunghwan Sohn; Karin C Kipper-Schuler; Christopher G Chute Journal: J Am Med Inform Assoc Date: 2010 Sep-Oct Impact factor: 4.497
Authors: Paul A Harris; Robert Taylor; Robert Thielke; Jonathon Payne; Nathaniel Gonzalez; Jose G Conde Journal: J Biomed Inform Date: 2008-09-30 Impact factor: 6.317
Authors: David S Carrell; Scott Halgrim; Diem-Thy Tran; Diana S M Buist; Jessica Chubak; Wendy W Chapman; Guergana Savova Journal: Am J Epidemiol Date: 2014-01-30 Impact factor: 4.897
Authors: Melissa D Curtis; Sandra D Griffith; Melisa Tucker; Michael D Taylor; William B Capra; Gillis Carrigan; Ben Holzman; Aracelis Z Torres; Paul You; Brandon Arnieri; Amy P Abernethy Journal: Health Serv Res Date: 2018-05-14 Impact factor: 3.402
Authors: Shang Gao; Michael T Young; John X Qiu; Hong-Jun Yoon; James B Christian; Paul A Fearn; Georgia D Tourassi; Arvind Ramanthan Journal: J Am Med Inform Assoc Date: 2018-03-01 Impact factor: 4.497
Authors: Charlotta Lindvall; Chih-Ying Deng; Nicole D Agaronnik; Anne Kwok; Soujanya Samineni; Renato Umeton; Warren Mackie-Jenkins; Kenneth L Kehl; James A Tulsky; Andrea C Enzinger Journal: JCO Clin Cancer Inform Date: 2022-06
Authors: Lora H Ellenson; Britta Weigelt; Robert A Soslow; Amir Momeni-Boroujeni; Wissam Dahoud; Chad M Vanderbilt; Sarah Chiang; Rajmohan Murali; Eric V Rios-Doria; Kaled M Alektiar; Carol Aghajanian; Nadeem R Abu-Rustum; Marc Ladanyi Journal: Clin Cancer Res Date: 2021-02-18 Impact factor: 12.531
Authors: Kenneth L Kehl; Wenxin Xu; Alexander Gusev; Ziad Bakouny; Toni K Choueiri; Irbaz Bin Riaz; Haitham Elmarakeby; Eliezer M Van Allen; Deborah Schrag Journal: Nat Commun Date: 2021-12-15 Impact factor: 14.919
Authors: Kenneth L Kehl; Gregory J Riely; Eva M Lepisto; Jessica A Lavery; Jeremy L Warner; Michele L LeNoue-Newton; Shawn M Sweeney; Julia E Rudolph; Samantha Brown; Celeste Yu; Philippe L Bedard; Deborah Schrag; Katherine S Panageas Journal: JAMA Netw Open Date: 2021-07-01