PURPOSE: SEER registries do not report results of epidermal growth factor receptor (EGFR) and anaplastic lymphoma kinase (ALK) mutation tests. To facilitate population-based research in molecularly defined subgroups of non-small-cell lung cancer (NSCLC), we assessed the validity of natural language processing (NLP) for the ascertainment of EGFR and ALK testing from electronic pathology (e-path) reports of NSCLC cases included in two SEER registries: the Cancer Surveillance System (CSS) and the Kentucky Cancer Registry (KCR). METHODS: We obtained 4,278 e-path reports from 1,634 patients who were diagnosed with stage IV nonsquamous NSCLC from September 1, 2011, to December 31, 2013, included in CSS. We used 855 CSS reports to train NLP systems for the ascertainment of EGFR and ALK test status (reported v not reported) and test results (positive v negative). We assessed sensitivity, specificity, and positive and negative predictive values in an internal validation sample of 3,423 CSS e-path reports and repeated the analysis in an external sample of 1,041 e-path reports from 565 KCR patients. Two oncologists manually reviewed all e-path reports to generate gold-standard data sets. RESULTS: NLP systems yielded internal validity metrics that ranged from 0.95 to 1.00 for EGFR and ALK test status and results in CSS e-path reports. NLP showed high internal accuracy for the ascertainment of EGFR and ALK in CSS patients-F scores of 0.95 and 0.96, respectively. In the external validation analysis, NLP yielded metrics that ranged from 0.02 to 0.96 in KCR reports and F scores of 0.70 and 0.72, respectively, in KCR patients. CONCLUSION: NLP is an internally valid method for the ascertainment of EGFR and ALK test information from e-path reports available in SEER registries, but future work is necessary to increase NLP external validity.
PURPOSE: SEER registries do not report results of epidermal growth factor receptor (EGFR) and anaplastic lymphoma kinase (ALK) mutation tests. To facilitate population-based research in molecularly defined subgroups of non-small-cell lung cancer (NSCLC), we assessed the validity of natural language processing (NLP) for the ascertainment of EGFR and ALK testing from electronic pathology (e-path) reports of NSCLC cases included in two SEER registries: the Cancer Surveillance System (CSS) and the Kentucky Cancer Registry (KCR). METHODS: We obtained 4,278 e-path reports from 1,634 patients who were diagnosed with stage IV nonsquamous NSCLC from September 1, 2011, to December 31, 2013, included in CSS. We used 855 CSS reports to train NLP systems for the ascertainment of EGFR and ALK test status (reported v not reported) and test results (positive v negative). We assessed sensitivity, specificity, and positive and negative predictive values in an internal validation sample of 3,423 CSS e-path reports and repeated the analysis in an external sample of 1,041 e-path reports from 565 KCR patients. Two oncologists manually reviewed all e-path reports to generate gold-standard data sets. RESULTS: NLP systems yielded internal validity metrics that ranged from 0.95 to 1.00 for EGFR and ALK test status and results in CSS e-path reports. NLP showed high internal accuracy for the ascertainment of EGFR and ALK in CSSpatients-F scores of 0.95 and 0.96, respectively. In the external validation analysis, NLP yielded metrics that ranged from 0.02 to 0.96 in KCR reports and F scores of 0.70 and 0.72, respectively, in KCR patients. CONCLUSION: NLP is an internally valid method for the ascertainment of EGFR and ALK test information from e-path reports available in SEER registries, but future work is necessary to increase NLP external validity.
Authors: Gregory P Kalemkerian; Navneet Narula; Erin B Kennedy; William A Biermann; Jessica Donington; Natasha B Leighl; Madelyn Lew; James Pantelas; Suresh S Ramalingam; Martin Reck; Anjali Saqi; Michael Simoff; Navneet Singh; Baskaran Sundaram Journal: J Clin Oncol Date: 2018-02-05 Impact factor: 44.544
Authors: David S Carrell; Scott Halgrim; Diem-Thy Tran; Diana S M Buist; Jessica Chubak; Wendy W Chapman; Guergana Savova Journal: Am J Epidemiol Date: 2014-01-30 Impact factor: 4.897
Authors: Daniel Morgensztern; Meghan J Campo; Suzanne E Dahlberg; Robert C Doebele; Edward Garon; David E Gerber; Sarah B Goldberg; Peter S Hammerman; Rebecca S Heist; Thomas Hensing; Leora Horn; Suresh S Ramalingam; Charles M Rudin; Ravi Salgia; Lecia V Sequist; Alice T Shaw; George R Simon; Neeta Somaiah; David R Spigel; John Wrangle; David Johnson; Roy S Herbst; Paul Bunn; Ramaswamy Govindan Journal: J Thorac Oncol Date: 2015-01 Impact factor: 15.609
Authors: Martin Reck; Delvys Rodríguez-Abreu; Andrew G Robinson; Rina Hui; Tibor Csőszi; Andrea Fülöp; Maya Gottfried; Nir Peled; Ali Tafreshi; Sinead Cuffe; Mary O'Brien; Suman Rao; Katsuyuki Hotta; Melanie A Leiby; Gregory M Lubiniecki; Yue Shentu; Reshma Rangwala; Julie R Brahmer Journal: N Engl J Med Date: 2016-10-08 Impact factor: 91.245
Authors: David S Ettinger; Douglas E Wood; Wallace Akerley; Lyudmila A Bazhenova; Hossein Borghaei; David Ross Camidge; Richard T Cheney; Lucian R Chirieac; Thomas A D'Amico; Thomas J Dilling; M Chris Dobelbower; Ramaswamy Govindan; Mark Hennon; Leora Horn; Thierry M Jahan; Ritsuko Komaki; Rudy P Lackner; Michael Lanuti; Rogerio Lilenbaum; Jules Lin; Billy W Loo; Renato Martins; Gregory A Otterson; Jyoti D Patel; Katherine M Pisters; Karen Reckamp; Gregory J Riely; Steven E Schild; Theresa A Shapiro; Neelesh Sharma; James Stevenson; Scott J Swanson; Kurt Tauer; Stephen C Yang; Kristina Gregory; Miranda Hughes Journal: J Natl Compr Canc Netw Date: 2016-03 Impact factor: 11.908
Authors: Anthony Rios; Eric B Durbin; Isaac Hands; Susanne M Arnold; Darshil Shah; Stephen M Schwartz; Bernardo H L Goulart; Ramakanth Kavuluru Journal: J Biomed Inform Date: 2019-08-08 Impact factor: 6.317