Anthony Rios1, Eric B Durbin2, Isaac Hands3, Susanne M Arnold4, Darshil Shah5, Stephen M Schwartz6, Bernardo H L Goulart6, Ramakanth Kavuluru7. 1. Department of Information Systems and Cyber Security, University of Texas at San Antonio, USA. 2. Division of Biomedical Informatics, Dept. of Internal Medicine, University of Kentucky, USA; Kentucky Cancer Registry, Lexington, KY, USA. 3. Kentucky Cancer Registry, Lexington, KY, USA. 4. Markey Cancer Center, University of Kentucky, Lexington, KY, USA. 5. Ironwood Cancer and Research Centers, Avondale, AZ, USA. 6. Fred Hutchinson Cancer Research Center, Seattle, WA, USA. 7. Division of Biomedical Informatics, Dept. of Internal Medicine, University of Kentucky, USA; Computer Science Department, University of Kentucky, USA. Electronic address: ramakanth.kavuluru@uky.edu.
Abstract
OBJECTIVE: We study the performance of machine learning (ML) methods, including neural networks (NNs), to extract mutational test results from pathology reports collected by cancer registries. Given the lack of hand-labeled datasets for mutational test result extraction, we focus on the particular use-case of extracting Epidermal Growth Factor Receptor mutation results in non-small cell lung cancers. We explore the generalization of NNs across different registries where our goals are twofold: (1) to assess how well models trained on a registry's data port to test data from a different registry and (2) to assess whether and to what extent such models can be improved using state-of-the-art neural domain adaptation techniques under different assumptions about what is available (labeled vs unlabeled data) at the target registry site. MATERIALS AND METHODS: We collected data from two registries: the Kentucky Cancer Registry (KCR) and the Fred Hutchinson Cancer Research Center (FH) Cancer Surveillance System. We combine NNs with adversarial domain adaptation to improve cross-registry performance. We compare to other classifiers in the standard supervised classification, unsupervised domain adaptation, and supervised domain adaptation scenarios. RESULTS: The performance of ML methods varied between registries. To extract positive results, the basic convolutional neural network (CNN) had an F1 of 71.5% on the KCR dataset and 95.7% on the FH dataset. For the KCR dataset, the CNN F1 results were low when trained on FH data (Positive F1: 23%). Using our proposed adversarial CNN, without any labeled data, we match the F1 of the models trained directly on each target registry's data. The adversarial CNN F1 improved when trained on FH and applied to KCR dataset (Positive F1: 70.8%). We found similar performance improvements when we trained on KCR and tested on FH reports (Positive F1: 45% to 96%). CONCLUSION: Adversarial domain adaptation improves the performance of NNs applied to pathology reports. In the unsupervised domain adaptation setting, we match the performance of models that are trained directly on target registry's data by using source registry's labeled data and unlabeled examples from the target registry.
OBJECTIVE: We study the performance of machine learning (ML) methods, including neural networks (NNs), to extract mutational test results from pathology reports collected by cancer registries. Given the lack of hand-labeled datasets for mutational test result extraction, we focus on the particular use-case of extracting Epidermal Growth Factor Receptor mutation results in non-small cell lung cancers. We explore the generalization of NNs across different registries where our goals are twofold: (1) to assess how well models trained on a registry's data port to test data from a different registry and (2) to assess whether and to what extent such models can be improved using state-of-the-art neural domain adaptation techniques under different assumptions about what is available (labeled vs unlabeled data) at the target registry site. MATERIALS AND METHODS: We collected data from two registries: the Kentucky Cancer Registry (KCR) and the Fred Hutchinson Cancer Research Center (FH) Cancer Surveillance System. We combine NNs with adversarial domain adaptation to improve cross-registry performance. We compare to other classifiers in the standard supervised classification, unsupervised domain adaptation, and supervised domain adaptation scenarios. RESULTS: The performance of ML methods varied between registries. To extract positive results, the basic convolutional neural network (CNN) had an F1 of 71.5% on the KCR dataset and 95.7% on the FH dataset. For the KCR dataset, the CNN F1 results were low when trained on FH data (Positive F1: 23%). Using our proposed adversarial CNN, without any labeled data, we match the F1 of the models trained directly on each target registry's data. The adversarial CNN F1 improved when trained on FH and applied to KCR dataset (Positive F1: 70.8%). We found similar performance improvements when we trained on KCR and tested on FH reports (Positive F1: 45% to 96%). CONCLUSION: Adversarial domain adaptation improves the performance of NNs applied to pathology reports. In the unsupervised domain adaptation setting, we match the performance of models that are trained directly on target registry's data by using source registry's labeled data and unlabeled examples from the target registry.
Authors: Simon Kocbek; Lawrence Cavedon; David Martinez; Christopher Bain; Chris Mac Manus; Gholamreza Haffari; Ingrid Zukerman; Karin Verspoor Journal: J Biomed Inform Date: 2016-10-11 Impact factor: 6.317
Authors: David S Ettinger; Douglas E Wood; Wallace Akerley; Lyudmila A Bazhenova; Hossein Borghaei; David Ross Camidge; Richard T Cheney; Lucian R Chirieac; Thomas A D'Amico; Thomas J Dilling; M Chris Dobelbower; Ramaswamy Govindan; Mark Hennon; Leora Horn; Thierry M Jahan; Ritsuko Komaki; Rudy P Lackner; Michael Lanuti; Rogerio Lilenbaum; Jules Lin; Billy W Loo; Renato Martins; Gregory A Otterson; Jyoti D Patel; Katherine M Pisters; Karen Reckamp; Gregory J Riely; Steven E Schild; Theresa A Shapiro; Neelesh Sharma; James Stevenson; Scott J Swanson; Kurt Tauer; Stephen C Yang; Kristina Gregory; Miranda Hughes Journal: J Natl Compr Canc Netw Date: 2016-03 Impact factor: 11.908
Authors: Neal I Lindeman; Philip T Cagle; Mary Beth Beasley; Dhananjay Arun Chitale; Sanja Dacic; Giuseppe Giaccone; Robert Brian Jenkins; David J Kwiatkowski; Juan-Sebastian Saldivar; Jeremy Squire; Erik Thunnissen; Marc Ladanyi Journal: J Thorac Oncol Date: 2013-07 Impact factor: 15.609