Merlijn Sevenster1, Jeffrey Bozeman2, Andrea Cowhy2, William Trost2. 1. Clinical Informatics, Interventional & Translational Solutions, Philips Research North America, 345 Scarborough Road, Briarcliff Manor, NY, USA. Electronic address: Merlijn.sevenster@philips.com. 2. Department of Medicine, University of Chicago, Chicago, IL, USA.
Abstract
OBJECTIVE: To standardize and objectivize treatment response assessment in oncology, guidelines have been proposed that are driven by radiological measurements, which are typically communicated in free-text reports defying automated processing. We study through inter-annotator agreement and natural language processing (NLP) algorithm development the task of pairing measurements that quantify the same finding across consecutive radiology reports, such that each measurement is paired with at most one other ("partial uniqueness"). METHODS AND MATERIALS: Ground truth is created based on 283 abdomen and 311 chest CT reports of 50 patients each. A pre-processing engine segments reports and extracts measurements. Thirteen features are developed based on volumetric similarity between measurements, semantic similarity between their respective narrative contexts and structural properties of their report positions. A Random Forest classifier (RF) integrates all features. A "mutual best match" (MBM) post-processor ensures partial uniqueness. RESULTS: In an end-to-end evaluation, RF has precision 0.841, recall 0.807, F-measure 0.824 and AUC 0.971; with MBM, which performs above chance level (P<0.001), it has precision 0.899, recall 0.776, F-measure 0.833 and AUC 0.935. RF (RF+MBM) has error-free performance on 52.7% (57.4%) of report pairs. DISCUSSION: Inter-annotator agreement of three domain specialists with the ground truth (κ>0.960) indicates that the task is well defined. Domain properties and inter-section differences are discussed to explain superior performance in abdomen. Enforcing partial uniqueness has mixed but minor effects on performance. CONCLUSION: A combined machine learning-filtering approach is proposed for pairing measurements, which can support prospective (supporting treatment response assessment) and retrospective purposes (data mining).
OBJECTIVE: To standardize and objectivize treatment response assessment in oncology, guidelines have been proposed that are driven by radiological measurements, which are typically communicated in free-text reports defying automated processing. We study through inter-annotator agreement and natural language processing (NLP) algorithm development the task of pairing measurements that quantify the same finding across consecutive radiology reports, such that each measurement is paired with at most one other ("partial uniqueness"). METHODS AND MATERIALS: Ground truth is created based on 283 abdomen and 311 chest CT reports of 50 patients each. A pre-processing engine segments reports and extracts measurements. Thirteen features are developed based on volumetric similarity between measurements, semantic similarity between their respective narrative contexts and structural properties of their report positions. A Random Forest classifier (RF) integrates all features. A "mutual best match" (MBM) post-processor ensures partial uniqueness. RESULTS: In an end-to-end evaluation, RF has precision 0.841, recall 0.807, F-measure 0.824 and AUC 0.971; with MBM, which performs above chance level (P<0.001), it has precision 0.899, recall 0.776, F-measure 0.833 and AUC 0.935. RF (RF+MBM) has error-free performance on 52.7% (57.4%) of report pairs. DISCUSSION: Inter-annotator agreement of three domain specialists with the ground truth (κ>0.960) indicates that the task is well defined. Domain properties and inter-section differences are discussed to explain superior performance in abdomen. Enforcing partial uniqueness has mixed but minor effects on performance. CONCLUSION: A combined machine learning-filtering approach is proposed for pairing measurements, which can support prospective (supporting treatment response assessment) and retrospective purposes (data mining).
Authors: Margaret Mahan; Daniel Rafter; Hannah Casey; Marta Engelking; Tessneem Abdallah; Charles Truwit; Mark Oswood; Uzma Samadani Journal: PLoS One Date: 2020-07-01 Impact factor: 3.240
Authors: Yujia Bao; Zhengyi Deng; Yan Wang; Heeyoon Kim; Victor Diego Armengol; Francisco Acevedo; Nofal Ouardaoui; Cathy Wang; Giovanni Parmigiani; Regina Barzilay; Danielle Braun; Kevin S Hughes Journal: JCO Clin Cancer Inform Date: 2019-09
Authors: Kathryn C Arbour; Anh Tuan Luu; Jia Luo; Justin F Gainor; Regina Barzilay; Matthew D Hellmann; Hira Rizvi; Andrew J Plodkowski; Mustafa Sakhi; Kevin B Huang; Subba R Digumarthy; Michelle S Ginsberg; Jeffrey Girshman; Mark G Kris; Gregory J Riely; Adam Yala Journal: Cancer Discov Date: 2020-09-21 Impact factor: 39.397
Authors: Selen Bozkurt; Francisco Gimenez; Elizabeth S Burnside; Kemal H Gulkesen; Daniel L Rubin Journal: J Biomed Inform Date: 2016-07-04 Impact factor: 6.317
Authors: Hong Cui; Dongfang Xu; Steven S Chong; Martin Ramirez; Thomas Rodenhausen; James A Macklin; Bertram Ludäscher; Robert A Morris; Eduardo M Soto; Nicolás Mongiardino Koch Journal: BMC Bioinformatics Date: 2016-11-17 Impact factor: 3.169
Authors: Arlene Casey; Emma Davidson; Michael Poon; Hang Dong; Daniel Duma; Andreas Grivas; Claire Grover; Víctor Suárez-Paniagua; Richard Tobin; William Whiteley; Honghan Wu; Beatrice Alex Journal: BMC Med Inform Decis Mak Date: 2021-06-03 Impact factor: 2.796