Dung H M Nguyen1, Jon D Patrick1. 1. School of Information Technologies, University of Sydney, Sydney, New South Wales, Australia.
Abstract
OBJECTIVE: This paper presents an automated system for classifying the results of imaging examinations (CT, MRI, positron emission tomography) into reportable and non-reportable cancer cases. This system is part of an industrial-strength processing pipeline built to extract content from radiology reports for use in the Victorian Cancer Registry. MATERIALS AND METHODS: In addition to traditional supervised learning methods such as conditional random fields and support vector machines, active learning (AL) approaches were investigated to optimize training production and further improve classification performance. The project involved two pilot sites in Victoria, Australia (Lake Imaging (Ballarat) and Peter MacCallum Cancer Centre (Melbourne)) and, in collaboration with the NSW Central Registry, one pilot site at Westmead Hospital (Sydney). RESULTS: The reportability classifier performance achieved 98.25% sensitivity and 96.14% specificity on the cancer registry's held-out test set. Up to 92% of training data needed for supervised machine learning can be saved by AL. DISCUSSION: AL is a promising method for optimizing the supervised training production used in classification of radiology reports. When an AL strategy is applied during the data selection process, the cost of manual classification can be reduced significantly. CONCLUSIONS: The most important practical application of the reportability classifier is that it can dramatically reduce human effort in identifying relevant reports from the large imaging pool for further investigation of cancer. The classifier is built on a large real-world dataset and can achieve high performance in filtering relevant reports to support cancer registries. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
OBJECTIVE: This paper presents an automated system for classifying the results of imaging examinations (CT, MRI, positron emission tomography) into reportable and non-reportable cancer cases. This system is part of an industrial-strength processing pipeline built to extract content from radiology reports for use in the Victorian Cancer Registry. MATERIALS AND METHODS: In addition to traditional supervised learning methods such as conditional random fields and support vector machines, active learning (AL) approaches were investigated to optimize training production and further improve classification performance. The project involved two pilot sites in Victoria, Australia (Lake Imaging (Ballarat) and Peter MacCallum Cancer Centre (Melbourne)) and, in collaboration with the NSW Central Registry, one pilot site at Westmead Hospital (Sydney). RESULTS: The reportability classifier performance achieved 98.25% sensitivity and 96.14% specificity on the cancer registry's held-out test set. Up to 92% of training data needed for supervised machine learning can be saved by AL. DISCUSSION: AL is a promising method for optimizing the supervised training production used in classification of radiology reports. When an AL strategy is applied during the data selection process, the cost of manual classification can be reduced significantly. CONCLUSIONS: The most important practical application of the reportability classifier is that it can dramatically reduce human effort in identifying relevant reports from the large imaging pool for further investigation of cancer. The classifier is built on a large real-world dataset and can achieve high performance in filtering relevant reports to support cancer registries. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Entities:
Keywords:
Classification; Radiology Information Systems; active learning; machine learning
Authors: Keith J Dreyer; Mannudeep K Kalra; Michael M Maher; Autumn M Hurier; Benjamin A Asfaw; Thomas Schultz; Elkan F Halpern; James H Thrall Journal: Radiology Date: 2004-12-10 Impact factor: 11.105
Authors: Rosa L Figueroa; Qing Zeng-Treitler; Long H Ngo; Sergey Goryachev; Eduardo P Wiechmann Journal: J Am Med Inform Assoc Date: 2012-06-15 Impact factor: 4.497
Authors: Yukun Chen; Robert J Carroll; Eugenia R McPeek Hinz; Anushi Shah; Anne E Eyler; Joshua C Denny; Hua Xu Journal: J Am Med Inform Assoc Date: 2013-07-13 Impact factor: 4.497
Authors: Iain A McCowan; Darren C Moore; Anthony N Nguyen; Rayleen V Bowman; Belinda E Clarke; Edwina E Duhig; Mary-Jane Fry Journal: J Am Med Inform Assoc Date: 2007-08-21 Impact factor: 4.497
Authors: Nir Nissim; Mary Regina Boland; Nicholas P Tatonetti; Yuval Elovici; George Hripcsak; Yuval Shahar; Robert Moskovitch Journal: J Biomed Inform Date: 2016-03-22 Impact factor: 6.317
Authors: Michael Nalisnik; Mohamed Amgad; Sanghoon Lee; Sameer H Halani; Jose Enrique Velazquez Vega; Daniel J Brat; David A Gutman; Lee A D Cooper Journal: Sci Rep Date: 2017-11-06 Impact factor: 4.379