Literature DB >> 23745151

Classification of cancer-related death certificates using machine learning.

Luke Butt1, Guido Zuccon, Anthony Nguyen, Anton Bergheim, Narelle Grayson.   

Abstract

BACKGROUND: Cancer monitoring and prevention relies on the critical aspect of timely notification of cancer cases. However, the abstraction and classification of cancer from the free-text of pathology reports and other relevant documents, such as death certificates, exist as complex and time-consuming activities. AIMS: In this paper, approaches for the automatic detection of notifiable cancer cases as the cause of death from free-text death certificates supplied to Cancer Registries are investigated.
METHOD: A number of machine learning classifiers were studied. Features were extracted using natural language techniques and the Medtex toolkit. The numerous features encompassed stemmed words, bi-grams, and concepts from the SNOMED CT medical terminology. The baseline consisted of a keyword spotter using keywords extracted from the long description of ICD-10 cancer related codes.
RESULTS: Death certificates with notifiable cancer listed as the cause of death can be effectively identified with the methods studied in this paper. A Support Vector Machine (SVM) classifier achieved best performance with an overall Fmeasure of 0.9866 when evaluated on a set of 5,000 freetext death certificates using the token stem feature set. The SNOMED CT concept plus token stem feature set reached the lowest variance (0.0032) and false negative rate (0.0297) while achieving an F-measure of 0.9864. The SVM classifier accounts for the first 18 of the top 40 evaluated runs, and entails the most robust classifier with a variance of 0.001141, half the variance of the other classifiers.
CONCLUSION: The selection of features significantly produced the most influences on the performance of the classifiers, although the type of classifier employed also affects performance. In contrast, the feature weighting schema created a negligible effect on performance. Specifically, it is found that stemmed tokens with or without SNOMED CT concepts create the most effective feature when combined with an SVM classifier.

Entities:  

Keywords:  Cancer Registry; Death certificates; SNOMED CT; cancer monitoring and reporting; machine learning; natural language processing

Year:  2013        PMID: 23745151      PMCID: PMC3674421          DOI: 10.4066/AMJ.2013.1654

Source DB:  PubMed          Journal:  Australas Med J        ISSN: 1836-1935


  7 in total

1.  SNOMED clinical terms: overview of the development process and project status.

Authors:  M Q Stearns; C Price; K A Spackman; A Y Wang
Journal:  Proc AMIA Symp       Date:  2001

2.  Evaluation of a generalizable approach to clinical information retrieval using the automated retrieval console (ARC).

Authors:  Leonard W D'Avolio; Thien M Nguyen; Wildon R Farwell; Yongming Chen; Felicia Fitzmeyer; Owen M Harris; Louis D Fiore
Journal:  J Am Med Inform Assoc       Date:  2010 Jul-Aug       Impact factor: 4.497

3.  Automatic extraction of cancer characteristics from free-text pathology reports for cancer notifications.

Authors:  Anthony Nguyen; Julie Moore; Michael Lawley; David Hansen; Shoni Colquist
Journal:  Stud Health Technol Inform       Date:  2011

4.  Symbolic rule-based classification of lung cancer stages from free-text pathology reports.

Authors:  Anthony N Nguyen; Michael J Lawley; David P Hansen; Rayleen V Bowman; Belinda E Clarke; Edwina E Duhig; Shoni Colquist
Journal:  J Am Med Inform Assoc       Date:  2010 Jul-Aug       Impact factor: 4.497

5.  The impact of OCR accuracy on automated cancer classification of pathology reports.

Authors:  Guido Zuccon; Anthony N Nguyen; Anton Bergheim; Sandra Wickman; Narelle Grayson
Journal:  Stud Health Technol Inform       Date:  2012

6.  Identification of pneumonia and influenza deaths using the Death Certificate Pipeline.

Authors:  Kailah Davis; Catherine Staes; Jeff Duncan; Sean Igo; Julio C Facelli
Journal:  BMC Med Inform Decis Mak       Date:  2012-05-08       Impact factor: 2.796

7.  Cancer survival in Australia, Canada, Denmark, Norway, Sweden, and the UK, 1995-2007 (the International Cancer Benchmarking Partnership): an analysis of population-based cancer registry data.

Authors:  M P Coleman; D Forman; H Bryant; J Butler; B Rachet; C Maringe; U Nur; E Tracey; M Coory; J Hatcher; C E McGahan; D Turner; L Marrett; M L Gjerstorff; T B Johannesen; J Adolfsson; M Lambe; G Lawrence; D Meechan; E J Morris; R Middleton; J Steward; M A Richards
Journal:  Lancet       Date:  2010-12-21       Impact factor: 79.321

  7 in total
  7 in total

1.  Automatic detection of tweets reporting cases of influenza like illnesses in Australia.

Authors:  Guido Zuccon; Sankalp Khanna; Anthony Nguyen; Justin Boyle; Matthew Hamlet; Mark Cameron
Journal:  Health Inf Sci Syst       Date:  2015-02-24

2.  Artificial intelligence in health - the three big challenges.

Authors:  Sankalp Khanna; Abdul Sattar; David Hansen
Journal:  Australas Med J       Date:  2013-05-30

3.  Automatic classification of diseases from free-text death certificates for real-time surveillance.

Authors:  Bevan Koopman; Sarvnaz Karimi; Anthony Nguyen; Rhydwyn McGuire; David Muscatello; Madonna Kemp; Donna Truran; Ming Zhang; Sarah Thackway
Journal:  BMC Med Inform Decis Mak       Date:  2015-07-15       Impact factor: 2.796

4.  Enhancing timeliness of drug overdose mortality surveillance: A machine learning approach.

Authors:  Patrick J Ward; Peter J Rock; Svetla Slavova; April M Young; Terry L Bunn; Ramakanth Kavuluru
Journal:  PLoS One       Date:  2019-10-16       Impact factor: 3.240

5.  Use of the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) for Processing Free Text in Health Care: Systematic Scoping Review.

Authors:  Christophe Gaudet-Blavignac; Vasiliki Foufi; Mina Bjelogrlic; Christian Lovis
Journal:  J Med Internet Res       Date:  2021-01-26       Impact factor: 5.428

6.  Clinical Characteristics of COVID-19 Patients and Application to an Artificial Intelligence System for Disease Surveillance.

Authors:  Ying-Chuan Wang; Dung-Jang Tsai; Li-Chen Yen; Ya-Hsin Yao; Tsung-Ta Chiang; Chun-Hsiang Chiu; Te-Yu Lin; Kuo-Ming Yeh; Feng-Yee Chang
Journal:  J Clin Med       Date:  2022-03-05       Impact factor: 4.241

7.  Artificial Intelligence Learning Semantics via External Resources for Classifying Diagnosis Codes in Discharge Notes.

Authors:  Chia-Cheng Lee; Sui-Lung Su; Hsiang-Cheng Chen; Chin Lin; Chia-Jung Hsu; Yu-Sheng Lou; Shih-Jen Yeh
Journal:  J Med Internet Res       Date:  2017-11-06       Impact factor: 5.428

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.