Literature DB >> 26323193

Automatic ICD-10 classification of cancers from free-text death certificates.

Bevan Koopman1, Guido Zuccon2, Anthony Nguyen3, Anton Bergheim4, Narelle Grayson5.   

Abstract

OBJECTIVE: Death certificates provide an invaluable source for cancer mortality statistics; however, this value can only be realised if accurate, quantitative data can be extracted from certificates--an aim hampered by both the volume and variable nature of certificates written in natural language. This paper proposes an automatic classification system for identifying cancer related causes of death from death certificates.
METHODS: Detailed features, including terms, n-grams and SNOMED CT concepts were extracted from a collection of 447,336 death certificates. These features were used to train Support Vector Machine classifiers (one classifier for each cancer type). The classifiers were deployed in a cascaded architecture: the first level identified the presence of cancer (i.e., binary cancer/nocancer) and the second level identified the type of cancer (according to the ICD-10 classification system). A held-out test set was used to evaluate the effectiveness of the classifiers according to precision, recall and F-measure. In addition, detailed feature analysis was performed to reveal the characteristics of a successful cancer classification model.
RESULTS: The system was highly effective at identifying cancer as the underlying cause of death (F-measure 0.94). The system was also effective at determining the type of cancer for common cancers (F-measure 0.7). Rare cancers, for which there was little training data, were difficult to classify accurately (F-measure 0.12). Factors influencing performance were the amount of training data and certain ambiguous cancers (e.g., those in the stomach region). The feature analysis revealed a combination of features were important for cancer type classification, with SNOMED CT concept and oncology specific morphology features proving the most valuable.
CONCLUSION: The system proposed in this study provides automatic identification and characterisation of cancers from large collections of free-text death certificates. This allows organisations such as Cancer Registries to monitor and report on cancer mortality in a timely and accurate manner. In addition, the methods and findings are generally applicable beyond cancer classification and to other sources of medical text besides death certificates.
Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

Entities:  

Keywords:  Cancer classification; Death certificates; Machine learning; Natural language processing

Mesh:

Year:  2015        PMID: 26323193     DOI: 10.1016/j.ijmedinf.2015.08.004

Source DB:  PubMed          Journal:  Int J Med Inform        ISSN: 1386-5056            Impact factor:   4.046


  22 in total

1.  Computer-Assisted Diagnostic Coding: Effectiveness of an NLP-based approach using SNOMED CT to ICD-10 mappings.

Authors:  Anthony N Nguyen; Donna Truran; Madonna Kemp; Bevan Koopman; David Conlan; John O'Dwyer; Ming Zhang; Sarvnaz Karimi; Hamed Hassanzadeh; Michael J Lawley; Damian Green
Journal:  AMIA Annu Symp Proc       Date:  2018-12-05

2.  Can structured EHR data support clinical coding? A data mining approach.

Authors:  José Carlos Ferrão; Mónica Duarte Oliveira; Filipe Janela; Henrique M G Martins; Daniel Gartner
Journal:  Health Syst (Basingstoke)       Date:  2020-03-01

3.  Clinical Natural Language Processing in 2015: Leveraging the Variety of Texts of Clinical Interest.

Authors:  A Névéol; P Zweigenbaum
Journal:  Yearb Med Inform       Date:  2016-11-10

4.  EHR problem list clustering for improved topic-space navigation.

Authors:  Markus Kreuzthaler; Bastian Pfeifer; Jose Antonio Vera Ramos; Diether Kramer; Victor Grogger; Sylvia Bredenfeldt; Markus Pedevilla; Peter Krisper; Stefan Schulz
Journal:  BMC Med Inform Decis Mak       Date:  2019-04-04       Impact factor: 2.796

5.  ICD Coding from Clinical Text Using Multi-Filter Residual Convolutional Neural Network.

Authors:  Fei Li; Hong Yu
Journal:  Proc Conf AAAI Artif Intell       Date:  2020-04-03

6.  AUTOMATIC ICD-10 CODING USING PRESCRIBED DRUGS DATA.

Authors:  Alexander Dokumentov; Yassien Shaalan; Piyapong Khumrin; Krit Khwanngern; Anawat Wisetborisut; Thanakom Hatsadeang; Nattapat Karaket; Witthawin Achariyaviriya; Sansanee Auephanwiriyakul; Nipon Theera-Umpon; Terence Siganakis
Journal:  Perspect Health Inf Manag       Date:  2021-07-01

Review 7.  Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review.

Authors:  Kory Kreimeyer; Matthew Foster; Abhishek Pandey; Nina Arya; Gwendolyn Halford; Sandra F Jones; Richard Forshee; Mark Walderhaug; Taxiarchis Botsis
Journal:  J Biomed Inform       Date:  2017-07-17       Impact factor: 6.317

8.  Neural Machine Translation-Based Automated Current Procedural Terminology Classification System Using Procedure Text: Development and Validation Study.

Authors:  Hyeon Joo; Michael Burns; Sai Saradha Kalidaikurichi Lakshmanan; Yaokun Hu; V G Vinod Vydiswaran
Journal:  JMIR Form Res       Date:  2021-05-26

9.  Automatic classification of histopathological diagnoses for building a large scale tissue catalogue.

Authors:  Robert Reihs; Heimo Müller; Stefan Sauer; Kurt Zatloukal
Journal:  Health Technol (Berl)       Date:  2016-12-22

10.  Automatic ICD-10 coding algorithm using an improved longest common subsequence based on semantic similarity.

Authors:  YunZhi Chen; HuiJuan Lu; LanJuan Li
Journal:  PLoS One       Date:  2017-03-17       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.