Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Automatic ICD-10 classification of cancers from free-text death certificates.

Literature DB >> 26323193

Automatic ICD-10 classification of cancers from free-text death certificates.

Bevan Koopman¹, Guido Zuccon², Anthony Nguyen³, Anton Bergheim⁴, Narelle Grayson⁵.

Abstract

OBJECTIVE: Death certificates provide an invaluable source for cancer mortality statistics; however, this value can only be realised if accurate, quantitative data can be extracted from certificates--an aim hampered by both the volume and variable nature of certificates written in natural language. This paper proposes an automatic classification system for identifying cancer related causes of death from death certificates.
METHODS: Detailed features, including terms, n-grams and SNOMED CT concepts were extracted from a collection of 447,336 death certificates. These features were used to train Support Vector Machine classifiers (one classifier for each cancer type). The classifiers were deployed in a cascaded architecture: the first level identified the presence of cancer (i.e., binary cancer/nocancer) and the second level identified the type of cancer (according to the ICD-10 classification system). A held-out test set was used to evaluate the effectiveness of the classifiers according to precision, recall and F-measure. In addition, detailed feature analysis was performed to reveal the characteristics of a successful cancer classification model.
RESULTS: The system was highly effective at identifying cancer as the underlying cause of death (F-measure 0.94). The system was also effective at determining the type of cancer for common cancers (F-measure 0.7). Rare cancers, for which there was little training data, were difficult to classify accurately (F-measure 0.12). Factors influencing performance were the amount of training data and certain ambiguous cancers (e.g., those in the stomach region). The feature analysis revealed a combination of features were important for cancer type classification, with SNOMED CT concept and oncology specific morphology features proving the most valuable.
CONCLUSION: The system proposed in this study provides automatic identification and characterisation of cancers from large collections of free-text death certificates. This allows organisations such as Cancer Registries to monitor and report on cancer mortality in a timely and accurate manner. In addition, the methods and findings are generally applicable beyond cancer classification and to other sources of medical text besides death certificates.

Entities: Disease

Keywords: Cancer classification; Death certificates; Machine learning; Natural language processing

Mesh：

Year: 2015 PMID： 26323193 DOI： 10.1016/j.ijmedinf.2015.08.004

Source DB: PubMed Journal: Int J Med Inform ISSN： 1386-5056 Impact factor: 4.046

Keyword Cloud
Cited

22 in total

1. Computer-Assisted Diagnostic Coding: Effectiveness of an NLP-based approach using SNOMED CT to ICD-10 mappings.

Authors: Anthony N Nguyen; Donna Truran; Madonna Kemp; Bevan Koopman; David Conlan; John O'Dwyer; Ming Zhang; Sarvnaz Karimi; Hamed Hassanzadeh; Michael J Lawley; Damian Green
Journal: AMIA Annu Symp Proc Date: 2018-12-05

2. Can structured EHR data support clinical coding? A data mining approach.

Authors: José Carlos Ferrão; Mónica Duarte Oliveira; Filipe Janela; Henrique M G Martins; Daniel Gartner
Journal: Health Syst (Basingstoke) Date: 2020-03-01

3. Clinical Natural Language Processing in 2015: Leveraging the Variety of Texts of Clinical Interest.

Authors: A Névéol; P Zweigenbaum
Journal: Yearb Med Inform Date: 2016-11-10

4. EHR problem list clustering for improved topic-space navigation.

Authors: Markus Kreuzthaler; Bastian Pfeifer; Jose Antonio Vera Ramos; Diether Kramer; Victor Grogger; Sylvia Bredenfeldt; Markus Pedevilla; Peter Krisper; Stefan Schulz
Journal: BMC Med Inform Decis Mak Date: 2019-04-04 Impact factor: 2.796

5. ICD Coding from Clinical Text Using Multi-Filter Residual Convolutional Neural Network.

Authors: Fei Li; Hong Yu
Journal: Proc Conf AAAI Artif Intell Date: 2020-04-03

6. AUTOMATIC ICD-10 CODING USING PRESCRIBED DRUGS DATA.

Authors: Alexander Dokumentov; Yassien Shaalan; Piyapong Khumrin; Krit Khwanngern; Anawat Wisetborisut; Thanakom Hatsadeang; Nattapat Karaket; Witthawin Achariyaviriya; Sansanee Auephanwiriyakul; Nipon Theera-Umpon; Terence Siganakis
Journal: Perspect Health Inf Manag Date: 2021-07-01

Review 7. Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review.

Authors: Kory Kreimeyer; Matthew Foster; Abhishek Pandey; Nina Arya; Gwendolyn Halford; Sandra F Jones; Richard Forshee; Mark Walderhaug; Taxiarchis Botsis
Journal: J Biomed Inform Date: 2017-07-17 Impact factor: 6.317

8. Neural Machine Translation-Based Automated Current Procedural Terminology Classification System Using Procedure Text: Development and Validation Study.

Authors: Hyeon Joo; Michael Burns; Sai Saradha Kalidaikurichi Lakshmanan; Yaokun Hu; V G Vinod Vydiswaran
Journal: JMIR Form Res Date: 2021-05-26

9. Automatic classification of histopathological diagnoses for building a large scale tissue catalogue.

Authors: Robert Reihs; Heimo Müller; Stefan Sauer; Kurt Zatloukal
Journal: Health Technol (Berl) Date: 2016-12-22

10. Automatic ICD-10 coding algorithm using an improved longest common subsequence based on semantic similarity.

Authors: YunZhi Chen; HuiJuan Lu; LanJuan Li
Journal: PLoS One Date: 2017-03-17 Impact factor: 3.240