| Literature DB >> 31077817 |
Mario Almagro1, Raquel Martínez2, Soto Montalvo3, Víctor Fresno4.
Abstract
Automatic ICD-10 coding is an unresolved challenge in terms of Machine Learning tasks. Despite hospitals generating an enormous amount of clinical documents, data is considerably sparse, associated with a very skewed and unbalanced code distribution, what entails reduced interoperability. In addition, in some languages the availability of coded documents is very limited. This paper proposes a cross-lingual approach based on Machine Translation methods to code death certificates with ICD-10 using supervised learning. The aim of this approach is to increase the availability of coded documents by combining collections of different languages, which may also contribute to reduce their possible bias in the ICD distribution, i.e. to avoid the promotion of a subset of codes due to service or environmental factors. A significant improvement in system performance is achieved for those labels with few occurrences.Entities:
Keywords: Cross-lingual approach; Electronic health records; ICD-10 coding; Machine translation; Text mining
Mesh:
Year: 2019 PMID: 31077817 DOI: 10.1016/j.jbi.2019.103207
Source DB: PubMed Journal: J Biomed Inform ISSN: 1532-0464 Impact factor: 6.317