| Literature DB >> 29602312 |
Aurélie Névéol1, Hercules Dalianis2, Sumithra Velupillai3,4, Guergana Savova5, Pierre Zweigenbaum1.
Abstract
BACKGROUND: Natural language processing applied to clinical text or aimed at a clinical outcome has been thriving in recent years. This paper offers the first broad overview of clinical Natural Language Processing (NLP) for languages other than English. Recent studies are summarized to offer insights and outline opportunities in this area. MAIN BODY: We envision three groups of intended readers: (1) NLP researchers leveraging experience gained in other languages, (2) NLP researchers faced with establishing clinical text processing in a language other than English, and (3) clinical informatics researchers and practitioners looking for resources in their languages in order to apply NLP techniques and tools to clinical practice and/or investigation. We review work in clinical NLP in languages other than English. We classify these studies into three groups: (i) studies describing the development of new NLP systems or components de novo, (ii) studies describing the adaptation of NLP architectures developed for English to another language, and (iii) studies focusing on a particular clinical application.Entities:
Keywords: Clinical Decision-Making; Languages other than English; Natural Language Processing
Mesh:
Year: 2018 PMID: 29602312 PMCID: PMC5877394 DOI: 10.1186/s13326-018-0179-8
Source DB: PubMed Journal: J Biomed Semantics
Number of publications returned by a PubMed search for “Natural Language Processing AND *language* [tiab]” where *language* is instantiated with a specific language name, on January 13, 2017 along with references cited in this review for each language. The last row (bolded) presents overall information for all languages studied in this review
| Language (ISO 639-1 language code) | PubMed Count | Cited in this review |
|---|---|---|
| French (FR) | 111 | [ |
| [ | ||
| [ | ||
| [ | ||
| German (DE) | 69 | [ |
| [ | ||
| Chinese (ZH) | 54 | [ |
| Spanish (ES) | 39 | [ |
| *[ | ||
| Japanese (JA) | 30 | [ |
| Dutch (DU) | 20 | [ |
| Swedish (SV) | 15 | [ |
| [ | ||
| Portuguese (PT) | 14 | [ |
| Greek (EL) | 14 | [ |
| Italian (IT) | 12 | [ |
| Korean (KO) | 11 | [ |
| Arabic (AR) | 9 | [ |
| Finnish (FI) | 9 | [ |
| Czech (CS), Russian (RU) | 7 | [ |
| Polish (PL) | 6 | [ |
| Hebrew (HE) | 5 | [ |
| Danish (DA) | 4 | [ |
| Turkish (TR) | 3 | [ |
| Bulgarian (BG) | 2 | [ |
| Basque (EU) | 1 | [ |
| Georgian (KA) | 1 | [ |
| Hungarian (HU) | 0 | [ |
|
|
|
|
Note that some included articles are not indexed in MEDLINE but in other publication venues such as ACL. A star indicates work that addresses several languages
Fig. 1Growth of bio-clinical NLP publications in MEDLINE over the past decade, for the top 5 studied languages other than English
List of studies presented in this review categorized by NLP method used and language(s) addressed
| Method/Task | Language/reference cited in this review |
|---|---|
| Core NLP | |
| - Morphology | FR [ |
| - Part of Speech tagging | PT [ |
| - Parsing | FI [ |
| - Segmentation | DE [ |
| Resource development | |
| - Lexicons | BG [ |
| - Corpora and annotation | EL [ |
| ES [ | |
| - Models, methods | DE [ |
| De-identification | FR [ |
| Information extraction | |
| - Medical Concepts | BG [ |
| IT [ | |
| - Findings/Symptoms | DE [ |
| - Drugs/Adverse events | BG [ |
| - Specific characteristics | EN-{ZH,FR,DE,JA,ES} [ |
| - Relations | BG [ |
| Classification | |
| - Phenotyping from EHR text | BG [ |
| SV [ | |
| - Indexing and coding | EN-FR [ |
| - Patient-authored text | JA [ |
| - Cohort stratification | DA [ |
| Context Analysis | DU [ |
| - Negation detection | BG [ |
| FR,DE,SV [ | |
| - Uncertainty/Assertion | SV [ |
| - Temporality | FR [ |
| - Abbreviation | DE [ |
| - Experiencer | DU [ |
| Multilingual tasks | |
| - Translation | EN-ES [ |
| EN-{FR,DE,HU,PL,ES,TU} [ | |
| - Information Retrieval | AR [ |
| - Cultural analysis | DE [ |
| Shared tasks | |
| - CLEF-ER2013 | DE,DU,FR,ES- [ |
| - CLEF eHealth 2015, 2016 | FR [ |
| - NTCIR 2014, 2016 | JA [ |
The two letter language codes are introduced in Table 1. When multiples languages are addressed in one paper we provide a comma separated list; dashes mark language pairs in multilingual work