| Literature DB >> 32734151 |
Egoitz Laparra1, Steven Bethard1, Timothy A Miller2,3.
Abstract
Building clinical natural language processing (NLP) systems that work on widely varying data is an absolute necessity because of the expense of obtaining new training data. While domain adaptation research can have a positive impact on this problem, the most widely studied paradigms do not take into account the realities of clinical data sharing. To address this issue, we lay out a taxonomy of domain adaptation, parameterizing by what data is shareable. We show that the most realistic settings for clinical use cases are seriously under-studied. To support research in these important directions, we make a series of recommendations, not just for domain adaptation but for clinical NLP in general, that ensure that data, shared tasks, and released models are broadly useful, and that initiate research directions where the clinical NLP community can lead the broader NLP and machine learning fields.Entities:
Keywords: domain adaptation; machine learning; natural language processing; shared resources
Year: 2020 PMID: 32734151 PMCID: PMC7382626 DOI: 10.1093/jamiaopen/ooaa010
Source DB: PubMed Journal: JAMIA Open ISSN: 2574-2531
A proposed categorization of the space of domain adaptation algorithms
| Source shares | Target has | Target shares | Best methods |
|---|---|---|---|
| Labeled text | Labeled text | – |
Neural feature augmentation Parameter transfer Prior knowledge Instance weighting and selection |
| Raw text | – |
Neural feature correspondence learning Re-training embeddings Bootstrapping Adversarial training Auto-encoders | |
| Labeled features | Labeled text | – | Feature augmentation |
| Raw text | – | Feature correspondence learning | |
| Trained Models | Labeled text | – |
Fine-tuning Adaptive off-the-shelf |
| Models | – | ||
| Raw text | – | Online self-training | |
| Models | Pseudo in-domain data selection |
Notes: It is assumed that there is always labeled data available in the source domain. “Source shares” describes what the source site is able to share with the target site. “Target has” describes what data are available at the target site. “Target shares” describes what the target site is able to share with the source site. “Methods” gives names for the types of methods in each configuration, and citations to examples of such work