| Literature DB >> 30157522 |
Michele Filannino1,2, Özlem Uzuner1,2.
Abstract
OBJECTIVES: To review the latest scientific challenges organized in clinical Natural Language Processing (NLP) by highlighting the tasks, the most effective methodologies used, the data, and the sharing strategies.Entities:
Mesh:
Year: 2018 PMID: 30157522 PMCID: PMC6115235 DOI: 10.1055/s-0038-1667079
Source DB: PubMed Journal: Yearb Med Inform ISSN: 0943-4747
Clinical NLP Challenges, the tasks they posed, and the number of participating teams, since 2015, ordered by data sensitivity.
| Category | Year | Challenge name | Task description | Data type | Data source | teams type | |||
|---|---|---|---|---|---|---|---|---|---|
| Academia | Industry | Joint | Total | ||||||
| 2015 |
TREC Clinical Decision Support (CDS)
| Paticnt-ccntcrcd information retrieval | Medical case narratives | Synthetic, PubMed | 33 | 3 | 0 | 36 | |
|
TREC Precision Medicine
| |||||||||
| Synthetic | 2017 | > Track 1 | Patient-centered literature article retrieval Patient-centered clinical trials retrieval | Semi-structured cases |
Synthetic, PubMed,
| 27 | 5 | 0 | 32 |
| 2016 |
CLEF cHealth
| Information extraction | Nursing handover notes | NICTA synthetic nursing handover notes | 4 | 0 | 0 | 4 | |
|
Text Analysis Conference (TAC) Adverse Drug Reaction Extraction from Drug Labels (ADR)
| |||||||||
| > Track 1 | ADR mentions and modifiers extraction | ||||||||
| Prescription drug labels | 2017 | > Track 2 | Relation extraction Positive ADR filtering Positive ADR normalization | Drug labels |
| 6 | 3 | 1 | 10 |
| 2015 |
CLPsych: Depression and PTSD on Twitter
| Binary classification of depression and PTSD users | Social media | 3 | 0 | 0 | 3 | ||
|
Social Media Mining (SMM)
| |||||||||
| 2016 | > Track 1 | ADR classification Information extraction Concept normalization | Social media | 9 | 2 | 0 | 11 | ||
| Online social data | 2017 |
Social Media Mining for Health Applications (SMM4HA)
| ADR classification Classification of medication intake Concept normalization | Social media | 12 | 1 | 0 | 13 | |
| 2016 |
CLPsych: Triaging content in online peer-support forums
| Classification of mental health severity in 4 levels | Forum | RcachOut | 13 | 1 | 1 | 15 | |
| 2017 |
CLPsych: Triaging content in online peer-support forums
| Classification of mental health severity in 4 levels | Forum | RcachOut | 12 |
| 1 | 15 | |
| 2017 |
NTCIR-13 MedWeb
| 8-class classification of diseases and symptoms | Multilingual Social media | 7 | 1 | 1 | 9 | ||
|
Analysis of Clinical Text (ACT)
| |||||||||
| 2015 | > Track 1 | Disorder NER and normalization | Clinical notes | ShARc corpus (MIMIC) | 18 | 3 | 0 | 21 | |
| 2016 |
TREC Clinical Decision Support (CDS)
| Paticnt-ccntcrcd IR | Nursing admission notes | MIMIC, PubMed | 21 |
|
| 26 | |
| Medication and Adverse Drug Events (MADE1.0) | |||||||||
| 2017 | > Track 1 | Medication, ADE, sign and symptom identification Relation extraction | Clinical notes | UMass Memorial Medical Center | |||||
|
Clinical TempEval
| |||||||||
| > Track 1 | Time expression extraction | ||||||||
| 2015 | > Track 2 | Event extraction | Pathology reports | Mayo Clinic | 3 | 0 | 0 | 3 | |
| Clinical data |
Clinical TempEval
| Time expression extraction | |||||||
| 2016 | > Track 2 | Event extraction | Pathology reports | Mayo Clinic | U | 3 | 0 | 14 | |
|
Clinical TempEval
| |||||||||
| 2017 | > Track 1 | Time expression extraction (cross-domain) Event extraction (cross-domain) | Pathology reports, Clinical notes | Mayo Clinic | 9 | 2 | 0 | 11 | |
| > Track 3 | Relation extraction (wrt DCT) | ||||||||
|
Centers for Excellence in Genomics N-GRJD (CEGS-NGR1D)
| |||||||||
| 2016 | > Track la | De-identification (cross-domain) Dc-identification | Psychiatric evaluation records | Partners Healthcare and Harvard Medical School | 23 | 5 | 3 | 31 | |
List of shared tasks with data source, data size, sub-tasks descriptions, and best-performance score (metrics differ per challenge). The table also contains information about data availability after the challenge, whether the data have been de-identified, and whether they require a DUA to be signed.
| Category | Year | Challenge name | Task description | Data type | Data source | Data size | De-identification / anonymization | DUA | Currently Available? | Best Performance | Measure |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 2015 |
TREC Clinical Decision Support (CDS)
| Patient-centered information retrieval | Medical case narratives | Synthetic, PubMed | 30 topics, 730K articles | no | no | yes | 38.21% | infNDCG | |
|
TREC Precision Medicine
| |||||||||||
| Synthetic | 2017 | > Track 1 | Patient-centered literature article retrieval | Semi-structured cases |
Synthetic, PubMed,
| 30 topics, 27M abstracts, 241K trials | no | no | yes | 63.10% | P@10 |
| 2016 |
CLEF cHcalth
| Information extraction | Nursing handover notes | NICTA synthetic nursing handover notes | 300 notes | no | no | yes | 38.20% | Fl (macro avg.) | |
|
Text Analysis Conference (TAC) Adverse Drug Reaction Extraction from Drug Labels (ADR)
| |||||||||||
| Prescription drug labels | 2017 | > Track 1 | ADR mentions and modifiers extraction | Drug labels |
| 2309 labels | no | no | yes | 82.48% | Fl |
| 2015 |
CLPsych: Depression and PTSD on Twitter
| Binaty classification of depression and PTSD users | Social media | 7.8M tweets | yes | yes | yes | 80.00% | Avg. Precision | ||
|
Social Media Mining (SMM)
| |||||||||||
| 2016 | > Track 1 | ADR classification | Social media | Twiner | 10,882 tweets | no | no | yes | 41.95% | Fl | |
|
Social Media Mining for Health Applications (SMM4HA)
| |||||||||||
| Online social data | 2017 | > Track 1 | ADR classification | Social media | 15,777 tweets | no | no | yes | 43.50% | Fl | |
| 2016 |
CLPsych: Triaging content in online peer-support forums
| Classification of mental health severity in 4 levels | Forum | ReachOut | 65,024 (1,227 annotated) | yes | yes | yes, on request | 42.00% | Fl (macro avg.) | |
| 2017 |
CLPsych: Triaging content in online peer-support forums
| Classification of mental health severity in 4 levels | Forum | ReachOut | 157,963 posts (1,588 annotated) | yes | yes | yes, on request | 46.70% | Fl (macro avg.) | |
| 2017 |
NTCIR-13 MedWeb
| 8-class classification of diseases and symptoms | Multilingual Social media | 2560 tweets | yes | yes | yes, on request | - | |||
|
Analysis of Clinical Text (ACT)
| |||||||||||
| 2015 | > Track 1 | Disorder NER and normalization | Clinical notes | ShARc corpus (MIMIC) | 531 summaries | yes | yes | yes | 75.70% | Fl (strict) | |
| 2016 |
TREC Clinical Decision Support (CDS)
| Patient-centered IR | Nursing admission notes | MIMIC, PubMed | 30 notes, 1.25M abstracts | 40.33% | P@10 | ||||
| Medication and Adverse Drug Events (MADE1.0) | |||||||||||
| 2017 | > Track 1 | Medication, ADE, sign and symptom identification | Clinical notes | UMass Memorial Medical Center | 1092 records | yes | yes | no | - | ||
|
Clinical TempEval
| |||||||||||
| 2015 | > Track 1 | Time expression extraction | Pathology reports | Mayo Clinic | 600 notes | yes | yes | yes, on request | 72.50% | Fl | |
| Clinical data |
Clinical TempEval
| ||||||||||
| 2016 | > Track 1 | Time expression extraction | Pathology reports | Mayo Clinic | 600 notes | yes | yes | yes, on request | 79.50% | Fl | |
|
Clinical TempEval
| |||||||||||
| 2017 | > Track 1 | Time expression extraction (cross-domain) | Pathology reports, Clinical notes | Mayo Clinic | 1216 notes | yes | yes | yes, on request | 57.00% | Fl | |
|
Centers for Excellence in Genomics N-GRID (CEGS-NGRID)
| |||||||||||
| 2016 | > Track la | De-identification (cross-domain) | Psychiatric evaluation records | Partners Healthcare and Harvard Medical School | 1000 records | yes | yes | yes, on request | 79.85% | Fl | |