| Literature DB >> 33198814 |
Martijn G Kersloot1,2, Florentien J P van Putten3, Ameen Abu-Hanna3, Ronald Cornet3, Derk L Arts3,4.
Abstract
BACKGROUND: Free-text descriptions in electronic health records (EHRs) can be of interest for clinical research and care optimization. However, free text cannot be readily interpreted by a computer and, therefore, has limited value. Natural Language Processing (NLP) algorithms can make free text machine-interpretable by attaching ontology concepts to it. However, implementations of NLP algorithms are not evaluated consistently. Therefore, the objective of this study was to review the current methods used for developing and evaluating NLP algorithms that map clinical text fragments onto ontology concepts. To standardize the evaluation of algorithms and reduce heterogeneity between studies, we propose a list of recommendations.Entities:
Keywords: Annotation; Concept mapping; Entity linking; Evaluation studies; Named-entity recognition; Natural language processing; Ontologies; Recommendations for future studies
Mesh:
Year: 2020 PMID: 33198814 PMCID: PMC7670625 DOI: 10.1186/s13326-020-00231-z
Source DB: PubMed Journal: J Biomed Semantics
Fig. 1PRISMA flow diagram
Induced objective tasks with their definition and an example
| Induced NLP task(s) | Description | Example |
|---|---|---|
| Assign ontology concepts to phrases in free text (i.e., entity linking or annotation) | “Systolic blood pressure” can be represented as SNOMED-CT concept | |
| Detect events in free text | “Patient visited the outpatient clinic in January 2020” is an event of type | |
| Detect semantic relationships between concepts in free text | The concept | |
| Transform free text into a single canonical form | “This patient was diagnosed with influenza last year.” becomes “This patient be diagnose with influenza last year.” | |
| Create a short summary of free text and possible restructure the text based on this summary | “Last year, this patient visited the clinic and was diagnosed with diabetes mellitus type 2, and in addition to his diabetes, the patient was also diagnosed with hypertension” becomes “Last year, this patient was diagnosed with diabetes mellitus type 2 and hypertension”. | |
| Assign categories to free text | A report containing the text “This patient is not diagnosed yet” will be assigned to the category | |
| Create a predictive model based on free text | Predict the outcome of the APACHE score based on the (free-text) content in a patient chart. | |
| Identify documents (e.g., reports or patient charts) that match a specific condition based on the contents of the document | Find all patient charts that describe patients with hypertension and a BMI above 30. | |
| Develop new or build upon existing NLP software | A new algorithm was developed to map ontology concepts to free text in clinical reports. | |
| Evaluate the effectiveness of NLP software | The mapping algorithm has an F-score of 0.874. |
1.Also known as Medical Entity Linking and Medical Concept Normalization
Induced objective categories with their definition and associated NLP task(s)
| Induced category | Induced NLP task(s) | Definition |
|---|---|---|
| Concept detection | Perform semi-automated annotation (i.e., with a human in the loop) | |
Concept detection Event detection Relationship detection | Compare extracted structured information to information available in free-text form | |
Concept detection Event detection Relationship detection Text normalization Text summarization | Extract structured information from free text and attach this new information to the source | |
Concept detection Event detection Relationship detection | Extract structured information from free text | |
Classification Prediction Identification | Use structured information to classify free-text reports, predict outcomes, or identify cases | |
Software development Software evaluation | Develop new NLP software or evaluate new or existing NLP software | |
Text normalization Text summarization | Transform free text into a new, more comprehensible form |
Included publications and their first author, year, title, and country
| Author | Year | Country | Challenge | Induced objective | Data origin | Dataset | Data language | Used system | Term. Sys. | In use | Source code | Ref |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Afshar | 2019 | USA | No | Information extraction | Clinical Data Warehouse Data | Own | English | New (+ existing) | UMLS (CPT, HCPCS, ICD-10, ICD10CM / ICD9CM, LOINC, MeSH, SNOMED-CT, RxNorm) | Not listed | No, only links to cTAKES source code | [ |
| Alnazzawi | 2016 | UK | No | Information enrichment | PhenoCHF corpus 1 | Existing | English | Existing | UMLS | Not listed | Not applicable | [ |
| Atutxa | 2018 | Spain | No | Information enrichment | EHR documents | Own | Spanish | New | ICD (SNOMED-CT for normalization) | Not yet, aim to embed it in human-supervised loop | Not listed | [ |
| Barrett | 2013 | USA | No | Information extraction | Palliative care consult letters | Own | English | New | SNOMED CT | Not listed | No, but planned | [ |
| Becker | 2016 | Germany | No | Information extraction | ShARe/CLEF corpus (2013) 2 | Existing | German | Existing | SNOMED CT (English), UMLS (German) | Not yet, still under development | Not applicable | [ |
| Becker | 2019 | Germany | No | Information extraction | Clinical notes of patients with known colorectal cancer | Own | German | New (+ existing) | UMLS | Yes, led to improved quality of care for colorectal patients | Not listed | [ |
| Bejan | 2015 | USA | No | Information extraction | Discharge summaries and i2b2/VA challenge dataset (2010) 3 | Own + Existing | English | Existing | UMLS | No | Not applicable | [ |
| Castro | 2010 | Spain | No | Information extraction | Clinical notes with ‘most relevant information’ | Own | Spanish | Existing | SNOMED CT | Not listed | Not applicable | [ |
| Catling | 2018 | UK | No | Software development and evaluation | MIMIC-III dataset 4 | Existing | English | New | ICD-9-CM | Not listed | Not listed | [ |
| Chapman | 2004 | USA | No | Information extraction | Emergency department reports | Own | English | Existing | UMLS | Not listed | Not applicable | [ |
| Chen | 2016 | USA | No | Information enrichment | Discharge summaries and progress notes | Own | English | New (+ existing) | UMLS | Not listed | Not listed | [ |
| Chiaramello | 2016 | Italy | No | Information extraction | Clinical notes (cardiology, diabetology, hepatology, nephrology, and oncology) | Own | Italian | Existing | UMLS | Not listed | Not applicable | [ |
| Chodey | 2016 | USA | SemEval (2014) | Information extraction | ICU Data: Discharge summaries, ECG, echo, and radiology | Existing | English | New (+ existing) | UMLS | Not listed | Not listed | [ |
| Chung | 2005 | USA | No | Information extraction | Echocardiogram reports | Own | English | New (+ existing) | UMLS | Not yet, it will be used to populate a registry | Not listed | [ |
| Combi | 2018 | Italy | No | Information extraction | VigiSegn (adverse drug reactions) reports | Own | Italian + English | New | MedDRA | Yes, implemented in VigiFarmaco | Pseudocode | [ |
| De Bruijn | 2011 | Canada | i2b2/VA (2010) | Information extraction | Hospital discharge summaries and progress reports | Existing | English | New (+ existing) | UMLS | Not listed | Not listed | [ |
| Deisseroth | 2019 | USA | No | Information extraction | Six sets of real patient data from four different medical centers. | Own | English | New | HPO | Not listed | Yes | [ |
| Demner-Fushman | 2017 | USA | No | Software development and evaluation | BioScope 5, NCBI disease corpus 6, i2b2/VA challenge corpus (2010) 3, ShARe corpus 7, LHC test collection (biological/clinical journal abstracts) | Existing | English | New (+ existing) | UMLS | Yes, used in other papers identified in literature search | Yes | [ |
| Divita | 2014 | USA | Parts: i2b2/VA (2010) | Software development and evaluation | Randomly selected clinical records from the most frequent document types | Own | English | New | UMLS (level 0 + 9) | Yes, used by VA Informatics and Computing Infrastructure | Yes | [ |
| Duarte | 2018 | Portugal | No | Information enrichment | Death certificates, clinical bulletins, and autopsy reports | Own | Portuguese | New | ICD-10 | Yes, used by Portugese Ministry of Health for near real-time death cause surveillance | Not listed | [ |
| Falis | 2019 | UK | No | Information extraction | MIMIC-III dataset 4 | Existing | English | New | ICD-9 | Not listed | Not listed | [ |
| Ferrão | 2013 | Portugal | No | Information enrichment | Inpatient adult episodes from the EHR | Own | Portuguese | New | ICD-9-CM | Not listed | Not listed | [ |
| Gerbier | 2011 | France | No | Information extraction | Computerized emergency department medical records | Own | French | New | ICD-10, CCAM, SNOMED CT, ATC, MeSH, ICPC-2, DCR | Not yet, will be integrated into a CDSS | Not listed | [ |
| Goicoechea Salazar | 2013 | Spain | No | Information enrichment | Diagnostic text from patient records | Own | Spanish | New | ICD-9-CM | Not listed | Not listed | [ |
| Hamid | 2013 | USA | No | Classification | Notes of Iraq and Afghanistan veterans from the VA national clinical database | Own | English | Existing | UMLS | Not listed | Not applicable | [ |
| Hassanzadeh | 2016 | Australia | No | Information extraction | ShARe/CLEF corpus (2013) 2 | Existing | English | Existing | UMLS, SNOMED CT | Not applicable | Not applicable | [ |
| Helwe | 2017 | Lebanon | No | Computer-assisted coding | MIMIC-III dataset | Existing | English | New | UMLS, ICD | Not listed | Not listed | [ |
| Hersh | 2001 | USA | No | Information enrichment | Radiology image reports | Own | English | Existing | UMLS | No, still in development/testing | Pseudocode | [ |
| Hoogendoorn | 2015 | Netherlands | No | Prediction | Consultation notes of patients in a primary care setting | Own | Dutch | New | SNOMED-CT, UMLS, ICPC | Not listed | Not listed | [ |
| Jindal | 2013 | USA | i2b2 (2012) | Information extraction | i2b2 challenge corpus (2012) 8 | Existing | English | New (+ existing) | UMLS, SNOMED CT, MeSH | Not listed | Not listed | [ |
| Kang | 2009 | Korea | No | Information extraction | Discharge summaries | Own | Korean | New | KOMET, UMLS | Not listed | Not listed | [ |
| Kersloot | 2019 | Netherlands | No | Information extraction | (Non-small cell) Lung cancer charts | Own | English | New (+ existing) | SNOMED CT | Not listed | Yes | [ |
| König | 2019 | Germany | No | Software development and evaluation | Discharge letters from BASE-II study | Own | German | New (+ existing) | Wingert-Nomenclature | No, still has to prove its value | Not listed | [ |
| Li | 2015 | USA | No | Information comparison | Clinical notes and discharge prescription lists | Own | English | New (+ existing) | UMLS, SNOMED CT, RxNorm | Not yet, plans to move to production | Pseudocode | [ |
| Li | 2019 | USA | No | Information extraction | EHR notes | Own | English | New (+ existing) | UMLS, SNOMED CT, MedDRA | Not listed | Not listed | [ |
| Lingren | 2016 | USA | No | Classification | Structured and unstructured data from two EHR databases | Own | English | New (+ existing) | UMLS, ICD-9, RxNorm | Not listed | Not listed | [ |
| Liu | 2019 | USA | No | Information extraction | Clinical notes from different institutions + PubMed Case report abstracts | Own + Existing | English | Existing | HPO | Not listed | Not applicable | [ |
| Lowe | 2009 | USA | No | Information extraction | Single-specimen pathology reports | Own | English | Existing | UMLS, SNOMED CT | Not listed | Not applicable | [ |
| Luo | 2014 | USA | No | Information extraction | Pathology reports | Own | English | New (+ existing) | UMLS, SNOMED CT | Yes, currently working on project in multiple hospitals | Not listed | [ |
| Meystre | 2006 | USA | No | Information enrichment | Clinical documents form adult inpatients in a cardiovascular unit | Own | English | New (+ existing) | UMLS (level 0), SNOMED CT | Not yet, testing in practice | Not listed | [ |
| Meystre | 2010 | USA | i2b2 (2009) | Information extraction | i2b2 challenge dataset (2009) 9 | Existing | English | New | UMLS | Not yet, possible integration in research infrastructure | Not listed | [ |
| Minard | 2011 | France | i2b2/VA (2010) | Information extraction | i2b2/VA challenge corpus (2010) 3 | Existing | English | New (+ existing) | UMLS | Not listed | Not listed | [ |
| Mishra | 2019 | USA | No | Information extraction | Clinical notes from NIH Clinical Center data warehouse | Own | English | Existing | UMLS, HPO | Not listed | Not applicable | [ |
| Nguyen | 2018 | Australia | No | Computer-assisted coding | Hospital progress notes | Own | English | New (+ existing) | SNOMED CT, ICD-10-AM | Not listed | Not listed | [ |
| Oellrich | 2015 | UK | No | Information extraction | PubMed abstracts, clinical trial information, i2b2/VA challenge corpus (2010) 3, SHARE/CLEF (2013) 2 | Existing | English | Existing | UMLS | Not listed | Not applicable | [ |
| Patrick | 2011 | Australia | i2b2/VA (2010) | Information extraction | i2b2/VA challenge corpus (2010) 3 | Existing | English | New | UMLS, SNOMED CT | Not listed | Not listed | [ |
| Pérez | 2018 | Spain | No | Text processing | Spontaneous DTs randomly selected entries | Own | Spanish | New | ICD | Not listed | Not listed | [ |
| Reátegui | 2018 | Canada | No | Information extraction | i2b2 challenge corpus (2008) 10 | Existing | English | New (+ existing) | UMLS, SNOMED CT, RxNorm | Not listed | Not listed | [ |
| Roberts | 2011 | USA | i2b2/VA (2010) | Information extraction | i2b2/VA challenge corpus (2010) 3 | Existing | English | New (+ existing) | UMLS, ICD-9 | Not listed | Not listed | [ |
| Rousseau | 2019 | USA | No | Information comparison | ED encounters for patients with headaches who received head CT | Own | English | Existing | UMLS: SNOMED CT, RadLex | Not listed | Not applicable | [ |
| Savova | 2010 | USA | i2b2 (2006, 2008) | Information extraction | Subset of clinical notes from the EMR | Own | English | New (+ existing) | UMLS, SNOMED CT, RxNorm | Yes, used in other papers identified in literature search | Yes | [ |
| Shivade | 2015 | USA | i2b2/UTHealth (2014) | Classification | i2b2 challenge corpus (2014) 11 | Existing | English | Existing | UMLS | Not listed | Not applicable | [ |
| Shoenbill | 2019 | USA | No | Information extraction | EHR notes from hypertension patients | Own | English | Existing | UMLS, SNOMED CT | Not listed | Not applicable | [ |
| Sohn | 2014 | USA | No | Information extraction | Clinical notes with medication mentions | Own | English | New | RxNorm | Not listed | Yes | [ |
| Solti | 2008 | USA | No | Information enrichment | Cardiology ambulatory progress notes | Own | English | Existing | UMLS | Not listed | Not applicable | [ |
| Soriano | 2019 | Spain | No | Information extraction | clinical emergency discharge reports | Own | Spanish | New | SNOMED CT | Not yet | Yes | [ |
| Soysal | 2018 | USA | Parts: i2b2 (2009 + 2010), ShARe/CLEF (2013), Sem-EVAL (2014) | Software development and evaluation | Discharge summaries from the i2b2/VA challenge corpus (2010) 3, outpatient clinic visit notes, mock clinical documents | Own + Existing | English | New | UMLS | Yes, used by various institutions and industrial entities | Yes | [ |
| Spasić | 2015 | UK | No | Information extraction | MRI reports of patients | Own | English | New (+ existing) | TRAK, UMLS, MEDCIN, RadLex | Not listed | Yes | [ |
| Strauss | 2013 | USA | No | Information extraction | Pathology reports of breast and prostate cancer patients | Own | English | New | SNOMED CT | Not listed | Yes | [ |
| Sung | 2018 | Taiwan | No | Information extraction | Cases of adult patients with AIS | Own | English | Existing | UMLS | Not listed | Not applicable | [ |
| Tchechmedjiev | 2018 | France | No | Information extraction | Quaero (French MEDLINE abstract titles + EMEA drug labels) + CépiDC (ICD-10 coding of death certificates) | Existing | French | New (+ existing) | UMLS terminologies (ICD-10) | Yes, available in SIFR BioPortal | Yes | [ |
| Ternois | 2018 | France | No | Classification | Endoscopy reports written between 2015 and 2016 | Own | French | New | CCAM | Not listed | Not listed | [ |
| Travers | 2004 | USA | No | Information extraction | Chief complaint text entries for all emergency department visits | Own | English | New | UMLS | Not listed | Not listed | [ |
| Tulkens | 2019 | Belgium | No | Information extraction | i2b2/VA challenge corpus (2010) 3 | Existing | English | New (+ existing) | UMLS | Not listed | Yes | [ |
| Usui | 2018 | Japan | No | Prediction | Electronic medication history data from pharmacy | Own | Japanese | New | ICD-10 | Not yet, expect to use it | Not listed | [ |
| Valtchinov | 2019 | USA | No | Classification | Radiology reports, emergency department notes + other clinical reports | Own | English | Existing | SNOMED CT, RadLex | Not listed | Not applicable | [ |
| Wadia | 2018 | USA | No | Classification | Chest CT reports | Own | English | Existing | SNOMED CT, UMLS | Not listed | Not applicable | [ |
| Walker | 2019 | USA | No | Information extraction | Treatment sites from EMR | Own | English | New | UMLS | Not listed | Not listed | [ |
| Xie | 2019 | China | No | Information extraction | MIMIC-III dataset 4 | Existing | English | New | ICD-9-CM, ICD-10 | Not listed | Not listed | [ |
| Xu | 2011 | USA | No | Classification | CRC patient cases from the Synthetic Derivative database | Own | English | Existing | UMLS | No, still under development | Not applicable | [ |
| Yadav | 2013 | USA | No | Prediction | Emergency department CT imaging reports | Own | English | Existing | UMLS | Not listed | Yes, command line command | [ |
| Yao | 2019 | USA | No | Prediction | i2b2 challenge corpus (2008) 10 | Existing | English | New (+ existing) | UMLS | Not listed | Part (Sorl) | [ |
| Zeng | 2018 | USA | No | Classification | Progress notes and breast cancer surgical pathology reports | Own | English | New (+ existing) | UMLS | Not listed | Not listed | [ |
| Zhang | 2013 | USA | No | Information extraction | i2b2/VA challenge corpus (2010) 3 and GENIA corpus (MEDLINE abstracts) | Existing | English | New | UMLS | Not listed | Not listed | [ |
| Zhou | 2006 | USA | No | Information extraction | Records of patients with breast complaints | Own | English | New | UMLS | No, still under development | Not listed | [ |
| Zhou | 2011 | USA | No | Software development and evaluation | COPD and CAD patients | Own | English | New | SNOMED CT, RxNorm, UMLS, PPL, MDD, HL7 value sets | Yes, described in other paper (103]) | Not listed | [ |
| Zhou | 2014 | USA | No | Information extraction | Admission notes and discharge summaries | Own | English | Existing | SNOMED CT, HL7 RoleCodes | Not listed | Not applicable | [ |
1. PhenoCHF corpus: narrative reports from electronic health records (EHRs) and literature articles
2. ShARe/CLEF corpus (2013): narrative clinical reports
3. i2b2/VA challenge dataset (2010): discharge summaries and progress reports
4. MIMIC-III dataset: demographics, vital sign measurements, laboratory test results, procedures, medications, caregiver notes, imaging reports, and mortality
5. BioScope corpus: medical free texts, biological full papers and biological scientific abstracts
6. NCBI disease corpus: PubMed abstracts
7. ShARe corpus: deidentified clinical free-text notes from the MIMIC II database
8. i2b2 challenge corpus (2012): discharge summaries
9. i2b2 challenge dataset (2009): de-identified hospital discharge summaries
10. i2b2 challenge corpus (2008): discharge summaries of overweight and diabetic patients
11. i2b2 challenge corpus (2014): longitudinally ordered clinical notes from three cohorts of diabetic patients
Included publications and their evaluation methodologies
| Author | Year | Ref. std. | Validation | External | Generalizability | Ref |
|---|---|---|---|---|---|---|
| Afshar | 2019 | Existing EHR data | Hold-out validation (train, test, development) | No | No, validation is needed | [ |
| Alnazzawi | 2016 | Existing annotated corpus | External | ShARe/CLEF, NCBI disease, Heart failure and pulmonary embolism corpora | Yes, achieves competitive performance on other corpora | [ |
| Atutxa | 2018 | Manual retrospective review | Hold-out validation (train, test, development) | No | Yes, easily portable to other languages | [ |
| Barrett | 2013 | Manual annotations | 10-fold cross validation | Multiple datasets (different provider) | Yes, expect that it is generalizable | [ |
| Becker | 2016 | Existing annotated corpus | Not used | No | Not listed | [ |
| Becker | 2019 | Manual annotations | Hold-out validation (train, test, development) | No | Not listed | [ |
| Bejan | 2015 | Manual annotations | External | i2b2 data (2010) | Yes, good performance on the i2b2 dataset, even though not optimized on it | [ |
| Castro | 2010 | Manual annotations | Not used | No | Not listed | [ |
| Catling | 2018 | Existing annotated corpus | Hold-out validation (train, test, development) | No | Not listed | [ |
| Chapman | 2004 | Manual annotations | Not used | No | Yes, generalizable to other domains within and outside of bio surveillance | [ |
| Chen | 2016 | Manual annotations | 10-fold cross validation | No | Not listed | [ |
| Chiaramello | 2016 | Manual annotations | Not used | No | Not listed | [ |
| Chodey | 2016 | Existing annotated corpus | Hold-out validation (train, test) | No | Not listed | [ |
| Chung | 2005 | Manual annotations | Hold-out validation (train, test) | Reports from a second hospital | Not listed | [ |
| Combi | 2018 | Manual annotations | Not used | No | Not listed | [ |
| deBruijn | 2011 | Existing annotated corpus | 15-fold cross validation | No | Not listed | [ |
| Deisseroth | 2019 | Manual annotations | Hold-out validation (train, test) | Data from a second hospital | Yes, it can be immediately incorporated into clinical practice | [ |
| Demner-Fushman | 2017 | Existing annotated corpus | External | Multiple datasets | Not listed | [ |
| Divita | 2014 | Manual annotations | Not used | No | Not listed | [ |
| Duarte | 2018 | Manual annotations | Hold-out validation (train, test) | Second dataset | Not listed | [ |
| Falis | 2019 | Existing annotated corpus | Hold-out validation (train, test, development) | No | Yes, method is not specific to an ontology, and could be used for a graph of any formation | [ |
| Ferrão | 2013 | Existing EHR data | Hold-out validation (train, test) | No | Not listed | [ |
| Gerbier | 2011 | Manual annotations | Hold-out validation (train, test) | No | Yes, it could also serve other types of clinical decision support systems | [ |
| Goicoechea Salazar | 2013 | Manual annotations | Hold-out validation (train, test) | No | Not listed | [ |
| Hamid | 2013 | Manual annotations | 10-fold cross validation | No | Possible, the classifier may be applicable in academic hospital samples | [ |
| Hassanzadeh | 2016 | Existing annotated corpus | Hold-out validation (train, test) | No | Not applicable | [ |
| Helwe | 2017 | Existing annotated corpus | Hold-out validation (train, test, development) | No | Not listed | [ |
| Hersh | 2001 | Manual annotations | Hold-out validation (train, test) | No | Not listed | [ |
| Hoogendoorn | 2015 | Existing EHR data | 5-fold cross validation | No | Not listed | [ |
| Jindal | 2013 | Existing annotated corpus | Hold-out validation (train, test) | No | Yes, broad applicability | [ |
| Kang | 2009 | Manual annotations | Hold-out validation (train, test) | No | Yes, extensible to other languages | [ |
| Kersloot | 2019 | Manual annotations | Hold-out validation (development, test) | No | Possible, but external validation is needed | [ |
| König | 2019 | Existing EHR data | Not used | No | Still to be tested | [ |
| Li | 2015 | Manual annotations | 10-fold cross validation | No | Not listed | [ |
| Li | 2019 | Existing annotated corpus | Hold-out validation (train, test, development) | No | Not listed | [ |
| Lingren | 2016 | Manual annotations | Hold-out validation (train, test, development) | No | Not listed | [ |
| Liu | 2019 | Manual annotations | Not used | No (but multiple datasets / non-trained) | No, limited because of NYP/CUIMC and Mayo notes. | [ |
| Lowe | 2009 | Manual retrospective review | Hold-out validation (train, test) | No | Yes, has the potential to index other classes of clinical documents | [ |
| Luo | 2014 | Existing EHR data | 10-fold cross validation | No | No, challenging, not currently working on it | [ |
| Meystre | 2006 | Manual retrospective review | Not used | No | Not listed | [ |
| Meystre | 2010 | Existing annotated corpus | Hold-out validation (train, test) | No | Not listed | [ |
| Minard | 2011 | Existing annotated corpus | Hold-out validation (train, test, development) | No | Not listed | [ |
| Mishra | 2019 | Manual annotations | Not used | No | Not listed | [ |
| Nguyen | 2018 | Existing EHR data | Not listed | No | Not listed | [ |
| Oellrich | 2015 | Existing annotated corpus | External | Multiple datasets | Not listed | [ |
| Patrick | 2011 | Existing annotated corpus | 10-fold cross validation | No | Yes, adaptable to different requirements in clinical information extraction and classification by choosing relevant feature sets | [ |
| Pérez | 2018 | Existing annotated corpus | Hold-out validation (train, test, development) | No | Yes, extensible to different hospital-sections and hospitals | [ |
| Reátegui | 2018 | Existing annotated corpus | Not used | No | Not listed | [ |
| Roberts | 2011 | Existing annotated corpus | Hold-out validation (train, test) | No | Not listed | [ |
| Rousseau | 2019 | Manual annotations | Not used | No | Not listed | [ |
| Savova | 2010 | Manual annotations | 10-fold cross validation | No | Yes, implemented in several applications | [ |
| Shivade | 2015 | Manual annotations | Hold-out validation (train, test) | No | Not listed | [ |
| Shoenbill | 2019 | Manual annotations | Hold-out validation (train, test) | No | Yes, can allow further evaluation and improvement in care delivery models and treatment approaches to multiple chronic illnesses | [ |
| Sohn | 2014 | Manual annotations | Hold-out validation (train, test, development) | No | Yes, with adaptions: create flexible mechanism for adaptation process | [ |
| Solti | 2008 | Manual annotations | Hold-out validation (train, test) | No | Not listed | [ |
| Soriano | 2019 | Manual annotations | Not listed | No | Not listed | [ |
| Soysal | 2018 | Existing annotated corpus | Hold-out validation (train, test) | No | Yes, can be used to quickly develop customized clinical information extraction pipelines | [ |
| Spasić | 2015 | Manual annotations | Hold-out validation (train, test) | No | Not listed | [ |
| Strauss | 2013 | Manual annotations | Not used | No | Yes, can be shared between institutions and used to support clinical + epidemiological research | [ |
| Sung | 2018 | Manual annotations | Not listed | No | Not listed | [ |
| Tchechmedjiev | 2018 | Existing annotated corpus | Hold-out validation (train, test, development) | No | Yes, but not universally | [ |
| Ternois | 2018 | Existing EHR data | 5-fold cross validation + Hold-out validation (train, test) | No | Not listed | [ |
| Travers | 2004 | Manual retrospective review | Not used | No | Not listed | [ |
| Tulkens | 2019 | Existing annotated corpus | Hold-out validation (train, test, development) | No | Not listed | [ |
| Usui | 2018 | Manual annotations | Not used | No | Not listed | [ |
| Valtchinov | 2019 | Manual annotations | Not used | No | No | [ |
| Wadia | 2018 | Manual annotations | Not used | No | Not listed | [ |
| Walker | 2019 | Manual retrospective review | Hold-out validation (development, test) | No | Yes, it can be incorporated in institutional data warehouse | [ |
| Xie | 2019 | Existing annotated corpus | Hold-out validation (train, test, development) | No | Not listed | [ |
| Xu | 2011 | Manual annotations | Hold-out validation (train, test) | No | Yes, generable approach to combine information from heterogeneous data sources in EHRs | [ |
| Yadav | 2013 | Manual annotations | Not used | Yes, should be broadly applicate to outcomes of clinical interest | [ | |
| Yao | 2019 | Existing annotated corpus | Hold-out validation (train, test) | No | Not listed | [ |
| Zeng | 2018 | Manual annotations | 5-fold cross validation + Hold-out validation (train, test) | No | Yes, potential to be replicated | [ |
| Zhang | 2013 | Existing annotated corpus | External | Two different sets with same settings | Yes, can be adapted to different semantic categories and text genres | [ |
| Zhou | 2006 | Manual annotations | 5-fold cross validation | No | Not listed | [ |
| Zhou | 2011 | Manual retrospective review | Hold-out validation (train, test) | No | Not listed | [ |
| Zhou | 2014 | Manual annotations | Not used | No | Not listed | [ |
a As reported by authors
Characteristics of the included studies
| Description | n (%) | References |
|---|---|---|
| Information extraction | 45 (58%) | [ |
| Information enrichment | 9 (12%) | [ |
| Classification | 8 (10%) | [ |
| Software development and evaluation | 6 (7.8%) | [ |
| Prediction | 4 (5.2%) | [ |
| Information comparison | 2 (2.6%) | [ |
| Computer-assisted coding | 2 (2.6%) | [ |
| Text processing | 1 (1.3%) | [ |
i2b2 (Informatics for Integrating Biology and the Bedside) | 10 (13%) | [ |
| Entire system | 8 (10%) | [ |
| Parts of the system | 2 (2.6%) | [ |
| SemEval (Semantic Evaluation) | 2 (2.6%) | [ |
| Entire system | 1 (1.3%) | [ |
| Parts of the system | 1 (1.3%) | [ |
ShARe/CLEF (Shared Annotated Resources/Conference and Labs of the Evaluation Forum) | 1 (1.3%) | [ |
| Parts of the system | 1 (1.3%) | [ |
| English | 60 (78%) | [ |
| Spanish | 5 (6.5%) | [ |
| French | 3 (3.9%) | [ |
| German | 3 (3.9%) | [ |
| Italian | 2 (2.6%) | [ |
| Portuguese | 2 (2.6%) | [ |
| Dutch | 1 (1.3%) | [ |
| Japanese | 1 (1.3%) | [ |
| Korean | 1 (1.3%) | [ |
| Data present in institute | 55 (71%) | [ |
| Existing dataset | 25 (33%) | [ |
| Included reference to dataset | 21 (27%) | [ |
| Trained | 47 (61%) | [ |
| Not listed | 3 (3.9%) | [ |
| Use of development set | 16 (21%) | [ |
| Not listed | 4 (5.2%) | [ |
| New NLP system or algorithm | 29 (38%) | [ |
| New NLP system or algorithm with existing components | 25 (33%) | [ |
| Existing NLP system or algorithm | 23 (30%) | [ |
| Plans to implement / still under development and testing | 12 (16%) | [ |
| Implemented in practice | 10 (13%) | [ |
| Published algorithm or source code | 15 (20%) | [ |
| Pseudocode in manuscript | 3 (3.9%) | [ |
| Planning to publish algorithm or source code | 1 (1.3%) | [ |
| Not applicable, used an existing system | 20 (26%) | [ |
Evaluation methods of the included studies
| Description | n (%) | References |
|---|---|---|
| Manual annotations | 40 (52%) | [ |
| Existing annotated corpus | 24 (31%) | [ |
| Existing EHR data | 7 (9.1%) | [ |
| Manual retrospective review | 6 (7.8%) | [ |
| Hold-out validation | 40 (52%) | [ |
| Cross-validation | 12 (16%) | [ |
| External validation | 9 (12%) | [ |
| Solely external validation | 5 (6.5%) | [ |
| In addition to another type of validation | 4 (5.2%) | [ |
| Not performed or not listed | 22 (29%) | [ |
| Claimed | 23 (30%) | [ |
| Externally validated | 5 (6.5%) | [ |
| Compared to other existing algorithms or models | 24 (31%) | [ |
| Tested difference in outcomes for statistical significance | 4 (5.2%) | [ |
Performance measures used in the included studies
| Description | Formula | n (%) | References |
|---|---|---|---|
| Confusion Matrix | Lists the True Positives (TP), True Negatives (TN), False Positives (FP), False Negatives (FN), and the Total (n) amount in a 2 × 2 contingency Table. TP: Text annotated with ontology concept when ontology concept is present in reference standard TN: Text not annotated with ontology concept when ontology concept is absent in reference standard FP: Text annotated with ontology concept when ontology concept is absent in reference standard FN: Text not annotated with ontology concept when ontology concept is present in reference standard | 12 (16%) | [ |
| Recall | 68 (88%) | [ | |
| Precision | 66 (86%) | [ | |
| F-score | 57 (74%) | [ | |
| Accuracy | 11 (14%) | [ | |
| Specificity | 6 (7.8%) | [ | |
| AUC | Not applicable | 5 (6.5%) | [ |
| Kappa | 3 (3.9%) | [ | |
| Processing time | Not applicable | 3 (3.9%) | [ |
| Negative Predictive Value | 3 (3.9%) | [ | |
| False Positive Rate | 1 (1.3%) | [ | |
| False Negative Rate | 1 (1.3%) | [ | |
| Information entropy | 1 (1.3%) | [ | |
| Mean Reciprocal Rank | 1 (1.3%) | [ | |
| Initial annotator agreement | Not applicable | 1 (1.3%) | [ |
| Match/no match (%) | Not applicable | 1 (1.3%) | [ |
| Overgeneration | 1 (1.3%) | [ | |
| Undergeneration | 1 (1.3%) | [ | |
| Error | 1 (1.3%) | [ | |
| Fallout | 1 (1.3%) | [ | |
| Mean Standard Error | 1 (1.3%) | [ | |
Recommendation regarding the use of systems and algorithms
1. Describe the system or algorithm that is used or the system that is developed for the specific NLP task. 1. When an existing NLP system or algorithm is used, describe how it is set up, how it is implemented in practice, and if and how the implementation differs from the original implementation. 2. When a new system is developed, describe the components and features used in the system, and preferably include a flow chart that explains how these elements work together. 2. Include the source code of the developed algorithm as supplementary material to the publication or upload the source code to a repository such as GitHub. 3. Specify which ontologies are used in the encoding task, including the version of the ontology. 1. If a new ontology is developed for the encoding task, report on the development and content of the ontology and rationale for the development of a new ontology instead of the use of an existing one. The MIRO guidelines could be used to structure the report [ |
Recommendation regarding the use of data
1. To ensure that new algorithms can be compared against your system, aim to publish the used training, development, and validation data in a data repository. 1. In case the data cannot be published, determine if the data can be accessed on request or can be used in a federated learning approach (i.e., a learning process in which the data owners collaboratively train a model in which process any data owner does not expose the data to others [ 2. In case a reference standard is used, include information about the origin of the data (external dataset, subset of the dataset) and the characteristics of the data in the dataset. If possible, reference the dataset using a DOI or URL. 3. If an external dataset is used, give a short description of the data present in the dataset and reference the source of the dataset. |
Recommendation regarding the evaluation and validation of Natural Language Processing algorithms
1. Perform an evaluation using generic (i.e., precision, recall, and F-score) performance measures and appropriate aspects of evaluation including discrimination, calibration, and preferably accuracies of predictions (e.g., AUC, calibration graphs, and the Brier score). 1. Include a motivation for the choice of measures, with references to existing literature where appropriate (e.g., Sokolova and Lapalme’s analysis of performance measures [ 2. Perform an error analysis and discuss the errors in the Discussion section of the paper. Include possible changes to the algorithm that could improve its performance for these specific errors. 3. When using a non-probabilistic NLP method: determine the cut-off value (a priori) for a ‘good’ test result before evaluating the algorithm. Elaborate why this cut-off value is chosen. |
Recommendation regarding the presentation of results
1. Report the outcomes of the evaluation in a clear manner, preferably in a table accompanied by a textual description of the outcomes. 1. Aim to include a confusion matrix in the reporting of the outcomes. 2. Use figures if they contribute to the making the results more readable and understandable for the reader. If a figure is used, make sure that the data is also available in the text or in a table. |
Recommendation regarding the generalizability of results
1. Compare the results of the evaluated algorithm with other algorithms by using the same dataset as reported in the publication of the other algorithm or by processing the same dataset with another algorithm available through the literature. Report the outcomes of both experiments and test for statistical significance. 2. Describe in what setting the research is performed. Include if the research is part of a challenge (e.g., i2b2 challenge), or that the research is carried out in a specific institute or department. 3. Before claiming generalizability, perform external validation by testing the algorithm on a different, external dataset from other research projects or other publicly available datasets. Aim to use a dataset with a different case mix, different individuals, and different types of text. 4. Determine and describe if there are potential sources of bias in data selection, data use by the NLP algorithm or system, and evaluation. 5. When claiming generalizability, clearly describe the conditions under which the algorithm can be used in a different setting. Describe for which population, domain, and type and language of data the algorithm can be used. |