| Literature DB >> 35135538 |
Thibaut Pressat-Laffouilhère1,2, Pierre Balayé1, Badisse Dahamna1,3, Romain Lelong1,3, Kévin Billey1,2, Stéfan J Darmoni1,3, Julien Grosjean4,5.
Abstract
BACKGROUND: Unstructured data from electronic health records represent a wealth of information. Doc'EDS is a pre-screening tool based on textual and semantic analysis. The Doc'EDS system provides a graphic user interface to search documents in French. The aim of this study was to present the Doc'EDS tool and to provide a formal evaluation of its semantic features.Entities:
Keywords: Clinical data warehouse; Cohort identification; Electronic health record; Information retrieval; Semantics
Mesh:
Year: 2022 PMID: 35135538 PMCID: PMC8822768 DOI: 10.1186/s12911-022-01762-4
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 3.298
Fig. 1Document processing workflow from extraction to indexing
Fig. 2Screenshot of Doc'EDS main page: (1) the query form is on the left side. In addition to keywords, different fields can be used e.g. document date, type, patient age and sex, etc. (2) the number of patients and documents retrieved are displayed, (3) A visualization screen allows users to consult documents (in order to refine queries or collect specific data). In this example, some portions of text (dates) have been blanked to preserve patientanonymity
Fig. 3Example of analysis panel function for the query “psychomotor regression” AND epilepsy: (1) sex distribution, (2) descriptive statistics concerning age, (3) age distribution by sex
Fig. 4Capture of the semantic assistant that helps the user to enhance the query. 1) Type words to search in HeTOP and select a concept, 2) Select part of speech including negations, hypothetical parts or family history, 3) Select segment(s) to query, 4) Choose relevant terms (which are synonyms or hyponyms extracted from HeTOP) to add to a query, 5) Check the number of documents found in real time for each proposed term and 6) If needed, repeat the operation by adding a new concept (with OR, AND or NOT operators)
Fig. 5Example of Doc’EDS automatic analysis. The system highlights when it detects special contents (negations, hypotheses, family history or segments); 1/Family history (“brothers deceased after lung cancer”), 2/Negations (“no material assistance for locomotion”), 3/Hypothesis (“suspect lesion”) and 4/example of a false negative because the system failed to identify a negation (“no dysphagia reported”)
Example of the data frame for formal evaluation based on extraction of manual clinical concepts (TN: true negative, TP: true positive, FP: false positive, FN: false negative, Y: yes, N: no)
| Concepts | Negation | Hypothesis | Family Medical history | Well segmented |
|---|---|---|---|---|
| Left heart failure | TN | TN | FP | Y |
| Atrial fibrillation | TN | TN | TN | Y |
| Atrial fibrillation anticoagulant | TN | FN | TN | Y |
| Known allergy | TP | TN | TN | Y |
| « Sudden» dyspnea | TN | TN | TN | N |
| Lung cancer | TN | TN | TP | Y |
| hematemesis | TN | TP | TN | Y |
| narcolepsy | TP | TP | TN | Y |
| asthma | TP | TN | TP | Y |
| Prealbumin: 0.16 | FP | TN | TN | Y |
Concordance results between the two readers (TN: true negative, TP: true positive, FP: false positive, FN: false negative)
| FN | FP | TN | TP | Kappa (CI 95%) | |
|---|---|---|---|---|---|
| FN | 14 | 0 | 7 | 0 | 0.88 [0.84; 0.91] |
| FP | 0 | 14 | 6 | 2 | |
| TN | 14 | 5 | 1711 | 4 | |
| TP | 2 | 7 | 10 | 204 | |
| FN | 14 | 0 | 23 | 2 | 0.70 [0.62; 0.77] |
| FP | 0 | 15 | 0 | 3 | |
| TN | 19 | 0 | 1886 | 0 | |
| TP | 0 | 1 | 4 | 33 | |
Evaluation of (a) negation tags, and (b) hypothesis tags, resident versus Doc’EDS
| Negative concepts | Reader + | Reader − | |
|---|---|---|---|
| Doc’EDS + | TP = 551 | FP = 60 | Precision = 0.90 [0.87; 0.92] |
| Doc’EDS − | FN = 68 | TN = 4.598 | NPV = 0.98 |
| Recall = 0.89 [0.86; 0.91] | Specificity = 0.98 | F = 0.89 |