| Literature DB >> 35101128 |
Florent Baty1, Jemima Hegermann1, Tiziana Locatelli1, Claudio Rüegg2, Christian Gysin1, Frank Rassouli1, Martin Brutsche3.
Abstract
BACKGROUND: Text mining can be applied to automate knowledge extraction from unstructured data included in medical reports and generate quality indicators applicable for medical documentation. The primary objective of this study was to apply text mining methodology for the analysis of polysomnographic medical reports in order to quantify sources of variation - here the diagnostic precision vs. the inter-rater variability - in the work-up of sleep-disordered breathing. The secondary objective was to assess the impact of a text block standardization on the diagnostic precision of polysomnography reports in an independent test set.Entities:
Keywords: Electronic medical reports; Polysomnography; Text mining
Mesh:
Year: 2022 PMID: 35101128 PMCID: PMC8805265 DOI: 10.1186/s13326-022-00259-3
Source DB: PubMed Journal: J Biomed Semantics
Fig. 1Correspondence analysis (CA) of the term-document matrix. The left panel (a) displays the term scores on the first 2 CA axes. The right panel (b) shows a ‘spider’ diagram connecting each level of the explanatory variables (apnea severity, physicians, type of apnea and technicians) to its group centroid on both main axes of the canonical correspondence analysis
Fig. 2Variation partitioning before and after text block standardization of the polysomnography reports. The left panel (a) shows the effect of text block standardization on the percentage of variance explained by the raters and disease characteristics. The right panel (b) provides details on the fractions of variation between the different explanatory variables before (upper plot) and after (lower plot) text block standardization using Venn diagram representations
Cross-validated confusion matrix summarizing the predictive value of the standardization procedure
| Reference | ||||||
|---|---|---|---|---|---|---|
| Prediction | Central SA | Mixed SA | Undetected | OSAS/light | OSAS/mild | OSAS/severe |
| Central SA | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 1.33 |
| Mixed SA | 0.33 | 1.67 | 0.00 | 0.00 | 0.00 | 0.00 |
| Undetected | 0.00 | 0.00 | 14.33 | 2.33 | 0.67 | 0.00 |
| OSAS/light | 1.00 | 0.00 | 0.33 | 10.67 | 0.00 | 0.00 |
| OSAS/mild | 0.00 | 0.00 | 0.33 | 0.00 | 17.33 | 0.00 |
| OSAS/severe | 2.67 | 2.33 | 1.00 | 0.00 | 0.00 | 43.67 |
| Pred. accuracy (%) | 0.0 | 41.7 | 89.6 | 82.1 | 96.3 | 97.0 |
The table entries report the percentual average cell counts across resamples following a 10-fold cross-validation with 3 repetitions. The bottom line provides the class-wide prediction accuracy