| Literature DB >> 34130677 |
Kristiina Rannikmäe1,2, Honghan Wu3,4, Steven Tominey5, William Whiteley6,7, Naomi Allen7,8, Cathie Sudlow9,3,10.
Abstract
BACKGROUND: Better phenotyping of routinely collected coded data would be useful for research and health improvement. For example, the precision of coded data for hemorrhagic stroke (intracerebral hemorrhage [ICH] and subarachnoid hemorrhage [SAH]) may be as poor as < 50%. This work aimed to investigate the feasibility and added value of automated methods applied to clinical radiology reports to improve stroke subtyping.Entities:
Keywords: Brain scan; Cerebral hemorrhage; Disease subtyping; Natural language processing; Stroke
Mesh:
Year: 2021 PMID: 34130677 PMCID: PMC8204419 DOI: 10.1186/s12911-021-01556-0
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1Selection of included UK Biobank (UKB) participants. GP general practitioner, NHS National Health Service; Information for code validation refers to the participant having any information on the hospital electronic patient record system to allow an expert stroke physician to confirm or reject the accuracy of the coded diagnosis [2]
Fig. 2Pipeline for automated disease subtyping based on clinical scan reports. The medical student who undertook the scan report annotations to train the Sem-EHR tool for the current task was a final year medical student who had completed their clinical neurology and stroke modules. They spent time reading the literature around the topic and practiced scan report annotation under the training of a neurologist before the study
Entity label-level precision and recall estimates
| Concept mentions | Precision (%) | Recall (%) |
|---|---|---|
| Metastatic tumor | 93 | 87 |
| Aneurysm | 97 | 100 |
| Intracerebral haemorrhage | 95 | 100 |
| Time old (temporal words/phrases indicating old events, e.g., | 78 | 83 |
| Subdural haematoma | 97 | 83 |
| Contusion | 100 | 100 |
| Subarachnoid haemorrhage | 90 | 100 |
| Related to (words/phrases indicating relations between two events, e.g., bleeding | 78 | 100 |
| Ischaemic stroke | 90 | 99 |
| Time recent (temporal words/phrases indicating recent events, e.g., | 91 | 97 |
| Meningioma | 100 | 100 |
| Transformation | 88 | 100 |
| Traumatic | 100 | 75 |
Numbers are mean values of tenfold cross validation
Domain-expert rules to combine entity labels into a single diagnostic label for each scan report
| Diagnostic labels | Inclusion reasons | Exclusion reasons |
|---|---|---|
| ICH | Presence of entity label: (a) intracerebral haemorrhage | Presence of ≥ 1 entity labels: (a) metastatic tumour or tumour; (b) contusion; (c) time recent and ischaemic stroke; (d) transformation; (e) subarachnoid haemorrhage + aneurysm (f) subdural haematoma |
| SAH | Presence of entity label: (a) subarachnoid haemorrhage | Presence of ≥ 1 entity labels: (a) metastatic tumour or tumour; (b) contusion; (c) transformation; (d) intracerebral haemorrhage if no mention of aneurysm; (e) subdural haematoma |
| IS | Presence of entity labels: (a) time recent and (b) ischaemic stroke |
British English spelling was used for entity labels in the original study. ICH, primary intracerebral hemorrhage; SAH, primary subarachnoid hemorrhage, IS, primary ischemic stroke
Participant-level diagnostic label precision and recall estimates against reference standard (i.e. expert physician adjudications based on the complete EMR)
| Stroke subtype | Precision (i.e. positive predictive value) | Recall (i.e. sensitivity) | ||
|---|---|---|---|---|
| From codes (based on previous work [ | From automated method | From codes (based on previous work [ | From automated method | |
| ICH | 42% (31–54%) (11/26) | 89% (52–100%) (8/9) | 100% (72–100%) (11/11) | 89% (52–100%) (8/9) |
| SAH | 71% (54–83%) (17/24) | 82% (57–96%) (14/17) | 100% (80–100%) (17/17) | 82% (57–96%) (14/17) |
| IS | 83% (75–89%) (73/88) | 73% (65–81%) (91/124) | 49% (41–57%) (73/149) | 64% (56–72%) (91/142) |
| IS (including cases with an unspecified subtype assigned as IS) | 80% (76–83%) (147/184) | 77% (71–83%) (141/182) | 99% (95–100%) (147/149) | 99% (96–100%) (141/142) |
ICH, intracerebral hemorrhage; SAH, subarachnoid hemorrhage; IS, ischemic stroke; IS (including cases with an unspecified subtype assigned as IS) = all cases where a stroke subtype could not be assigned with automated methods or where the code was unspecified for a stroke subtype were assumed to be ischemic stroke; Precision = positive predictive value (proportion of true-positive cases among all cases). Recall = sensitivity (proportion of all true-positive cases in the population identified). Absolute numbers of cases provided in brackets. The dataset used for the precision and recall calculation from codes in our previous work [2] included a total of 225 participants with a stroke code. The dataset used for the precision and recall calculation from automated method in this study includes a total of 207 participants with a stroke code. The 207 are a subset of the 225 participants with a stroke code who also had a relevant clinical brain scan report available. 18 participants among the 225 participants did not have a brain scan available and were hence excluded from this study
Participant-level diagnostic label precision and recall estimates among those with a hemorrhagic stroke code
| Precision (95% CI) | Recall (95% CI) | |
|---|---|---|
| ICH | 89% (52–100%) (8/9) | 89% (52–100%) (8/9) |
| SAH | 88% (62–98%) (14/16) | 82% (57–96%) (14/17) |
ICH, intracerebral hemorrhage; SAH, subarachnoid hemorrhage; Precision = positive predictive value (proportion of true-positive cases among all cases). Recall = sensitivity (proportion of true-positive cases identified among all true-positive cases). Absolute numbers of cases provided in brackets