| Literature DB >> 27175226 |
Danielle L Mowery1, Brian E Chapman1, Mike Conway2, Brett R South1, Erin Madden3, Salomeh Keyhani3, Wendy W Chapman1.
Abstract
BACKGROUND: In the United States, 795,000 people suffer strokes each year; 10-15 % of these strokes can be attributed to stenosis caused by plaque in the carotid artery, a major stroke phenotype risk factor. Studies comparing treatments for the management of asymptomatic carotid stenosis are challenging for at least two reasons: 1) administrative billing codes (i.e., Current Procedural Terminology (CPT) codes) that identify carotid images do not denote which neurovascular arteries are affected and 2) the majority of the image reports are negative for carotid stenosis. Studies that rely on manual chart abstraction can be labor-intensive, expensive, and time-consuming. Natural Language Processing (NLP) can expedite the process of manual chart abstraction by automatically filtering reports with no/insignificant carotid stenosis findings and flagging reports with significant carotid stenosis findings; thus, potentially reducing effort, costs, and time.Entities:
Keywords: Information extraction; Natural language processing; Phenotype; Stroke
Mesh:
Year: 2016 PMID: 27175226 PMCID: PMC4863379 DOI: 10.1186/s13326-016-0065-1
Source DB: PubMed Journal: J Biomed Semantics
Fig. 1Sample texts by report type. Each text contains fictional, but realistic information
Structure types with example sentences
| Example sentence | |
|---|---|
|
| “30–45 % stenosis in the right ICA.” |
|
| “1. Both ICAs are occluded.” |
|
| “95 % RICA 50 % LICA 75 % LECA” |
|
| “Right: ICA: stenosis >70 %.” |
|
| Any structures not listed above |
Expression types with example sentences
| Example sentence | |
|---|---|
| Category | “severe stenosis” |
| Range | “stenosis ranging from 40 to 70 %” |
| Exact | “60 % stenosis” |
Fig. 2Schema representing findings as well as semantic and linguistic modifiers and their possible normalized value sets
Fig. 3Illustration of pyConText’s pipeline encoding a sentence and classifying the document from Fig. 1 RAD report example 1. Some modifiers e.g., temporality and exam are not displayed for brevity. Blue mentions indicate templated mentions classified as no/insignificant stenosis; red mentions indicate templated mentions classified as significant stenosis
Fig. 4The resulting RAD report example 1 processed by pyConText from Fig. 3
According to report type, overall frequency of at least one carotid mention within sections, types of structures for all carotid mentions, and types of expressions for all carotid mentions
| Information type | Information subtype | Report types | |
|---|---|---|---|
| RAD | TIU | ||
| Sections | |||
| Findings Total | 368 | 106 | |
| Impressions Total | 488 | 173 | |
| Findings Only | 9 | 39 | |
| Impressions Only | 129 | 106 | |
| Both | 359 | 67 | |
| Neither/Not Applicable | 1 | 286 | |
|
| |||
|
| 706 | 294 | |
|
| 256 | 76 | |
|
| 0 | 36 | |
|
| 46 | 152 | |
|
| 2 | 6 | |
| Expressions | |||
| Category | 713 | 344 | |
| Range | 254 | 314 | |
| Exact | 48 | 19 | |
Findings Total = Findings only + Both; Impressions Total = Impressions only + Both. Neither = report has Findings and Impressions, but does not contain carotid mentions; Not Applicable = report does not have Findings and Impressions
Structure type usage according to sections and report type
|
|
|
|
|
| |
|---|---|---|---|---|---|
| RAD | |||||
| Findings | 306 | 3 | 0 | 66 | 3 |
| Impressions | 352 | 127 | 0 | 22 | 0 |
| TIU | |||||
| Findings | 25 | 6 | 33 | 43 | 0 |
| Impressions | 88 | 21 | 13 | 48 | 0 |
Structure type usage between Findings (rows) and Impressions (columns) for repetitive mentions by report type
|
|
|
|
|
| |
|---|---|---|---|---|---|
| RAD | |||||
|
| 233 (61 %) | 73 (19 %) | 0 (0 %) | 1 (<1 %) | 0 (0 %) |
|
| 1 (<1 %) | 1 (<1 %) | 0 (0 %) | 0 (0 %) | 0 (0 %) |
|
| 0 (0 %) | 0 (0 %) | 0 (0 %) | 0 (0 %) | 0 (0 %) |
|
| 35 (9 %) | 27 (7 %) | 0 (0 %) | 5 (1 %) | 0 (0 %) |
|
| 2 (<1 %) | 1 (<1 %) | 0 (0 %) | 0 (0 %) | 0 (0 %) |
| TIU | |||||
|
| 12 (23 %) | 4 (7 %) | 0 (0 %) | 3 (6 %) | 0 (0 %) |
|
| 0 (0 %) | 0 (0 %) | 0 (0 %) | 0 (0 %) | 0 (0 %) |
|
| 15 (28 %) | 0 (0 %) | 1 (2 %) | 0 (0 %) | 0 (0 %) |
|
| 0 (0 %) | 9 (17 %) | 0 (0 %) | 9 (17 %) | 0 (0 %) |
|
| 0 (0 %) | 0 (0 %) | 0 (0 %) | 0 (0 %) | 0 (0 %) |
Expression type usage by sections and report type
| Category | Range | Exact | |
|---|---|---|---|
| RAD | |||
| Findings | 330 | 73 | 25 |
| Impressions | 381 | 178 | 23 |
| TIU | |||
| Findings | 73 | 59 | 8 |
| Impressions | 116 | 110 | 5 |
Expression type usage between Findings (rows) and Impressions (columns) for repetitive mentions by report type
| Category | Range | Exact | |
|---|---|---|---|
| RAD | |||
| Category | 278 (53 %) | 108 (20 %) | 14 (3 %) |
| Range | 35 (7 %) | 53 (10 %) | 2 (<1 %) |
| Exact | 16 (3 %) | 6 (1 %) | 14 (3 %) |
| TIU | |||
| Category | 30 (29 %) | 23 (22 %) | 1 (<1 %) |
| Range | 26 (25 %) | 13 (12 %) | 3 (3 %) |
| Exact | 3 (3 %) | 4 (4 %) | 2 (2 %) |
pyConText performance according to report type
|
|
|
|
| |
|---|---|---|---|---|
| RAD | ||||
| Findings | 57 | 67 | 88 | 83 |
| Impressions | 74 |
|
| 90 |
| Full report |
| 70 | 84 |
|
| TIU | ||||
| Findings | 60 | 55 | 88 | 89 |
| Impressions | 19 |
|
| 82 |
| Full report |
| 58 | 87 |
|
For each metric and report type, the highest metric value is bolded