| Literature DB >> 31340796 |
Ellen L Palmer1, Saeed Hassanpour2, John Higgins3, Jennifer A Doherty4, Tracy Onega5.
Abstract
BACKGROUND: Usage of structured fields in Electronic Health Records (EHRs) to ascertain smoking history is important but fails in capturing the nuances of smoking behaviors. Knowledge of smoking behaviors, such as pack year history and most recent cessation date, allows care providers to select the best care plan for patients at risk of smoking attributable diseases.Entities:
Keywords: Electronic health records; Informatics pipeline; Natural language processing; Smokers registry
Mesh:
Year: 2019 PMID: 31340796 PMCID: PMC6657102 DOI: 10.1186/s12911-019-0863-3
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Summary of Testing and Training Data Available for Algorithm Development
| Smoking status1 | i2b2 | Local EHR | ||||||
|---|---|---|---|---|---|---|---|---|
| Smoking Status | Smoking Status | Pack years | Cessation Date | |||||
| Train | Test | Train | Test | Train | Test | Train | Test | |
| Never | 66 | 16 | 117 | 51 | – | – | – | – |
| Ever | 80 | 25 | 139 | 64 | 84 | 26 | 54 | 19 |
| Former | 36 | 11 | 71 | 30 | 38 | 12 | 54 | 19 |
| Current | 35 | 11 | 58 | 31 | 39 | 23 | – | – |
| Smoker | 9 | 3 | 10 | 3 | 7 | 1 | – | – |
| Unknown | 252 | 63 | 277 | 108 | – | – | – | – |
Distribution of annotations for smoking status, pack years, and cessation date for the training and testing data from the i2b2 Challenge and our local EHR. Smoking status was determined by a manual review, with notes classified as: Never smoker, former smoker, current smoker, smoker temporality unknown (referred to as smoker), or no smoking status information (referred to as unknown). For the local EHR pack year and cessation date counts, we indicate the number of notes for which this information was identified by manual review
Fig. 1Visualization of information extraction for smoking status classification. Visualization of the process used by the smoking status algorithm. Notes were manually annotated for smoking behavior concepts in 756 local records. Many of these notes were semi-structured and contained clearly defined sections. Smoking behaviors were most often found in the “social history” and impression sections, followed by the “impressions” and “health summary” sections
Fig. 2Flowchart for informatics pipeline information identification. Flowchart of the application of the pipeline to clinical notes. All notes were subjected to the smoking status algorithm. If the status assigned was never smoker or unknown, no further assessments were done. If the status assigned was current smoker, former smoker, or smoker temporality unknown, the note was assessed for pack year history and cessation date
Summary Statistics from Smoking Status Algorithm Testing
| a) | i2b2 trained and tested | |||||
|
| ||||||
| Precision | Recall | F1-Score | N notes | Sensitivity | Specificity | |
| Never | 0.94 | 0.94 | 0.94 | 16 | 94% | 94% |
| Ever | – | – | – | 25 | 94% | 99% |
| Former | 0.73 | 0.73 | 0.73 | 11 | 73% | 97% |
| Current | 0.62 | 0.73 | 0.67 | 11 | 73% | 95% |
| Smoker | 0.0 | 0.0 | 0.0 | 3 | 0% | 99% |
| Unknown | 1.00 | 1.00 | 1.00 | 63 | 100% | 100% |
| b) | Local record trained and tested | |||||
|
| ||||||
| Precision | Recall | F1-Score | N notes | Sensitivity | Specificity | |
| Never | 0.83 | 0.98 | 0.90 | 51 | 98% | 94% |
| Ever | – | – | – | 64 | 90% | 96% |
| Former | 0.93 | 0.83 | 0.88 | 30 | 83% | 99% |
| Current | 0.79 | 0.84 | 0.81 | 31 | 84% | 96% |
| Smoker | 0.33 | 0.33 | 0.33 | 3 | 33% | 99% |
| Unknown | 0.99 | 0.92 | 0.95 | 108 | 92% | 99% |
| c) | Local record trained, i2b2 record tested | |||||
|
| ||||||
| Precision | Recall | F1-Score | N notes | Sensitivity | Specificity | |
| Never | 0.75 | 0.94 | 0.83 | 16 | 94% | 94% |
| Ever | – | – | – | 25 | 80% | 94% |
| Former | 0.88 | 0.64 | 0.74 | 11 | 64% | 99% |
| Current | 0.86 | 0.55 | 0.67 | 11 | 55% | 99% |
| Smoker | 0.00 | 0.00 | 0.00 | 3 | 0% | 94% |
| Unknown | 1.00 | 1.00 | 1.00 | 63 | 100% | 100% |
Overall F1-score, and by smoking status precision, recall, F1-score, sensitivity, and specificity for a) i2b2 note trained and tested b) local note trained and tested and c) local note trained and i2b2 note tested data sets
Summary Statistics from the Pack Year and Cessation Date Algorithms
| Annotation | a) Pack years regular expression extraction | Annotation | b) Cessation date regular expression extraction | ||
|---|---|---|---|---|---|
| Sensitivity: 91.7% | Sensitivity: 63.2% | ||||
| Pack year history found | Pack year history not found | Cessation date found | Cessation date not found | ||
| Pack year history recorded | 33 | 3 | Cessation date recorded | 12 | 7 |
| No pack year history recorded | 8 | 179 | Cessation date not recorded | 11 | 193 |
Sensitivity, specificity, and crosstabulation of the annotation vs. the classification of a) the pack year algorithm and b) the cessation date algorithm