| Literature DB >> 27980387 |
Liwei Wang1, Xiaoyang Ruan1, Ping Yang1, Hongfang Liu1.
Abstract
OBJECTIVE: The primary aim was to compare independent and joint performance of retrieving smoking status through different sources, including narrative text processed by natural language processing (NLP), patient-provided information (PPI), and diagnosis codes (ie, International Classification of Diseases, Ninth Revision [ICD-9]). We also compared the performance of retrieving smoking strength information (ie, heavy/light smoker) from narrative text and PPI.Entities:
Keywords: ICD-9; natural language processing; patient-provided information; smoking status; smoking strength
Year: 2016 PMID: 27980387 PMCID: PMC5147453 DOI: 10.4137/CIN.S40604
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Figure 1Workflow of NLP-based identification of smoking-related information.
Smoking status and strength by manual review.
| DETAILED SMOKING STATUS | COUNT | PERCENTAGE | |
|---|---|---|---|
| Detailed Smoking Status | Current smoker | 143 | 26 |
| Former smoker | 86 | 15 | |
| Smoker current status unknown | 126 | 22 | |
| Never smoker | 206 | 37 | |
| Unknown if ever smoked | 0 | 0 | |
| Smoking Strength | Heavy tobacco smoker | 293 | 52 |
| Light tobacco smoker | 36 | 6 |
Notes:
Includes current every day or some day smoker.
Smoke more than or equal to 10 cigarettes per day.
Performance of NLP, ICD-9, and PPI in identifying smoking status and strength.
| MANUAL | SENSITIVITY (95% CI) | SPECIFICITY (95% CI) | ACCURACY (95% CI) | |||
|---|---|---|---|---|---|---|
| Smoking status (561 total) | Ever | Never | ||||
| NLP | Ever | 311 | 53 | 0.97(0.95–0.99) | 0.70(0.63–0.77) | 0.88(0.84–0.90) |
| Never | 8 | 124 | ||||
| ICD-9 | Ever | 89 | 4 | 0.25(0.2–0.3) | 0.98(0.95–1) | 0.52(0.48–0.56) |
| Never | 266 | 202 | ||||
| PPI | Ever | 223 | 51 | 0.73(0.68–0.78) | 0.72(0.64–0.78) | 0.73(0.68–0.77) |
| Never | 82 | 129 | ||||
| NLP + ICD-9 | Ever | 315 | 53 | 0.89(0.85–0.92) | 0.74(0.68–0.80) | 0.83(0.80–0.86) |
| Never | 40 | 153 | ||||
| NLP | ICD-9 | Ever | 313 | 53 | 0.88(0.84–0.91) | 0.74(0.68–0.80) | 0.83(0.80–0.86) |
| Never | 42 | 153 | ||||
| NLP + PPI | Ever | 333 | 82 | 0.97(0.94–0.98) | 0.58(0.51–0.65) | 0.83(0.79–0.86) |
| Never | 12 | 114 | ||||
| NLP | PPI | Ever | 329 | 57 | 0.95(0.93–0.97) | 0.71(0.64–0.77) | 0.87(0.83–0.89) |
| Never | 16 | 139 | ||||
| NLP + PPI + ICD | Ever | 334 | 82 | 0.94(0.91–0.96) | 0.6(0.53–0.67) | 0.82(0.78–0.85) |
| Never | 21 | 124 | ||||
| NLP | PPI | ICD-9 | Ever | 329 | 57 | 0.93(0.89–0.95) | 0.72(0.66–0.78) | 0.85(0.82–0.88) |
| Never | 26 | 149 | ||||
| NLP | Heavy | 123 | 2 | 0.74(0.66–0.80) | 0.88(0.64–0.99) | 0.75(0.68–0.81) |
| Light | 44 | 15 | ||||
| PPI | Heavy | 45 | 0 | 0.66(0.54–0.77) | 1.0(0.59–1) | 0.69(0.58–0.79) |
| Light | 23 | 7 | ||||
| NLP | PPI | Heavy | 136 | 2 | 0.73(0.66–0.80) | 0.9(0.68–0.99) | 0.74(0.68–0.80) |
| Light | 52 | 18 |
Notes:
Categorized as ever smoker when either NLP or ICD identifies as ever smoker. Same rule applies to other combinations.
Used NLP as primary source, and if not available, used ICD. Same rule applies to other combinations.
Performance of NLP and PPI in identifying accurate smoking status.
| MANUAL | UNKNOWN | EXACT MATCH | |||||
|---|---|---|---|---|---|---|---|
| CURRENT SMOKER | FORMER SMOKER | SMOKER CURRENT STATUS UNKNOWN | NEVER | ||||
| NLP | Current smoker | 15 | 83 | 11 | 0 | 0.51 | |
| Former smoker | 36 | 22 | 39 | 0 | |||
| Smoker current status unknown | 7 | 9 | 3 | 0 | |||
| Never | 2 | 5 | 1 | 0 | |||
| Unknown | 3 | 2 | 3 | 7 | |||
| PPI | Current smoker | 17 | 49 | 40 | 0 | 0.42 | |
| Former smoker | 48 | 28 | 7 | 0 | |||
| Smoker current status unknown | 2 | 4 | 4 | 0 | |||
| Never | 31 | 33 | 18 | 0 | |||
| Unknown | 0 | 0 | 0 | 0 | |||
| NLP | PPI | Current smoker | 17 | 86 | 13 | 0 | 0.53 | |
| Former smoker | 41 | 23 | 39 | 0 | |||
| Smoker current status unknown | 7 | 9 | 5 | 0 | |||
| Never | 4 | 7 | 5 | 0 | |||
| Unknown | 0 | 0 | 0 | 0 | |||
Note: Numbers in bold indicate accurate results.