| Literature DB >> 31682579 |
Jeremy Petch1,2, Jane Batt3,4,5, Joshua Murray6,7, Muhammad Mamdani1,6,8,9.
Abstract
BACKGROUND: The increasing adoption of electronic health records (EHRs) in clinical practice holds the promise of improving care and advancing research by serving as a rich source of data, but most EHRs allow clinicians to enter data in a text format without much structure. Natural language processing (NLP) may reduce reliance on manual abstraction of these text data by extracting clinical features directly from unstructured clinical digital text data and converting them into structured data.Entities:
Keywords: electronic health record; natural language processing; tuberculosis
Year: 2019 PMID: 31682579 PMCID: PMC6913750 DOI: 10.2196/12575
Source DB: PubMed Journal: JMIR Med Inform
Figure 1Natural language processing (NLP) workflow using the DARWEN tool. PRP: pronoun; VB: verb; RB: adverb; JJ: adjective; CC: coordinating conjunction; DT: determiner; NN: noun; IN: preposition; TB: tuberculosis; TST: Tuberculin Skin Test.
Feature categorization based on a priori assessment of clinical and linguistic complexity.
| Feature complexity and feature | Type | Examples | |
|
|
|
| |
|
| Country of birth | Country | India; Indonesia |
|
| Date of immigration | Date | 30/06/2013 |
|
| Smoking status | Categorical | Current smoker; former smoker |
|
| Drug treatment | Text mapped to drug list | Isoniazid; rifampin |
|
|
|
| |
|
| HIV status | Binary | Positive/negative |
|
| Known TBa exposure | Binary | Yes/no |
|
| Previous TB | Binary | Yes/no |
|
| Method of diagnosis | Categorical | Culture positive; polymerase chain reaction positive |
|
| TB sensitivities | Categorical | Fully sensitive; isoniazid resistant |
|
|
|
| |
|
| Diagnosis | Categorical | Active TB; latent TB infection |
|
| Sputum conversion date | Date | 22/07/2016 |
|
| Adverse drug reactions | Categorical | Peripheral neuropathy; rash |
|
| Medical risk factors | Categorical | Chemotherapy; renal failure |
|
| Social risk factors | Categorical | Refugee camp resident; jail inmate |
|
| Disease extent | Categorical | Pulmonary acid fast bacilli smear positive; disseminated |
aTB: tuberculosis.
Primary and secondary outcomes for natural language processing (index analysis) compared with manual chart review (reference standard analysis).
| Feature complexity | Primary outcome, overall accuracy (95% CI) | Secondary outcomes | |||
|
|
| Sensitivity/recall (SD) | Specificity (SD) | Positive predictive value/precision (SD) | Negative predictive value (SD) |
| Simple | 96.3 (94.3-97.6) | 93.8 (7.7) | 99.7 (0.5) | 96.4 (6.4) | 99.0 (1.7) |
| Moderate | 92.9 (91.1-94.4) | 60.2 (38.6) | 94.2 (5.0) | 70.2 (33.7) | 95.6 (6.6) |
| Complex | 90.6 (87.3-93.1) | 73.8 (45.7) | 89.2 (8.3) | 53.6 (37.4) | 98.4 (2.9) |
Primary and secondary outcomes for natural language processing (index analysis) compared with manual chart review (reference standard analysis) at the clinical feature level.
| Feature | Primary outcome, overall accuracy (95% CI) | Secondary outcomesa | |||||
|
|
| Sensitivity/recall (SD) | Specificity (SD) | Positive predictive value/precision (SD) | Negative predictive value (SD) | ||
|
| |||||||
|
| Country of birth | 0.92 (0.80-0.98) | 0.88 (0.32) | 0.99 (0.01) | 0.97 (0.11) | 0.99 (0.01) | |
|
| Year of immigration | 0.90 (0.78-0.97) | 0.89 (0.29) | 0.99 (0.02) | 0.98 (0.08) | 0.99 (0.01) | |
|
| Smoking status | 0.94 (0.83-0.99) | 0.92 (0.08) | 0.98 (0.03) | 0.85 (0.30) | 0.97 (0.02) | |
|
| Sputum conversion date | 0.98 (0.89-0.99) | 0.80 (0.45) | 0.99 (0.01) | 0.99 (0.01) | 0.99 (0.01) | |
|
| Pyrazinamide | 0.96 (0.86-0.99) | 1.00 | 0.85 | 0.95 | 1.00 | |
|
| Moxifloxacin | 1.00 (0.93-1.00) | 1.00 | 1.00 | 1.00 | 1.00 | |
|
| Vitamin B6 | 0.92 (0.80-0.98) | 1.00 | 0.86 | 0.84 | 1.00 | |
|
| Rifampin | 1.00 (0.93-1.00) | 1.00 | 1.00 | 1.00 | 1.00 | |
|
| Ethambutol | 1.00 (0.93-1.00) | 1.00 | 1.00 | 1.00 | 1.00 | |
|
| Isoniazid | 1.00 (0.93-1.00) | 1.00 | 1.00 | 1.00 | 1.00 | |
|
| Levofloxacin | 0.98 (0.89-0.99) | N/Ab | 0.98 | N/A | N/A | |
|
| |||||||
|
| HIV status | 0.94 (0.83-0.99) | 0.94 | 0.94 | 0.89 | 0.97 | |
|
| TBc contact | 0.82 (0.68-0.91) | 0.80 | 0.82 | 0.67 | 0.90 | |
|
| Old TB | 0.94 (0.83-0.99) | 0.71 | 0.98 | 0.83 | 0.95 | |
|
| Culture positive | 0.88 (0.75-0.95) | 0.33 | 1.00 | 1.00 | 0.87 | |
|
| Polymerase chain reaction positive | 1.00 (0.93-1.00) | 1.00 | 1.00 | 1.00 | 1.00 | |
|
| Clinical diagnosis | 1.00 (0.93-1.00) | 1.00 | 1.00 | 1.00 | 1.00 | |
|
| Drug sensitivity | 0.92 (0.80-0.98) | 0.81 (0.27) | 0.97 (0.04) | 0.73 (0.25) | 0.91 (0.14) | |
|
| Corticosteroids | 0.98 (0.89-0.99) | N/A | 0.98 | N/A | N/A | |
|
| Chemotherapy | 0.94 (0.83-0.99) | 0.50 | 0.96 | 0.33 | 0.98 | |
|
| Other immunosuppressive drugs | 0.76 (0.61-0.87) | 0.08 | 0.97 | 0.50 | 0.77 | |
|
| Cancer | 0.92 (0.80-0.98) | 1.00 | 0.91 | 0.33 | 1.00 | |
|
| Diabetes | 0.98 (0.89-0.99) | 0.86 | 1.00 | 1.00 | 0.98 | |
|
| Malnutrition | 0.94 (0.83-0.99) | 0.00 | 0.98 | 0.00 | 0.96 | |
|
| Other immunosuppressive conditions | 0.82 (0.68-0.91) | 0.10 | 1.00 | 1.00 | 0.81 | |
|
| Marginalized | 0.96 (0.86-0.99) | 0.66 (0.57) | 0.93 (0.12) | 0.99 (0.02) | 0.91 (0.14) | |
|
| Health care facility | 0.90 (0.78-0.97) | 0.38 (0.48) | 0.95 (0.08) | 0.95 (0.08) | 0.97 (0.03) | |
|
|
|
|
|
|
| ||
|
|
| Positive | 0.92 (0.80-0.98) | 0.25 | 0.98 | 0.50 | 0.93 |
|
|
| Negative | 0.96 (0.86-0.99) | 1.00 | 0.96 | 0.67 | 1.00 |
|
| Extrapulmonary (other than lymphadenitis) | 0.88 (0.75-0.96) | 0.00 | 0.96 | 0.00 | 0.91 | |
|
| Lymphadenitis | 0.94 (0.83-0.99) | N/A | 0.94 | N/A | N/A | |
|
| Disseminated | 0.96 (0.86-0.99) | 0.00 | 1.00 | N/A | 0.96 | |
|
| |||||||
|
| Active TB disease | 1.00 (0.93-1.00) | 1.00 | 1.00 | 1.00 | 1.00 | |
|
| Latent TB infection | 0.84 (0.70-0.93) | 0.90 | 0.79 | 0.76 | 0.92 | |
|
| Pulmonary nontuberculous mycobacteria | 0.88 (0.75-0.95) | 1.00 | 0.87 | 0.25 | 1.00 | |
|
|
|
|
|
|
|
| |
|
|
| Gastrointestinal | 0.84 (0.70-0.93) | 1.00 | 0.76 | 0.65 | 1.00 |
|
|
| Peripheral neuropathy | 0.96 (0.86-0.99) | 1.00 | 0.95 | 0.78 | 1.00 |
|
|
| Rash | 0.90 (0.78-0.97) | 1.00 | 0.89 | 0.50 | 1.00 |
|
|
| Other | 0.94 (0.83-0.99) | 0.00 | 0.98 | 0.00 | 0.96 |
|
|
| Ocular toxicity | 0.90 (0.75-0.97) | 0.00 | 0.92 | 0.00 | 0.98 |
aValues within parenthesis are standard deviation values.
bN/A: not applicable.
cTB: tuberculosis.
Primary and secondary outcomes for natural language processing compared with manual chart review, adjusted for results of adjudication.
| Feature complexity | Primary outcome, overall accuracy (95% CI) | Secondary outcomes | |||
|
|
| Sensitivity/recall (SD) | Specificity (SD) | Positive predictive value/precision (SD) | Negative predictive value (SD) |
|
|
|
|
|
|
|
| Simple | 97.8 (96.1-98.7) | 96.4 (5.4) | 99.8 (0.5) | 98.3 (4.5) | 99.2 (1.7) |
| Moderate | 96.2 (94.8-97.3) | 78.2 (25.0) | 93.3 (4.7) | 92.7 (14.7) | 97.2 (3.2) |
| Complex | 94.1 (91.3-96.1) | 86.3 (35.0) | 92.8 (8.2) | 70.5 (34.2) | 98.7 (2.9) |
Primary and secondary outcomes for natural language processing compared with manual chart review, adjusted for results of adjudication at the clinical feature level.
| Feature | Primary outcome, overall accuracy (95% CI) | Secondary outcomesa | ||||||||||
|
|
| Sensitivity/recall (SD) | Specificity (SD) | Positive predictive value/precision (SD) | Negative predictive value (SD) | |||||||
|
| ||||||||||||
|
| Country of birth | 0.94 (0.83-0.99) | 0.91 (0.28) | 0.99 (0.01) | 0.98 (0.10) | 0.99 (0.01) | ||||||
|
| Year of immigration | 0.92 (0.80-0.98) | 0.92 (0.23) | 0.99 (0.02) | 0.99 (0.06) | 0.99 (0.01) | ||||||
|
| Smoking status | 0.94 (0.83-0.99) | 0.92 (0.08) | 0.98 (0.03) | 0.85 (0.30) | 0.97 (0.02) | ||||||
|
| Sputum year | 1.00 (0.93-1.00) | 1.00 | 1.00 | 1.00 | 1.00 | ||||||
|
| Pyrazinamide | 0.96 (0.86-0.99) | 1.00 | 0.85 | 0.95 | 1.00 | ||||||
|
| Moxifloxacin | 1.00 (0.93-1.00) | 1.00 | 1.00 | 1.00 | 1.00 | ||||||
|
| Vitamin B6 | 0.92 (0.80-0.98) | 1.00 | 0.86 | 0.84 | 1.00 | ||||||
|
| Rifampin | 1.00 (0.93-1.00) | 1.00 | 1.00 | 1.00 | 1.00 | ||||||
|
| Ethambutol | 1.00 (0.93-1.00) | 1.00 | 1.00 | 1.00 | 1.00 | ||||||
|
| Isoniazid | 1.00 (0.93-1.00) | 1.00 | 1.00 | 1.00 | 1.00 | ||||||
|
| Levofloxacin | 1.00 (0.93-1.00) | 1.00 | 1.00 | 1.00 | 1.00 | ||||||
|
| ||||||||||||
|
| HIV status | 0.98 (0.89-0.99) | 0.95 | 1.00 | 1.00 | 0.97 | ||||||
|
| TBb contact | 0.86 (0.73-0.94) | 0.92 | 0.83 | 0.67 | 0.97 | ||||||
|
| Old TB | 0.96 (0.86-0.99) | 0.75 | 1.00 | 1.00 | 0.95 | ||||||
|
| Culture positive | 0.88 (0.75-0.95) | 0.33 | 1.00 | 1.00 | 0.87 | ||||||
|
| Polymerase chain reaction positive | 1.00 (0.93-1.00) | 1.00 | 1.00 | 1.00 | 1.00 | ||||||
|
| Clinical diagnosis | 1.00 (0.93-1.00) | 1.00 | 1.00 | 1.00 | 1.00 | ||||||
|
| Drug sensitivity | 0.96 (0.86-0.99) | 0.98 (0.03) | 0.99 (0.01) | 0.80 (0.26) | 0.94 (0.10) | ||||||
|
| Corticosteroids | 1.00 (0.93, 1.00) | 1.00 | 1.00 | 1.00 | 1.00 | ||||||
|
| Chemotherapy | 0.98 (0.89-0.99) | 0.75 | 1.00 | 1.00 | 0.98 | ||||||
|
| Other immunosuppressive drugs | 0.98 (0.89-0.99) | 0.67 | 1.00 | 1.00 | 0.98 | ||||||
|
| Cancer | 1.00 (0.93-1.00) | 1.00 | 1.00 | 1.00 | 1.00 | ||||||
|
| Diabetes | 0.98 (0.89-0.99) | 0.86 | 1.00 | 1.00 | 0.98 | ||||||
|
| Malnutrition | 0.94 (0.83-0.99) | 0.00 | 0.98 | 0.00 | 0.96 | ||||||
|
| Other immunosuppressive conditions | 0.98 (0.89-0.99) | 0.5 | 1.00 | 1.00 | 0.98 | ||||||
|
| Marginalized | 0.98 (0.89-0.99) | 0.75 (0.50) | 0.95 (0.10) | 0.99 (0.01) | 0.99 (0.01) | ||||||
|
| Health care facility | 0.92 (0.80-0.97) | 0.50 (0.50) | 0.86 (0.29) | 0.95 (0.06) | 0.97 (0.03) | ||||||
|
|
|
|
|
|
| |||||||
|
|
| Positive | 0.92 (0.80-0.98) | 0.25 | 0.98 | 0.50 | 0.93 | |||||
|
|
| Negative | 0.96 (0.86-0.99) | 1.00 | 0.95 | 0.67 | 1.00 | |||||
|
| Extrapulmonary (other than lymphadenitis) | 0.96 (0.86-0.99) | 0.50 | 1.00 | 1.00 | 0.95 | ||||||
|
| Lymphadenitis | 1.00 (0.93-1.00) | 1.00 | 1.00 | 1.00 | 1.00 | ||||||
|
| Disseminated | 1.00 (0.93-1.00) | N/Ac | 1.00 | N/A | N/A | ||||||
|
| ||||||||||||
|
| Active TB disease | 1.00 (0.93-1.00) | 1.00 | 1.00 | 1.00 | 1.00 | ||||||
|
| Latent TB infection | 0.84 (0.70-0.93) | 0.90 | 0.79 | 0.76 | 0.92 | ||||||
|
| Pulmonary nontuberculous mycobacteria | 1.00 (0.93-1.00) | 1.00 | 1.00 | 1.00 | 1.00 | ||||||
|
|
|
|
|
|
|
| ||||||
|
|
| Gastrointestinal | 0.90 (0.78-0.97) | 1.00 | 0.84 | 0.78 | 1.00 | |||||
|
|
| Peripheral neuropathy | 1.00 (0.93-1.00) | 1.00 | 1.00 | 1.00 | 1.00 | |||||
|
|
| Rash | 0.90 (0.78-0.97) | 1.00 | 0.89 | 0.50 | 1.00 | |||||
|
|
| Other | 0.97 (0.89-0.99) | 1.00 | 0.98 | 0.50 | 1.00 | |||||
|
|
| Ocular toxicity | 0.90 (0.75-0.97) | 0.00 | 0.92 | 0.00 | 0.98 | |||||
aValues within parenthesis are standard deviation values.
bTB: tuberculosis.
cN/A: not applicable.