| Literature DB >> 29671399 |
Huaixiao Tou1, Lu Yao2, Zhongyu Wei3, Xiahai Zhuang1, Bo Zhang4.
Abstract
BACKGROUND: Making accurate patient care decision, as early as possible, is a constant challenge, especially for physicians in the emergency department. The increasing volumes of electronic medical records (EMRs) open new horizons for automatic diagnosis. In this paper, we propose to use machine learning approaches for automatic infection detection based on EMRs. Five categories of information are utilized for prediction, including personal information, admission note, vital signs, diagnose test results and medical image diagnose.Entities:
Keywords: Automatic disease detection; Electronic medical records; Infection detection; Machine learning; Natural language processing
Mesh:
Year: 2018 PMID: 29671399 PMCID: PMC5907141 DOI: 10.1186/s12859-018-2101-x
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1The diagnostic process in clinic
The diagnostic term defined as infection
|
| Infection |
|
| Abscess |
|
| Necrosis |
|
| Gangrene |
|
| Pyogenic |
|
| Sepsis |
|
| Erysipelatous |
|
| Pneumonia |
|
| Pyothorax |
|
| Mastitis |
|
| Perforation |
|
| Peritonitis |
|
| Acute cholecystitis |
|
| Gangrenous cholecystitis |
|
| Acute attacking of chronic cholecystitis |
|
| Acute cholangitis |
|
| Acute suppurative cholangitis |
|
| Acute gangrenous cholangitis |
|
| Biliary pancreatitis |
|
| Acute appendicitis |
|
| Acute suppurated appendicitis |
|
| Acute gangrened appendicitis |
|
| Acute purulent gangrenous appendicitis |
|
| Acute phlegmonous appendicitis |
|
| Systemic inflammatory response syndrome |
|
| Sepsis |
|
| Septic shock |
|
| Acute attacking of chronic appendicitis |
Performance of automatic annotation approach compared with manual annotation
| Category | PPV | Sensitivity | F1-score |
|---|---|---|---|
| No infection | 0.84 | 0.83 | 0.84 |
| Infection | 0.83 | 0.84 | 0.83 |
| Avg / Total | 0.84 | 0.83 | 0.84 |
Fig. 2The distribution of patients for different infectious diseases in the EMRs
Fig. 3The distribution of patients in terms of ages by number (above) and by proportion (below)
Fig. 4The pipeline for preprocessing and feature extraction on EMRs
Fig. 5Example for EMR narratives tokenization and n-gram generation
Distribution of features in terms of data types
| Feature category | Number | Type |
|---|---|---|
| Personal information | 6 | Number |
| Admission note | 2276 | Text |
| Vital signs | 14 | Number |
| Diagnostic tests | 175 | Number |
| Medical image diagnose | 19 | Text |
| Total | 2490 | Number/Text |
The top six features associated with infection
| Features | EMR components | Correlation coefficient( | MIC |
|---|---|---|---|
| Shifting pain in right quadrant | Admission note | 0.32 | 0.09 |
| Malignant tumor(MT) | Admission note | 0.32 | 0.09 |
| Mcburney point | Admission note | 0.23 | 0.04 |
| Fluctuation | Admission note | 0.22 | 0.04 |
| White blood cell count(WBC) | Diagnose test | 0.20 | 0.03 |
| Lymphocytes percentage(LYMPH%) | Diagnose test | -0.12 | 0.04 |
The results of different feature categories with different models
| Model | Feature type | AUC | F1-score | PPV | Sensitivity |
|---|---|---|---|---|---|
| Random Forest | Personal information (I) | 0.62 | 0.56 | 0.61 | 0.52 |
| Admission notes (II) | 0.83 | 0.81 | 0.86 | 0.77 | |
| Vital signs (III) | 0.61 | 0.55 | 0.61 | 0.5 | |
| Diagnose tests (IV) | 0.51 | 0.07 | 0.56 | 0.04 | |
| Medical image diagnoses(V) | 0.51 | 0.06 | 0.76 | 0.03 | |
| I & III | 0.68 | 0.63 | 0.68 | 0.59 | |
| I & IV | 0.63 | 0.57 | 0.61 | 0.53 | |
| II & V | 0.84 | 0.81 | 0.86 | 0.77 | |
| II & IV & V | 0.83 | 0.81 | 0.86 | 0.77 | |
| I & II & V | 0.83 | 0.81 | 0.86 | 0.77 | |
| I & II & IV & V | 0.84 | 0.82 | 0.86 | 0.78 | |
| Total (I & II & III & IV & V) | 0.84 | 0.82 | 0.86 | 0.79 | |
| Logistic Regression CV | Personal information (I) | 0.67 | 0.62 | 0.65 | 0.59 |
| Admission note (II) | 0.87 | 0.85 | 0.85 | 0.86 | |
| Vital signs (III) | 0.59 | 0.48 | 0.6 | 0.4 | |
| Diagnose test (IV) | 0.51 | 0.09 | 0.54 | 0.05 | |
| Medical image (V) | 0.51 | 0.06 | 0.8 | 0.03 | |
| I & III | 0.68 | 0.65 | 0.66 | 0.64 | |
| I & IV | 0.68 | 0.65 | 0.65 | 0.65 | |
| II & V | 0.87 | 0.85 | 0.85 | 0.86 | |
| II & IV & V | 0.87 | 0.86 | 0.85 | 0.87 | |
| I & II & V | 0.87 | 0.85 | 0.85 | 0.86 | |
| I & II & IV & V | 0.87 | 0.85 | 0.85 | 0.86 | |
| Total (I & II & III & IV & V) | 0.87 | 0.86 | 0.86 | 0.87 | |
| Bernoulli NB | Personal information (I) | 0.58 | 0.58 | 0.53 | 0.65 |
| Admission note (II) | 0.65 | 0.69 | 0.55 | 0.93 | |
| Vital signs (III) | 0.6 | 0.52 | 0.59 | 0.46 | |
| Diagnose test (IV) | 0.55 | 0.63 | 0.48 | 0.9 | |
| Medical image (V) | 0.51 | 0.06 | 0.71 | 0.03 | |
| I & III | 0.6 | 0.52 | 0.59 | 0.46 | |
| I & IV | 0.55 | 0.63 | 0.48 | 0.9 | |
| II & V | 0.66 | 0.7 | 0.56 | 0.93 | |
| II & IV & V | 0.67 | 0.71 | 0.57 | 0.93 | |
| I & II & V | 0.66 | 0.7 | 0.56 | 0.93 | |
| I & II & IV & V | 0.67 | 0.71 | 0.57 | 0.93 | |
| Total (I & II & III & IV & V) | 0.68 | 0.71 | 0.58 | 0.93 | |
| Gradient Boosting Classifier | Personal information (I) | 0.66 | 0.62 | 0.66 | 0.58 |
| Admission note (2) | 0.87 | 0.85 | 0.85 | 0.86 | |
| Vital signs (III) | 0.65 | 0.6 | 0.63 | 0.58 | |
| Diagnose test (IV) | 0.51 | 0.09 | 0.59 | 0.05 | |
| Medical image (V) | 0.51 | 0.06 | 0.78 | 0.03 | |
| I & III | 0.72 | 0.69 | 0.69 | 0.69 | |
| I & IV | 0.67 | 0.63 | 0.65 | 0.62 | |
| II & V | 0.87 | 0.85 | 0.85 | 0.86 | |
| II & IV & V | 0.87 | 0.86 | 0.85 | 0.87 | |
| I & II & V | 0.87 | 0.86 | 0.86 | 0.86 | |
| I & II & IV & V | 0.87 | 0.86 | 0.86 | 0.87 | |
| Total (I & II & III & IV & V) | 0.88 | 0.86 | 0.86 | 0.87 |