| Literature DB >> 25250675 |
Michelle R Ananda-Rajah1, David Martinez2, Monica A Slavin3, Lawrence Cavedon4, Michael Dooley5, Allen Cheng6, Karin A Thursky3.
Abstract
PURPOSE: Prospective surveillance of invasive mold diseases (IMDs) in haematology patients should be standard of care but is hampered by the absence of a reliable laboratory prompt and the difficulty of manual surveillance. We used a high throughput technology, natural language processing (NLP), to develop a classifier based on machine learning techniques to screen computed tomography (CT) reports supportive for IMDs. PATIENTS AND METHODS: We conducted a retrospective case-control study of CT reports from the clinical encounter and up to 12-weeks after, from a random subset of 79 of 270 case patients with 33 probable/proven IMDs by international definitions, and 68 of 257 uninfected-control patients identified from 3 tertiary haematology centres. The classifier was trained and tested on a reference standard of 449 physician annotated reports including a development subset (n = 366), from a total of 1880 reports, using 10-fold cross validation, comparing binary and probabilistic predictions to the reference standard to generate sensitivity, specificity and area under the receiver-operating-curve (ROC).Entities:
Mesh:
Year: 2014 PMID: 25250675 PMCID: PMC4175456 DOI: 10.1371/journal.pone.0107797
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Characteristics of patients with and without invasive mold diseases (IMDs).
| Characteristic | IMD group n (%) | Control group n (%) |
| No. of patients | 79 | 68 |
| No. of clinical encounters | 79 (51) | 75 (49) |
| Male gender | 48 (61) | 35 (51) |
| Age, mean (range) years | 53 (20–89) | 51 (18–89) |
| Underlying disease | ||
| AML | 32 (41) | 35 (51) |
| ALL | 14 (18) | 14 (19) |
| Lymphoma | 15 (19) | 12 (16) |
| Chronic leukaemia | 7 (8.9) | 1 (1.3) |
| MDS/transformed MDS | 6 (7.6) | 2 (2.7) |
| Multiple myeloma | 3 (3.8) | 3 (4) |
| Other | 2 (2.5) | 5 (6.7) |
| Neutropenia (≤0.5 cells/L) present | 65 (82) | 56 (75) |
| Median duration of neutropenia (IQR), days | 18 (8–45) | 19 (5–39) |
| HSCT | 36 (46) | 39 (52) |
| Allogeneic | 31/36 (86) | 30/39 (77) |
| Autologous | 5/36 (14) | 9/39 (23) |
| Characteristics of IMDs, n = 79 | NA | |
| Probable/proven IMDs | 33 (42) | |
| Possible IMDs | 46 (58) | |
| Site of infection | ||
| Lung | 67 (85) | |
| Sino-pulmonary | 3 (3.8) | |
| Sinus | 2 (2.5) | |
| Hepatosplenic | 2 (2.5) | |
| Disseminated | 4 (5.1) | |
| Organism | ||
|
| 13 | |
| Non-fumigatus | 4 | |
| Fungal hyphae resembling | 3 | |
|
| 4 | |
| Any positive PCR | 2 | |
|
| 4 | |
| Other molds ( | 2 | |
|
| 1 |
Abbreviations: AML, acute myeloid leukemia; ALL, acute lymphoblastic leukemia; MDS, myelodysplastic syndrome; IQR, inter-quartile range; HSCT, haematopoietic stem cell transplant.
Clinical encounter defined from admission to either discharge, death or transfer and for up to 12-weeks after where applicable.
Characteristics of the physician expert annotated and unannotated reports.
| Characteristic | Annotated reports n (%) | Unannotated reports n (%) |
| No. of reports | 449 | 1431 |
| Held-out reports | 83 (18) | NA |
| No. of patients total | 147 | 380 |
| No. of IMD patients | 79 | 191 |
| No. reports from IMD patients | 294 (65) | 905 (63) |
| No. reports from control patients | 155 (35) | 526 |
| Chest (alone or in combination with sinus, abdo/pelvis, brain etc) | 375 (84) | 865 (60) |
| Sinus (alone or brain-sinus, orbits, abdo/pelvis) | 38 (8.5) | 44 |
| Other (abdo, abdo pelvis, liver, aorta, neck) | 36 (8.0) | 408 |
| No. of reports according to study site | ||
| Hospital A | 226 (50) | 713 |
| Hospital B | 131 (29) | 422 |
| Hospital C | 92 (20) | 296 |
| No. of words per report according to study site | ||
| Hospital A | 211 | 229 |
| Hospital B | 126 | 128 |
| Hospital C | 314 | 348 |
Abbreviation: IMD, invasive mold disease.
Held out reports were annotated at scan level only as being supportive, unequivocal or negative for IMD.
Performance characteristics of the classifier.
| Characteristic | TP | FP | TN | FN | Sn, % (95%CI) | Sp, % (95%CI) |
| Development dataset, reports n = 366 | 197 | 32 | 117 | 20 | 91 (86 to 94) | 79 (71 to 84) |
|
| 35 | 13 | 30 | 5 | 88 (74 to 95) | 70 (55 to 81) |
| All reports, n = 449 | 232 | 45 | 147 | 25 | 90 (86 to 93) | 77 (70 to 82) |
Held out dataset were annotated at report level only as being positive, negative or equivocal for IMD.
Abbreviations: TN, true positives; FP, false positives; TN, true negatives; FN, false negatives; Sn, sensitivity; Sp, specificity; CI, confidence interval.
Figure 1Receiver operating characteristic (ROC) curve for 321 inpatient reports comparing the probabilistic output of the classifier to expert opinion.
Area under the ROC curve = 0.90 (95%CI 0.86 to 0.93). Abbreviation: CI, confidence interval.
Figure 2Error analysis of reports annotated supportive for invasive mold disease (IMD) but missed by the classifier.
Abbreviations: CT, computed tomography.
Major systematic errors in the false notifications (false positives) for invasive mold diseases among computed tomography reports by the classifier.
| Reason for misinterpretation | No. of reports | Characteristics |
| Inconsequential nodules | 10 | <1 cm nodules, granulomas |
| Abdominal scans | 9 | Non-specific hepatic or splenic lesions |
| Progress scans | 9 | Change in lesions rather than diagnosis the focus, therefore reports annotated negative by experts |
| Non-specific pulmonary/thoracic lesion | 8 | Atelectasis, scarring, mediastinal neoplastic mass |
| Misclassification | 3 | Pulmonary oedema, septic emboli, pulmonary lesions consistent with graft versus host disease |