| Literature DB >> 27267768 |
Aaron J Masino1, Robert W Grundmeier2,3, Jeffrey W Pennington2, John A Germiller4,5, E Bryan Crenshaw4,5.
Abstract
BACKGROUND: Radiology reports are a rich resource for biomedical research. Prior to utilization, trained experts must manually review reports to identify discrete outcomes. The Audiological and Genetic Database (AudGenDB) is a public, de-identified research database that contains over 16,000 radiology reports. Because the reports are unlabeled, it is difficult to select those with specific abnormalities. We implemented a classification pipeline using a human-in-the-loop machine learning approach and open source libraries to label the reports with one or more of four abnormality region labels: inner, middle, outer, and mastoid, indicating the presence of an abnormality in the specified ear region.Entities:
Keywords: Audiology; Human-in-the-loop; Machine learning; Natural language processing; Radiology
Mesh:
Year: 2016 PMID: 27267768 PMCID: PMC4896018 DOI: 10.1186/s12911-016-0306-3
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1Web service and classification pipeline architecture. Client requests include radiology reports that are first normalized and then classified by four region specific models. Label values are returned to client via an HTTP response
Abnormal annotation distribution
| Region | Training set | Test set |
|---|---|---|
| At least one | 62.41 % (362) | 62.33 % (91) |
| Inner | 26.72 % (155) | 26.03 % (38) |
| Middle | 37.59 % (218) | 40.41 % (59) |
| Outer | 13.79 % (80) | 14.38 % (21) |
| Mastoid | 30.86 % (179) | 36.99 % (54) |
Column values indicate the percentage of documents (values in parenthesis indicate absolute number of documents) that were labeled as abnormal for the given region. A document as a whole was considered abnormal if it contained an abnormality in at least one region. The training and test sets contain a total of 580 and 146 documents, respectively
Best classifier hyperpameters by ear region
| Feature Vector Hyperparameters | ||||
|---|---|---|---|---|
| Region | Best classifier | n-gram Range | Word/Character | Model Hyperparameters |
| Inner Ear | SVM (Linear) | 1–2 | Word | Cost parameter, C = 0.1 |
| Middle Ear | Logistic Regression | 1–3 | Word | Regularization cost, l = 0.1 |
| Outer Ear | SVM (Linear) | 1–3 | Word | Cost parameter, C = 0.333 |
| Mastoid | Decision Tree | 1–3 | Character | Max depth = 2 |
Best classifier test set performance metrics
| Region | Inner | Middle | Outer | Mastoid |
|---|---|---|---|---|
| Accuracy | 90 % (+16.0) | 90 % (+30.4) | 93 % (+7.38) | 82 % (+19.0) |
| F1 Score | 0.82 | 0.85 | 0.71 | 0.74 |
| NPV | 0.94 | 0.85 | 0.93 | 0.83 |
| PPV | 0.82 | 1.0 | 0.92 | 0.80 |
| Sensitivity | 0.82 | 0.75 | 0.57 | 0.69 |
| Specificity | 0.94 | 1.0 | 0.99 | 0.90 |
The values in parenthesis in the accuracy row are the percent difference compared to the majority class. NPV is negative predictive value, PPV is positive predictive value. The best classifiers by region were SVM (linear) for the inner and outer ear, logistic regression for the middle ear, and decision tree for the mastoid
Best classifier confusion matrices by ear region
| Inner Ear: SVM Linear Kernel | Middle Ear: Logistic Regression | ||||
| Predicted Label | Predicted Label | ||||
| Actual Label | Normal | Abnormal |
| Normal | Abnormal |
| Normal | 101 | 7 | Normal | 87 | 0 |
| Abnormal | 7 | 31 | Abnormal | 15 | 44 |
| Outer Ear: SVM Linear Kernel | Mastoid: Decision Tree | ||||
| Predicted Label | Predicted Label | ||||
| Actual Label | Normal | Abnormal |
| Normal | Abnormal |
| Normal | 124 | 1 | Normal | 83 | 9 |
| Abnormal | 9 | 12 | Abnormal | 17 | 37 |
Test set confusion matrices for best learning algorithm for each ear region. The confusion matrices provide the true and false counts for normal and abnormal documents as labeled by the classification algorithm
Fig. 2Best classifier learning curves. From top left to bottom right, best classifier model learning curves for the inner ear (linear SVM), middle ear (logistic regression), outer ear (linear SVM), and mastoid (decision tree). The curves show the training and validation accuracy as a function of the training set size. Performance is evaluated by 5-fold cross validation. The green (red) curves indicate performance on the training (cross-validation) report sets. Each data point (circles) is the average accuracy value over the 5 folds. The shaded region indicates the standard deviation
Keyword and ICD9 search method performance
| Accuracy | F1-Score | |||||
|---|---|---|---|---|---|---|
| Region | Best classifier | Keyword | ICD9 | Best classifier | Keyword | ICD9 |
| Inner | 90 % | 75 % | 60 % | 0.82 | 0.42 | 0.60 |
| Middle | 90 % | 67 % | 62 % | 0.85 | 0.67 | 0.15 |
| Outer | 93 % | 86 % | 86 % | 0.71 | 0.33 | 0.0 |
| Mastoid | 82 % | 65 % | 68 % | 0.73 | 0.14 | 0.28 |
Fig. 3Logistic regression receiver operating characteristic (ROC) by region. The ROC curves for the best performing logistic regression model for each ear region. The dashed line is the expected performance for a random binary classifier. Area Under the Curve (AUC) values closer to 1.0 indicate high performance with low false positive and false negative events