| Literature DB >> 35127232 |
Joshua Levy1,2,3, Nishitha Vattikonda4, Christian Haudenschild5, Brock Christensen2,6,7, Louis Vaickus1.
Abstract
BACKGROUND: Pathology reports serve as an auditable trial of a patient's clinical narrative, containing text pertaining to diagnosis, prognosis, and specimen processing. Recent works have utilized natural language processing (NLP) pipelines, which include rule-based or machine-learning analytics, to uncover textual patterns that inform clinical endpoints and biomarker information. Although deep learning methods have come to the forefront of NLP, there have been limited comparisons with the performance of other machine-learning methods in extracting key insights for the prediction of medical procedure information, which is used to inform reimbursement for pathology departments. In addition, the utility of combining and ranking information from multiple report subfields as compared with exclusively using the diagnostic field for the prediction of Current Procedural Terminology (CPT) codes and signing pathologists remains unclear.Entities:
Keywords: BERT; XGBoost; current procedural terminology; deep learning; machine learning; pathology reports
Year: 2022 PMID: 35127232 PMCID: PMC8802304 DOI: 10.4103/jpi.jpi_52_21
Source DB: PubMed Journal: J Pathol Inform
Recording the percent missingness of each report subsection before removing reports lacking a diagnostic section. Summary measures (median, 1st quartile, 3rd quartile) for the number of words in each document subsection (where the subfield existed) and the percentage of documents whose length exceeded 512 words
| Report subfield | Missingness before removal | Median word count | 1st Q Word count | 3rd Q Word count | Exceeds BERT max words |
|---|---|---|---|---|---|
| ADDENDUM DISCUSSION | 96.2% | 84 | 47 | 121 | 0.000% |
| ADDITIONAL STUDIES | 86.6% | 78 | 11 | 92 | 0.039% |
| CLINICAL INFORMATION | 5.3% | 25 | 18 | 61 | 0.000% |
| DIAGNOSIS | 3.5% | 23 | 13 | 28 | 0.022% |
| DISCUSSION | 81.7% | 36 | 16 | 68 | 0.017% |
| FINAL DIAGNOSIS | 99.9% | 235 | 186 | 313 | 0.000% |
| FROZEN SECTION | 99.4% | 2 | 1 | 11 | 0.000% |
| FROZEN SECTION DIAGNOSIS | 99.3% | 20 | 12 | 33 | 0.000% |
| INTERPRETATION | 99.9% | 135 | 91 | 141 | 0.000% |
| RESULTS | 97.9% | 248 | 216 | 268 | 2.198% |
| SPECIMEN PROCESSING | 34.4% | 38 | 27 | 64 | 0.389% |
| Complete Text ( | 0% | 119 | 68 | 158 | 1.768% |
Figure 1Model Descriptions: Graphics depicting: (A) SVM, where hyperplane linearly separates pathology reports, which are represented by individual datapoints; (B) XGBoost, which sequentially fits decision trees based on residuals from sum of conditional means of previous trees and outcomes; (C) All-Fields BERT model, where a diagnosis-specific neural network extracts relevant features from the diagnostic field, whereas a neural network trained on a separate clinical corpus extracts features for the remaining subfields; subfields are weighted and summed via the attention mechanism, indicated in red; subfields are combined with diagnostic features and fine-tuned with a multilayer perceptron for the final prediction
Changes in primary CPT code assignment over time; model fits for several logistic regression models, modeling time as years since 2017 (continuous) and whether a CPT code was assigned on a specific day as the dichotomous outcome variable
| CPT Code | B | SE | P-value | CI [2.5%] | CI [97.5%] |
|---|---|---|---|---|---|
| 88300 | -0.050 | 0.037 | 0.172 | -0.122 | 0.022 |
| 88302 | 0.135 | 0.053 | 0.012 | 0.030 | 0.239 |
| 88304 | 0.096 | 0.020 | <0.001 | 0.056 | 0.136 |
| 88305 | 0.004 | 0.009 | 0.687 | -0.014 | 0.021 |
| 88307 | -0.004 | 0.018 | 0.818 | -0.039 | 0.031 |
| 88309 | -0.043 | 0.041 | 0.298 | -0.123 | 0.038 |
Figure 2Pathology report corpus characterization: (A and B) Word cloud depicting words with the highest aggregated tf-idf scores across the corpus of: (A) diagnostic text only, (B) all report subfields (all-fields); important words across the corpus indicated by relative size of the word in the word cloud; (C and D) UMAP projection of the tf-idf matrix, clustered and noise removal via HDBSCAN for: (C) diagnostic texts only, and (D) all report subfields (all-fields)
Correlation between length of the word document and the number of uniquely assigned codes; broken down by a reported cluster using the diagnostic fields and all report fields
| Cluster | Diagnostic clusters | All-field clusters | ||
|---|---|---|---|---|
|
|
| |||
| Correlation | p-value | Correlation | p-value | |
| 1 | -0.09 | 1.6E-26 | 0.39 | 5.7E-178 |
| 2 | 0.07 | 1.6E-02 | -0.05 | 7.1E-02 |
| 3 | -0.30 | 3.4E-85 | 0.01 | 7.1E-01 |
| 4 | -0.05 | 2.8E-02 | 0.02 | 3.9E-01 |
| 5 | 0.18 | 6.9E-93 | 0.00 | 9.6E-01 |
| 6 | 0.21 | 9.4E-37 | 0.01 | 2.9E-01 |
| 7 | 0.27 | 2.5E-26 | 0.08 | 1.7E-05 |
| 8 | 0.14 | 6.3E-137 | 0.57 | 4.9E-93 |
| 9 | 0.10 | 1.0E-28 | ||
| 10 | 0.31 | 5.3E-113 | ||
| 11 | 0.32 | 1.2E-33 | ||
| 12 | 0.16 | 1.0E-23 | ||
| 13 | 0.48 | 5.1E-98 | ||
| 14 | 0.09 | 3.9E-07 | ||
| 15 | 0.32 | 0.0E+00 | ||
Figure 3LDA Topic Words: Important words found for three select LDA Topics from: (A) diagnostic text only and (B) all report subfields (all-fields); important words across the corpus indicated by relative size of the word in the word cloud
Top 10 words found for each LDA topic (“topic descriptors”); 10 topics were discovered for the diagnostic text; and 10 additional topics were discovered for all of the report subfields (All Fields)
| Diagnostic text | Topic 1 | Topic 2 | Topic 3 | Topic 4 | Topic 5 | Topic 6 | Topic 7 | Topic 8 | Topic 9 | Topic 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | tumor | tissue | test | mucosa | cervical | colon | nevus | cells | shave | fragments |
| 1 | lymph | right | cancer | gastric | results | polypectomy | shave | placenta | cell | benign |
| 2 | carcinoma | left | lesion | esophagus | cancer | tubular | excision | cord | carcinoma | endocer- vical |
| 3 | grade | benign | cervical | chronic | please | adenoma | left | umbilical | left | evidence |
| 4 | nodes | excision | please | normal | guidelines | polyp | right | vessel | right | cervical |
| 5 | prostatic | soft | results | witdin | test | hyperplastic | melanocytic | acute | specimen | effect |
| 6 | left | breast | consensus | limits | consensus | ascending | changes | seen | basal | squamous |
| 7 | right | fallopian | management | abnormality | screening | sigmoid | compound | three | squamous | dysplasia |
| 8 | identified | inflam- mation |
| mationdiagnostic | manage-ment | fragments | specimen | grams | discussion | mucosa |
| 9 | invasive | resection | guidelines | seen | cells | transverse | back | villous | peripheral | hpv |
|
|
|
|
|
|
|
|
|
|
|
|
| 0 | pap | skin | tissue | tissue | pap | tissue | biopsy | positive | clinical | skin |
| 1 | hpv | specimen | biopsy | polyp | hist | lymph | diagnosis | antibody | pertinent | shave |
| 2 | test | clinical | formalin | submitted | hpv | margin | specimen | tissue | total | biopsy |
| 3 | hist | ’clock | quantity/size | colon | test | specimen | clinical | clinical | received | left |
| 4 | screening | excision | sections/processing | formalin | screening | tumor | see | studies | fluid | clinical |
| 5 | cervical | submitted | submitted | clinical | cervical | right | case | diagnostic | source | right |
| 6 | clinical | tissue | labeled/fixative | soft | clinical | left | punch | formalin | specimen | submitted |
| 7 | therapy | left | description labeled/ | fixative | therapy | node | discussion | staining | preparation | specimen |
| 8 | cancer | nevus | soft | history | cancer | submitted | submitted | core | description | tissue |
Predictive performances for primary CPT code algorithms
| Approach | Type | Macro-F1 ± SE | κ± SE | AUC ± SE | Spearman ± SE |
|---|---|---|---|---|---|
| BERT | Diagnosis | 0.825 ± 0.0064 | 0.852 ± 0.0033 | 0.99 ± 0.0008 | 0.84 ± 0.0044 |
| All fields | 0.828 ± 0.0062 | 0.855 ± 0.0032 | 0.99 ± 0.0006 | 0.843 ± 0.0044 | |
| XGBoost | Diagnosis | 0.807 ± 0.0069 | 0.835 ± 0.0034 | 0.99 ± 0.0007 | 0.824 ± 0.0045 |
| All fields | 0.832 ± 0.0069 | 0.863 ± 0.0032 | 0.994 ± 0.0004 | 0.855 ± 0.0042 | |
| SVM | Diagnosis | 0.497 ± 0.0047 | 0.644 ± 0.0043 | 0.554 ± 0.0021 | 0.637 ± 0.0056 |
| All fields | 0.518 ± 0.0048 | 0.668 ± 0.0044 | 0.554 ± 0.0014 | 0.652 ± 0.0058 |
Macro-F1 and AUC measures are agnostic to the ordering of the CPT code complexity; whereas Linear Kappa ( κ ) and Spearman correlation coefficients respect the CPT code ordering (88302, 88304, 88305, 88307, and 88309)
Figure 4Primary CPT Code Model Performance: (A and B) Grouped boxenplots demonstrating the performance of machine-learning models (BERT, XGBoost) for the prediction of primary CPT codes (bootstrapped performance statistics; A) macro-averaged F1-Score, (B) Linear-Weighted Kappa for performance across different levels of complexity, which takes into account the ordinal nature of the outcome; reported across five CPT code), given analysis of either the diagnostic text (blue) or all report subfields (orange); (C) UMAP projection of All-Fields BERT embedding vectors after applying the attention mechanism across report subfields; each point is reported with information aggregated from all report subfields; individual points represent reports, colored by the CPT code; large thick circles represent the report centroids for each CPT code; note how codes CPT 88302 and CPT 88304 cluster together and separately CPT 88307 and CPT 88309 cluster together, whereas CPT 88305 sits in between clustered reports of low and high complexity
Confusion matrices for each of the modeling approaches for primary CPT code prediction; aggregated across test sets of cross-validation folds; note how for BERT and XGBoost modes, misclassifications are mostly by codes of a similar case complexity
| Predicted | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| ||||||||||||
| Diagnosis | All Fields | |||||||||||
|
|
| |||||||||||
| 88302 | 88304 | 88305 | 88307 | 88309 | 88302 | 88304 | 88305 | 88307 | 88309 | |||
| TRUE |
| 88302 |
| 25 | 27 | 3 | 0 |
| 29 | 24 | 3 | 0 |
| 88304 | 19 |
| 322 | 92 | 2 | 18 |
| 321 | 118 | 4 | ||
| 88305 | 21 | 664 |
| 392 | 20 | 16 | 614 |
| 388 | 23 | ||
| 88307 | 3 | 148 | 257 |
| 36 | 3 | 131 | 253 |
| 55 | ||
| 88309 | 2 | 10 | 29 | 96 |
| 1 | 3 | 27 | 94 |
| ||
|
| 88302 |
| 29 | 45 | 0 | 0 |
| 33 | 44 | 1 | 0 | |
| 88304 | 10 |
| 616 | 78 | 3 | 12 |
| 515 | 82 | 2 | ||
| 88305 | 19 | 387 |
| 250 | 8 | 8 | 376 |
| 221 | 5 | ||
| 88307 | 2 | 149 | 522 |
| 29 | 0 | 116 | 366 |
| 31 | ||
| 88309 | 1 | 9 | 46 | 100 |
| 0 | 5 | 38 | 97 |
| ||
|
| 88302 |
| 59 | 289 | 19 | 0 |
| 84 | 243 | 29 | 0 | |
| 88304 | 20 |
| 436 | 189 | 0 | 3 |
| 618 | 96 | 0 | ||
| 88305 | 16 | 1516 |
| 1287 | 0 | 13 | 1068 |
| 1068 | 0 | ||
| 88307 | 5 | 299 | 774 |
| 0 | 11 | 373 | 736 |
| 0 | ||
| 88309 | 1 | 18 | 81 | 160 |
| 0 | 20 | 98 | 142 |
| ||
Summary of distribution of AUCs across ancillary CPT codes for BERT, XGBoost, and SVM prediction models for diagnostic and all-fields text
| Model | Report subfields | Median | 1st Quartile | 3rd Quartile |
|---|---|---|---|---|
| BERT | Diagnosis | 0.990 | 0.973 | 0.995 |
| All-Fields | 0.995 | 0.985 | 0.999 | |
| XGBoost | Diagnosis | 0.985 | 0.974 | 0.994 |
| All-Fields | 0.997 | 0.994 | 0.999 | |
| SVM | Diagnosis | 0.966 | 0.954 | 0.984 |
| All-Fields | 0.977 | 0.957 | 0.992 |
Wilcoxon tests for significance of relative performance gains (distribution of paired AUC differences for codes between two algorithms/report subfield combinations); all Wilcoxon tests were one-sided (algorithm 1 / selected subfields performance greater than algorithm 2 / selected subfields performance) to see which models perform the best for CPT code prediction
| Algorithm 1 | Algorithm 2 | |||
|---|---|---|---|---|
|
|
|
| ||
| Name | Report fields | Name | Report fields | P-Value |
| XGBoost | All fields | BERT | All fields |
|
| XGBoost | Diagnosis | BERT | Diagnosis | 6.4E-01 |
| BERT | All fields | BERT | Diagnosis |
|
| XGBoost | All fields | XGBoost | Diagnosis |
|
| BERT | All fields | SVM | All fields |
|
| BERT | Diagnosis | SVM | Diagnosis |
|
| SVM | All fields | SVM | Diagnosis |
|
Additional performance statistics: First three numerical columns: Averaged sensitivity and specificity across the XGBoost and BERT algorithms to denote overall predictive performance for each CPT code; Average Youden calculated from the sensitivity and specificity; Final three numerical columns: Changes in sensitivity, specificity, and Youden when utilizing all report subfields versus the diagnostic text alone
| Code | Average sensitivity | Average specificity | Average Youden | ΔSensitivity | ΔSpecificity | Δ Youden |
|---|---|---|---|---|---|---|
| 85060 | 1.00 | 1.00 | 0.99 | 0.00 | 0.00 | 0.00 |
| 85097 | 0.99 | 0.99 | 0.99 | 0.01 | 0.01 | 0.02 |
| 87491 | 0.99 | 0.99 | 0.97 | 0.02 | 0.02 | 0.04 |
| 87591 | 0.99 | 0.99 | 0.98 | 0.00 | 0.02 | 0.02 |
| 87624 | 0.98 | 0.99 | 0.97 | 0.00 | 0.00 | 0.00 |
| 88108 | 0.95 | 0.97 | 0.92 | 0.08 | 0.04 | 0.12 |
| 88112 | 0.99 | 0.98 | 0.97 | 0.01 | 0.02 | 0.04 |
| 88141 | 1.00 | 1.00 | 1.00 | 0.00 | 0.00 | 0.00 |
| 88142 | 0.98 | 0.94 | 0.92 | 0.01 | 0.06 | 0.07 |
| 88172 | 0.95 | 0.97 | 0.92 | 0.08 | 0.04 | 0.12 |
| 88173 | 1.00 | 0.98 | 0.98 | 0.00 | 0.03 | 0.03 |
| 88175 | 0.98 | 0.99 | 0.97 | 0.00 | 0.00 | 0.00 |
| 88177 | 0.95 | 0.97 | 0.93 | 0.09 | 0.05 | 0.14 |
| 88184 | 0.93 | 0.95 | 0.88 | 0.07 | 0.04 | 0.11 |
| 88185 | 0.90 | 0.94 | 0.84 | 0.14 | 0.05 | 0.19 |
| 88188 | 0.93 | 0.92 | 0.85 | 0.06 | 0.05 | 0.11 |
| 88189 | 0.89 | 0.88 | 0.77 | 0.12 | 0.16 | 0.28 |
| 88271 | 0.95 | 0.96 | 0.91 | 0.04 | 0.04 | 0.07 |
| 88274 | 0.97 | 0.97 | 0.94 | 0.02 | 0.04 | 0.06 |
| 88300 | 0.99 | 0.99 | 0.98 | 0.00 | 0.00 | 0.00 |
| 88302 | 0.94 | 0.97 | 0.91 | 0.02 | 0.00 | 0.02 |
| 88304 | 0.96 | 0.96 | 0.92 | 0.00 | 0.00 | 0.01 |
| 88305 | 0.94 | 0.92 | 0.86 | 0.02 | 0.03 | 0.04 |
| 88307 | 0.97 | 0.97 | 0.94 | 0.01 | 0.00 | 0.01 |
| 88309 | 0.97 | 0.97 | 0.94 | 0.00 | 0.00 | 0.01 |
| 88311 | 0.98 | 0.98 | 0.97 | 0.02 | 0.01 | 0.02 |
| 88312 | 0.92 | 0.93 | 0.85 | 0.07 | 0.07 | 0.14 |
| 88313 | 0.92 | 0.93 | 0.86 | 0.06 | 0.05 | 0.12 |
| 88321 | 0.98 | 0.97 | 0.96 | 0.03 | 0.04 | 0.07 |
| 88331 | 0.96 | 0.97 | 0.93 | 0.05 | 0.04 | 0.09 |
| 88332 | 0.95 | 0.95 | 0.90 | 0.04 | 0.03 | 0.07 |
| 88333 | 0.98 | 0.98 | 0.95 | 0.03 | 0.03 | 0.06 |
| 88341 | 0.89 | 0.88 | 0.78 | 0.09 | 0.09 | 0.18 |
| 88342 | 0.92 | 0.91 | 0.83 | 0.11 | 0.12 | 0.23 |
| 88344 | 0.95 | 0.97 | 0.92 | 0.02 | 0.02 | 0.04 |
| 88346 | 0.98 | 0.97 | 0.95 | 0.03 | 0.04 | 0.08 |
| 88350 | 0.95 | 0.98 | 0.93 | 0.10 | 0.03 | 0.13 |
| 88360 | 0.94 | 0.95 | 0.89 | 0.04 | 0.03 | 0.08 |
Classification reports for pathologist prediction models (BERT, XGBoost, SVM) for reported subfields (diagnostic/all fields)
| BERT | |||||||
|---|---|---|---|---|---|---|---|
|
| |||||||
| Diagnosis | All fields | ||||||
|
|
| ||||||
| Pathologist | Precision | Recall | F1-Score | Pathologist | Precision | Recall | F1-Score |
| 1 | 0.94 | 0.94 | 0.94 | 1 | 0.95 | 0.94 | 0.94 |
| 2 | 0.49 | 0.82 | 0.61 | 2 | 0.61 | 0.84 | 0.70 |
| 3 | 0.94 | 0.86 | 0.89 | 3 | 0.99 | 0.98 | 0.98 |
| 4 | 0.77 | 0.76 | 0.77 | 4 | 0.81 | 0.81 | 0.81 |
| 5 | 0.80 | 0.85 | 0.82 | 5 | 0.88 | 0.88 | 0.88 |
| 6 | 0.93 | 0.98 | 0.95 | 6 | 0.96 | 0.96 | 0.96 |
| 7 | 0.81 | 0.82 | 0.81 | 7 | 0.87 | 0.87 | 0.87 |
| 8 | 0.36 | 0.91 | 0.51 | 8 | 0.41 | 0.80 | 0.55 |
| 9 | 0.86 | 0.78 | 0.82 | 9 | 0.86 | 0.80 | 0.83 |
| 10 | 0.78 | 0.61 | 0.69 | 10 | 0.74 | 0.68 | 0.71 |
| 11 | 0.67 | 0.71 | 0.69 | 11 | 0.71 | 0.73 | 0.72 |
| 12 | 0.84 | 0.77 | 0.80 | 12 | 0.87 | 0.83 | 0.85 |
| 13 | 0.80 | 0.91 | 0.85 | 13 | 0.86 | 0.91 | 0.88 |
| 14 | 0.72 | 0.74 | 0.73 | 14 | 0.83 | 0.85 | 0.84 |
| 15 | 0.83 | 0.74 | 0.78 | 15 | 0.84 | 0.83 | 0.83 |
| 16 | 0.56 | 0.25 | 0.34 | 16 | 0.54 | 0.35 | 0.42 |
| 17 | 0.89 | 0.96 | 0.93 | 17 | 0.93 | 0.96 | 0.94 |
| 18 | 0.58 | 0.14 | 0.22 | 18 | 0.45 | 0.27 | 0.34 |
| 19 | 0.71 | 0.72 | 0.71 | 19 | 0.71 | 0.74 | 0.72 |
| 20 | 0.84 | 0.39 | 0.53 | 20 | 0.74 | 0.43 | 0.54 |
| Accuracy | 0.74 | 0.74 | 0.74 | Accuracy | 0.79 | 0.79 | 0.79 |
| Macro Avg | 0.76 | 0.73 | 0.72 | Macro Avg | 0.78 | 0.77 | 0.77 |
| Weighted Avg | 0.77 | 0.74 | 0.74 | Weighted Avg | 0.80 | 0.79 | 0.79 |
| XGBoost | |||||||
| Diagnosis | All fields | ||||||
| Pathologist | Precision | Recall | F1-Score | Pathologist | Precision | Recall | F1-Score |
| 1 | 0.92 | 0.88 | 0.90 | 1 | 0.94 | 0.89 | 0.91 |
| 2 | 0.67 | 0.66 | 0.67 | 2 | 0.68 | 0.76 | 0.72 |
| 3 | 0.90 | 0.85 | 0.88 | 3 | 1.00 | 1.00 | 1.00 |
| 4 | 0.81 | 0.76 | 0.78 | 4 | 0.80 | 0.83 | 0.81 |
| 5 | 0.74 | 0.89 | 0.81 | 5 | 0.86 | 0.91 | 0.88 |
| 6 | 0.94 | 0.98 | 0.96 | 6 | 0.97 | 0.98 | 0.97 |
| 7 | 0.88 | 0.77 | 0.82 | 7 | 0.92 | 0.88 | 0.90 |
| 8 | 0.36 | 0.87 | 0.51 | 8 | 0.51 | 0.73 | 0.60 |
| 9 | 0.72 | 0.88 | 0.79 | 9 | 0.79 | 0.86 | 0.82 |
| 10 | 0.80 | 0.62 | 0.70 | 10 | 0.76 | 0.67 | 0.72 |
| 11 | 0.75 | 0.77 | 0.76 | 11 | 0.78 | 0.76 | 0.77 |
| 12 | 0.73 | 0.81 | 0.77 | 12 | 0.79 | 0.87 | 0.83 |
| 13 | 0.83 | 0.76 | 0.79 | 13 | 0.92 | 0.87 | 0.90 |
| 14 | 0.78 | 0.68 | 0.73 | 14 | 0.91 | 0.83 | 0.87 |
| 15 | 0.75 | 0.73 | 0.74 | 15 | 0.83 | 0.82 | 0.82 |
| 16 | 0.50 | 0.32 | 0.39 | 16 | 0.56 | 0.47 | 0.51 |
| 17 | 0.69 | 0.52 | 0.59 | 17 | 0.88 | 0.76 | 0.81 |
| 18 | 0.69 | 0.21 | 0.32 | 18 | 0.58 | 0.42 | 0.48 |
| 19 | 0.71 | 0.74 | 0.72 | 19 | 0.71 | 0.75 | 0.73 |
| 20 | 0.83 | 0.38 | 0.53 | 20 | 0.70 | 0.51 | 0.59 |
| Accuracy | 0.73 | 0.73 | 0.73 | Accuracy | 0.80 | 0.80 | 0.80 |
| Macro Avg | 0.75 | 0.70 | 0.71 | Macro Avg | 0.79 | 0.78 | 0.78 |
| Weighted Avg | 0.75 | 0.73 | 0.72 | Weighted Avg | 0.80 | 0.80 | 0.80 |
| SVM | |||||||
| Diagnosis | All fields | ||||||
| Pathologist | Precision | Recall | F1-Score | Pathologist | Precision | Recall | F1-Score |
| 1 | 0.59 | 0.62 | 0.60 | 1 | 0.45 | 0.50 | 0.47 |
| 2 | 0.38 | 0.36 | 0.37 | 2 | 0.10 | 0.00 | 0.00 |
| 3 | 0.56 | 0.57 | 0.56 | 3 | 0.86 | 0.84 | 0.85 |
| 4 | 0.33 | 0.16 | 0.22 | 4 | 0.20 | 0.18 | 0.19 |
| 5 | 0.39 | 0.52 | 0.44 | 5 | 0.24 | 0.73 | 0.36 |
| 6 | 0.36 | 0.56 | 0.44 | 6 | 0.34 | 0.65 | 0.45 |
| 7 | 0.09 | 0.04 | 0.05 | 7 | 0.00 | 0.00 | 0.00 |
| 8 | 0.36 | 0.80 | 0.49 | 8 | 0.34 | 0.92 | 0.49 |
| 9 | 0.49 | 0.67 | 0.57 | 9 | 0.28 | 0.79 | 0.41 |
| 10 | 0.34 | 0.09 | 0.14 | 10 | 0.18 | 0.05 | 0.07 |
| 11 | 0.44 | 0.32 | 0.38 | 11 | 0.23 | 0.32 | 0.26 |
| 12 | 0.24 | 0.36 | 0.29 | 12 | 0.21 | 0.24 | 0.23 |
| 13 | 0.00 | 0.00 | 0.00 | 13 | 0.00 | 0.00 | 0.00 |
| 14 | 0.00 | 0.00 | 0.00 | 14 | 0.00 | 0.00 | 0.00 |
| 15 | 0.23 | 0.42 | 0.30 | 15 | 0.26 | 0.11 | 0.16 |
| 16 | 0.30 | 0.18 | 0.23 | 16 | 0.28 | 0.04 | 0.07 |
| 17 | 0.00 | 0.00 | 0.00 | 17 | 0.00 | 0.00 | 0.00 |
| 18 | 0.06 | 0.02 | 0.02 | 18 | 0.18 | 0.03 | 0.05 |
| 19 | 0.32 | 0.49 | 0.38 | 19 | 0.00 | 0.00 | 0.00 |
| 20 | 0.07 | 0.02 | 0.03 | 20 | 0.11 | 0.00 | 0.00 |
| Accuracy | 0.35 | 0.35 | 0.35 | Accuracy | 0.32 | 0.32 | 0.32 |
| Macro Avg | 0.28 | 0.31 | 0.28 | Macro Avg | 0.21 | 0.27 | 0.20 |
| Weighted Avg | 0.29 | 0.35 | 0.30 | Weighted Avg | 0.24 | 0.32 | 0.24 |
Figure 5SHAP interpretation of XGBoost predictions: Word clouds demonstrating words found to be important using the XGBoost algorithm (All-Fields) for the prediction of primary CPT codes, found via shapley attribution; important words pertinent to each CPT code indicated by the relative size of the word in the word cloud; word clouds visualized for word importance (A) across all five primary CPT codes and (B–F) for the following CPT codes: (B) CPT code 88302; (C) CPT code 88304; (D) CPT code 88305; (E) CPT code 88307; and (F) CPT code 88309; note that the size of the word considers strength but not directionality of the relationship with the code, which may be negatively associated in some cases
SHAP coefficients depicting relationships between the top 30 words that distinguish the primary CPT codes and their related CPT code: Positive value indicates positive association, whereas negative value indicates negative association between word and code; top codes determined by summing absolute SHAP value across CPT codes and test cohort
| 88302 | 88304 | 88305 | 88307 | 88309 | |
|---|---|---|---|---|---|
| Myocyte | 2.337 | ||||
| Excision pilomatricoma | -1.537 | 0.006 | |||
| Endocervical | -1.515 | 0.001 | 0.0 | ||
| Ureter fresh | 1.313 | ||||
| Left ankle | 0.45 | -0.813 | |||
| Products conception | 0.029 | -1.161 | |||
| Biopsy | -0.376 | 0.054 | 0.466 | 0.235 | 0.045 |
| Specimen cm | -1.059 | 0.108 | |||
| Mesh | 0.201 | -0.001 | -0.929 | ||
| Spleen | 0.0 | -1.085 | |||
| Diagnosis skin | 0.025 | 0.006 | 0.168 | -0.836 | |
| Reduction | 1.081 | ||||
| Termination | 0.234 | -0.846 | |||
| Toe clinical | -1.044 | ||||
| Mucocele | 1.013 | -0.001 | |||
| Hemorrhoid | 0.818 | -0.177 | |||
| Fixative pilonidal | 0.958 | 0.0 | |||
| Valve | -0.945 | 0.004 | |||
| Irregular | 0.217 | 0.684 | -0.048 | -0.003 | |
| Representative | 0.032 | 0.259 | 0.484 | 0.015 | 0.148 |
| Metatarsal resection | 0.937 | ||||
| Submitted skin | -0.69 | 0.064 | -0.044 | -0.118 | |
| Angioleiomyoma | -0.358 | 0.54 | |||
| Ovary serous | 0.897 | ||||
| Foreskin clinical | 0.879 | ||||
| Capsule excision | 0.878 | ||||
| Dcagnosis fibroma | 0.874 | ||||
| Transected | 0.658 | 0.046 | -0.159 | ||
| Mass provided | -0.756 | 0.083 | |||
| Excision suggestive | -0.819 |
Figure 6Embedding and Interpretation of BERT Predictions: (A, C, and E) UMAP projection of All-Fields BERT embedding vectors after applying the attention mechanism across report subfields; each point is reported with information aggregated from all report subfields; (B, D, and F) Select diagnostic text from individual reports interpreted by Integrated Gradients to elucidate words positively and negatively associated with calling the CPT code; Integrated Gradients was performed on the diagnostic text BERT models; Utilized CPT codes: (A and B) CPT code 88307, (C and D) CPT code 88342, and (E and F) CPT code 88360
Figure 7BERT Diagnostic Model Self-Attention: Output of self-attention maps for select self-attention heads/layers from the BERT diagnostic text model visualizes various layers of complex word-to-word relationships for the assessment of a select pathology report that was found to report CPT code 88307
Figure 8BERT All-Fields Model Interpretation: Visualization of importance scores assigned to pathology report subfields outside of the diagnostic section for three separate pathology reports (A–C) that were assigned by raters CPT code 88360; information from report subfields that appear more red was utilized more by the model for the final prediction of the code; attention scores listed below the text from the subfields and title of each subfield supplied
Sensitivity/specificity for each algorithm/report subfield(s), averaged across cross-validation folds for each CPT code after optimization of Youden’s index to select the sensitivity/specificity
| BERT | XGBoost | SVM | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
| ||||||||||
| Diagnosis | All fields | Diagnosis | All fields | Diagnosis | All fields | |||||||
|
|
|
|
|
|
| |||||||
| code | Sensitivity | Specificity | Sensitivity | Specificity | Sensitivity | Specificity | Sensitivity | Specificity | Sensitivity | Specificity | Sensitivity | Specificity |
| 85060 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 85097 | 0.98 | 0.98 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.99 | 1.00 |
| 87491 | 0.96 | 0.98 | 0.99 | 1.00 | 0.99 | 0.98 | 1.00 | 1.00 | 0.99 | 0.98 | 0.99 | 0.97 |
| 87591 | 0.99 | 0.98 | 0.99 | 1.00 | 0.99 | 0.98 | 1.00 | 1.00 | 0.99 | 0.98 | 0.99 | 0.97 |
| 87624 | 0.98 | 0.99 | 0.98 | 0.99 | 0.98 | 0.99 | 0.98 | 0.99 | 0.98 | 0.97 | 0.97 | 0.98 |
| 88108 | 0.84 | 0.95 | 0.99 | 0.99 | 0.99 | 0.95 | 0.99 | 1.00 | 0.99 | 0.95 | 1.00 | 0.99 |
| 88112 | 0.97 | 0.96 | 0.99 | 0.99 | 0.99 | 0.97 | 1.00 | 0.99 | 0.99 | 0.97 | 0.99 | 0.99 |
| 88141 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.99 | 0.99 |
| 88142 | 0.95 | 0.89 | 0.99 | 0.97 | 0.99 | 0.93 | 0.97 | 0.97 | 0.99 | 0.93 | 0.94 | 0.95 |
| 88172 | 0.81 | 0.96 | 0.99 | 0.99 | 1.00 | 0.95 | 0.99 | 0.99 | 1.00 | 0.95 | 0.99 | 0.98 |
| 88173 | 1.00 | 0.97 | 1.00 | 0.99 | 1.00 | 0.97 | 1.00 | 0.99 | 0.99 | 0.97 | 0.99 | 0.99 |
| 88175 | 0.98 | 0.99 | 0.98 | 0.99 | 0.98 | 0.99 | 0.98 | 0.99 | 0.98 | 1.00 | 0.97 | 0.98 |
| 88177 | 0.83 | 0.95 | 1.00 | 0.99 | 0.99 | 0.95 | 1.00 | 1.00 | 0.98 | 0.95 | 1.00 | 0.99 |
| 88184 | 0.85 | 0.92 | 0.95 | 0.96 | 0.94 | 0.93 | 0.98 | 0.98 | 0.96 | 0.93 | 0.96 | 0.94 |
| 88185 | 0.71 | 0.90 | 0.95 | 0.95 | 0.94 | 0.93 | 0.98 | 0.97 | 0.96 | 0.92 | 0.95 | 0.95 |
| 88188 | 0.87 | 0.88 | 0.95 | 0.94 | 0.93 | 0.91 | 0.96 | 0.96 | 0.96 | 0.93 | 0.92 | 0.93 |
| 88189 | 0.77 | 0.73 | 0.94 | 0.95 | 0.88 | 0.88 | 0.95 | 0.97 | 0.95 | 0.95 | 0.92 | 0.95 |
| 88271 | 0.92 | 0.94 | 0.95 | 0.97 | 0.94 | 0.95 | 0.98 | 0.99 | 0.96 | 0.96 | 0.97 | 0.98 |
| 88274 | 0.96 | 0.94 | 0.97 | 0.99 | 0.95 | 0.95 | 0.99 | 0.99 | 0.97 | 0.97 | 0.98 | 0.99 |
| 88300 | 0.99 | 0.99 | 0.98 | 1.00 | 0.99 | 0.99 | 0.99 | 0.99 | 0.97 | 0.98 | 0.96 | 0.98 |
| 88302 | 0.90 | 0.98 | 0.94 | 0.96 | 0.96 | 0.97 | 0.96 | 0.97 | 0.95 | 0.93 | 0.93 | 0.92 |
| 88304 | 0.95 | 0.96 | 0.96 | 0.96 | 0.96 | 0.95 | 0.96 | 0.96 | 0.93 | 0.93 | 0.92 | 0.92 |
| 88305 | 0.94 | 0.90 | 0.96 | 0.92 | 0.93 | 0.91 | 0.95 | 0.95 | 0.20 | 0.68 | 0.18 | 0.70 |
| 88307 | 0.97 | 0.97 | 0.97 | 0.96 | 0.97 | 0.96 | 0.98 | 0.97 | 0.94 | 0.94 | 0.95 | 0.93 |
| 88309 | 0.96 | 0.97 | 0.96 | 0.97 | 0.97 | 0.97 | 0.98 | 0.98 | 0.94 | 0.95 | 0.96 | 0.95 |
| 88311 | 0.97 | 0.98 | 0.98 | 0.98 | 0.98 | 0.98 | 0.99 | 0.99 | 0.89 | 0.95 | 0.97 | 0.95 |
| 88312 | 0.84 | 0.88 | 0.93 | 0.94 | 0.93 | 0.92 | 0.98 | 0.98 | 0.85 | 0.86 | 0.84 | 0.84 |
| 88313 | 0.89 | 0.90 | 0.94 | 0.95 | 0.89 | 0.91 | 0.97 | 0.97 | 0.87 | 0.89 | 0.87 | 0.90 |
| 88321 | 0.98 | 0.95 | 0.99 | 0.99 | 0.96 | 0.95 | 1.00 | 1.00 | 0.91 | 0.91 | 0.99 | 0.99 |
| 88331 | 0.93 | 0.94 | 0.98 | 0.98 | 0.94 | 0.96 | 1.00 | 1.00 | 0.92 | 0.93 | 0.94 | 0.92 |
| 88332 | 0.92 | 0.93 | 0.96 | 0.95 | 0.94 | 0.94 | 0.99 | 0.99 | 0.87 | 0.95 | 0.95 | 0.96 |
| 88333 | 0.95 | 0.95 | 0.98 | 0.99 | 0.97 | 0.96 | 1.00 | 1.00 | 0.98 | 0.96 | 0.97 | 0.99 |
| 88341 | 0.84 | 0.83 | 0.91 | 0.91 | 0.85 | 0.85 | 0.96 | 0.95 | 0.80 | 0.80 | 0.90 | 0.89 |
| 88342 | 0.86 | 0.85 | 0.97 | 0.96 | 0.86 | 0.84 | 0.98 | 0.97 | 0.80 | 0.77 | 0.90 | 0.93 |
| 88344 | 0.95 | 0.97 | 0.96 | 0.97 | 0.94 | 0.94 | 0.97 | 0.98 | 0.95 | 0.97 | 0.94 | 0.98 |
| 88346 | 0.92 | 0.91 | 0.99 | 0.99 | 0.99 | 1.00 | 1.00 | 1.00 | 0.98 | 0.97 | 0.98 | 1.00 |
| 88350 | 0.80 | 0.94 | 1.00 | 1.00 | 0.99 | 1.00 | 1.00 | 1.00 | 0.97 | 0.98 | 0.99 | 1.00 |
| 88360 | 0.91 | 0.93 | 0.95 | 0.96 | 0.93 | 0.93 | 0.97 | 0.97 | 0.92 | 0.94 | 0.94 | 0.94 |
Confidence intervals of 1000-sample nonparametric bootstrap of area under the receiver operating characteristic curve for each algorithm (BERT, XGBoost and SVM) and for each report type (Diagnosis and All-Fields); each AUC was averaged across the 5 cross-validation folds with the same random seed set for sampling values within each CV fold for each code/group of pathologists; ancillary CPT code and descriptions of codes listed on the left, in addition to the weighted AUC across 20 pathologists
| AUCs ( ± SE) | |||||||
|---|---|---|---|---|---|---|---|
|
| |||||||
| BERT | XGBoost | SVM | |||||
|
|
|
| |||||
| Code | Description | Diagnosis | All-Fields | Diagnosis | All-Fields | Diagnosis | All-Fields |
| 85060 |
| 0.998 ± 0.0002 | 0.9994 ± 0.0001 | 0.9989 ± 0.0002 | 0.9996 ± 0.0001 | 0.9983 ± 0.0002 | 0.9968 ± 0.0012 |
| 85097 |
| 0.9996 ± 0.0001 | 0.9994 ± 0.0001 | 0.9989 ± 0.0005 | 0.9997 ± 0.0 | 0.9985 ± 0.0001 | 0.9941 ± 0.0014 |
| 87491 |
| 0.9905 ± 0.0008 | 0.9984 ± 0.0008 | 0.9898 ± 0.001 | 0.9996 ± 0.0002 | 0.9872 ± 0.0013 | 0.9819 ± 0.0042 |
| 87591 |
| 0.9905 ± 0.0008 | 0.9994 ± 0.0001 | 0.9898 ± 0.001 | 0.9996 ± 0.0002 | 0.9872 ± 0.0013 | 0.9819 ± 0.0042 |
| 87624 |
| 0.9968 ± 0.0006 | 0.9973 ± 0.0003 | 0.9958 ± 0.0004 | 0.9984 ± 0.0002 | 0.9778 ± 0.0017 | 0.988 ± 0.0016 |
| 88108 |
| 0.9802 ± 0.0017 | 0.999 ± 0.0003 | 0.9808 ± 0.0008 | 0.9975 ± 0.0015 | 0.9717 ± 0.0026 | 0.9989 ± 0.0001 |
| 88112 |
| 0.9934 ± 0.0005 | 0.9991 ± 0.0001 | 0.9935 ± 0.0002 | 0.9995 ± 0.0 | 0.9887 ± 0.0004 | 0.9959 ± 0.0008 |
| 88141 |
| 1.0 ± 0.0 | 0.9998 ± 0.0001 | 0.9996 ± 0.0001 | 0.9999 ± 0.0 | 0.9988 ± 0.0004 | 0.9923 ± 0.0014 |
| 88142 |
| 0.9886 ± 0.0017 | 0.9938 ± 0.0016 | 0.9826 ± 0.0017 | 0.9951 ± 0.0018 | 0.9663 ± 0.0018 | 0.9501 ± 0.0131 |
| 88172 |
| 0.9825 ± 0.0011 | 0.999 ± 0.0002 | 0.9837 ± 0.0011 | 0.999 ± 0.0006 | 0.9749 ± 0.0015 | 0.9903 ± 0.001 |
| 88173 |
| 0.9867 ± 0.0024 | 0.9988 ± 0.0002 | 0.9899 ± 0.0005 | 0.9996 ± 0.0 | 0.9818 ± 0.001 | 0.997 ± 0.0006 |
| 88175 |
| 0.998 ± 0.0005 | 0.9976 ± 0.0003 | 0.9972 ± 0.0003 | 0.9981 ± 0.0003 | 0.9932 ± 0.0009 | 0.9847 ± 0.002 |
| 88177 |
| 0.9774 ± 0.0023 | 0.9993 ± 0.0001 | 0.9783 ± 0.0031 | 0.9998 ± 0.0 | 0.9624 ± 0.0044 | 0.9955 ± 0.0003 |
| 88184 |
| 0.9731 ± 0.0082 | 0.9848 ± 0.0022 | 0.9738 ± 0.0025 | 0.9942 ± 0.0012 | 0.9699 ± 0.0029 | 0.9708 ± 0.0033 |
| 88185 |
| 0.9629 ± 0.0075 | 0.9841 ± 0.0022 | 0.9711 ± 0.0027 | 0.994 ± 0.0008 | 0.9594 ± 0.003 | 0.9692 ± 0.0034 |
| 88188 |
| 0.9428 ± 0.0121 | 0.9773 ± 0.0029 | 0.9589 ± 0.0041 | 0.9875 ± 0.0024 | 0.9593 ± 0.0041 | 0.9486 ± 0.0029 |
| 88189 |
| 0.9043 ± 0.0295 | 0.9753 ± 0.0052 | 0.9199 ± 0.0101 | 0.9785 ± 0.0073 | 0.9611 ± 0.0074 | 0.9471 ± 0.0118 |
| 88271 |
| 0.9943 ± 0.002 | 0.9906 ± 0.0025 | 0.9735 ± 0.0055 | 0.995 ± 0.0024 | 0.9717 ± 0.0062 | 0.9768 ± 0.0061 |
| 88274 |
| 0.9951 ± 0.0011 | 0.9943 ± 0.003 | 0.9755 ± 0.0059 | 0.9941 ± 0.0036 | 0.9775 ± 0.0058 | 0.9922 ± 0.0029 |
| 88300 |
| 0.9983 ± 0.0011 | 0.9969 ± 0.0008 | 0.9967 ± 0.0012 | 0.9978 ± 0.0009 | 0.9846 ± 0.0025 | 0.9868 ± 0.0023 |
| 88302 |
| 0.9768 ± 0.0083 | 0.9824 ± 0.0036 | 0.9887 ± 0.0028 | 0.9934 ± 0.0019 | 0.9581 ± 0.0047 | 0.9643 ± 0.0042 |
| 88304 |
| 0.991 ± 0.0011 | 0.9877 ± 0.0007 | 0.987 ± 0.0009 | 0.9907 ± 0.0006 | 0.9534 ± 0.0019 | 0.9509 ± 0.0021 |
| 88305 |
| 0.9726 ± 0.0012 | 0.9775 ± 0.0005 | 0.97 ± 0.0006 | 0.9889 ± 0.0003 | 0.1087 ± 0.0012 | 0.0807 ± 0.001 |
| 88307 |
| 0.9942 ± 0.0006 | 0.9928 ± 0.0004 | 0.9925 ± 0.0004 | 0.995 ± 0.0003 | 0.9614 ± 0.0015 | 0.968 ± 0.0013 |
| 88309 |
| 0.9966 ± 0.0009 | 0.9885 ± 0.0021 | 0.9949 ± 0.0008 | 0.9967 ± 0.0007 | 0.9608 ± 0.0034 | 0.9777 ± 0.0022 |
| 88311 |
| 0.9906 ± 0.0033 | 0.9972 ± 0.0003 | 0.9943 ± 0.0009 | 0.9991 ± 0.0002 | 0.9316 ± 0.0035 | 0.9741 ± 0.0019 |
| 88312 |
| 0.9766 ± 0.0025 | 0.9792 ± 0.0012 | 0.9692 ± 0.0017 | 0.9972 ± 0.0004 | 0.8974 ± 0.0038 | 0.9063 ± 0.0031 |
| 88313 |
| 0.9577 ± 0.0065 | 0.9854 ± 0.0013 | 0.9619 ± 0.0023 | 0.9953 ± 0.0006 | 0.9163 ± 0.0039 | 0.9234 ± 0.0036 |
| 88321 |
| 0.9945 ± 0.0007 | 0.998 ± 0.0007 | 0.9889 ± 0.001 | 0.9994 ± 0.0001 | 0.9483 ± 0.0033 | 0.9931 ± 0.0013 |
| 88331 |
| 0.949 ± 0.0135 | 0.9958 ± 0.0012 | 0.9834 ± 0.0019 | 0.9996 ± 0.0002 | 0.9465 ± 0.0044 | 0.9592 ± 0.0024 |
| 88332 |
| 0.8971 ± 0.0485 | 0.9821 ± 0.0063 | 0.974 ± 0.0059 | 0.9972 ± 0.0008 | 0.9077 ± 0.0186 | 0.9666 ± 0.0084 |
| 88333 |
| 0.9924 ± 0.0011 | 0.9963 ± 0.0018 | 0.9883 ± 0.0027 | 0.999 ± 0.0008 | 0.9827 ± 0.0021 | 0.979 ± 0.0076 |
| 88341 |
| 0.9353 ± 0.0034 | 0.96 ± 0.0012 | 0.9273 ± 0.0017 | 0.9901 ± 0.0004 | 0.8514 ± 0.0031 | 0.9262 ± 0.0022 |
| 88342 |
| 0.9384 ± 0.0024 | 0.9925 ± 0.0003 | 0.9319 ± 0.0011 | 0.9955 ± 0.0002 | 0.8404 ± 0.0021 | 0.9471 ± 0.0015 |
| 88344 |
| 0.9833 ± 0.0117 | 0.9824 ± 0.0075 | 0.9747 ± 0.0061 | 0.9942 ± 0.0028 | 0.9664 ± 0.0091 | 0.9627 ± 0.0091 |
| 88346 |
| 0.9971 ± 0.0028 | 0.9972 ± 0.0018 | 0.9966 ± 0.0026 | 0.9977 ± 0.0023 | 0.987 ± 0.0045 | 0.989 ± 0.005 |
| 88350 |
| 0.9999 ± 0.0001 | 0.9998 ± 0.0 | 0.9993 ± 0.0004 | 0.9999 ± 0.0 | 0.9852 ± 0.0048 | 0.9933 ± 0.0037 |
| 88360 |
| 0.7182 ± 0.0282 | 0.9853 ± 0.0022 | 0.9761 ± 0.0027 | 0.9944 ± 0.0013 | 0.9578 ± 0.0042 | 0.9564 ± 0.0048 |
|
| 0.984 ± 0.0002 | 0.9877 ± 0.0002 | 0.9823 ± 0.0002 | 0.99 ± 0.0001 | 0.3778 ± 0.0007 | 0.3726 ± 0.0007 | |