Literature DB >> 32755460

Web Application for the Automated Extraction of Diagnosis and Site From Pathology Reports for Keratinocyte Cancers.

Bridie S Thompson¹, Sam Hardy², Nirmala Pandeya^1,3, Jean Claude Dusingize¹, Adele C Green^1,4, Athon Millane³, Daniel Bourke⁵, Ronald Grande⁵, Cameron D Bean⁵, Catherine M Olsen^1,6, David C Whiteman^1,6.

Abstract

PURPOSE: Keratinocyte cancers are exceedingly common in high-risk populations, but accurate measures of incidence are seldom derived because the burden of manually reviewing pathology reports to extract relevant diagnostic information is excessive. Thus, we sought to develop supervised learning algorithms for classifying basal and squamous cell carcinomas and other diagnoses, as well as disease site, and incorporate these into a Web application capable of processing large numbers of pathology reports.
METHODS: Participants in the QSkin study were recruited in 2011 and comprised men and women age 40-69 years at baseline (N = 43,794) who were randomly selected from a population register in Queensland, Australia. Histologic data were manually extracted from free-text pathology reports for participants with histologically confirmed keratinocyte cancers for whom a pathology report was available (n = 25,786 reports). This provided a training data set for the development of algorithms capable of deriving diagnosis and site from free-text pathology reports. We calculated agreement statistics between algorithm-derived classifications and 3 independent validation data sets of manually abstracted pathology reports.
RESULTS: The agreement for classifications of basal cell carcinoma (κ = 0.97 and κ = 0.96) and squamous cell carcinoma (κ = 0.93 for both) was almost perfect in 2 validation data sets but was slightly lower for a third (κ = 0.82 and κ = 0.90, respectively). Agreement for total counts of specific diagnoses was also high (κ > 0.8). Similar levels of agreement between algorithm-derived and manually extracted data were observed for classifications of keratoacanthoma and intraepidermal carcinoma.
CONCLUSION: Supervised learning methods were used to develop a Web application capable of accurately and rapidly classifying large numbers of pathology reports for keratinocyte cancers and related diagnoses. Such tools may provide the means to accurately measure subtype-specific skin cancer incidence.

Entities: Disease Gene Species

Year: 2020 PMID： 32755460 PMCID： PMC7469600 DOI： 10.1200/CCI.19.00152

Source DB: PubMed Journal: JCO Clin Cancer Inform ISSN： 2473-4276

INTRODUCTION

Among fair-skinned populations, keratinocyte cancers are more numerous than any other cancer type.[1] Because of volume and limited resources, keratinocyte cancers are either excluded from cancer registration[1,2] or registration is limited to the first incident basal cell carcinoma (BCC) or squamous cell carcinoma (SCC) for each person.[3] Incidence estimates and population trends are typically derived from administrative data sets of treatment information that do not discriminate between subtypes.[1,4] This is a major restriction to the optimal allocation of health resources. Key Objective To determine the accuracy of a supervised learning algorithm for the automated extraction of key diagnostic information about keratinocyte cancers from free-text pathology reports. Knowledge Generated Validated against manually extracted reports, the algorithm classified basal- and squamous cell carcinomas with nearly perfect accuracy in two dataset (kappa > 0.92) and with very high accuracy in a third dataset of complex reports (kappa 0.82-0.90). Relevance In the absence of population-based registration, this supervised learning algorithm can efficiently process large numbers of pathology reports, permitting users to accurately estimate subtype-specific keratinocyte cancer incidence. Such measures are essential for health care planning. Pathology reports provide information on definitive diagnosis of keratinocyte cancers. Skin cancer pathology is usually reported in a free-text format, and reports often include histologic assessments for multiple lesions. Histology of skin lesions can be complex; a single lesion may show characteristics of more than one diagnosis. Interpretation and data extraction from pathology reports for skin cancers are therefore time consuming and require high-level ability to codify complex clinical information. Automated encoding of data from free-text pathology reports has been recognized as a useful tool to identify new cancer diagnoses and for cancer registration.[5-7] A variety of machine learning methods have been used to reliably and accurately extract information from free-text pathology reports and from clinical narratives for cancers.[8] At least one study has used natural language processing to identify keratinocyte cancers from pathology reports, although that algorithm did not extract diagnosis or site details.[9] Globally, Australia experiences the highest incidence rates of skin cancers,[10] and Queensland experiences the highest rates of skin cancers within Australia.[11] The QSkin study is a large, population-based, longitudinal study of residents of Queensland, Australia. Large numbers of pathology reports for skin cancers from study participants provided an opportunity to investigate the automated extraction from free-text pathology reports. The ability to automatically process free-text pathology reports on a large scale has the potential for accurately tracking the incidence of keratinocyte skin cancers in various clinical settings, including hospitals and cancer registries. Here, we describe the development and validation of a Web application that uses supervised learning methods to automatically classify BCC, SCC, and related diagnoses from free-text pathology reports.

METHODS

We obtained pathology reports from participants of the QSkin study who had a skin cancer excised between recruitment in 2011 and June 30, 2014.[12] Details of the QSkin study have been described previously.[13] Medically trained staff reviewed each pathology report (n = 25,786 reports) and entered diagnostic information for each lesion into a database (n = 41,356 lesions). This manually extracted data set was considered the gold standard and provided the training data set to develop the supervised learning algorithm. After data cleaning and exclusion of diagnoses with insufficient examples, there were 36,281 lesions in the final data set. Supervised machine learning algorithms are developed using training data sets (typically numbering in the thousands of independent records) that contain the variables along with the relevant outcomes. A machine learning algorithm is applied to the training data set and iteratively improved to reduce the error of outcome prediction using optimization techniques.[14] The larger the training data set, the more examples there are with which to develop the algorithm, thereby reducing the degree of error in prediction. The training data set used in this study included the free-text as well as the known outcome for a large number of pathology reports to which we applied supervised learning methods to develop an algorithm to classify diagnosis (BCC, SCC, keratoacanthoma, and intraepidermal carcinoma [IEC]), number of lesions, and site of lesions from free-text pathology reports. Diagnosis and site were modeled as separate multiclass classification problems in which a single label can be assigned to each lesion text. The training data set included all pathology reports for participants (including nonskin lesions, benign skin lesions, and melanoma). More than a third of the pathology reports in the training data set contained descriptions and diagnoses for multiple skin lesions that had been excised at the same visit; each lesion required identification of a site and diagnosis. These were processed as multilabel classifications, where a model can return multiple labels, given a single text input. Using regex, Python, and Python dictionaries, the report text was split into lesion-specific text. The Web application first processes the free-text within a pathology report to identify and split multiple lesions, and then separate algorithms for diagnosis and site are processed on individual lesions.

Development and Internal Validation

Separate linear support vector machines (LSVMs) were developed for each classification task (ie, diagnosis and site). The data set was split into randomly shuffled train/test splits of 70/30, equating to 25,397 lesions used to derive and train the algorithm and 10,884 lesions used to test the algorithm. Term frequency-inverse document frequency matrix was created using word-based n-grams of length 1 or length 2 (short, 1- or 2-word phrases). Words contained within < 10% or > 90% of the reports were ignored, as were common, information-poor stop words (eg, “the,” “a,” “in”). A hyperparameter GridSearch was performed, optimizing for the best F1-Macro score (a function of both precision and recall; Table 1). Each parameter combination was evaluated using 3-fold cross-validation. The best-performing LSVM model was then evaluated against the held-out test data set. The test evaluation tested for completeness of predicting the classification (sensitivity, or recall), and the resulting evaluation retested for misclassifications (positive predictive value, or precision).

TABLE 1.

Calculations Used in the Experiment and Validations

Calculations Used in the Experiment and Validations The trained models for each classification problem were then used as the basis for a Web application to upload pathology reports and analyze the free-text. The Web application can parse and analyze reports across a range of formats, commensurate with the different formats used by various laboratories. The output variables are listed in Table 2.

TABLE 2.

Data Fields in Output Data From Pathology Classifier Web Application

Data Fields in Output Data From Pathology Classifier Web Application We developed the Web application using Python 3.6 on a machine with Ubuntu Linux (Canonical, London, United Kingdom) that has 16 cores with 8 G of memory. The following libraries were used: Pandas, sklearn, spaCy, various Python 3.6 standard libraries (including regex), and Jupyter notebooks.

External Validation

To assess the real-world performance of the algorithm beyond the historical data set used for training, we compared the classifiers’ predictions on 3 independent samples of pathology reports: a random sample of 400 new pathology reports from QSkin participants; 2,345 pathology reports for QSkin participants from pathology laboratories not represented in the training data set; and 42 pathology reports from high-risk transplantation recipients enrolled in the Skin Tumors in Allograft Recipients (STAR) study.[15] The text reports were first reviewed by a medically trained staff member who entered diagnosis and site details into a database; we considered these summary measures to be the gold standard data with which to compare the algorithm-derived measures. Separately and independently of this review, the first author (B.S.T.) uploaded the same reports in their various formats (Excel, comma-separated values, PDF, and Word) into the Web application. The manually extracted data were not always entered in numerical order; therefore, we could not match on lesion, but rather matched on reports including counts of histologic-specific lesions. We calculated standard measures of agreement (kappa score) between manually extracted and algorithm-derived classifications for histology (at least one correct classification for each diagnosis) and for histologic-specific lesion count (0, 1, 2, and ≥ 3 lesions occurring within a single report). Agreement was calculated for each of the 3 independent validation samples.

RESULTS

Development

The algorithm achieved high recall (> 0.9), precision, and F1 scores in the evaluation of the parameter combinations for BCC and SCC within the train/test splits. Agreement measures for diagnosing keratoacanthoma and IEC were slightly less with F1 scores of 0.89 and 0.86, respectively (Table 3). See Appendix Tables A1 and A2 and Appendix Figures A1 and A2 for detailed results.

TABLE 3.

Accuracy of Final Algorithm for Diagnosis Classification in Test Split of Training Data Set in Development

TABLE A1.

Test Results for Accuracy of Algorithm Prediction for Diagnosis

TABLE A2.

Test Results for Accuracy of Algorithm Prediction for Site

FIG A1.

Test results for agreement (F1 score) and discordance of diagnoses between the predicted labels (algorithm derived classification) and true labels (actual diagnosis). Histologic names for labels are detailed in Table A1.

FIG A2.

Test results for agreement (F1 score) and discordance of site between the predicted labels (algorithm-predicted site) and true labels (actual site). Anatomic site names for labels are detailed in Table A2.

Accuracy of Final Algorithm for Diagnosis Classification in Test Split of Training Data Set in Development

Validation

We observed high accuracy for classifying histologic subtypes of skin cancer across the 3 validation data sets. Kappa scores for validation data sets 1 and 2 were almost perfect for BCC, SCC, and keratoacanthoma (> 0.9) and were high for IEC (0.89; Table 4). However, approximately 7% of pathology reports from validation data set 2 could not be processed because of formatting irregularities.

TABLE 4.

Accuracy of Classifying at Least One Case of the Diagnosis in Each Report and Agreement Between Algorithm-Derived and Manual Review (gold standard) Sample of Reports in QSkin Study Participants and External Study Participants (STAR study) Although agreement indices were slightly lower for validation data set 3 (the cohort of organ transplantation recipients with high incidence and multiplicity of skin cancer), kappa scores were high for BCC (0.82), SCC (0.90), and IEC (0.89). A lower sensitivity was found for BCCs in this data set (83%), largely because the application could not separate 8 BCCs diagnosed in one pathology report. Across all 3 validation data sets, accuracy of histology-specific lesion counts was slightly lower than for histologic classification. Even so, kappa scores generally remained higher than 0.8 (Appendix Table A3).

TABLE A3.

Count of Each Diagnosis for Each Person and Agreement Between Algorithm-Derived Extraction and Manual Review (gold standard) From 3 Validation Sources

Kappa scores for site of lesion were high for validation data set 1 but lower for some sites in validation data set 2 for head and neck (0.89 v 0.78, respectively), torso (0.83 v 0.69, respectively), and limbs (0.91 v 0.74, respectively). Further agreement calculations and agreement for face-specific sites are provided in Appendix Table A4. A gold standard for site of lesion was not available for validation data set 3.

TABLE A4.

Accuracy of Classifying at Least One Keratinocyte Cancer at Each Site in a Report and Agreement Between Algorithm-Derived Extraction and Manual Review (gold standard) Sample of Reports

DISCUSSION

We developed a Web application to automatically extract diagnostic information from free-text pathology reports. The application underwent extensive validation and was found to be highly accurate for classifying diagnoses of keratinocyte cancers within a large, prospective study. Its utility among transplantation patients with complex pathology reports was slightly lower. However, it must be noted that the reports in this group frequently described > 10 lesions in a single report. In addition to overall accuracy, sensitivity and positive predictive value for BCC and SCC were particularly high, indicating high ascertainment and few false negatives. Agreement between algorithm-derived and manually extracted information on the site of lesion was slightly lower than that observed for type of lesion. This is likely because of inconsistencies in the collection of this data item. Expert reviewers were required to allocate the site of a lesion from an extensive, but not exhaustive, list. As an example, a lesion on the lower neck or upper back region may have been entered as neck, shoulder, or upper back. Similarly, a lesion described as located on the hip could potentially be entered as being on the buttock, torso, or thigh. This inconsistency likely affected the ability of the algorithm to accurately determine site. To the best of our knowledge, this is the only automated method for extracting diagnostic information from free-text pathology reports for keratinocyte cancers. Eide et al[9] used natural language processing to identify incident cases of keratinocyte cancers from pathology reports appropriate for registration but did not extract pathology data using these methods. The automated extraction of information from cancer histopathology reports is complex. Free-text reporting by pathologists results in large and complex variety in the language used to describe a diagnosis (or lack of diagnosis).[16,17] The main challenge for the automated algorithm arises from multiple lesions being described in a single pathology report. To overcome this, we developed rules in the application to separately extract information specific for each lesion and then map the components together again. Similar to Currie et al,[18] the Web application generates an alert to flag the small number of reports that failed processing. Strengths of the study include full manual reviews of > 25,000 pathology reports, yielding a training data set of sufficient quality and size for supervised learning development. However, the application is limited in that it can only assign one diagnosis to a single lesion. For example, “squamous cell carcinoma arising in a keratoacanthoma” was classified as SCC, whereas a medical reviewer would classify this lesion as both SCC and keratoacanthoma. This occurred in approximately 1% of lesions classified by the application. For the purposes of defining skin cancer incidence in a population, we contend that the coding rules developed here are acceptable. Unlike other attempts to automate the extraction of information from pathology reports,[16,18] we report our detailed methods and used open-source software. Thus, although the findings in this report are specific to the format and language used in pathology reports for keratinocyte cancers in the study population, the preprocessing rules can be easily adapted to suit different text formats and the supervised learning methods could be applied to a different training data set. In conclusion, a supervised learning Web application can process large numbers of pathology reports and classify and count diagnoses of keratinocyte cancers described in free-text histopathology reports with a high degree of accuracy. This tool was developed primarily for compiling statistical summary information in settings where such data are not currently able to be recorded as a result of the volume and complexity of data. Similar applications could be implemented into cancer registries and hospitals, which would enable the measurement of histology type–specific keratinocyte cancer incidence rates.

16 in total

1. Non-melanoma skin cancer in Australia.

Authors: Marloes Fransen; Amalia Karahalios; Niyati Sharma; Dallas R English; Graham G Giles; Rodney D Sinclair
Journal: Med J Aust Date: 2012-11-19 Impact factor: 7.738

2. Automated Extraction of Grade, Stage, and Quality Information From Transurethral Resection of Bladder Tumor Pathology Reports Using Natural Language Processing.

Authors: Alexander P Glaser; Brian J Jordan; Jason Cohen; Anuj Desai; Philip Silberman; Joshua J Meeks
Journal: JCO Clin Cancer Inform Date: 2018-12

3. Prevalence of Skin Cancer and Related Skin Tumors in High-Risk Kidney and Liver Transplant Recipients in Queensland, Australia.

Authors: Michelle R Iannacone; Sudipta Sinnya; Nirmala Pandeya; Nikky Isbel; Scott Campbell; Jonathan Fawcett; Peter H Soyer; Lisa Ferguson; Marcia Davis; David C Whiteman; Adèle C Green
Journal: J Invest Dermatol Date: 2016-03-09 Impact factor: 8.551

4. Automated classification of free-text pathology reports for registration of incident cases of cancer.

Authors: V Jouhet; G Defossez; A Burgun; P le Beux; P Levillain; P Ingrand; V Claveau
Journal: Methods Inf Med Date: 2011-07-26 Impact factor: 2.176

5. Assessing the Utility of Automatic Cancer Registry Notifications Data Extraction from Free-Text Pathology Reports.

Authors: Anthony N Nguyen; Julie Moore; John O'Dwyer; Shoni Philpot
Journal: AMIA Annu Symp Proc Date: 2015-11-05

6. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries.

Authors: Freddie Bray; Jacques Ferlay; Isabelle Soerjomataram; Rebecca L Siegel; Lindsey A Torre; Ahmedin Jemal
Journal: CA Cancer J Clin Date: 2018-09-12 Impact factor: 508.702

7. Medicare claims data reliably identify treatments for basal cell carcinoma and squamous cell carcinoma: a prospective cohort study.

Authors: Bridie S Thompson; Catherine M Olsen; Padmini Subramaniam; Rachel E Neale; David C Whiteman
Journal: Aust N Z J Public Health Date: 2015-11-11 Impact factor: 2.939

Review 8. Text mining of cancer-related information: review of current status and future directions.

Authors: Irena Spasić; Jacqueline Livsey; John A Keane; Goran Nenadić
Journal: Int J Med Inform Date: 2014-06-24 Impact factor: 4.046

9. The feasibility of using natural language processing to extract clinical information from breast pathology reports.

Authors: Julliette M Buckley; Suzanne B Coopey; John Sharko; Fernanda Polubriaginof; Brian Drohan; Ahmet K Belli; Elizabeth M H Kim; Judy E Garber; Barbara L Smith; Michele A Gadd; Michelle C Specht; Constance A Roche; Thomas M Gudewicz; Kevin S Hughes
Journal: J Pathol Inform Date: 2012-06-30

10. Validation of claims data algorithms to identify nonmelanoma skin cancer.

Authors: Melody J Eide; J Mark Tuthill; Richard J Krajenta; Gordon R Jacobsen; Marc Levine; Christine C Johnson
Journal: J Invest Dermatol Date: 2012-04-05 Impact factor: 8.551

2 in total

1. Temporal trends in the incidence rates of keratinocyte carcinomas from 1978 to 2018 in Tasmania, Australia: a population-based study.

Authors: Bruna S Ragaini; Leigh Blizzard; Leah Newman; Brian Stokes; Tim Albion; Alison Venn
Journal: Discov Oncol Date: 2021-08-31

2. Searching Full-Text Anatomic Pathology Reports Using Business Intelligence Software.

Authors: Simone Arvisais-Anhalt; Christoph U Lehmann; Justin A Bishop; Jyoti Balani; Laurie Boutte; Marjorie Morales; Jason Y Park; Ellen Araj
Journal: J Pathol Inform Date: 2022-02-07

2 in total