| Literature DB >> 32469840 |
Carlos R Oliveira1, Patrick Niccolai1, Anette Michelle Ortiz1, Sangini S Sheth2, Eugene D Shapiro1,3, Linda M Niccolai3, Cynthia A Brandt4,5.
Abstract
BACKGROUND: Accurate identification of new diagnoses of human papillomavirus-associated cancers and precancers is an important step toward the development of strategies that optimize the use of human papillomavirus vaccines. The diagnosis of human papillomavirus cancers hinges on a histopathologic report, which is typically stored in electronic medical records as free-form, or unstructured, narrative text. Previous efforts to perform surveillance for human papillomavirus cancers have relied on the manual review of pathology reports to extract diagnostic information, a process that is both labor- and resource-intensive. Natural language processing can be used to automate the structuring and extraction of clinical data from unstructured narrative text in medical records and may provide a practical and effective method for identifying patients with vaccine-preventable human papillomavirus disease for surveillance and research.Entities:
Keywords: HPV; accuracy; anal cancer; automated data extraction; cancer; cervical cancer; human papillomavirus; natural language processing; pathology reporting; precancer; surveillance
Year: 2020 PMID: 32469840 PMCID: PMC7671846 DOI: 10.2196/20826
Source DB: PubMed Journal: JMIR Med Inform
Figure 1Diagrammatic representation of the classification process for pathology reports (colored indicates abnormal pathology). AIN: anal intraepithelial lesion; AIS: adenocarcinoma in situ; ASC-US: atypical squamous cells of undetermined significance; ASC-H: atypical squamous cells—cannot exclude high-grade squamous intraepithelial lesion; CIN: cervical intraepithelial lesion;HSIL: high-grade squamous intraepithelial lesion; LSIL: low-grade squamous intraepithelial lesion; NIEL: negative for intraepithelial lesion; SCC: squamous cell carcinoma.
Summary of results from the manual review of the validation set.
| Test | Cervical (n=769), n ( %) | Anal (n=180), n (%) | Total (N=949), n |
| Cytology | 403 (81.1) | 94 (18.9) | 497 |
| Negative for intraepithelial lesion | 255 (84.4) | 47 (15.6) | 302 |
| Atypical squamous cells of undetermined significance | 44 (68.8) | 20 (31.3) | 64 |
| Atypical squamous cells—cannot exclude high-grade squamous intraepithelial lesion | 57 (98.3) | 1 (1.7) | 58 |
| Low-grade squamous intraepithelial lesion | 16 (84.2) | 3 (15.8) | 19 |
| Glandular abnormality | 14 (82.4) | 3 (17.6) | 17 |
| Unsatisfactory specimen | 17 (45.9) | 20 (54.1) | 37 |
| HPVa test performed | 206 (85.8) | 34 (14.2) | 240 |
| Positive | 91 (84.3) | 17 (15.7) | 108 |
| Histology | 366 (81.0) | 86 (19.0) | 452 |
| Benign | 153 (77.3) | 45 (22.7) | 198 |
| Squamous intraepithelial lesion grade 1 | 138 (84.1) | 26 (15.9) | 164 |
| Squamous intraepithelial lesion grade 2+ | 75 (83.3) | 15 (16.7) | 90 |
aHPV: human papillomavirus.
Performance of NLP algorithm on the validation set, N = 949.
| Variable | Precision (95% CI) | Recall (95% CI) | F-measure (95% CI) | Accuracy (95% CI) | |
|
|
|
|
|
| |
|
| Cervical | 0.98 (0.95-0.99) | 1.00 (0.97-1.00) | 0.99 (0.98-1.00) | 0.99 (0.98-1.00) |
|
| Anal | 0.93 (0.76-0.99) | 1.00 (0.86-1.00) | 0.96 (0.91-1.00) | 0.98 (0.93-0.99) |
|
|
|
|
|
| |
|
| Positive | 0.95 (0.89-0.98) | 1.00 (0.97-1.00) | 0.97 (0.95-0.99) | 0.99 (0.98-1.00) |
|
|
|
|
|
| |
|
| CINc grade 2+ | 0.89 (0.80-0.95) | 0.93 (0.85-0.98) | 0.91 (0.86-0.96) | 0.96 (0.94-0.98) |
|
| AINd grade 2+ | 0.87 (0.59-0.98) | 0.68 (0.43-0.87) | 0.76 (0.61-0.92) | 0.91 (0.82-0.96) |
|
|
|
|
|
| |
|
| Abnormal test | 0.94 (0.91-0.97) | 0.96 (0.92-0.98) | 0.94 (0.93-0.97) | 0.97 (0.96-0.98) |
aAbnormalities include atypical squamous cells of undetermined significance, atypical squamous cells—cannot exclude high-grade squamous intraepithelial lesion, low-grade squamous intraepithelial lesion, and glandular cell abnormalities.
bHPV: human papillomavirus.
cCIN: cervical intraepithelial lesion.
dAIN: anal intraepithelial lesion.
eIncludes results from both cytology and histology.