| Literature DB >> 35721398 |
Hong-Jun Yoon1, Alina Peluso1, Eric B Durbin2, Xiao-Cheng Wu3, Antoinette Stroup4, Jennifer Doherty5, Stephen Schwartz6, Charles Wiggins7, Linda Coyle8, Lynne Penberthy9.
Abstract
Objectives: The International Classification of Childhood Cancer (ICCC) facilitates the effective classification of a heterogeneous group of cancers in the important pediatric population. However, there has been no development of machine learning models for the ICCC classification. We developed deep learning-based information extraction models from cancer pathology reports based on the ICD-O-3 coding standard. In this article, we describe extending the models to perform ICCC classification. Materials andEntities:
Keywords: cancer pathology reports; information extraction; machine learning; pediatric cancer
Year: 2022 PMID: 35721398 PMCID: PMC9202570 DOI: 10.1093/jamiaopen/ooac049
Source DB: PubMed Journal: JAMIA Open ISSN: 2574-2531
ICCC (a) main and (b) subgroup codes and definitions based on the ICCC third edition
| (a) | |
|---|---|
| Code | Description |
| 01 | Leukemias, myeloproliferative, and myelodysplastic diseases |
| 02 | Lymphomas and reticuloendothelial neoplasms |
| 03 | CNS and miscellaneous intracranial and intraspinal neoplasms |
| 04 | Neuroblastoma and other peripheral nervous cell tumors |
| 05 | Retinoblastoma |
| 06 | Renal tumors |
| 07 | Hepatic tumors |
| 08 | Malignant bone tumors |
| 09 | Soft tissue and other extraosseous sarcomas |
| 10 | Germ cell tumors, trophoblastic tumors, and neoplasms of gonads |
| 11 | Other malignant epithelial neoplasms and malignant melanomas |
| 12 | Other and unspecified malignant neoplasms |
| 999 | Not classified by SEER or in situ |
Figure 1.Number of childhood cancer pathology reports by ICCC main and subgroup codes.
Figure 2.Number of childhood cancer pathology reports by ICCC main codes and age at diagnosis.
List of cancers diagnosed between ages 20 and 39 that could be regarded as pediatric cancers per CCDI and the number of cases that are augmented to the training of Model 2(b)
| Code | Description | No. of cases |
|---|---|---|
| 061 | Nephroblastoma and other nonepithelial renal tumors | 26 |
| 071 | Hepatoblastoma and mesenchymal tumors of the liver | 11 |
| 081 | Osteosarcoma | 301 |
| 082 | Chondrosarcoma | 232 |
| 083 | Ewing tumors and related sarcoma of the bone | 193 |
| 084 | Other specified malignant bone tumors | 90 |
| 085 | Unspecified malignant bone tumors | 23 |
| 091 | Rhabdomyosarcomas | 179 |
Figure 3.Model architecture for ICCC classification from childhood cancer pathology reports. (A) Model 1: ICD-O-3 classification then ICCC recoding. (B) Model 2: direct ICCC classification.
Classification accuracy scores per each ICCC code in F1 metric
| Code | 1(a) | 1(b) | 2(a) | 2(b) | # cases | UQ | # UQ |
|---|---|---|---|---|---|---|---|
| 011 | 0.94 | 0.95 | 0.97 | 0.97 | 8042 | 0.99 | 7171 |
| 012 | 0.88 | 0.91 | 0.93 | 0.93 | 1694 | 0.98 | 1450 |
| 013 | 0.90 | 0.89 | 0.94 | 0.93 | 294 | 0.98 | 248 |
| 014 | 0.62 | 0.57 | 0.71 | 0.66 | 137 | 0.92 | 61 |
| 015 | 0.33 | 0.49 | 0.76 | 0.75 | 212 | 0.92 | 127 |
| 021 | 0.94 | 0.94 | 0.96 | 0.96 | 1530 | 0.99 | 1380 |
| 022 | 0.75 | 0.80 | 0.87 | 0.88 | 1993 | 0.96 | 1498 |
| 023 | 0.84 | 0.86 | 0.90 | 0.90 | 801 | 0.98 | 641 |
| 024 | 0.96 | 0.96 | 0.98 | 0.97 | 316 | 1.00 | 299 |
| 025 | 0.07 | 0.00 | 0.15 | 0.17 | 25 | 0.25 | 14 |
| 031 | 0.89 | 0.89 | 0.93 | 0.93 | 391 | 0.99 | 333 |
| 032 | 0.88 | 0.90 | 0.92 | 0.93 | 1441 | 0.99 | 1238 |
| 033 | 0.90 | 0.90 | 0.93 | 0.92 | 803 | 0.99 | 681 |
| 034 | 0.52 | 0.58 | 0.76 | 0.76 | 278 | 0.93 | 156 |
| 035 | 0.82 | 0.85 | 0.91 | 0.92 | 881 | 0.98 | 704 |
| 036 | 0.00 | 0.24 | 0.60 | 0.61 | 36 | 0.90 | 16 |
| 041 | 0.96 | 0.97 | 0.98 | 0.98 | 1639 | 1.00 | 1558 |
| 042 | 0.64 | 0.55 | 0.73 | 0.74 | 26 | 1.00 | 12 |
| 050 | 0.92 | 0.95 | 0.99 | 0.99 | 71 | 1.00 | 68 |
| 061 | 0.97 | 0.97 | 0.98 | 0.98 | 736 | 1.00 | 694 |
| 062 | 0.87 | 0.89 | 0.94 | 0.95 | 99 | 0.99 | 87 |
| 071 | 0.96 | 0.96 | 0.98 | 0.97 | 334 | 0.99 | 315 |
| 072 | 0.89 | 0.87 | 0.92 | 0.92 | 87 | 0.98 | 73 |
| 081 | 0.96 | 0.97 | 0.98 | 0.98 | 617 | 1.00 | 581 |
| 082 | 0.80 | 0.53 | 0.81 | 0.80 | 40 | 0.93 | 35 |
| 083 | 0.79 | 0.83 | 0.87 | 0.89 | 627 | 0.97 | 433 |
| 084 | 0.73 | 0.45 | 0.76 | 0.85 | 30 | 0.98 | 20 |
| 085 | 0.16 | 0.23 | 0.38 | 0.60 | 25 | 0.82 | 10 |
| 091 | 0.93 | 0.93 | 0.96 | 0.95 | 840 | 0.99 | 764 |
| 092 | 0.71 | 0.74 | 0.83 | 0.82 | 159 | 0.97 | 102 |
| 093 | 0.83 | 0.00 | 0.29 | 0.50 | 6 | 0.00 | 1 |
| 094 | 0.72 | 0.73 | 0.83 | 0.84 | 772 | 0.97 | 508 |
| 095 | 0.57 | 0.58 | 0.77 | 0.78 | 225 | 0.96 | 128 |
| 101 | 0.70 | 0.72 | 0.84 | 0.85 | 155 | 0.99 | 105 |
| 102 | 0.77 | 0.83 | 0.86 | 0.87 | 152 | 0.98 | 101 |
| 103 | 0.94 | 0.94 | 0.96 | 0.96 | 688 | 1.00 | 619 |
| 104 | 0.56 | 0.65 | 0.80 | 0.84 | 44 | 0.96 | 24 |
| 105 | 0.44 | 0.36 | 0.80 | 0.73 | 18 | 1.00 | 6 |
| 111 | 0.64 | 0.73 | 0.80 | 0.76 | 16 | 0.77 | 8 |
| 112 | 0.99 | 0.99 | 0.99 | 0.99 | 1112 | 1.00 | 1096 |
| 113 | 0.92 | 0.89 | 0.95 | 0.93 | 46 | 0.99 | 40 |
| 114 | 0.95 | 0.89 | 0.98 | 0.97 | 427 | 0.99 | 392 |
| 115 | 0.59 | 0.59 | 0.86 | 0.77 | 23 | 1.00 | 9 |
| 116 | 0.88 | 0.92 | 0.96 | 0.96 | 715 | 1.00 | 639 |
| 121 | 0.59 | 0.58 | 0.88 | 0.82 | 75 | 0.99 | 44 |
| 122 | 0.00 | 0.09 | 0.58 | 0.54 | 20 | 1.00 | 3 |
| 999 | 0.75 | 0.72 | 0.90 | 0.89 | 508 | 0.97 | 400 |
| Micro-F1 | 0.882 | 0.896 | 0.935 | 0.936 | 29 206 | 0.987 | 24 892 |
| Macro-F1 | 0.701 | 0.719 | 0.837 | 0.843 | 29 206 | 0.935 | 24 892 |
Note: Column 1(a) is the scores from Model 1(a), 1(b) is from Model 1(b), 2(a) is from Model 2(a), and 2(b) is from Model 2(b), “# cases” is the number of classified cases in the data corpus, UQ is the scores from Model 2(b) but without abstained cases based on the softmax UQ, and “# UQ” is the number of classified cases by the UQ model. Micro-averaged and macro-averaged F1 scores are at the bottom of the table.