| Literature DB >> 35719866 |
Marie-Pier Gauthier1, Jennifer H Law1, Lisa W Le2, Janice J N Li1, Sajda Zahir1, Sharon Nirmalakumar1, Mike Sung1, Christopher Pettengell3, Steven Aviv3, Ryan Chu1, Adrian Sacher1, Geoffrey Liu1, Penelope Bradbury1, Frances A Shepherd1, Natasha B Leighl1.
Abstract
Introduction: Real-world evidence is important in regulatory and funding decisions. Manual data extraction from electronic health records (EHRs) is time-consuming and challenging to maintain. Automated extraction using natural language processing (NLP) and artificial intelligence may facilitate this process. Whereas NLP offers a faster solution than manual methods of extraction, the validity of extracted data remains in question. The current study compared manual and automated data extraction from the EHR of patients with advanced lung cancer.Entities:
Keywords: Artificial intelligence; Health records; Natural language processing; Real-world data; Real-world evidence; Validation
Year: 2022 PMID: 35719866 PMCID: PMC9201015 DOI: 10.1016/j.jtocrr.2022.100340
Source DB: PubMed Journal: JTO Clin Res Rep ISSN: 2666-3643
Figure 1Study population. NLP, natural language processing; PM, Princess Margaret Cancer Centre.
Metastatic Sites of Disease
| Site of Metastasis | Positive Cases | Accuracy (%) | Concordance (%) | |
|---|---|---|---|---|
| NLP | Manual | |||
| Abdominal | 15 | 88.0 | 86.0 | 74.0 |
| Adrenal | 18 | 96.0 | 77.0 | 73.0 |
| Brain | 29 | 99.0 | 71.0 | 70.0 |
| Bone | 52 | 95.0 | 81.0 | 76.0 |
| Liver | 23 | 96.0 | 95.0 | 91.0 |
| Lung | 55 | 71.0 | 87.0 | 58.0 |
| Lymph | 57 | 66.0 | 92.0 | 58.0 |
| Pericardium | 3 | 99.0 | 100 | 99.0 |
| Renal | 3 | 99.0 | 99.0 | 98.0 |
| Spleen | 4 | 99.0 | 97.0 | 96.0 |
NLP, natural language processing.
Figure 2Concordance between NLP data extraction and manual data extraction results. (A) Demographics and disease characteristics, (B) Metastatic sites, (C) Biomarker testing performed, (D) Biomarker status, (E) Systemic therapy type (first line), and (F) Systemic therapy type (any line). The dashed lines indicate % concordance between NLP and manual data extraction results. NLP, natural language processing; PD-L1, programmed death-ligand 1.
Demographics and Disease Characteristics
| Characteristic | Number of Cases | Accuracy (%) | Concordance (%) | |
|---|---|---|---|---|
| NLP | Manual | |||
| Age at diagnosis | 100 | 100 | 99.0 | 99.0 |
| Sex | 100 | 100 | 100 | |
| Male | 54 | |||
| Female | 46 | |||
| Date of diagnosis (±30 d) | 100 | 94.0 | 83.0 | 77.0 |
| ECOG PS at diagnosis | 93.0 | 78.0 | 71.0 | |
| 0 | 16 | |||
| 1 | 54 | |||
| 2 | 14 | |||
| 3 | 13 | |||
| 4 | 1 | |||
| Unknown | 2 | |||
| Smoking status | 88.0 | 94.0 | 82.0 | |
| Nonsmoker | 35 | |||
| Former smoker | 34 | |||
| Smoker | 31 | |||
| Histologic subtype | 98.0 | 98.0 | 96.0 | |
| Adenocarcinoma | 66 | |||
| Large cell | 4 | |||
| Non–small cell | 3 | |||
| Small cell | 21 | |||
| Squamous | 6 | |||
| First line treatment | ||||
| Chemotherapy | 59 | 95.0 | 96.0 | 92.0 |
| Immunotherapy | 6 | 99.0 | 100 | 99.0 |
| Targeted Therapy | 36 | 99.0 | 99.0 | 98.0 |
| Treatment (any line) | ||||
| Chemotherapy | 69 | 94.0 | 94.0 | 88.0 |
| Immunotherapy | 12 | 98.0 | 98.0 | 96.0 |
| Targeted therapy | 40 | 99.0 | 84.0 | 83.0 |
ECOG, Eastern Cooperative Oncology Group; NLP, natural language processing; PS, performance status.
One patient received combination therapy as first line treatment.
Figure 3Sensitivity and specificity of (A) biomarker status results and (B) metastasis site results. PD-L1, programmed death-ligand 1.
Biomarker Testing and Results
| Biomarker | Biomarker Testing Performed | Biomarker Results Captured | ||||||
|---|---|---|---|---|---|---|---|---|
| Number tested | Accuracy (%) | Concordance (%) | Positive Cases | Accuracy (%) | Concordance (%) | |||
| NLP | Manual | NLP | Manual | |||||
| ALK | 71 | 99.0 | 97.0 | 96.0 | 8 | 98.6 | 98.6 | 97.2 |
| BRAF | 19 | 99.0 | 98.0 | 97.0 | 1 | 94.7 | 100 | 94.7 |
| EGFR | 72 | 98.0 | 98.0 | 96.0 | 29 | 100 | 98.6 | 98.6 |
| KRAS | 19 | 99.0 | 98.0 | 97.0 | 3 | 94.7 | 94.7 | 89.5 |
| PD-L1 | 29 | 98.0 | 100 | 98.0 | 20 | 86.2 | 100 | 86.2 |
| ROS1 | 4 | 98.0 | 100 | 98.0 | 1 | 100 | 100 | 100 |
NLP, natural language processing; PD-L1, programmed death-ligand 1.
Out of the corresponding number of patients tested.