| Literature DB >> 35907849 |
Yiqing Zhao1, Anastasios Dimou2, Feichen Shen1, Nansu Zong1, Jaime I Davila3, Hongfang Liu4, Chen Wang5.
Abstract
BACKGROUND: Next-generation sequencing provides comprehensive information about individuals' genetic makeup and is commonplace in precision oncology practice. Due to the heterogeneity of individual patient's disease conditions and treatment journeys, not all targeted therapies were initiated despite actionable mutations. To better understand and support the clinical decision-making process in precision oncology, there is a need to examine real-world associations between patients' genetic information and treatment choices.Entities:
Keywords: Electronic health records; Precision oncology; Real-world evidence; Resource description framework
Mesh:
Year: 2022 PMID: 35907849 PMCID: PMC9338627 DOI: 10.1186/s12920-022-01314-9
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.622
Fig. 1Workflow of RDF representation of real-world precision oncology data. (1) Data Retrieval: ‘patient’, ‘gene’, ‘variant’, ‘drug’, ‘disease’ information were retrieved from multiple data sources. (2) Data Normalization: raw data retrieved from multiple data sources were mapped to standardized terminologies including UMLS, etc. (3) data was integrated using a schema by Genetic Testing Ontology. 4) PO2RDF was generated using D2RQ
Data retrieval sources
| Gene | Variant | Disease | Drug | |
|---|---|---|---|---|
| Genetic reports | Y | Y | Y | |
| UDP | Y | Y | ||
| Clinical notes | Y |
Description of data properties and related object properties
| Class | Data property | Related object property |
|---|---|---|
| Patient | Patient_ID, Date_of_Birth, Race, Ethnicity, Sex, Death | HasMutGene, HasVariant, HasDisease, TreatedBy |
| Gene | Gene_Name, UMLS_CUI, OMIM_ID, CIViC_Gene_ID, OncoKB_Gene_ID, PharmGKB_Gene_ID | AssociatedWithGene, AssociatedWithVariant, MayTargetedBy |
| Variant | Var_Name, UMLS_CUI, ClinVar_ID, dbSNP_ID, CIViC_Var_ID, OncoKB_Var_ID, | AssociatedWithVariant |
| Disease | Disease_Name, UMLS_CUI, OMIM_ID, CIViC_DOID, OncoKB_Disease_ID, PharmGKB_Disease_ID, Stage_At_Diagnosis | AssociatedWithGene, MayTreatedBy, HasContraindicationWith |
| Drug | Drug_Name, Brand_Name, Drug_Category, UMLS_CUI, NUI (NDF-RT Unique Identifier), CIViC_Drug_ID, OncoKB_Drug_ID, PharmGKB_Drug_ID | MayTreatedBy, HasContraindicationWith, MayTargetedBy |
Fig. 2Example of RDF Representation of Two Patients’ Data (Purple square: ‘Patient’, Red circle: ‘Gene’, Orange circle: ‘Drug’, Blue circle: ‘Disease’). Patient 1 was diagnosed as lung adenocarcinoma, had variants in EGFR, TP53, CHEK2 gene and was prescribed Osimertinib after receiving the genetic report. Patient 2 was diagnosed as melanoma, had variants in EGFR, TP53, DNMT3A, CDKN2A/B, RAF1 gene and was prescribed Pembrolizumab after receiving the genetic report
Fig. 3Distribution of Major Cancer Type in the Institutional Oncology Cohort (N = 2593)
Cohort demographic distribution
| Characteristic | Cohort (n = 2593) |
|---|---|
| Average age at initial diagnosis at Mayo Clinic | 58 |
| Average age at first test | 62 |
| Sex (% female) | 51.4% |
| Race (% white) | 88.7% |
| Ethnicity (% hispanic) | 3.5% |
Statistical results for data collection
| Total number of occurrences | Total number of UMLS-identifiable occurrences | Unique concepts | Unique UMLS-identifiable concepts | |
|---|---|---|---|---|
| Gene | 17,100 | 17,018 (99.5%) | 417 | 415 |
| Variant | 16,196 | 3,158 (19.5%) | 5497 | 285 |
| Disease | 109,030 | 107,106 (98.2%) | 8449 | 8102 |
| Drug | 249,995 | 249,853 (99.9%) | 389 | 368 |
SPARQL query to extract EGFR related information
| SPARQL query | Results |
|---|---|
SELECT distinct ?Gene ?property ?hasValue WHERE { ?Gene a po2rdf:Gene. FILTER regex(str(?Gene), "EGFR") ?Gene ?property ?hasValue } | Gene_Name: EGFR. UMLS_CUI: C1414313. OMIM_ID: 131550. CIViC_Gene_ID: 1956. OncoKB_Gene_ID: 2. PharmGKB_Gene_ID: PA7360 Disease_Name: 1. Lung cancer, 2. Colorectal cancer, 3. Melanoma, 4. Esophagus adenocarcinoma, 5. Glioma Drugs_Name: 1. Gefitinib, 2. Osimertinib, 3. Afatinib, 4. Erlotinib, 5. Dacomitinib Patient_ID: 3, 15, 21, 44, 65, 73… |
Fig. 4Visualization of RDF Representation of Real-world Associations Among “Gene” (Purple), “Disease” (Green), and “Drug” (Orange) in Precision Oncology. Node size represents the degree of each node (unique number of co-occurrence). Edge thickness represents the weight of each edge (total counts of co-occurrences)
Fig. 5Association rule analysis results (confidence) regarding drug—a EGFR and b ALK mutation Associations
Fig. 6Survival analysis for lung cancer patients a with or b without TP53 mutation