| Literature DB >> 35454813 |
Marta Contreiras Silva1, Patrícia Eugénio1, Daniel Faria1, Catia Pesquita1.
Abstract
The complexity of cancer research stems from leaning on several biomedical disciplines for relevant sources of data, many of which are complex in their own right. A holistic view of cancer-which is critical for precision medicine approaches-hinges on integrating a variety of heterogeneous data sources under a cohesive knowledge model, a role which biomedical ontologies can fill. This study reviews the application of ontologies and knowledge graphs in cancer research. In total, our review encompasses 141 published works, which we categorized under 14 hierarchical categories according to their usage of ontologies and knowledge graphs. We also review the most commonly used ontologies and newly developed ones. Our review highlights the growing traction of ontologies in biomedical research in general, and cancer research in particular. Ontologies enable data accessibility, interoperability and integration, support data analysis, facilitate data interpretation and data mining, and more recently, with the emergence of the knowledge graph paradigm, support the application of Artificial Intelligence methods to unlock new knowledge from a holistic view of the available large volumes of heterogeneous data.Entities:
Keywords: cancer; knowledge graph; oncology; ontologies; review; semantic technologies
Year: 2022 PMID: 35454813 PMCID: PMC9029532 DOI: 10.3390/cancers14081906
Source DB: PubMed Journal: Cancers (Basel) ISSN: 2072-6694 Impact factor: 6.575
Figure 1Knowledge graph representing a smaller network that includes renal cell carcinoma, MET gene, antineoplastic agent and proten tyrosine kinase, with instances of a Patient X and the drug Sunitinib. All concepts are derived from the class owl:Thing. Adapted from the NCIt.
Figure 2PRISMA flowchart with the steps taken to reach the final list of articles for categorization.
Figure 3Classification schema for the works included in this articles.
New ontologies.
| Ref | Objective | Ontology Name | Domain | Reused Ontologies | Language |
|---|---|---|---|---|---|
| [ | Model lung cancer for the clinical decision support application Lung Cancer Assistant | LUCADA ontology | Clinical | SNOMED-CT | OWL |
| [ | Use a hybrid approach to build a breast cancer ontology | N/A | Breast Cancer | N/A | OWL |
| [ | Describe cancer cells and capture the properties of tumorigenesis | OncoCL | Cell Lines | CL, UBERON, BTO, Pathway Ontology, PATO, CPO, SO | OWL |
| [ | Represent the project domain and link the NeoMark data to other domains | NeoMark ontology | Clinical | BFO, RO | OWL |
| [ | Cancer reclassification and drug inference | N/A | Farmacology | N/A | N/A |
| [ | Drug target prediction | CRC ontology | Colorectal Cancer | PharmGKB | OWL |
| [ | Assist medical students and professionals in the breast cancer domain | OntoMama | Clinical | N/A | N/A |
| [ | Development of an ontology-driven survivor engagement framework for mobile apps | POCS | Social | FOAF | OWL |
| [ | Creation of TNM-O | TNM-O | Anatomical | FMA, BioTopLite 2 | OWL |
| [ | Represent obesity-related cancer (ORC) ontology to organize information and allow data querying | FOORC | Obesity Related Cancer | DOID | OWL |
| [ | Extraction of association rules from large datasets on gastric cancer patients | Gastric cancer ontology | Clinical | N/A | N/A |
| [ | Aid data integration; enable association between SE variables and health outcomes | OCRSEV | Social-Ecological Factors | BFO | OWL |
| [ | Interoperability across quantitative histopathological imaging data sets | QHIO | Imaging | OBI | OWL |
| [ | Design of a semantic model for local cancer registries | N/A | Epidemiology | SIO, OBI | OWL |
| [ | Development of ontologies for the public health domain | N/A | Public Health | N/A | OWL |
| [ | Understand cellular responses to different perturbations | LINCS-CLOview | Cell Lines | CLO | OWL |
| [ | Integrate heterogeneous datasets | OCRV | Cancer Outcomes | BFO, NCIt, TEO | OWL |
| [ | Define a specific terminological system to standardized data collection for head and neck cancer patients | ENT COBRA ontology | Clinical | N/A | N/A |
| [ | Use structured knowledge representation with concepts of treatment end points | CCTOO | Clinical | NCIt, CTCAE | OBO |
| [ | Represent the data elements identified by the synoptic worksheets of College of American Pathologists | SNOMED CT observable ontology | Clinical | SNOMED CT, LOINC | N/A |
| [ | Create a standardized hierarchic ontology of cancer treatments, mapped to standard nomenclatures | N/A | Cancer Treatments | HemOnc | OWL |
| [ | Increase interoperability between data sources to allow the creation of Big Data studies involving several treatment centers | ROS | Radiation Oncology | FMA | OWL |
| [ | Create temporal ontology of survival outcome measures of clinical trials in oncology | TOCSOC | Clinical | EFO, CCTOO, IOBC, NCIT | OWL |
| [ | Provide an ontological representation of immunophenotyping cell types found in hematologic malignancies | CCL | Hematologic Malignancies | CL | OWL |
| [ | Semi-automatic development of CHV for breast cancer | MuEVo | Clinical | MeSH, MedDRA, SNOMEDint | SKOS |
| [ | Offer ontology-based approach modeling HCC tumors | OntHCC | Liver Cancer | N/A | OWL |
| [ | Support integrative data analysis in cancer outcomes research | ODVDS | Risk Factors | BFO | OWL |
| [ | Cytological tissue image analysis of cervical cancer | CCOWL | Cervical Cancer | N/A | OWL |
| [ | Standardize the terminology used in the selection and integration steps of RF variables and data sources | OD-ATTEST | Risk Factors | BFO, others in NCBO (not specified) | OWL |
| [ | Standardize data collection for non-melanoma skin cancer patients treated with brachytherapy | SKIN-COBRA ontology | Clinical | N/A | N/A |
| [ | Analyze social media data to identify information needs and emotions related to cancer | N/A | Social | LCO, BCO, GCO, SOSW | N/A |
| [ | Solve the heterogeneity and diversity of different data types related to prostate cancer by establishing a standardized lifestyle ontology | PCLiON | Risk Factors | NCIT, WordNet, SNOMED CT, The Cochrane Library, FooDB, CheBI | OWL |
| [ | Build a knowledge graph that represents causal associations between incidence of breast cancer and risk factors | RiskExplorer | Clinical | UMLS | N/A |
| [ | Facilitate the integrity and maintenance of ENCR core data set. | ENCR core-data | Epidemiology | N/A | OWL |
| [ | Minimizing vagueness in the formalization of medical knowledge | BCFO | Clinical | DO | OWL |
| [ | Predict side effects of bladder cancer treatments | N/A | Bladder Cancer | N/A | OWL |
| [ | Provide a generalizing pattern of more concise definitions to correctly classify all tumor configurations | N/A | Gastrointestinal Tumors | BioTopLite2 | N/A |
Terminology-focused applications.
| Ref | Summary | Ontologies | Data | Tag | Cancer Type |
|---|---|---|---|---|---|
| [ | Ontology for a clinical decision support system to produce treatment recommendations | SNOMED-CT, New ontology | N/A | Database Interface | Lung |
| [ | Ontology-based querying for cancer research data | NCIt | N/A | Database Interface | Various |
| [ | Mining of genetic marker data in a journal | SNOMED-CT, HUGO | NEJM | NLP | Various |
| [ | Automatic translation of NeoMark relational database | BFO, RO, OBI, OGMS, HDO | NeoMark database | Data Integration | OSCC |
| [ | Manual identification and inference of associations between breast cancer drugs | New ontology | PharmGKB, NCI | Data Annotation | Breast |
| [ | Genome-wide functional predictions of lncRNAs | GO | Gencode, Ensembl, ENCODE project LncRNA Ontology | Data Integration | Various |
| [ | Extraction of semantic entities in eligibility criteria and annotation | UMLS | CTG | Data Integration, Database Interface, NLP | Breast |
| [ | Development of an ontology-driven survivor engagement framework for mobile apps | FOAF | N/A | Database Interface, Data Annotation | POCS |
| [ | Prediction of clinical outcomes from a graph-based approach with multi-omics and genetic data | GO | TCGA | Data Integration | Ovarian |
| [ | Development of a focused view within the DO from cancer datasets | DO | COSMIC, TCGA, ICGC, TARGET, IO, EDRN | Data Integration | Various |
| [ | Development of a platform for analysis and visualization of data | ICD10, ICD-O-3, TNM staging, SIO, OBI, OQuaRE | NCRI | Data Annotation, Database Interface | Various |
| [ | Automatic annotation of cancer hallmarks on biomedical literature | MeSH | N/A | Data Annotation, NLP | Various |
| [ | Connection of predictors with cancer survival with a use-case ontology | OCRV | FCDS 2000 U.S. census, BRFSS | Data Integration | Various |
| [ | Data integration of several databases with ontologies to enable querying of patient data | DO, UBERON | TCIA, TCGA, LIDC-IDRI, Head-Neck-PET-CT | Data Integration | Various |
| [ | Construcion of OCRV based on data analysis needs | NCIt, TEO, ICD-O-3, ICD-9-CM | UF Health CCCA, FCDS, ATSDR, USCB, BRFSS, County Health Ranking & Roadmaps | Data Integration | Various |
| [ | Manual representation of semantic temporal components of CDEs | TEO | NCI, caDSR | Data Annotation | Various |
| [ | Ontology built following the MethOntology methodology [ | DICOM | University Hospital of Clermont-Ferrand | Data Annotation | HCC |
| [ | Semi-automatic development of CHV for breast cancer | INDC dictionary | N/A | NLP | Various |
| [ | KG of cancer registry data, with data analysis and visualization | New ontology | LTR | Data Integration, Database Interface | Various |
| [ | Development of an ontology to understand information needs and emotions | LCO, BCO, GCO, SOSW | N/A | NLP | Various |
| [ | KGHC is a KG constructed from clinical data available publicly | UMLS | PubMed, UpToDate, CTG, SemMedDB | Data Integration | HCC |
| [ | Functional annotation of circRNAs obtained from sequencing lung cell lines | GO | Lung cell lines sequencing data | Database Interface | Lung |
| [ | IMI is a web-based system that creates mappings from the NAACCR data dictionary to NCIt | NAACCR data dictionary, NCIt | KCR | Data Integration, Database Interface | Various |
| [ | Comparative analysis of cancer hallmark mapping strategies | GO | MSigDB, KEGG, cancer hallmark mapping schemes, TCGA | Data Integration | Various |
Semantic-focused applications: reasoning with ontologies.
| Ref | Objective | Input Ontologies | Reasoner | Tag | Cancer Type |
|---|---|---|---|---|---|
| [ | Determine cancer type and stage of the patient to recommend treatments | LuCO, BCO, LCO | FaCT++ | New Knowledge Inference | Various |
| [ | Identification of new indications for existing drugs | New ontology | Automated semantic inference (Protégé) | New knowledge Inference | Breast |
| [ | Prediction of new drug targets | New ontology | Pellet (Protégé) | New knowledge Inference | Colorectal |
| [ | Extraction of association rules from large datasets on gastric cancer patients | GCO | Apriori algorithm | New Knowledge Inference | Gastric |
| [ | Provide a generalizing pattern of more concise definitions to correctly classify all tumor configurations | New ontology | HermiT DL (Protégé) | Error Detection | Various |
| [ | Creation of TNM-O | FMA, BioTopLite 2 | HermIT DL | Error Detection | Various |
| [ | Predict side effects of bladder cancer treatments | New ontology | Pellet (Protégé) | New knowledge Inference + Error Detection | Bladder |
| [ | Signal rule violations in a validation process of multiple primary tumors international rules | ICD-O-3 | FaCT++, HermiT | New knowledge Inference + Error Detection | Multiple primary tumors |
| [ | Facilitate the integrity and maintenance of ENCR core data set | New ontology | FaCT++ (Protégé) | Error Detection | Various |
| [ | Minimizing vagueness in the formalization of medical knowledge | DO | Fuzzy DL, HermiT/Pellet (Protégé) | Error Detection | Breast |
Semantic-focused applications: mining and analyzing multimodal data with ontologies.
| Ref | Objective | Method | Input Ontologies | Input Data | Tag | Cancer Type |
|---|---|---|---|---|---|---|
| [ | Mining of genetic marker data in a journal | MCVS NLP engine | SNOMED CT, HUGO | NEJM | ML | Various |
| [ | Ontology-based querying for cancer research data | Construction of a OWL Generation facility | NCIt | caGrid | ML | Various |
| [ | Represent the project domain and link the NeoMark data to other domains | Bayesian Networks, ANN, SVMs, Decision Trees, Random Forests | BFO, RO, OBI, OGMS, HDO | N/A | ML | OSCC |
| [ | Cancer reclassification and drug inference | Vazquez Bayesian clustering algorithm | N/A | HemOnc.org | ML | Various |
| [ | Ontological application in Clinical Decision Support | CBR and MAS | UML | Patient Health Records | ML | Gastric |
| [ | Prediction of new drug targets | KEGG functional PharmGKB drug annotation. Network neighborhood modeling ranking | New ontology, ATC | PharmGKB, GAD, CGC, OMIM, NCI, DrugBank, TTD | ML | Colorectal |
| [ | Design of a semantic model for local cancer registries | Ontology-driven search filters and aggregates properties of interest | ICD10, ICD-O-3, TNM staging, SIO, OBI, OQuaRE | NCRI | Filtering | Various |
| [ | Discover patterns related to the patients’ ability to perform daily living activities | AQ21—multi-task ML and data mining system | UMLS | Surveillance, Epidemiology, and End Results—Medicare HOS | ML | Various |
| [ | Automatic annotation of cancer hallmarks on biomedical literature | United Decision Tree and Random Forest | MeSH | Pubmed abstracts | ML | Various |
| [ | Prediction of microRNA related to glucocorticoid resistance | Manual background literature search. Semantic searches in resulting subset | OMIT, NCRO, MeSH | PubMed | Filtering | Pediatric ALL |
| [ | Cancer-related gene prioritization | Fuzzy similarity | GO | GSEA website, TCGA, SNP4Disease | Similarity | PAC, Breast |
| [ | Predict drug synergy in cancer treatment | Stacked Restricted Boltzmann machine | GO, Ontology Fingerprints | AstraZeneca-Sanger Drug Combination Prediction Challenge, GDSC, KEGG | ML | Various |
| [ | Identification of cancer driver genes with role distinction | Neuro-symbolic deep learning on semantic knowledge representation on genetic information | CMPO, GO, MP | Uniprot, MGI database, Mutational Cancer Drivers Database, CPD | ML | Naso-pharyngeal, Colorectal |
| [ | Identification of relevant, expression data non-redundant cancer gene markers | Unsupervised Multi-View Multi-Objective clustering | GO | Gene expression datasets from own lab | ML | Prostate, DLBCL, FL |
| [ | Predict cervical cancer cells from cytological tissue images | DNN | New ontology | hospital cervical cancer data, kaggle data repository | ML | Cervical |
| [ | Complement system role inference from immunofunctionome analysis | SVMs | GO | GEO database | ML | OCCC |
| [ | Cancer detection based on gene expression data | Multilayer Perceptrons | GO | Affymetrix HG-U133Plus2 chip arrays, TCGA | ML | Various |
| [ | Tolerating data missing in breast cancer diagnosis from clinical ultrasound reports | KG embeddings | BI-RADS | Ultrasound reports | ML | Breast |
| [ | Real-time inference on a lung KG | GAT | New ontology | KEGG, Uniprot, DrugBank, TCGA | ML | Lung |