Literature DB >> 35454813

Ontologies and Knowledge Graphs in Oncology Research.

Marta Contreiras Silva¹, Patrícia Eugénio¹, Daniel Faria¹, Catia Pesquita¹.

Abstract

The complexity of cancer research stems from leaning on several biomedical disciplines for relevant sources of data, many of which are complex in their own right. A holistic view of cancer-which is critical for precision medicine approaches-hinges on integrating a variety of heterogeneous data sources under a cohesive knowledge model, a role which biomedical ontologies can fill. This study reviews the application of ontologies and knowledge graphs in cancer research. In total, our review encompasses 141 published works, which we categorized under 14 hierarchical categories according to their usage of ontologies and knowledge graphs. We also review the most commonly used ontologies and newly developed ones. Our review highlights the growing traction of ontologies in biomedical research in general, and cancer research in particular. Ontologies enable data accessibility, interoperability and integration, support data analysis, facilitate data interpretation and data mining, and more recently, with the emergence of the knowledge graph paradigm, support the application of Artificial Intelligence methods to unlock new knowledge from a holistic view of the available large volumes of heterogeneous data.

Entities: Chemical

Keywords: cancer; knowledge graph; oncology; ontologies; review; semantic technologies

Year: 2022 PMID： 35454813 PMCID： PMC9029532 DOI： 10.3390/cancers14081906

Source DB: PubMed Journal: Cancers (Basel) ISSN： 2072-6694 Impact factor: 6.575

1. Introduction

Understanding complex phenomena that cannot be modeled purely mathematically is a challenging endeavor transverse to all biomedical research. Ultimately, all boils down to the complex interplay between genes and environment, which manifests in the interactions between the cells in an organism, between host and pathogen, between drug and body. From its genesis, medicine focused on understanding the phenomena which can be generalized between individuals, dating back to the first texts on anatomy by the Ancient Egyptians. Indeed, nomenclature and classification are the first steps towards understanding complex phenomena, and are inextricable from modern medicine, which relies on its precise terminology and its compendium of pathogens, diseases, symptoms, genes and mutations, and drugs and therapies, as well as of the relationships between them. Over the last three decades, the rise of the digital age and subsequent informatization of clinical records and biomedical research drove the encoding of terminologies, classification schemes and knowledge models into digital machine-readable formats (often captured under the umbrella term ‘ontology’) to promote standardization, support information systems, and enable knowledge discovery. One of the first major efforts to this effect in the biomedical domain was the compilation of the Systematized Nomenclature of Medicine—Clinical Terms (SNOMED CT) [1] to support the standardization and interoperability of clinical information systems and electronic health records. Another major effort was the classification and trans-species standardization of gene functional characteristics under the Gene Ontology (GO) [2]. In the footsteps of these efforts, several hundred other ontologies have been developed for the biomedical domain throughout the years [3], among which we must note the National Cancer Institute Thesaurus (NCIt), a compendium of terminology spanning all aspects of cancer research and health care [4]. More recently, medicine has been witnessing a shift towards the particular, enabled by the decreasing costs of acquiring genetic information, and driven by the understanding that tailored treatments that contemplate the genetic makeup of the patient will likely be more effective and less prone to nefarious side-effects. Cancer is the family of diseases that is benefiting from these precision (or personalized) medicine approaches the most, as despite commonalities, each cancer is genetically unique, and can react very differently to different types of treatment. Moreover, understanding the fine differences between cancer cells and healthy cells can be the key for more successful and less aggressive treatments. Yet the precision medicine paradigm places additional emphasis on having a holistic understanding of the gene–environment interplay in all its manifestations, which requires the integrative analysis of large volumes of heterogeneous data that are individually already complex (e.g., clinical records, medical imaging, transcriptomic data, immunopeptidomic data) [5]. Here too, ontologies have been playing an important role in enabling data integration and facilitating data analysis. In this article, we review the applications of ontologies in cancer research over the past decade, summarizing published works within this time frame, and categorizing them with respect to their usage of ontologies. Section 2 details core concepts underlying this review article, Section 3 outlines the methodology adopted to conduct the review, Section 4 summarizes both the ontologies reused in the works and the ones created for them, Section 5 reviews and categorizes the aforementioned published works, and Section 6 features our prospects regarding the present and future use of ontologies in cancer research.

2. Background

2.1. Ontologies

The term “ontology” was borrowed from philosophy to computer science to signify a machine-readable formalization of a conceptualization pertaining to a particular domain of knowledge [6]. That is to say, an ontology is a digital artifact that can be interpreted by both humans and computers and which encodes the terminology and the semantic relations between concepts in a given domain. The term “ontology” is often used with some latitude, also encompassing thesauri [7]. While our review of published works adopts the same encompassing perspective, it is important to make a formal distinction between ontologies proper and thesauri due to their different purposes and applications. Ontologies proper are typically encoded in the Web Ontology Language (OWL), developed by the W3C OWL Working Group [8], which includes various serializations, namely the Open Biomedical Ontologies (OBO) format or the more popular Resource Description Framework (RDF) format in which statements take the form of triples of the form

Ref	Objective	Ontology Name	Domain	Reused Ontologies	Language
[51]	Model lung cancer for the clinical decision support application Lung Cancer Assistant	LUCADA ontology	Clinical	SNOMED-CT	OWL
[33]	Use a hybrid approach to build a breast cancer ontology	N/A	Breast Cancer	N/A	OWL
[32]	Describe cancer cells and capture the properties of tumorigenesis	OncoCL	Cell Lines	CL, UBERON, BTO, Pathway Ontology, PATO, CPO, SO	OWL
[11]	Represent the project domain and link the NeoMark data to other domains	NeoMark ontology	Clinical	BFO, RO	OWL
[50]	Cancer reclassification and drug inference	N/A	Farmacology	N/A	N/A
[54]	Drug target prediction	CRC ontology	Colorectal Cancer	PharmGKB	OWL
[63]	Assist medical students and professionals in the breast cancer domain	OntoMama	Clinical	N/A	N/A
[34]	Development of an ontology-driven survivor engagement framework for mobile apps	POCS	Social	FOAF	OWL
[46]	Creation of TNM-O	TNM-O	Anatomical	FMA, BioTopLite 2	OWL
[41]	Represent obesity-related cancer (ORC) ontology to organize information and allow data querying	FOORC	Obesity Related Cancer	DOID	OWL
[49]	Extraction of association rules from large datasets on gastric cancer patients	Gastric cancer ontology	Clinical	N/A	N/A
[55]	Aid data integration; enable association between SE variables and health outcomes	OCRSEV	Social-Ecological Factors	BFO	OWL
[45]	Interoperability across quantitative histopathological imaging data sets	QHIO	Imaging	OBI	OWL
[39]	Design of a semantic model for local cancer registries	N/A	Epidemiology	SIO, OBI	OWL
[40]	Development of ontologies for the public health domain	N/A	Public Health	N/A	OWL
[61]	Understand cellular responses to different perturbations	LINCS-CLOview	Cell Lines	CLO	OWL
[53]	Integrate heterogeneous datasets	OCRV	Cancer Outcomes	BFO, NCIt, TEO	OWL
[47]	Define a specific terminological system to standardized data collection for head and neck cancer patients	ENT COBRA ontology	Clinical	N/A	N/A
[10]	Use structured knowledge representation with concepts of treatment end points	CCTOO	Clinical	NCIt, CTCAE	OBO
[62]	Represent the data elements identified by the synoptic worksheets of College of American Pathologists	SNOMED CT observable ontology	Clinical	SNOMED CT, LOINC	N/A
[35]	Create a standardized hierarchic ontology of cancer treatments, mapped to standard nomenclatures	N/A	Cancer Treatments	HemOnc	OWL
[56]	Increase interoperability between data sources to allow the creation of Big Data studies involving several treatment centers	ROS	Radiation Oncology	FMA	OWL
[36]	Create temporal ontology of survival outcome measures of clinical trials in oncology	TOCSOC	Clinical	EFO, CCTOO, IOBC, NCIT	OWL
[60]	Provide an ontological representation of immunophenotyping cell types found in hematologic malignancies	CCL	Hematologic Malignancies	CL	OWL
[42]	Semi-automatic development of CHV for breast cancer	MuEVo	Clinical	MeSH, MedDRA, SNOMEDint	SKOS
[44]	Offer ontology-based approach modeling HCC tumors	OntHCC	Liver Cancer	N/A	OWL
[57]	Support integrative data analysis in cancer outcomes research	ODVDS	Risk Factors	BFO	OWL
[58]	Cytological tissue image analysis of cervical cancer	CCOWL	Cervical Cancer	N/A	OWL
[31]	Standardize the terminology used in the selection and integration steps of RF variables and data sources	OD-ATTEST	Risk Factors	BFO, others in NCBO (not specified)	OWL
[48]	Standardize data collection for non-melanoma skin cancer patients treated with brachytherapy	SKIN-COBRA ontology	Clinical	N/A	N/A
[43]	Analyze social media data to identify information needs and emotions related to cancer	N/A	Social	LCO, BCO, GCO, SOSW	N/A
[37]	Solve the heterogeneity and diversity of different data types related to prostate cancer by establishing a standardized lifestyle ontology	PCLiON	Risk Factors	NCIT, WordNet, SNOMED CT, The Cochrane Library, FooDB, CheBI	OWL
[59]	Build a knowledge graph that represents causal associations between incidence of breast cancer and risk factors	RiskExplorer	Clinical	UMLS	N/A
[30]	Facilitate the integrity and maintenance of ENCR core data set.	ENCR core-data	Epidemiology	N/A	OWL
[14]	Minimizing vagueness in the formalization of medical knowledge	BCFO	Clinical	DO	OWL
[52]	Predict side effects of bladder cancer treatments	N/A	Bladder Cancer	N/A	OWL
[38]	Provide a generalizing pattern of more concise definitions to correctly classify all tumor configurations	N/A	Gastrointestinal Tumors	BioTopLite2	N/A

Ref	Summary	Ontologies	Data	Tag	Cancer Type
[51]	Ontology for a clinical decision support system to produce treatment recommendations	SNOMED-CT, New ontology	N/A	Database Interface	Lung
[74]	Ontology-based querying for cancer research data	NCIt	N/A	Database Interface	Various
[77]	Mining of genetic marker data in a journal	SNOMED-CT, HUGO	NEJM	NLP	Various
[11]	Automatic translation of NeoMark relational database	BFO, RO, OBI, OGMS, HDO	NeoMark database	Data Integration	OSCC
[15]	Manual identification and inference of associations between breast cancer drugs	New ontology	PharmGKB, NCI	Data Annotation	Breast
[65]	Genome-wide functional predictions of lncRNAs	GO	Gencode, Ensembl, ENCODE project LncRNA Ontology	Data Integration	Various
[66]	Extraction of semantic entities in eligibility criteria and annotation	UMLS	CTG	Data Integration, Database Interface, NLP	Breast
[34]	Development of an ontology-driven survivor engagement framework for mobile apps	FOAF	N/A	Database Interface, Data Annotation	POCS
[67]	Prediction of clinical outcomes from a graph-based approach with multi-omics and genetic data	GO	TCGA	Data Integration	Ovarian
[68]	Development of a focused view within the DO from cancer datasets	DO	COSMIC, TCGA, ICGC, TARGET, IO, EDRN	Data Integration	Various
[39]	Development of a platform for analysis and visualization of data	ICD10, ICD-O-3, TNM staging, SIO, OBI, OQuaRE	NCRI	Data Annotation, Database Interface	Various
[13]	Automatic annotation of cancer hallmarks on biomedical literature	MeSH	N/A	Data Annotation, NLP	Various
[70]	Connection of predictors with cancer survival with a use-case ontology	OCRV	FCDS 2000 U.S. census, BRFSS	Data Integration	Various
[69]	Data integration of several databases with ontologies to enable querying of patient data	DO, UBERON	TCIA, TCGA, LIDC-IDRI, Head-Neck-PET-CT	Data Integration	Various
[78]	Construcion of OCRV based on data analysis needs	NCIt, TEO, ICD-O-3, ICD-9-CM	UF Health CCCA, FCDS, ATSDR, USCB, BRFSS, County Health Ranking & Roadmaps	Data Integration	Various
[64]	Manual representation of semantic temporal components of CDEs	TEO	NCI, caDSR	Data Annotation	Various
[44]	Ontology built following the MethOntology methodology [79]	DICOM	University Hospital of Clermont-Ferrand	Data Annotation	HCC
[42]	Semi-automatic development of CHV for breast cancer	INDC dictionary	N/A	NLP	Various
[71]	KG of cancer registry data, with data analysis and visualization	New ontology	LTR	Data Integration, Database Interface	Various
[43]	Development of an ontology to understand information needs and emotions	LCO, BCO, GCO, SOSW	N/A	NLP	Various
[72]	KGHC is a KG constructed from clinical data available publicly	UMLS	PubMed, UpToDate, CTG, SemMedDB	Data Integration	HCC
[76]	Functional annotation of circRNAs obtained from sequencing lung cell lines	GO	Lung cell lines sequencing data	Database Interface	Lung
[12]	IMI is a web-based system that creates mappings from the NAACCR data dictionary to NCIt	NAACCR data dictionary, NCIt	KCR	Data Integration, Database Interface	Various
[73]	Comparative analysis of cancer hallmark mapping strategies	GO	MSigDB, KEGG, cancer hallmark mapping schemes, TCGA	Data Integration	Various

Ref	Objective	Input Ontologies	Reasoner	Tag	Cancer Type
[81]	Determine cancer type and stage of the patient to recommend treatments	LuCO, BCO, LCO	FaCT++	New Knowledge Inference	Various
[15]	Identification of new indications for existing drugs	New ontology	Automated semantic inference (Protégé)	New knowledge Inference	Breast
[82]	Prediction of new drug targets	New ontology	Pellet (Protégé)	New knowledge Inference	Colorectal
[49]	Extraction of association rules from large datasets on gastric cancer patients	GCO	Apriori algorithm	New Knowledge Inference	Gastric
[38]	Provide a generalizing pattern of more concise definitions to correctly classify all tumor configurations	New ontology	HermiT DL (Protégé)	Error Detection	Various
[46]	Creation of TNM-O	FMA, BioTopLite 2	HermIT DL	Error Detection	Various
[52]	Predict side effects of bladder cancer treatments	New ontology	Pellet (Protégé)	New knowledge Inference + Error Detection	Bladder
[83]	Signal rule violations in a validation process of multiple primary tumors international rules	ICD-O-3	FaCT++, HermiT	New knowledge Inference + Error Detection	Multiple primary tumors
[30]	Facilitate the integrity and maintenance of ENCR core data set	New ontology	FaCT++ (Protégé)	Error Detection	Various
[14]	Minimizing vagueness in the formalization of medical knowledge	DO	Fuzzy DL, HermiT/Pellet (Protégé)	Error Detection	Breast

Ref	Objective	Method	Input Ontologies	Input Data	Tag	Cancer Type
[77]	Mining of genetic marker data in a journal	MCVS NLP engine	SNOMED CT, HUGO	NEJM	ML	Various
[74]	Ontology-based querying for cancer research data	Construction of a OWL Generation facility	NCIt	caGrid	ML	Various
[11]	Represent the project domain and link the NeoMark data to other domains	Bayesian Networks, ANN, SVMs, Decision Trees, Random Forests	BFO, RO, OBI, OGMS, HDO	N/A	ML	OSCC
[50]	Cancer reclassification and drug inference	Vazquez Bayesian clustering algorithm	N/A	HemOnc.org	ML	Various
[19]	Ontological application in Clinical Decision Support	CBR and MAS	UML	Patient Health Records	ML	Gastric
[82]	Prediction of new drug targets	KEGG functional PharmGKB drug annotation. Network neighborhood modeling ranking	New ontology, ATC	PharmGKB, GAD, CGC, OMIM, NCI, DrugBank, TTD	ML	Colorectal
[39]	Design of a semantic model for local cancer registries	Ontology-driven search filters and aggregates properties of interest	ICD10, ICD-O-3, TNM staging, SIO, OBI, OQuaRE	NCRI	Filtering	Various
[90]	Discover patterns related to the patients’ ability to perform daily living activities	AQ21—multi-task ML and data mining system	UMLS	Surveillance, Epidemiology, and End Results—Medicare HOS	ML	Various
[13]	Automatic annotation of cancer hallmarks on biomedical literature	United Decision Tree and Random Forest	MeSH	Pubmed abstracts	ML	Various
[85]	Prediction of microRNA related to glucocorticoid resistance	Manual background literature search. Semantic searches in resulting subset	OMIT, NCRO, MeSH	PubMed	Filtering	Pediatric ALL
[17]	Cancer-related gene prioritization	Fuzzy similarity	GO	GSEA website, TCGA, SNP4Disease	Similarity	PAC, Breast
[161]	Predict drug synergy in cancer treatment	Stacked Restricted Boltzmann machine	GO, Ontology Fingerprints	AstraZeneca-Sanger Drug Combination Prediction Challenge, GDSC, KEGG	ML	Various
[18]	Identification of cancer driver genes with role distinction	Neuro-symbolic deep learning on semantic knowledge representation on genetic information	CMPO, GO, MP	Uniprot, MGI database, Mutational Cancer Drivers Database, CPD	ML	Naso-pharyngeal, Colorectal
[87]	Identification of relevant, expression data non-redundant cancer gene markers	Unsupervised Multi-View Multi-Objective clustering	GO	Gene expression datasets from own lab	ML	Prostate, DLBCL, FL
[58]	Predict cervical cancer cells from cytological tissue images	DNN	New ontology	hospital cervical cancer data, kaggle data repository	ML	Cervical
[88]	Complement system role inference from immunofunctionome analysis	SVMs	GO	GEO database	ML	OCCC
[89]	Cancer detection based on gene expression data	Multilayer Perceptrons	GO	Affymetrix HG-U133Plus2 chip arrays, TCGA	ML	Various
[91]	Tolerating data missing in breast cancer diagnosis from clinical ultrasound reports	KG embeddings	BI-RADS	Ultrasound reports	ML	Breast
[92]	Real-time inference on a lung KG	GAT	New ontology	KEGG, Uniprot, DrugBank, TCGA	ML	Lung

Ontologies and Knowledge Graphs in Oncology Research.

1. Introduction

2. Background

2.1. Ontologies

2.2. Ontologies in Cancer Research

3. Materials and Methods

3.1. Initial Search and Screening

3.2. Categorization

4. Ontologies in Oncology

4.1. Ontologies Used in the Reviewed Applications

4.2. Ontologies Created for the Reviewed Applications

5. Ontologies and Knowledge Graph Applications in Cancer Research

5.1. Terminology-Focused Applications

5.1.1. Data Annotation

5.1.2. Data Integration

5.1.3. Database Interfaces

5.1.4. Natural Language Processing

5.2. Semantic-Focused Applications

5.2.1. Formalized Definitions and Axioms: Reasoning with Ontologies

5.2.2. Mining and Analyzing Multimodal Data with Ontologies

6. Conclusions

1. Exploring the pharmacogenomics knowledge base (PharmGKB) for repositioning breast cancer drugs by leveraging Web ontology language (OWL) and cheminformatics approaches.

2. Time event ontology (TEO): to support semantic representation and reasoning of complex temporal relations of clinical events.

3. Microarray expression profiling of long non-coding RNAs in epithelial ovarian cancer.

4. Expression profile analysis identifies a two-gene signature for prediction of head and neck squamous cell carcinoma patient survival.

5. CLO: The cell line ontology.

6. Predicting activities of daily living for cancer patients using an ontology-guided machine learning methodology.

7. An ontology-guided semantic data integration framework to support integrative data analysis of cancer survival.

8. DNA methylation variations in familial female and male breast cancer.

9. Representation of Time-Relevant Common Data Elements in the Cancer Data Standards Repository: Statistical Evaluation of an Ontological Approach.

10. FNDC3B is associated with ER stress and poor prognosis in cervical cancer.