Literature DB >> 30157517

As Ontologies Reach Maturity, Artificial Intelligence Starts Being Fully Efficient: Findings from the Section on Knowledge Representation and Management for the Yearbook 2018.

Ferdinand Dhombres^1,2, Jean Charlet^1,3.

Abstract

OBJECTIVES: To select, present, and summarize the best papers published in 2017 in the field of Knowledge Representation and Management (KRM).
METHODS: A comprehensive and standardized review of the medical informatics literature was performed to select the most interesting papers of KRM published in 2017, based on a PubMed query.
RESULTS: In direct line with the research on data integration presented in the KRM section of the 2017 edition of the International Medical Informatics Association (IMIA) Yearbook, the five best papers for 2018 demonstrate even further the added-value of ontology-based integration approaches for phenotype-genotype association mining. Additionally, among the 15 preselected papers, two aspects of KRM are in the spotlight: the design of knowledge bases and new challenges in using ontologies.
CONCLUSIONS: Ontologies are demonstrating their maturity to integrate medical data and begin to support clinical practices. New challenges have emerged: the query on distributed semantically annotated datasets, the efficiency of semantic annotation processes, the semantic representation of large textual datasets, the control of biases associated with semantic annotations, and the computation of Bayesian indicators on data annotated with ontologies. Georg Thieme Verlag KG Stuttgart.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Year: 2018 PMID： 30157517 PMCID： PMC6115232 DOI： 10.1055/s-0038-1667078

Source DB: PubMed Journal: Yearb Med Inform ISSN： 0943-4747

Introduction

The year 2017 has produced a large amount of publications related to Knowledge Representation and Management (KRM) in medicine. KRM focuses on the development of techniques to be used and leveraged in other medical informatics domains. In recent years, especially the last two years, we have observed an increasing number of works combining ontology engineering and supervised learning technologies. In this context, the nature and impact of ontology is discussed in different papers. In this review, we present a selection of some of the best papers published in 2017 in the KRM domain, based either on their impact or the novelty of the approach proposed in the medical knowledge representation and management field.

Paper Selection Method

We conducted the selection of KRM papers in PubMed/MELDINE based on the same query as in the previous edition of the International Medical Informatics Association (IMIA) Yearbook 1 . We followed a generic method, commonly used in all sections of the Yearbook since 2013. As for the last four years, the search was performed on MEDLINE by querying PubMed. Our query includes MeSH descriptors related to the KRM field in the context of medical informatics with a restriction to international peer-reviewed papers, including international conference proceedings indexed in PubMed. Only original research articles published in 2017 (from 01/01/2017 to 12/31/2017) were considered; we excluded the following publications types: reviews, editorials, comments, and letters to the editor. We limited the search on the major MeSH descriptors (for example “biomedical ontologies [MAJR]”) to avoid retrieving a large set of articles, and we completed it by non-MeSH terms searched on the titles and abstracts of the articles (for example “terminologies [TIAB]”). The selection of best papers was performed in a three-step process among the papers retrieved by the query. At the first step, section editors reviewed all titles, abstracts, and types of publications to establish a short list of 15 candidate best papers. At the second step, five experts (including the two section editors) reviewed the candidate best papers using the IMIA Yearbook quality criteria scoring method. More specifically, the following aspects of papers were evaluated: significance, quality of scientific content, originality and innovativeness, coverage of the related literature, organization and quality of the presentation. The final step of the selection was achieved during a meeting of the whole editorial board, based on the external reviews and the report of the two section editors.

Results

The KRM query retrieved 1,998 citations from PubMed. It accounts for a 41% increase in comparison with the results of the same query applied on papers published in 2016. Section editors achieved a first selection of 100 papers based on titles and abstracts. After a second review of this set of papers, including full text reviews, a selection of 15 candidate best papers was established 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 . Five reviewers scored these 15 pre-selected papers and finally selected five final best papers 2 3 4 5 6 . In direct line with the research on data integration presented last year 1 , the 2017's best papers demonstrate even further the added-value of ontology-based integration approaches for phenotype-genotype association mining. Three papers describe systems 2 3 6 . The first paper presents a system that exploits semantic technologies with automated reasoning over genotype-phenotype relations to prioritize variants in whole exome and whole genome sequencing datasets 2 . The second paper describes a system which predicts the associations between human genes and phenotypes in Human Phenotype Ontology (HPO) based on human protein–protein interaction network 6 . The third paper presents a system which assembles high-throughput sequencing and microarray data by measuring the semantic similarity between different samples, with the aim of combining experiments associated with similar semantic annotations 3 . The fourth paper introduces a novel method to capture the hierarchical relationships between ontology classes during the learning process, thus allowing novel associations between genes and abnormal phenotypes prediction, with a significant reduction of computational costs 5 . In the fifth paper, the authors introduce a query federation engine that enables policy-aware access to healthcare sensitive data sets represented as Resource Description Framework (RDF) data cubes 4 . Among the ten other selected papers for 2017, we observed two other directions in the research conducted in the KRM field, in addition to the research on ontology-based data integration for phenotype-genotype association mining. One is the design of knowledge bases and their application, a “traditional” topic in our field. The other describes a series of emerging challenges associated with the growing use of ontologies. In the next paragraphs, we grouped the selected papers in these three dimensions of the KRM research: the mining of genotype-phenotype associations, the design of knowledge bases, and the new challenges in using ontologies.

Ontology-based Integration Approaches for Phenotype-genotype Association Mining

Four of the five best papers are presenting research in this specific domain 2 3 5 6 . The papers are summarized in the Appendix of this synopsis. The use of HPO was the reference for the phenotyping in three papers. In the paper from Galeota and Pelizzola 3 , the use of Open Biological Ontologies (OBO) ontologies-based tools showed better results than Unified Medical Language System (UMLS)-based tools for the phenotype annotations. This might be a consequence of using UMLS version 2014AA released before the HPO integration to UMLS (in version 2015AB and later) and also a consequence of using topic-specific OBO ontologies. The paper of Boudellioua et al. 2 was the best rated of the selection. It should be noted that it is the one that built the most complex models by using several ontologies and by using a rather complex process of integration of those ontologies in a formally valid final file. The originality of the paper of Galeota and Pelizzola 3 lies in the comparison of the annotation resources used. If the comparison between UMLS and OBO ontologies can be discussed, it allows to question the viability of resources developed specifically versus initiatives, like UMLS, which seek to integrate widely specific resources with long term maintenance. From this point of view, the previous article 2 has chosen to make itself the integration of all needed resources. It is important to note that the article by Notaro et al. 5 and the one of Petegrosso et al. 6 relate to very similar subjects that lead the authors to develop learning algorithms that take into account the hierarchical structure of ontologies, be it HPO or Gene Ontology (GO). Overall, these four papers confirm a fair maturity in available resources for the operational semantic description of phenotypes. In the same direction, the paper from Alonso-Calvo et al. 7 addresses the integration of genomic and clinical data in European data centers, by developing a semantic interoperability layer which uses standard terminologies and standards for information management (e.g. Health Language Seven (HL7), Relational Data Bases to RDF Modeling Language (R2RML)).

Ontologies in Action: Knowledge Base Design and Applications

Zhang et al. 16 introduced an ontology-based framework to integrate patient data, medical domain knowledge, and patient assessment criteria for chronic disease patient follow-up assessments. This framework was instantiated using real clinical data (115K follow-up assessment records of 36K type 2 diabetic patients) and resulted in a clinical decision support system (CDSS) for the automatic selection and adaptation of standard assessment protocols to suit patient personal conditions. The system demonstrated significant performances (accuracy of 99.93%, completeness of 95.00%), thus contributing to the improvement of the accessibility, the efficiency, and the quality of patient follow-up services. This approach being generic to knowledge sharing and reuse for patient-centered chronic disease management, this work paves the way for the development of CDSS for the care of other chronic disease patients. In the same vein of leveraging semantic representation to support clinical practices, Esteban-Gil et al. 10 presented a semantic interoperability platform with a real focus on usability for clinicians. The platform design is detailed with relevant choices of Semantic Web techniques supporting interactively manipulatable decision support (and visualizations) derived from cancer registries data. As of today, this promising platform was assessed only on simulation data and awaits for a clinical evaluation. These two systems demonstrated the efficacy of using semantic representations of data to develop CDSS. In 2017 however, it appears that these systems, some of them being fully functional, are still at early stages of deployment, with poor usability in clinical practice. In contrast, the development of knowledge bases and ontological resources has become usual in the KRM medical community. Four years after the initial publication of the Protein Ontology (PRO), Natale et al. 13 presented the recent developments of this OBO resource. This group developed a standardized description of proteoforms, based on a specific syntax, that enforces the interoperability with Reactome (a biological pathway reference resource) and consequently with Reactome-related resources (Open Targets, Chemical Entities of Biological Interest (ChEBI) and UniProt). This significant evolution enhanced the coverage of PRO up to ~60% of Reactome proteoforms. Interoperability with other resources was also implemented via new pipelines, including dynamic generation of terms. Among many evolutions, the PRO team has addressed community needs and provided an Ontology Web Language (OWL) version and a SPARQL Protocol and RDF Query Language (SPARQL) endpoint. The Immune Epitope Database (IEDB) project is another example of knowledge base. It was presented by Vita et al. 15 , illustrating the practical consequences of good practices in controlled vocabularies integration, with curation processes simplification and efficient interoperability between the IEDB and other resources. Similarly, Gipson and his group 11 developed a terminology for pediatric adverse events (PAEs). Although without significant novelty from the perspective of research in terminology design, this paper emerges as a good illustration of an international collaboration for achieving a shared PAE terminology with an appropriate integration within Medical Dictionary for Regulatory Activities (MedDRA) and other UMLS terminologies.

New Challenges in Using Ontologies

With the large adoption of ontologies in the KRM process, new challenges both technical and methodological arose. Two technical challenges were addressed in 2017, the need for efficient access and query on distributed semantically annotated datasets, and the need for efficient semantic annotation processes. Three methodological challenges were identified: semantic representation of large textual datasets, potential biases associated with semantic annotations, and computation of Bayesian indicators on data annotated with ontologies. In one of the five best papers selected this year, Khan et al. 4 addressed the first technical challenge with SAFE, a SPARQL-federated query system for RDF data cubes with access control. There are some research works about SPARQL-federated query systems and access control for SPARQL query engines and the RDF Data Cube Vocabulary exists already as the world wide web consortium (W3C) recommendation for describing data cubes as RDF. However, the authors successfully integrated these three elements in SAFE and they compared its performance against existing SPARQL-federated query systems with clear advance on the latter. Besides the raw performances demonstrated in this work, there is no actual experiment in medicine that could prove its usefulness, although good results are expected. The second technical challenge is the semantic annotation and Cuzzola et al. 9 gave a good overview on existing approaches. At the frontier with the clinical Natural Language Processing (NLP) selection of papers for this edition of the Yearbook, the discussion and methods made this work also relevant to the KRM community. The semantic annotation is crucial for many KRM processes and the presented annotator (RysannMD) exhibited very promising results (precision, recall, and F1 measure and processing time) as it was demonstrated in a benchmarking experiment with other modern annotators (cTAKES, MetaMap, NOBLE Coder, and Neji). One key feature of this tool is an efficient semantic disambiguation that relies on the UMLS Semantic Network®. Additionally, this tool is immediately applicable. Among the methodological challenges, Shi et al. 14 established a novel approach to integrate textual medical knowledge. With a specific model and NLP techniques, they converted medical texts into conceptual graphs and pruned meaningless inferences with an experimental algorithm. Although proper experiments on real datasets (electronic health records) were not presented in the paper, this approach represents a significant contribution for semantic processes over large medical corpora. Kulmanov et al. 12 investigated the impact of the annotation size on a large number of measures of semantic similarity. This has become of major interest given the growing methods relying on similarity measures in particular in the field of omics data semantic analysis. They concluded that most measures were sensitive to the number of annotations per entities, to the difference in annotation sizes among compared entities, and to the concepts’ depth in the ontology. However, this work does not discuss the negative impact of these biases, neither present the methods to control them. In any case, further work on the potential solutions that could be used to quantify and control the effect of annotation size on similarity measures would be of major interest to the KRM community. The last methodological challenge was addressed by Barton et al. 8 , who provided an elegant theoretical basis for the use of ontologies for Bayesian indicators calculation, accounting for the granularity represented in these ontologies (i.e. the “spectrum effect”). This work introduced a meaningful method to derive the usual Bayesian indicators of performance (i.e. sensitivity, specificity, positive predictive value, and negative predictive value), which are mandatory indicators in the medical research, when data are annotated with ontologies.

Conclusions

After the refinement in ontology design and numerous initiatives for ontologies integration into knowledge-based systems observed in 2016, significant consequent results were published in 2017. The first major advances were in the field of genetics, where ontologies appeared fully instrumental to phenotype-genotype associations mining, mainly supported by semantic similarity measurements. The “routine” work in ontology and knowledge base design remains a significant part of KRM research, however with very high quality in methods. The growing use of ontologies has led to identifying new challenges for 2018 KRM research: the query on distributed semantically annotated datasets, the efficiency of semantic annotation processes, the semantic representation of large textual datasets, the control of biases associated with semantic annotations, and the computation of Bayesian indicators on data annotated with ontologies.

Table 1

Best paper selection of articles for the IMIA Yearbook of Medical Informatics 2018 in the section ‘Knowledge Representation and Management'. The articles are listed in alphabetical order of the first author's surname.

SectionKnowledge Representation and Management
▪ Boudellioua I, Mahamad Razali RB, Kulmanov M, Hashish Y, Bajic VB, Goncalves-Serra E, Schoenmakers N, Gkoutos GV, Schofield PN, Hoehndorf R. Semantic prioritization of novel causative genomic variants. PLoS Comput Biol 2017;13(4):e1005500.
▪ Galeota E, Pelizzola M. Ontology-based annotations and semantic relations in large-scale (epi)genomics data. Brief Bioinform 2017;18(3):403-12.
▪ Khan Y, Saleem M, Mehdi M, Hogan A, Mehmood Q, Rebholz-Schuhmann D, Sahay R. SAFE: SPARQL Federation over RDF Data Cubes with Access Control. J Biomed Semantics 2017;8(1):5.
▪ Notaro M, Schubach M, Robinson PN, Valentini G. Prediction of Human Phenotype Ontology terms by means of hierarchical ensemble methods. BMC Bioinformatics 2017;18(1):449.
▪ Petegrosso R, Park S, Hwang TH, Kuang R. Transfer learning across ontologies for phenome-genome association prediction. Bioinformatics 2017;33(4):529-36.

5 in total

1. Comparison of the cohort selection performance of Australian Medicines Terminology to Anatomical Therapeutic Chemical mappings.

Authors: Guan N Guo; Jitendra Jonnagaddala; Sanjay Farshid; Vojtech Huser; Christian Reich; Siaw-Teng Liaw
Journal: J Am Med Inform Assoc Date: 2019-11-01 Impact factor: 4.497

Review 2. Formal Medical Knowledge Representation Supports Deep Learning Algorithms, Bioinformatics Pipelines, Genomics Data Analysis, and Big Data Processes.

Authors: Ferdinand Dhombres; Jean Charlet
Journal: Yearb Med Inform Date: 2019-08-16

Review 3. Design and Use of Semantic Resources: Findings from the Section on Knowledge Representation and Management of the 2020 International Medical Informatics Association Yearbook.

Authors: Ferdinand Dhombres; Jean Charlet
Journal: Yearb Med Inform Date: 2020-08-21

Review 4. Contributions of Artificial Intelligence Reported in Obstetrics and Gynecology Journals: Systematic Review.

Authors: Ferdinand Dhombres; Jules Bonnard; Kévin Bailly; Paul Maurice; Aris T Papageorghiou; Jean-Marie Jouannic
Journal: J Med Internet Res Date: 2022-04-20 Impact factor: 7.076

5. Construction of Hospital Human Resource Information Management System under the Background of Artificial Intelligence.

Authors: Xiaona Yu; Chunmei Zhang; Chengcheng Wang
Journal: Comput Math Methods Med Date: 2022-08-04 Impact factor: 2.809

5 in total