Literature DB >> 31255713

HPO2Vec+: Leveraging heterogeneous knowledge resources to enrich node embeddings for the Human Phenotype Ontology.

Feichen Shen1, Suyuan Peng2, Yadan Fan3, Andrew Wen4, Sijia Liu4, Yanshan Wang4, Liwei Wang4, Hongfang Liu5.   

Abstract

BACKGROUND: In precision medicine, deep phenotyping is defined as the precise and comprehensive analysis of phenotypic abnormalities, aiming to acquire a better understanding of the natural history of a disease and its genotype-phenotype associations. Detecting phenotypic relevance is an important task when translating precision medicine into clinical practice, especially for patient stratification tasks based on deep phenotyping. In our previous work, we developed node embeddings for the Human Phenotype Ontology (HPO) to assist in phenotypic relevance measurement incorporating distributed semantic representations. However, the derived HPO embeddings hold only distributed representations for IS-A relationships among nodes, hampering the ability to fully explore the graph.
METHODS: In this study, we developed a framework, HPO2Vec+, to enrich the produced HPO embeddings with heterogeneous knowledge resources (i.e., DECIPHER, OMIM, and Orphanet) for detecting phenotypic relevance. Specifically, we parsed disease-phenotype associations contained in these three resources to enrich non-inheritance relationships among phenotypic nodes in the HPO. To generate node embeddings for the HPO, node2vec was applied to perform node sampling on the enriched HPO graphs based on random walk followed by feature learning over the sampled nodes to generate enriched node embeddings. Four HPO embeddings were generated based on different graph structures, which we hereafter label as HPOEmb-Original, HPOEmb-DECIPHER, HPOEmb-OMIM, and HPOEmb-Orphanet. We evaluated the derived embeddings quantitatively through an HPO link prediction task with four edge embeddings operations and six machine learning algorithms. The resulting best embeddings were then evaluated for patient stratification of 10 rare diseases using electronic health records (EHR) collected at Mayo Clinic. We assessed our framework qualitatively by visualizing phenotypic clusters and conducting a use case study on primary hyperoxaluria (PH), a rare disease, on the task of inferring relevant phenotypes given 22 annotated PH related phenotypes.
RESULTS: The quantitative link prediction task shows that HPOEmb-Orphanet achieved an optimal AUROC of 0.92 and an average precision of 0.94. In addition, HPOEmb-Orphanet achieved an optimal F1 score of 0.86. The quantitative patient similarity measurement task indicates that HPOEmb-Orphanet achieved the highest average detection rate for similar patients over 10 rare diseases and performed better than other similarity measures implemented by an existing tool, HPOSim, especially for pairwise patients with fewer shared common phenotypes. The qualitative evaluation shows that the enriched HPO embeddings are generally able to detect relationships among nodes with fine granularity and HPOEmb-Orphanet is particularly good at associating phenotypes across different disease systems. For the use case of detecting relevant phenotypic characterizations for given PH related phenotypes, HPOEmb-Orphanet outperformed the other three HPO embeddings by achieving the highest average P@5 of 0.81 and the highest P@10 of 0.79. Compared to seven conventional similarity measurements provided by HPOSim, HPOEmb-Orphanet is able to detect more relevant phenotypic pairs, especially for pairs not in inheritance relationships.
CONCLUSION: We drew the following conclusions based on the evaluation results. First, with additional non-inheritance edges, enriched HPO embeddings can detect more associations between fine granularity phenotypic nodes regardless of their topological structures in the HPO graph. Second, HPOEmb-Orphanet not only can achieve the optimal performance through link prediction and patient stratification based on phenotypic similarity, but is also able to detect relevant phenotypes closer to domain expert's judgments than other embeddings and conventional similarity measurements. Third, incorporating heterogeneous knowledge resources do not necessarily result in better performance for detecting relevant phenotypes. From a clinical perspective, in our use case study, clinical-oriented knowledge resources (e.g., Orphanet) can achieve better performance in detecting relevant phenotypic characterizations compared to biomedical-oriented knowledge resources (e.g., DECIPHER and OMIM).
Copyright © 2019 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Deep phenotyping; Enriched node embeddings; Heterogeneous knowledge resources; Human Phenotype Ontology; Phenotypic relevance detection

Year:  2019        PMID: 31255713      PMCID: PMC6710011          DOI: 10.1016/j.jbi.2019.103246

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  35 in total

1.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

2.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program.

Authors:  A R Aronson
Journal:  Proc AMIA Symp       Date:  2001

3.  A new method to measure the semantic similarity of GO terms.

Authors:  James Z Wang; Zhidian Du; Rapeeporn Payattakool; Philip S Yu; Chin-Fu Chen
Journal:  Bioinformatics       Date:  2007-03-07       Impact factor: 6.937

4.  The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease.

Authors:  Peter N Robinson; Sebastian Köhler; Sebastian Bauer; Dominik Seelow; Denise Horn; Stefan Mundlos
Journal:  Am J Hum Genet       Date:  2008-10-23       Impact factor: 11.025

5.  Primary hyperoxaluria: clinical course, diagnosis, and treatment after kidney failure.

Authors:  Dharmapaul L Raju; Marcelo Cantarovich; Marie-Laure Brisson; Jean Tchervenkov; Mark L Lipman
Journal:  Am J Kidney Dis       Date:  2008-01       Impact factor: 8.860

Review 6.  Networking for rare diseases: a necessity for Europe.

Authors:  S Aymé; J Schmidtke
Journal:  Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz       Date:  2007-12       Impact factor: 1.513

7.  A new measure for functional similarity of gene products based on Gene Ontology.

Authors:  Andreas Schlicker; Francisco S Domingues; Jörg Rahnenführer; Thomas Lengauer
Journal:  BMC Bioinformatics       Date:  2006-06-15       Impact factor: 3.169

8.  The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information.

Authors:  Cynthia L Smith; Carroll-Ann W Goldsmith; Janan T Eppig
Journal:  Genome Biol       Date:  2004-12-15       Impact factor: 13.583

9.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders.

Authors:  Ada Hamosh; Alan F Scott; Joanna S Amberger; Carol A Bocchini; Victor A McKusick
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

10.  The Rat Genome Database 2009: variation, ontologies and pathways.

Authors:  Melinda R Dwinell; Elizabeth A Worthey; Mary Shimoyama; Burcu Bakir-Gungor; Jeffrey DePons; Stanley Laulederkind; Timothy Lowry; Rajni Nigram; Victoria Petri; Jennifer Smith; Alexander Stoddard; Simon N Twigger; Howard J Jacob
Journal:  Nucleic Acids Res       Date:  2008-11-07       Impact factor: 16.971

View more
  7 in total

Review 1.  Clinical concept extraction: A methodology review.

Authors:  Sunyang Fu; David Chen; Huan He; Sijia Liu; Sungrim Moon; Kevin J Peterson; Feichen Shen; Liwei Wang; Yanshan Wang; Andrew Wen; Yiqing Zhao; Sunghwan Sohn; Hongfang Liu
Journal:  J Biomed Inform       Date:  2020-08-06       Impact factor: 6.317

2.  Constructing co-occurrence network embeddings to assist association extraction for COVID-19 and other coronavirus infectious diseases.

Authors:  David Oniani; Guoqian Jiang; Hongfang Liu; Feichen Shen
Journal:  J Am Med Inform Assoc       Date:  2020-08-01       Impact factor: 4.497

3.  Recommendations for patient similarity classes: results of the AMIA 2019 workshop on defining patient similarity.

Authors:  Nathan D Seligson; Jeremy L Warner; William S Dalton; David Martin; Robert S Miller; Debra Patt; Kenneth L Kehl; Matvey B Palchuk; Gil Alterovitz; Laura K Wiley; Ming Huang; Feichen Shen; Yanshan Wang; Khoa A Nguyen; Anthony F Wong; Funda Meric-Bernstam; Elmer V Bernstam; James L Chen
Journal:  J Am Med Inform Assoc       Date:  2020-11-01       Impact factor: 4.497

4.  Ontologies, Knowledge Representation, and Machine Learning for Translational Research: Recent Contributions.

Authors:  Peter N Robinson; Melissa A Haendel
Journal:  Yearb Med Inform       Date:  2020-08-21

5.  Patient-Patient Similarity-Based Screening of a Clinical Data Warehouse to Support Ciliopathy Diagnosis.

Authors:  Xiaoyi Chen; Carole Faviez; Marc Vincent; Luis Briseño-Roa; Hassan Faour; Jean-Philippe Annereau; Stanislas Lyonnet; Mohamad Zaidan; Sophie Saunier; Nicolas Garcelon; Anita Burgun
Journal:  Front Pharmacol       Date:  2022-03-25       Impact factor: 5.810

6.  Comparative effectiveness of medical concept embedding for feature engineering in phenotyping.

Authors:  Junghwan Lee; Cong Liu; Jae Hyun Kim; Alex Butler; Ning Shang; Chao Pang; Karthik Natarajan; Patrick Ryan; Casey Ta; Chunhua Weng
Journal:  JAMIA Open       Date:  2021-06-16

7.  Deep phenotyping: Embracing complexity and temporality-Towards scalability, portability, and interoperability.

Authors:  Chunhua Weng; Nigam H Shah; George Hripcsak
Journal:  J Biomed Inform       Date:  2020-04-23       Impact factor: 6.317

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.