Literature DB >> 27148566

Capturing phenotypes for precision medicine.

Peter N Robinson¹, Christopher J Mungall², Melissa Haendel³.

Abstract

Deep phenotyping followed by integrated computational analysis of genotype and phenotype is becoming ever more important for many areas of genomic diagnostics and translational research. The overwhelming majority of clinical descriptions in the medical literature are available only as natural language text, meaning that searching, analysis, and integration of medically relevant information in databases such as PubMed is challenging. The new journal Cold Spring Harbor Molecular Case Studies will require authors to select Human Phenotype Ontology terms for research papers that will be displayed alongside the manuscript, thereby providing a foundation for ontology-based indexing and searching of articles that contain descriptions of phenotypic abnormalities-an important step toward improving the ability of researchers and clinicians to get biomedical information that is critical for clinical care or translational research.

Entities: Chemical Disease Gene Species

Year: 2015 PMID： 27148566 PMCID： PMC4850887 DOI： 10.1101/mcs.a000372

Source DB: PubMed Journal: Cold Spring Harb Mol Case Stud ISSN： 2373-2873

A phenotypic abnormality is defined in medical settings as a deviation from normal morphology, physiology, or behavior, and good phenotyping is a cornerstone of a doctor's daily work (Baynam et al. 2015). Next-generation sequencing, proteomics, and metabolomics data as well as information technologies are bringing about a paradigm shift in translational research and also clinical care. Although the coming era will allow physicians and patients to access large-scale data with the potential to stratify and thereby improve medical treatment, the ability to find correct and up-to-date information with sufficiently detailed and accurate phenotypic descriptions will be essential to exploit this data to its fullest (Fernald et al. 2011). In this article, we will discuss the role of deep phenotyping in translational research and the challenges in using this information for integrated computational analysis of “omics” data in the medical arena, as well as the application of the Human Phenotype Ontology (HPO) as a standardized, comprehensive nomenclature for disease-associated phenotypic abnormalities.

DEEP PHENOTYPING

Deep phenotyping can be defined as the precise and comprehensive analysis of phenotypic abnormalities in which the individual components of the phenotype are observed and described, generally in such a way as to be computationally accessible. Precision medicine has the goal of providing the best available care for each patient based on stratification into disease subclasses for which there is a common biological basis. The comprehensive discovery of such subclasses, as well as the translation of this knowledge into clinical care, will depend critically upon computational resources to capture, store, and exchange deep phenotypic data. Further, sophisticated algorithms to integrate deep phenotype data with genomic variation and additional clinical information will be required in support of precision medicine (Robinson 2012).

TEXT MINING PHENOTYPE DATA

A “traditional” method of retrieving phenotype data from the medical literature or Electronic Health Records for computational analysis is text mining. However, the overwhelming majority of clinical descriptions in the medical literature are simply natural language text, and thus automated searching, analysis, and integration of medical information from databases such as PubMed remains challenging (Taboada et al. 2014). For instance, in the phrase “short long bones” the word “long” is part of the concept long bone (e.g., the femur and the humerus are long bones, but the skull is not). However, in the phrase “long metacarpal,” the word “long” is used to denote an abnormally increased length of metacarpal bones. Similarly, the medical literature abounds in phrases such as “the patient was still ambulatory after 25 years,” or “segmentation defects appear to affect L4-S1” that can be very evocative to trained physicians but next to impossible to interpret correctly by text mining. Therefore, although sophisticated concept recognition algorithms have been developed to improve the results of text mining for phenotype data (Groza et al. 2015), it remains difficult to extract the clinical information from an article in a correct and comprehensive fashion. The need for improved online search tools to find and analyze publications on patients with similar clinical characteristics is especially critical and challenging for rare diseases, where publications of large series are scarce. There are several current clinical nomenclatures for phenotyping such as Medical Subject Headings (MeSH), the ICD-10, the National Cancer Institute's (NCI) Thesaurus, SNOMED CT, and the United Medical Language System (UMLS). However, phenotypic concepts are covered inconsistently and incompletely in most currently used clinical terminologies (Winnenburg and Bodenreider 2014). For example, MeSH provides little semantic distinction between disease entities and phenotypic manifestations of diseases. For instance, even though the MeSH category C is described as comprising Diseases, it contains many entries that describe phenotypic features of diseases rather than an actual disease entities, such as Cheyne–Stokes Respiration (MeSH: D002639), which is an abnormal pattern of breathing that can be observed in diseases such as central sleep apnea syndrome. Even with these clinical nomenclatures, it can be difficult for clinicians and researchers to find relevant biomedical articles on a phenotypic abnormality using PubMed or Google Scholar.

A NEW APPROACH TO CAPTURING PHENOTYPES

To overcome the limitations above, a structured, comprehensive, and well-defined phenotyping terminology is needed. The Human Phenotype Ontology (HPO), available at www.human-phenotype-ontology.org, provides a set of more than 11,000 terms describing human phenotypic abnormalities. The HPO provides both a set of terms that describe concepts of human phenotypes as well as a logical (computational) representation of the interrelationships between the terms. The HPO is arranged as a hierarchy with the most specific terms being at the greatest distance from the root term (Fig. 1). A recent study comparing HPO with alternate terminologies found that the ICD-10 covered 9%, the NCI thesaurus 16%, MeSH 19%, SNOMED CT 30%, and the UMLS 54% of the concepts in the HPO (Winnenburg and Bodenreider 2014).

Figure 1.

An excerpt of the hierarchical structure of the Human Phenotype Ontology. The terms of the HPO are arranged in a subclass hierarchy. For instance, any patient annotated to the HPO term “Reduced beta/alpha synthesis ratio” (HP:0011906) can also be said to have “Imbalanced hemoglobin synthesis” (HP:0005560), “Abnormal hemoglobin” (HP:0011902), and so on. Note that when selecting HPO terms for Cold Spring Harbor Molecular Case Studies submissions, authors may select leaf terms (i.e., the most specific terms possible). For example, “Hemoglobin H” is a leaf team, but “Abnormal hemoglobin” is not. The HPO is developed in the context of the Monarch Initiative (monarchinitiative.org/), whereby HPO classes are logically related to terms from other ontologies for anatomy, cell types, function (Gene Ontology), embryology, pathology, and other domains. The links provide computational definitions for HPO terms that can be used both for quality control as well as sophisticated computational phenotypic comparison. The logical links enable interoperability with numerous resources, including human genotype–phenotype resources such as OMIM (Amberger et al. 2015) and ClinVar (Landrum et al. 2014), but also those containing phenotype information on model organisms such as mouse and zebrafish (Gkoutos et al. 2009; Washington et al. 2009; Mungall et al. 2010; Köhler et al. 2013; Robinson and Webber 2014). Furthermore, human disease models that are annotated with HPO terms can be related to mouse and zebrafish models at databases including the Mouse Genome Database (Blake et al. 2014) and ZFIN (Howe et al. 2013). Integration of patient deep-phenotyping data with the landscape of both clinical and basic research informatics resources is key to effectively leveraging a much wider diversity of relevant data for the purposes of precision medicine. A number of tools have been developed to help physicians and researchers annotate patients with HPO terms. For example, PhenoTips provides a secure, web-based interface that closely mirrors clinician workflows to facilitate the recording of phenotypic abnormalities for patients with genetic disorders, as well as a variety of other relevant information including family and medical history (Girdea et al. 2013). PhenoDB is another useful web-based tool initially developed for the Centers of Mendelian Genomics project for storing and analyzing phenotypic information from families or cohorts (Hamosh et al. 2013). Phenotypic features are hierarchically organized according to the major headings and subheadings of the Online Mendelian Inheritance in Man (OMIM) clinical synopses. The terms of PhenoDB have been mapped to HPO terms, enabling interoperability with other resources. The HPO is being used by a number of groups in human genetics to annotate and analyze phenotypic features of patients against the background of knowledge about human diseases and animal models of disease in order to prioritize novel disease genes and perform genomic diagnostics (Riggs et al. 2012; Sifrim et al. 2013; Javed et al. 2014; Petrovski and Goldstein 2014; Robinson et al. 2014; Singleton et al. 2014; Soden et al. 2014; Zemojtel et al. 2014; Wright et al. 2015). Among the groups and projects using the HPO are the U.K. 100,000 Genomes Project (rare diseases), the Canadian CARE for RARE, PhenomeCentral (https://phenomecentral.org/), the case matching system GeneYenta (Gottlieb et al. 2015), the U.S. National Institutes of Health Undiagnosed Diseases Program and Network, and the Sanger Institute's Deciphering Developmental Disorders (DDD) (Wright et al. 2015) and DECIPHER (Firth et al. 2009) databases. Therefore, annotations of articles in Cold Spring Harbor Molecular Case Studies with HPO terms will open up the possibility of interlinking data with an ever-richer ecosystem of phenotypic data and sophisticated computational algorithms. Cold Spring Harbor Molecular Case Studies requires authors to select HPO terms during submission that will be displayed alongside the manuscript to improve visibility. Authors should only annotate abnormal phenotypes for the case(s) described in their articles and use a precise and comprehensive set of HPO terms to maximize the ability of search engines to find their article. As the number of articles increases, researchers and physicians would be able to search for articles that describe patients with a set of phenotypic abnormalities and hopefully take advantage of semantic comparison algorithms such as the Phenomizer (Köhler et al. 2009) and not simply rely upon single-phrase matching. Articles in Cold Spring Harbor Molecular Case Studies will present clinical and molecular data obtained by -omics and related approaches with the goal of elucidating disease pathogenesis and gaining insights into therapeutic strategies. Encoding the salient aspects of the clinical presentation using Human Phenotype Ontology terms will enable the articles to be searched according to phenotypic presentations, something that is currently difficult to do with standard search engines. Ontology-based indexing of articles thus represents an important step toward improving the ability of researchers and clinicians to get biomedical information that is critical for clinical care or translational research—and to realize the goal of precision medicine.

ADDITIONAL INFORMATION

Acknowledgments

This work was supported by grants from the Bundesministerium für Bildung und Forschung (Förderkennziffer FKZ 1315848A), the European Commission's Seventh Framework Program (SYBIL, 602300), and the National Institutes of Health (grant 5R24OD011883).

Competing Interest Statement

The authors have declared no competing interest.

28 in total

1. Phenotypic information in genomic variant databases enhances clinical care and research: the International Standards for Cytogenomic Arrays Consortium experience.

Authors: Erin Rooney Riggs; Laird Jackson; David T Miller; Steven Van Vooren
Journal: Hum Mutat Date: 2012-03-20 Impact factor: 4.878

2. Phenomics and the interpretation of personal genomes.

Authors: Slavé Petrovski; David B Goldstein
Journal: Sci Transl Med Date: 2014-09-17 Impact factor: 17.956

Review 3. Deep phenotyping for precision medicine.

Authors: Peter N Robinson
Journal: Hum Mutat Date: 2012-05 Impact factor: 4.878

4. Effectiveness of exome and genome sequencing guided by acuity of illness for diagnosis of neurodevelopmental disorders.

Authors: Sarah E Soden; Carol J Saunders; Laurel K Willig; Emily G Farrow; Laurie D Smith; Josh E Petrikin; Jean-Baptiste LePichon; Neil A Miller; Isabelle Thiffault; Darrell L Dinwiddie; Greyson Twist; Aaron Noll; Bryce A Heese; Lee Zellmer; Andrea M Atherton; Ahmed T Abdelmoity; Nicole Safina; Sarah S Nyp; Britton Zuccarelli; Ingrid A Larson; Ann Modrcin; Suzanne Herd; Mitchell Creed; Zhaohui Ye; Xuan Yuan; Robert A Brodsky; Stephen F Kingsmore
Journal: Sci Transl Med Date: 2014-12-03 Impact factor: 17.956

5. Phen-Gen: combining phenotype and genotype to analyze rare disorders.

Authors: Asif Javed; Saloni Agrawal; Pauline C Ng
Journal: Nat Methods Date: 2014-08-03 Impact factor: 28.547

6. DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources.

Authors: Helen V Firth; Shola M Richards; A Paul Bevan; Stephen Clayton; Manuel Corpas; Diana Rajan; Steven Van Vooren; Yves Moreau; Roger M Pettett; Nigel P Carter
Journal: Am J Hum Genet Date: 2009-04-02 Impact factor: 11.025

7. PhenoTips: patient phenotyping software for clinical and research use.

Authors: Marta Girdea; Sergiu Dumitriu; Marc Fiume; Sarah Bowdin; Kym M Boycott; Sébastien Chénier; David Chitayat; Hanna Faghfoury; M Stephen Meyn; Peter N Ray; Joyce So; Dimitri J Stavropoulos; Michael Brudno
Journal: Hum Mutat Date: 2013-05-24 Impact factor: 4.878

8. Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research.

Authors: Sebastian Köhler; Sandra C Doelken; Barbara J Ruef; Sebastian Bauer; Nicole Washington; Monte Westerfield; George Gkoutos; Paul Schofield; Damian Smedley; Suzanna E Lewis; Peter N Robinson; Christopher J Mungall
Journal: F1000Res Date: 2013-02-01

9. Improved exome prioritization of disease genes through cross-species phenotype comparison.

Authors: Peter N Robinson; Sebastian Köhler; Anika Oellrich; Kai Wang; Christopher J Mungall; Suzanna E Lewis; Nicole Washington; Sebastian Bauer; Dominik Seelow; Peter Krawitz; Christian Gilissen; Melissa Haendel; Damian Smedley
Journal: Genome Res Date: 2013-10-25 Impact factor: 9.043

10. Automatic concept recognition using the human phenotype ontology reference and test suite corpora.

Authors: Tudor Groza; Sebastian Köhler; Sandra Doelken; Nigel Collier; Anika Oellrich; Damian Smedley; Francisco M Couto; Gareth Baynam; Andreas Zankl; Peter N Robinson
Journal: Database (Oxford) Date: 2015-02-27 Impact factor: 3.451

15 in total

Review 1. Emerging Role of Precision Medicine in Cardiovascular Disease.

Authors: Jane A Leopold; Joseph Loscalzo
Journal: Circ Res Date: 2018-04-27 Impact factor: 17.367

Review 2. Exploring the dark genome: implications for precision medicine.

Authors: Tudor I Oprea
Journal: Mamm Genome Date: 2019-07-04 Impact factor: 2.957

3. Feasibility study of a point-of-care positron emission tomography system with interactive imaging capability.

Authors: Jianyong Jiang; Ke Li; Sergey Komarov; Joseph A O'Sullivan; Yuan-Chuan Tai
Journal: Med Phys Date: 2019-02-14 Impact factor: 4.071

4. Data-driven method to enhance craniofacial and oral phenotype vocabularies.

Authors: Rashmi Mishra; Andrea Burke; Bonnie Gitman; Payal Verma; Mark Engelstad; Melissa A Haendel; Ilias Alevizos; William A Gahl; Michael T Collins; Janice S Lee; Murat Sincan
Journal: J Am Dent Assoc Date: 2019-11 Impact factor: 3.634

5. An ontology-aware integration of clinical models, terminologies and guidelines: an exploratory study of the Scale for the Assessment and Rating of Ataxia (SARA).

Authors: Haitham Maarouf; María Taboada; Hadriana Rodriguez; Manuel Arias; Ángel Sesar; María Jesús Sobrido
Journal: BMC Med Inform Decis Mak Date: 2017-12-06 Impact factor: 2.796

Review 6. Use of Biomedical Ontologies for Integration of Biological Knowledge for Learning and Prediction of Adverse Drug Reactions.

Authors: Shadia Zaman; Sirarat Sarntivijai; Darrell R Abernethy
Journal: Gene Regul Syst Bio Date: 2017-03-15

Review 7. The Human Phenotype Ontology in 2017.

Authors: Sebastian Köhler; Nicole A Vasilevsky; Mark Engelstad; Erin Foster; Julie McMurry; Ségolène Aymé; Gareth Baynam; Susan M Bello; Cornelius F Boerkoel; Kym M Boycott; Michael Brudno; Orion J Buske; Patrick F Chinnery; Valentina Cipriani; Laureen E Connell; Hugh J S Dawkins; Laura E DeMare; Andrew D Devereau; Bert B A de Vries; Helen V Firth; Kathleen Freson; Daniel Greene; Ada Hamosh; Ingo Helbig; Courtney Hum; Johanna A Jähn; Roger James; Roland Krause; Stanley J F Laulederkind; Hanns Lochmüller; Gholson J Lyon; Soichi Ogishima; Annie Olry; Willem H Ouwehand; Nikolas Pontikos; Ana Rath; Franz Schaefer; Richard H Scott; Michael Segal; Panagiotis I Sergouniotis; Richard Sever; Cynthia L Smith; Volker Straub; Rachel Thompson; Catherine Turner; Ernest Turro; Marijcke W M Veltman; Tom Vulliamy; Jing Yu; Julie von Ziegenweidt; Andreas Zankl; Stephan Züchner; Tomasz Zemojtel; Julius O B Jacobsen; Tudor Groza; Damian Smedley; Christopher J Mungall; Melissa Haendel; Peter N Robinson
Journal: Nucleic Acids Res Date: 2016-11-28 Impact factor: 16.971

8. Merging heterogeneous clinical data to enable knowledge discovery.

Authors: Martin G Seneviratne; Michael G Kahn; Tina Hernandez-Boussard
Journal: Pac Symp Biocomput Date: 2019

9. Phelan-McDermid syndrome data network: Integrating patient reported outcomes with clinical notes and curated genetic reports.

Authors: Cartik Kothari; Maxime Wack; Claire Hassen-Khodja; Sean Finan; Guergana Savova; Megan O'Boyle; Geraldine Bliss; Andria Cornell; Elizabeth J Horn; Rebecca Davis; Jacquelyn Jacobs; Isaac Kohane; Paul Avillach
Journal: Am J Med Genet B Neuropsychiatr Genet Date: 2017-09-01 Impact factor: 3.568

10. The anatomy of phenotype ontologies: principles, properties and applications.

Authors: Georgios V Gkoutos; Paul N Schofield; Robert Hoehndorf
Journal: Brief Bioinform Date: 2018-09-28 Impact factor: 11.622