Literature DB >> 27899602

The Human Phenotype Ontology in 2017.

Sebastian Köhler¹, Nicole A Vasilevsky², Mark Engelstad², Erin Foster², Julie McMurry², Ségolène Aymé³, Gareth Baynam^4,5, Susan M Bello⁶, Cornelius F Boerkoel⁷, Kym M Boycott⁸, Michael Brudno⁹, Orion J Buske⁹, Patrick F Chinnery^10,11, Valentina Cipriani^12,13, Laureen E Connell¹⁴, Hugh J S Dawkins¹⁵, Laura E DeMare¹⁴, Andrew D Devereau¹⁶, Bert B A de Vries¹⁷, Helen V Firth¹⁸, Kathleen Freson¹⁹, Daniel Greene^20,21, Ada Hamosh²², Ingo Helbig^23,24, Courtney Hum²⁵, Johanna A Jähn²⁴, Roger James^11,21, Roland Krause²⁶, Stanley J F Laulederkind²⁷, Hanns Lochmüller²⁸, Gholson J Lyon²⁹, Soichi Ogishima³⁰, Annie Olry³¹, Willem H Ouwehand²¹, Nikolas Pontikos^12,13, Ana Rath³¹, Franz Schaefer³², Richard H Scott¹⁶, Michael Segal³³, Panagiotis I Sergouniotis³⁴, Richard Sever¹⁴, Cynthia L Smith⁶, Volker Straub²⁸, Rachel Thompson²⁸, Catherine Turner²⁸, Ernest Turro^20,21, Marijcke W M Veltman¹¹, Tom Vulliamy³⁵, Jing Yu³⁶, Julie von Ziegenweidt²⁰, Andreas Zankl^37,38, Stephan Züchner³⁹, Tomasz Zemojtel⁴⁰, Julius O B Jacobsen¹⁶, Tudor Groza^41,42, Damian Smedley¹⁶, Christopher J Mungall⁴³, Melissa Haendel², Peter N Robinson^44,45.

Abstract

Deep phenotyping has been defined as the precise and comprehensive analysis of phenotypic abnormalities in which the individual components of the phenotype are observed and described. The three components of the Human Phenotype Ontology (HPO; www.human-phenotype-ontology.org) project are the phenotype vocabulary, disease-phenotype annotations and the algorithms that operate on these. These components are being used for computational deep phenotyping and precision medicine as well as integration of clinical data into translational research. The HPO is being increasingly adopted as a standard for phenotypic abnormalities by diverse groups such as international rare disease organizations, registries, clinical labs, biomedical resources, and clinical software tools and will thereby contribute toward nascent efforts at global data exchange for identifying disease etiologies. This update article reviews the progress of the HPO project since the debut Nucleic Acids Research database article in 2014, including specific areas of expansion such as common (complex) disease, new algorithms for phenotype driven genomic discovery and diagnostics, integration of cross-species mapping efforts with the Mammalian Phenotype Ontology, an improved quality control pipeline, and the addition of patient-friendly terminology.

Entities: Chemical

Mesh：

Year: 2016 PMID： 27899602 PMCID： PMC5210535 DOI： 10.1093/nar/gkw1039

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

The Human Phenotype Ontology (HPO) provides comprehensive bioinformatic resources for the analysis of human diseases and phenotypes, offering a computational bridge between genome biology and clinical medicine. The HPO was initially published in 2008 (1) with the goal of enabling the integration of phenotype information across scientific fields and databases. Since then, the project has grown in terms of coverage, scope and sophistication, and has become a core component of the Monarch Initiative, allowing computational cross-species analysis (2). HPO has also become part of the core Orphanet (3) rare disease database content. The Orphanet nomenclature of rare diseases, whose adoption has been recommended by the European Commission expert group of rare diseases for codification of rare-disease (RD) patients in health information systems (recommendation on ways to improve codification for rare diseases in health information systems: http://ec.europa.eu/health/rare_diseases/docs/recommendation_coding_cegrd_en.pdf), is being annotated with HPO terms in order to allow for deep phenotyping of RD in health records and registries. The description of phenotypic variation has become a central topic for translational research and genomic medicine (4–7), and ‘computable’ descriptions of human disease using HPO phenotypic profiles (also known as ‘annotations’) have become a key element in a number of algorithms being used to support genomic discovery and diagnostics. Here, we describe the latest improvements to the tools and resources being developed by the HPO Consortium and the Monarch Initiative, and provide an overview of external tools and databases that are using the HPO for translational research and diagnostic decision support.

HPO: NEW TERMS, ANNOTATIONS AND ONTOLOGY INTEGRATION

The HPO is organized as independent subontologies that cover different categories. The largest category is Phenotypic abnormality. The Mode of inheritance subontology allows disease models to be defined according to Mendelian or non-Mendelian inheritance modes. The Mortality/Aging subontology similarly allows the age of death typically associated with a disease or observed in a specific individual to be annotated. Finally, the clinical modifier subontology is designed to provide terms to characterize and specify the phenotypic abnormalities defined in the Phenotypic abnormality subontology, with respect to severity, laterality, age of onset, and other aspects.

Ontology

The HPO has grown substantially since the first Nucleic Acids Research database article in 2014 (Version: 30 July 2013) (8) to the September 2016 release (Version: 3 September 2016). There are 1725 additional terms (10 088 in 30 July 2013 versus 11 813 in 3 September 2016, see Figure 1) and 2269 additional subclass relationships (13 326 versus 15 595). We obsoleted 82 HPO classes (44 versus 126). We have added 2024 textual definitions (6603 versus 8627) and 8063 synonyms (6265 versus 14 328). Logical definitions were constructed for an additional 1126 HPO classes, bringing the total number to 5717. These definitions refer to ontologies for biochemistry, gene function, anatomy, and others, and allow cross-species mapping by means of automated semantic reasoning. There are now 123 724 annotations of HPO terms to rare diseases and 132 620 to common diseases.

Figure 1.

Distribution of HPO class additions per general category of phenotypic abnormalities. The figure shows the number of terms added per category since the previous Nucleic Acids Research database article in 2014 (8).

Annotations

The main domain application of the HPO has, to date, been on rare disorders, and we have in the past provided a large corpus of disease-HPO annotation profiles using OMIM, Orphanet and DECIPHER for disease entities (8). With recent advances in personalized medicine, it is becoming increasingly important to provide a computational foundation for phenotype-driven analysis of genomes and other translational research in other fields of medicine. Consequently, we have extended our work to common human disease phenotypes by means of a text-mining approach (9) toward analyzing the 2014 PubMed corpus, which allowed us to infer 132 620 HPO annotations for 3145 common diseases (10). These annotations were validated against a manually curated subset of disorders and experimental results showed an overall precision of 67%. We showed statistically significant phenotypic overlap between common diseases that share one or more associated genetic variants (‘Genome-wide association study [GWAS] hit’), as well as phenotypic overlaps between rare and common disease that are linked to the same genes (10). The HPO has also been adopted by several resources for genotype-phenotype data in the field of complex disease and genome-wide association study (GWAS) analysis, including GWAS Central (11) and GWASdb (12), and is likely to be adopted for phenome-wide association studies with electronic health records in the future (13).

Precision annotation of deep phenotyping data

The performance of computational search algorithms within and across species can improve if a comprehensive list of phenotypic features is recorded. It is helpful if the person annotating thinks of the set of annotations as a query against all known phenotype profiles. Therefore, the set of phenotypes chosen for the annotation must be as specific as possible, and represent the most salient and important observable phenotypes. The Monarch Initiative has developed an annotation sufficiency meter that assesses the breadth and depth of the phenotype annotation profile using a five-star rating system for a given patient in the context of all curated human and model organism phenotypes, with the goal of helping the annotator to generate an annotation profile specific enough to exclude similar diseases and to identify model organisms with similar phenotypes that may have mutations in relevant genes or pathways (14). The Monarch annotation sufficiency meter is displayed within PhenoTips (15) and PhenomeCentral (16).

Integration

The scope and specificity of phenotypes useful for diagnosis and clinical decisions support differ considerably from phenotypes useful for medical billing and quality-of-care assessment. What sets HPO apart from other ontologies is that it is purpose built for the diagnosis and care use case and that it is designed to facilitate cross-species comparisons so that non-human data can be brought to bear as well. Moreover, to accomplish this task the HPO must also have extremely broad coverage of concepts. In an evaluation of HPO content versus the numerous vocabularies integrated within the Unified Medical Language System (UMLS), Winnenburg and Bodenreider showed that the coverage of HPO phenotype concepts in the UMLS is 54% and only 30% in SNOMED CT (17). The UMLS is a terminology integration system developed by the U.S. National Library of Medicine that integrates many standard biomedical terminologies (18). In order to improve the coverage of phenotype data, the UMLS has now integrated the entire HPO starting with the 2015AB release. This enables an easy process to map HPO-encoded data to standard health-care terminologies such as SNOMED CT (19). HPO has contributed to the establishment of the International Consortium of Human Phenotype Terminologies (ICHPT; http://www.ichpt.org) to provide the community with standards that achieve interoperability among databases incorporating human phenotypic features. The outcome is a set of over 2300 terms which should be incorporated in any terminology and which is fully cross-referenced with HPO terms. These terms are not arranged in a hierarchy and so can be mapped to or incorporated into any ontology. The HPO project data are available at http://www.human-phenotype-ontology.org. Requests for new terms or other amendments can be made using the GitHub issues tracker https://github.com/obophenotype/human-phenotype-ontology/issues. Further information on HPO-related publications and general announcements can be found on the HPO website at http://www.human-phenotype-ontology.org and on the HPO twitter feed @hp_ontology.

CLINICAL UTILITY

Although exome sequencing and other forms of genomic diagnostics have greatly accelerated the pace of discovery of novel disease-associated genes and have begun to be implemented in diagnostic settings in medical genetics, the overall diagnostic yield can still be low. It has been estimated that the genetic cause of only about half of the currently named ∼7000 rare diseases has been identified (20,21); in order to confidently assert that pathogenic variants in a given gene are associated with a given Mendelian disease, the community norm is to require the identification of at least two unrelated cases. The HPO team therefore continues to collaborate with clinical groups to refine and extend current terms and annotations to support efforts to match patient phenotype and genotype data. Table 1 provides an overview of public-facing clinical databases that use HPO to annotate patient data.

Table 1.

A selection of public-facing clinical databases using HPO to annotate patient data for disease-gene discovery projects

Name	URL	Ref
PhenomeCentral	phenomecentral.org	(16)
DDD (Deciphering Developmental Disorders)	www.ddduk.org	(61,62)
DECIPHER (DatabasE of genomiC varIation and Phenotype in Humans using Ensembl Resources)	decipher.sanger.ac.uk	(63)

ECARUCA (European Cytogeneticists Association Register of Unbalanced Chromosome Aberrations)	http://umcecaruca01.extern.umcn.nl:8080/ecaruca/ecaruca.jsp	(64)

The 100 000 Genomes Project	https://www.genomicsengland.co.uk/	(65)
Geno2MP (Exome sequencing data linked to phenotypic information from a wide variety of Mendelian gene discovery projects)	http://geno2mp.gs.washington.edu	(21)

NIH UDP (Undiagnosed Diseases Program)	available via phenomecentral.org	(66)
NIH UDN (Undiagnosed Diseases Network)	available via phenomecentral.org	(16)
HDG (Human Disease Gene Website series)	www.humandiseasegenes.com
Phenopolis (An open platform for harmonization and analysis of sequencing and phenotype data)	https://phenopolis.github.io

GenomeConnect (Patient portal developed by ClinGen (67)	www.genomeconnect.org	(68)
FORGE Canada & Care4Rare Consortium	available via phenomecentral.org	(69)
RD-Connect	platform.rd-connect.eu	(28)
Genesis	thegenesisprojectfoundation.org

The HPO has been extensively applied to the phenotypic characterization of bone dysplasias (rare genetic bone disorders). The Bone Dysplasia Ontology (BDO) (22) is an ontological representation of the International Skeletal Dysplasia Society's Nosology of Genetic Skeletal Disorders, the de facto standard classification for human bone dysplasias. The BDO uses HPO terms for the phenotypic description of each disorder. Using the BDO and HPO, decision support methods were developed to predict the correct bone dysplasia diagnosis from a set of HPO terms, and their methods outperformed many clinicians (23). DECIPHER (https://decipher.sanger.ac.uk) was established in 2004 as a web-based system for interpretation and sharing of genomic variants and their associated phenotypes. DECIPHER now supports sequence variation and copy number variation in the nuclear and mitochondrial genomes. DECIPHER was an early adopter of HPO and is the platform through which data from the Deciphering Developmental Disorders study (DDD study) is shared (24). At the outset of the project, the DDD study (www.ddduk.org) funded a week-long workshop to improve the HPO ontology by reducing redundancy of terms and improving coverage in the rare disease space. DECIPHER currently has 21,689 open-access patient records annotated with 60,521 HPO-encoded phenotype observations. PhenoTips (15) is an open-source clinical phenotype and genotype data collection tool. It provides simple user interfaces to select and explore HPO annotations and suggest diagnoses from OMIM. Records within PhenoTips can be de-identified and pushed to PhenomeCentral (16) to participate in phenotypic and genotypic matching with other cases in PhenomeCentral and in connected databases through the Matchmaker Exchange. PhenomeCentral makes use of HPO terms to measure semantic similarity between patient phenotypes and prioritize exome data using the Exomiser. At the time of this writing, PhenomeCentral contains 2640 matchable cases, of which 2059 have at least one HPO term, 172 are from the NIH UDP and 28 from the NIH UDN. Patient Archive (PA) (2) is a clinical-grade phenotype-oriented platform for managing patient data; PA combines the richness of the HPO with highly intuitive user interfaces to aid the discovery and decision-making process in the context of clinical genomics. PA enables clinicians to use free text clinical notes as the starting point for structured HPO-centric patient phenotyping to support clinical diagnostics and care. To this end, an instance has been installed in the Western Australian Department of Health for clinical genetic use, both within and outside of, the Undiagnosed Diseases Program (UDP)—Western Australia; a clinical public health service. It has also been nominated as the platform of choice for the UDP Australia which participates in the Undiagnosed Diseases Network International (25). Relatedly, and building on the principles of founding work (26), the integration of automated annotation of HPO terms to 3D facial images as part of a suite of approaches in the clinical workflow continues to be developed through the Rare and Undiagnosed Diseases Diagnostic Service at Genetic Services of Western Australia (27). Phenopolis is an interactive platform built on genomic and phenotypic data from over 4000 patients. With the help of phenotype quantification using HPO, Phenopolis is able to prioritize causative genes using prior knowledge from OMIM, Pubmed publications and existing tools such as Exomiser. Additionally, it helps novel gene discovery by looking for potential gene-HPO relationships among the patients without using any prior knowledge. This unbiased approach may provide valuable information for hospitals and researchers to optimize their resources on diagnosis and functional studies for the relevant genetic diseases. Numerous rare-disease research consortia are using HPO for patient annotation and analysis. In order to review and expand the HPO to better represent specific disease areas, the HPO consortium has conducted workshops with consortia including the European FP7 projects RD-Connect (28), EURenOmics and NeurOmics. Using advanced omics technologies, NeurOmics, an EU-funded translational research project, aims to characterize the causes, pathomechanisms and clinical features across ten major neurodegenerative and neuromuscular disease groups affecting the brain and spinal cord, peripheral nerves and muscle. EURenOmics is using high-throughput technologies to characterize new genes causing or predisposing to kidney diseases, concentrating on five groups of renal disease. RD-Connect is an integrated platform connecting databases, registries, biobanks and clinical bioinformatics for rare disease research that brings together multiple datasets on patients with rare diseases at a per-patient level. Deep phenotyping of affected individuals is an essential component of these projects, and is being addressed by using the HPO as a mechanism for linking a computationally accessible phenotypic record with a genomic dataset. The projects performed a review of available ontologies at an early stage and concluded that the HPO was the most appropriate ontology for their gene discovery focus (28). Both NeurOmics and EURenOmics performed mapping exercises in order to transform data items suggested by clinicians as essential items to record for each patient presenting with a particular clinical profile into HPO terms, and in most cases this mapping was able to produce exact matches to an already existing HPO term. Missing areas were then addressed in the expert workshops described above. Several of these projects make use of PhenoTips (15) in order to capture clinical data to a highly granular level through an interface that is user-friendly for clinicians. Independently of the data entry mechanism, the use of the HPO means that the data generated by these consortia is fully interoperable with other datasets internationally. Currently, the RD-Connect platform contains ∼2000 exome, genome and panel sequencing datasets linked with HPO-coded phenotypic profiles from a range of rare diseases. The HPO was used within the EuroEPINOMICS-Rare Epilepsy Syndrome (RES) project to systematically assess phenotypes in patients with epileptic encephalopathies (29,30). A first analysis of clustering of epilepsy phenotypes was presented as a poster at the 2012 European Congress of Epileptology (31), while a more comprehensive analysis of the obtained HPO terms including exome sequencing data is currently underway. Clustering of patient phenotypes in 171 patients with epileptic encephalopathies identified a subgroup of eight patients with closely related phenotypes. A review of manually curated phenotype data suggested these patients had a subset of Infantile Spasms with a good outcome. This preliminary analysis suggested that the use of HPO terms in patients with epilepsy is worthwhile, given that the identified epilepsy phenotype was both homogeneous and clinically meaningful. The use of HPO terms for patients with epilepsy is challenging. In contrast to many other genetic disorders, the phenotypic features in epilepsy patients are dynamic and specific features such as a complex seizure semiology are often difficult to fully include in systematic phenotype ontologies. For example, a patient with simple febrile seizures may have self-limiting febrile seizures (FS), may have recurrent febrile seizures past the age of six years (FS+), or may develop the intractable, fever-related epilepsy of Dravet Syndrome over time. All three entities are distinct, but depending on the age of the patients, may be coded identically in the HPO if modifiers coding the patient's age are not used. The dilemma of fully representing dynamic neurological phenotypes emphasizes the need for the ongoing use of HPO modifiers to achieve dimensionality in phenotype data. The HPO has been used to incorporate clinical data into the analysis of a diagnostic next-generation sequencing panel with nearly all known Mendelian disease-associated genes; the algorithm, Phenotypic Interpretation of Exomes (PhenIX) contributed to a diagnostic rate of 28% in children in whom previous extensive workups had failed to reveal a diagnosis (32). Using HPO to generate individualized phenotype-driven gene panels for diagnostics led to an increase in the diagnostic yield (33).The ThromboGenomics Consortium reported that computational prioritization of candidate rare variants identified in patients with bleeding, thrombotic or platelet disorders using HPO-coded phenotypes assigned the highest scores to pathogenic or likely pathogenic variants in 85% of cases, demonstrating that HPO-based algorithms can make multidisciplinary diagnostic meetings more efficient (34). Once such a causative link between rare pathogenic variants in a given gene has been established, it is essential to assess the clinical variability attributed to other mutations in that gene. For this, several novel approaches have currently been developed, such as the Human Disease Gene Website series (HDG). HDG is an international library of websites (www.humandiseasegenes.com) for professional information about genes and copy number variants and their clinical consequences using HPO to annotate the phenotype. Here, professionals will find relevant information that helps with interpretation of variants and counseling of their patient/families with such a rare genetic disorder and also have the opportunity to share clinical data. Moreover, patients, parents, and caregivers will find useful information on the rare genetic disease in their family. Sanford Health, one of the largest non-profit rural health care systems in the United States, has embarked on clinical genotyping of a substantial portion of its patient population to provide precision prevention and pharmacogenetics. As part of this process, it has incorporated tools within the patient portal of the electronic medical record (EMR) to enable patients to characterize themselves in HPO. Similarly, it has incorporated Phenotips within the EMR to enable clinical staff to characterize in HPO all patients prescribed diagnostic molecular testing. For both the patient self-characterization and the clinician characterization, the Monarch Initiative sufficiency score is used to guide depth of characterization. The HPO terms, data within the EMR and molecular test results are integrated to define diagnoses and best practice guidelines entered into the EMR. The 100 000 Genomes Project (www.genomicsengland.co.uk) is sequencing 100 000 whole genomes from NHS patients in England with rare diseases or cancer. Recruitment to the Rare Disease Programme currently occurs across approximately 200 diseases. A vital aspect of the project is to link rare disease participants’ genomes with their phenotype profile to enable genome diagnostics and in-depth genotype-phenotype analyses. The phenotype profiles need to be detailed, specific, consistently applied, computationally accessible and concordant with existing standards. The project has developed HPO-based models for each rare disease. These comprise, typically, 20–40 HPO terms that describe the key features of the disease. These are presented to recruiting clinicians as a questionnaire—additional HPO terms can also be entered. This approach requires less prior knowledge of HPO to achieve in-depth phenotyping than simple ‘free entry’, and encourages recording of the absence of phenotypes as well as their presence. The models are typically developed by mapping HPO terms to an existing case report form, published review, registry schema or through interaction with clinical experts. Models are analysed to ensure practicality, consistency and specificity using the Monarch annotation sufficiency score described above. Where clinical terms that are not contained in HPO are identified during model development they are submitted for inclusion. The collected phenotypes for each program participant are used extensively in analysis pipelines, and for manual clinical interpretation and automated prioritization using algorithms such as Exomiser (35) and Phevor (36).

USE OF HPO IN GENE IDENTIFICATION RESEARCH

The HPO has been used in many ways in research on disease pathophysiology, diagnostics and gene-discovery projects. It has been used to provide lists of genes associated with one or more HPO terms in order to filter lists of candidate genes (37–39), to prioritize candidate genes in Exome-sequencing studies via PhenIX, Phevor or Exomiser (40–43), and to identify known or novel disease genes or to analyze structural variation in large cohorts (44–46). The Deciphering Developmental Disorders (DDD) study analyzed 4125 families with diverse developmental disorders and identified four novel disease-gene associations by combined analysis of the genotypes and the phenotypic similarity of patients with recessive variants in the same candidate gene (47). The BRIDGE-BPD Consortium (48) used genome sequencing combined with HPO coding to identify a gain-of-function variant in DIAPH1 in two unrelated pedigrees with deafness and macrothrombocytopenia (49). This finding was supported by Phenotype Similarity Regression (SimReg), an algorithm for identifying composite phenotypes associated with rare variation in specific genes (50). HPO-based phenotype analysis also allowed matching of human phenotypes to mouse phenotypes by cross-species analysis and thereby aided the discovery of a dominant gain-of-function mutation in SRC that causes thrombocytopenia, myelofibrosis, bleeding and bone abnormalities (51). The Matchmaker exchange (MME) platform provides a systematic approach to rare disease-gene discovery with a federated network of phenotype-genotype databases that enable data sharing and discovery of relevant data (52,53) over a secure API (54). The HPO is the standard vocabulary for communicating phenotype data. The MME currently connects over 30 000 rare disease cases across six different patient databases.

TRANSLATIONAL RESEARCH AND DIAGNOSTICS WITH HPO: ALGORITHMS AND TOOLS

The HPO is a computational resource that allows algorithms to ‘compute over’ clinical phenotype data in an increasing number of contexts through a growing number of tools from the HPO Consortium and other groups (Table 2). The tools use the ontological structure of the HPO that allows individual terms to be associated with an information content, a measure of specificity (55), or with the underlying logical definitions of the terms, such that HPO terms can be linked to other resources such as model organisms (56,57).

Table 2.

Tools and applications using HPO

Tool	Reference
Phenotype-driven differential diagnosis
Phenomizer	(70)
BOQA	(71)
FACE2GENE	(72)
Phenolyzer	(73)
Phenotype-driven exome/genome analysis
Exomiser	(35,74)
PhenIX	(32)
Phevor	(36)
PhenoVar	(75)
eXtasy	(76)
OMIMExplorer	(77)
Phen-Gen	(78)
Geno2MP	(21)
Genomiser	(79)
SimReg	(50)
ontologySimilarity	*
Functional and network analysis
TopGene/ToppFunn	(80)
WebGestalt	(81)
SUPERFAMILY	(82)
GREAT	(83)
Random walk on heterogeneous network	(84)
PANDA	(85)
PREDICT	(86)
Clinical data management and analysis
Phenotips	(15)
Patient Archive	(2)
GENESIS (GEM.app)	(87)
Cross-species phenotype analysis
PhenoDigm	(88)
MouseFinder	(89)
Monarch	(2,53)
PhenomeNet	(90)
UberPheno	(56)
MORPHIN	(91)
PhenogramViz	(92)
Phenotype knowledge resources and databases
Orphanet	(3)
MalaCards	(93)
NIH genetic testing registry	(94)
OMIM	(95)
dcGO	(96)
ClinVar	(97)
GeneSetDB	(98)
MSeqDR	(99)
DIDA (digenic diseases database)	(100)
Genetic and Rare Diseases (GARD) Information Center	(101)
Visualization
PhenoStacks	(102)
PhenoBlocks	(103)
DECIPHER (phenogram)	(63)
phenogrid	(2)
ontologyPlot	*

*Greene, D., Richardson, S. and Turro, E. OntologyX: a suite of R packages for working with ontological data, under review.

PUBLISHING PROCESSES AND DATA EXCHANGE

It is non-trivial to collect patient phenotypes reliably, whether retrospectively from existing medical data or prospectively. The overwhelming majority of clinical descriptions in the medical literature are available only as natural language text, meaning that searching, analysis and integration of medically relevant information is challenging. An important step to increase the amount and quality of phenotype data in databases is to obtain the relevant information from authors upon submission of articles. The journal Cold Spring Harbor Molecular Case Studies requires authors to select HPO terms for research papers that are displayed alongside the manuscript and that can be used to search journal content for other cases with overlapping HPO terms (58). Short Reports in Clinical Genetics require authors to submit HPO-coded phenotype data to PhenomeCentral (16). An important goal of the HPO and the Monarch Initiative is to provide computational standards that will allow for exchange of detailed genotype and phenotype data by means of the emerging PhenoPackets standard (http://phenopackets.org).

PATIENT PHENOTYPING

Patient-reported phenotype data in patient registries such as J-RARE for rare diseases has been increasingly exploited in scientific research; for instance, indicating symptoms still unknown to physicians. A barrier to the use of patient-reported data for understanding the natural history and phenotypic spectrum of diseases lies in the fact that clinical terminology is often unfamiliar to patients. The HPO consortium has therefore increased the usability of the HPO by patients, as well as scientists and clinicians, by systematically adding new, ‘plain language’ terms, either as synonyms to existing classes or by tagging existing HPO class labels as "layperson". These layperson terms provide increased access to the HPO—for example, a patient may know they are ‘color-blind’, but may not be familiar with the clinical term ‘Dyschromatopsia’. As a result of this effort, the HPO now contains over 6000 layperson terms that can be integrated into patient registries, making the terminology useful for data interoperability across clinicians and patients. Future work will include validation studies using data from patient registries to demonstrate the utility of the HPO layperson synonyms in informing rare disease diagnosis (59).

HPO: an assessment by the NIHR rare disease initiatives

HPO is used as the system to capture of phenotypic information for the UK's National Institute for Health Research (NIHR) Rare Disease initiatives on projects such as NIHR RD-TRC (Rare Disease—Translational Research Collaboration, http://rd.trc.nihr.ac.uk) and the NIHR BioResource Rare Disease NIHR BR-RD. HPO is employed in all of these broad wide-ranging studies and includes data integration from a variety of sources such as multiple EHR systems, in a variety of locations and specialities. In some disease areas, for example, bleeding and platelet disorders, HPO has been the platform for new gene discovery and innovative research findings (44); the advantage of HPO its support for statistical power associations across phenotypes across different diseases and in different branches of the HPO ontology. The NIHR Rare Disease initiatives use a common infrastructure and clinical coding for the RD-TRC (56 studies), BR-RD (14 studies) and also in our contribution to the 100 000 Genome project (160+ targeted diseases, as mentioned above). This produces a large and diverse dataset with a growing ‘data dictionary’ containing terms mapped across different systems and coding schemes and includes clinically relevant signs outside of HPO—for example, lab test results or exercise questionnaires. In a short update, it is difficult to present the breadth of the contribution HPO makes to NIHR-RD research, which indeed is growing as more diseases are characterized and encoded using HPO. The HPO is now being employed in numerous NIHR-RD studies and it is anticipated that its use will be extended into all studies in which phenotype data are captured. The NIHR-RD-TRC has developed a qualitative scale for the maturity of HPO across different disease areas which adopts a four stage assessment (Table 3). The current, subjective, assessment of HPO maturity by the NIHR RD-TCR is shown in Table 4. The assessment will be used to prioritize areas requiring most attention in our future work.

Table 3.

NIHR-RD-TRC assessment scale

Stage	Description	Example
Foundation	The basis of characterizing the disease in HPO needs to be developed	HPO is good for describing dysmorphologies especially across species: how do you model and use dyslexia?
Formulation	The theory is defined but key details need to be defined and handled in the ontology computations	HPO models biology, where diseases are caused by environmental factors, e.g. cancers — how can an environment ontology be included?
Refinement	The key data sets and definitions for the disease are identified and available but require ‘translation’	Theme based registry systems hold collections of data in other coding systems (registry-specific or ICD) — how can these be mapped onto HPO?
Maturity	The HPO framework is in place and productive results are being obtained, the HPO term set continues to evolve	The HPO basics are in place and a set of Phenotypes in place — do we need more terms or do existing terms need modification?

Table 4.

NIHR-RD-TRC assessment of HPO maturity

Theme	Foundation	Formulation	Refinement	Maturity
Cancer	✓✓✓✓	✓✓
Cardiovascular	✓✓✓✓✓	✓✓✓	✓✓
Central Nervous System	✓✓✓
Eye Diseases	✓✓✓✓✓	✓✓✓✓✓	✓✓✓✓✓	✓✓
Gastrointestinal	✓✓✓✓	✓✓✓
Immunological Disorders	✓✓✓✓✓	✓✓✓	✓✓
Paediatric (cross-cutting)	✓✓✓✓✓	✓✓✓	✓✓✓	✓
Metabolic & Endocrine Diseases	✓✓✓✓✓	✓✓
Musculoskeletal Disorders	✓✓✓✓✓	✓✓✓✓✓	✓✓✓
Muscle & Nerve Diseases	✓✓✓✓✓	✓✓✓	✓
Non-malignant Haematology	✓✓✓✓✓	✓✓✓✓✓	✓✓✓✓✓	✓✓✓
Renal	✓✓✓✓✓	✓✓✓✓✓	✓✓✓	✓
Respiratory Diseases	✓✓✓	✓
Skin Diseases	✓✓✓✓✓	✓✓✓✓✓	✓✓✓

FUTURE DEVELOPMENTS AND OUTLOOK

Development of the HPO has continued steadily since its initial publication in 2008 (1), and has focused on providing a well defined, comprehensive, and interoperable resource for computational analysis of human disease phenotypes, and has been used as a basis for a wide panoply of tools to perform analysis in clinical and in research settings. The HPO has been adopted by a growing number of groups internationally, and efforts are underway to translate the HPO into six languages, as we will report on in the future. Orphanet serves as a reference portal for rare diseases populated by literature curation and validated by international experts (3). The HPO project and Orphanet are working on the creation of an integrated RD-specific informatics ecosystem that will build on the HPO as well as the Orphanet Rare Disease Ontology (ORDO), an open-access ontology developed from the Orphanet information system (60). While the initial focus of the HPO was placed on rare, mainly Mendelian diseases, HPO annotations are now available also for 3145 common diseases (10). Current work will involve the extension of HPO resources for precision medicine, cancer, and disorders such as congenital heart malformations that are characterized by non-Mendelian inheritance.

98 in total

1. The Matchmaker Exchange API: automating patient matching through the exchange of structured phenotypic and genotypic profiles.

Authors: Orion J Buske; François Schiettecatte; Benjamin Hutton; Sergiu Dumitriu; Andriy Misyura; Lijia Huang; Taila Hartley; Marta Girdea; Nara Sobreira; Chris Mungall; Michael Brudno
Journal: Hum Mutat Date: 2015-10 Impact factor: 4.878

2. PhenoBlocks: Phenotype Comparison Visualizations.

Authors: Michael Glueck; Peter Hamilton; Fanny Chevalier; Simon Breslav; Azam Khan; Daniel Wigdor; Michael Brudno
Journal: IEEE Trans Vis Comput Graph Date: 2016-01 Impact factor: 4.579

3. PhenoTips: patient phenotyping software for clinical and research use.

Authors: Marta Girdea; Sergiu Dumitriu; Marc Fiume; Sarah Bowdin; Kym M Boycott; Sébastien Chénier; David Chitayat; Hanna Faghfoury; M Stephen Meyn; Peter N Ray; Joyce So; Dimitri J Stavropoulos; Michael Brudno
Journal: Hum Mutat Date: 2013-05-24 Impact factor: 4.878

4. PhenoStacks: Cross-Sectional Cohort Phenotype Comparison Visualizations.

Authors: Michael Glueck; Alina Gvozdik; Fanny Chevalier; Azam Khan; Michael Brudno; Daniel Wigdor
Journal: IEEE Trans Vis Comput Graph Date: 2016-08-05 Impact factor: 4.579

5. PREDICT: a method for inferring novel drug indications with application to personalized medicine.

Authors: Assaf Gottlieb; Gideon Y Stein; Eytan Ruppin; Roded Sharan
Journal: Mol Syst Biol Date: 2011-06-07 Impact factor: 11.429

6. Improved exome prioritization of disease genes through cross-species phenotype comparison.

Authors: Peter N Robinson; Sebastian Köhler; Anika Oellrich; Kai Wang; Christopher J Mungall; Suzanna E Lewis; Nicole Washington; Sebastian Bauer; Dominik Seelow; Peter Krawitz; Christian Gilissen; Melissa Haendel; Damian Smedley
Journal: Genome Res Date: 2013-10-25 Impact factor: 9.043

7. Rare loss-of-function variants in SETD1A are associated with schizophrenia and developmental disorders.

Authors: Tarjinder Singh; Mitja I Kurki; David Curtis; Shaun M Purcell; Lucy Crooks; Jeremy McRae; Jaana Suvisaari; Himanshu Chheda; Douglas Blackwood; Gerome Breen; Olli Pietiläinen; Sebastian S Gerety; Muhammad Ayub; Moira Blyth; Trevor Cole; David Collier; Eve L Coomber; Nick Craddock; Mark J Daly; John Danesh; Marta DiForti; Alison Foster; Nelson B Freimer; Daniel Geschwind; Mandy Johnstone; Shelagh Joss; Georg Kirov; Jarmo Körkkö; Outi Kuismin; Peter Holmans; Christina M Hultman; Conrad Iyegbe; Jouko Lönnqvist; Minna Männikkö; Steve A McCarroll; Peter McGuffin; Andrew M McIntosh; Andrew McQuillin; Jukka S Moilanen; Carmel Moore; Robin M Murray; Ruth Newbury-Ecob; Willem Ouwehand; Tiina Paunio; Elena Prigmore; Elliott Rees; David Roberts; Jennifer Sambrook; Pamela Sklar; David St Clair; Juha Veijola; James T R Walters; Hywel Williams; Patrick F Sullivan; Matthew E Hurles; Michael C O'Donovan; Aarno Palotie; Michael J Owen; Jeffrey C Barrett
Journal: Nat Neurosci Date: 2016-03-14 Impact factor: 24.884

8. WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): update 2013.

Authors: Jing Wang; Dexter Duncan; Zhiao Shi; Bing Zhang
Journal: Nucleic Acids Res Date: 2013-05-23 Impact factor: 16.971

Review 9. Inherited platelet disorders: toward DNA-based diagnosis.

Authors: Claire Lentaigne; Kathleen Freson; Michael A Laffan; Ernest Turro; Willem H Ouwehand
Journal: Blood Date: 2016-04-19 Impact factor: 25.476

10. A visual and curatorial approach to clinical variant prioritization and disease gene discovery in genome-wide diagnostics.

Authors: Regis A James; Ian M Campbell; Edward S Chen; Philip M Boone; Mitchell A Rao; Matthew N Bainbridge; James R Lupski; Yaping Yang; Christine M Eng; Jennifer E Posey; Chad A Shaw
Journal: Genome Med Date: 2016-02-02 Impact factor: 11.117

288 in total

1. The effect of copy number variations in chromosome 16p on body weight in patients with intellectual disability.

Authors: Fátima Gimeno-Ferrer; David Albuquerque; Carola Guzmán Luján; Goitzane Marcaida Benito; Cristina Torreira Banzas; Alfredo Repáraz-Andrade; Virginia Ballesteros Cogollos; Montserrat Aleu Pérez-Gramunt; Enrique Galán Gómez; Inés Quintela; Raquel Rodríguez-López
Journal: J Hum Genet Date: 2018-12-05 Impact factor: 3.172

2. Resolving "orphaned" non-specific structures using machine learning and natural language processing methods.

Authors: Dongfang Xu; Steven S Chong; Thomas Rodenhausen; Hong Cui
Journal: Biodivers Data J Date: 2018-08-10

Review 3. Emerging Role of Precision Medicine in Cardiovascular Disease.

Authors: Jane A Leopold; Joseph Loscalzo
Journal: Circ Res Date: 2018-04-27 Impact factor: 17.367

Review 4. High-throughput mouse phenomics for characterizing mammalian gene function.

Authors: Steve D M Brown; Chris C Holmes; Ann-Marie Mallon; Terrence F Meehan; Damian Smedley; Sara Wells
Journal: Nat Rev Genet Date: 2018-06 Impact factor: 53.242

Review 5. The Impact of Next-Generation Sequencing on the Diagnosis and Treatment of Epilepsy in Paediatric Patients.

Authors: Davide Mei; Elena Parrini; Carla Marini; Renzo Guerrini
Journal: Mol Diagn Ther Date: 2017-08 Impact factor: 4.074

6. Annotation of phenotypes using ontologies: a gold standard for the training and evaluation of natural language processing systems.

Authors: Wasila Dahdul; Prashanti Manda; Hong Cui; James P Balhoff; T Alexander Dececchi; Nizar Ibrahim; Hilmar Lapp; Todd Vision; Paula M Mabee
Journal: Database (Oxford) Date: 2018-01-01 Impact factor: 3.451

7. ClinVar Miner: Demonstrating utility of a Web-based tool for viewing and filtering ClinVar data.

Authors: Alex Henrie; Sarah E Hemphill; Nicole Ruiz-Schultz; Brandon Cushman; Marina T DiStefano; Danielle Azzariti; Steven M Harrison; Heidi L Rehm; Karen Eilbeck
Journal: Hum Mutat Date: 2018-06-21 Impact factor: 4.878

8. Rare Disease Mechanisms Identified by Genealogical Proteomics of Copper Homeostasis Mutant Pedigrees.

Authors: Stephanie A Zlatic; Alysia Vrailas-Mortimer; Avanti Gokhale; Lucas J Carey; Elizabeth Scott; Reid Burch; Morgan M McCall; Samantha Rudin-Rush; John Bowen Davis; Cortnie Hartwig; Erica Werner; Lian Li; Michael Petris; Victor Faundez
Journal: Cell Syst Date: 2018-01-31 Impact factor: 10.304

9. Exome Sequencing in Children.

Authors: Elisa A Mahler; Jessika Johannsen; Konstantinos Tsiakas; Katja Kloth; Sabine Lüttgen; Chris Mühlhausen; Bader Alhaddad; Tobias B Haack; Tim M Strom; Fanny Kortüm; Thomas Meitinger; Ania C Muntau; René Santer; Christian Kubisch; Davor Lessel; Jonas Denecke; Maja Hempel
Journal: Dtsch Arztebl Int Date: 2019-03-22 Impact factor: 5.594

10. Developing a genomics rotation: Practical training around variant interpretation for genetic counseling students.

Authors: Megan E Grove; Shana White; Dianna G Fisk; Shannon Rego; Orit Dagan-Rosenfeld; Jennefer N Kohler; Chloe M Reuter; Devon Bonner; Matthew T Wheeler; Jonathan A Bernstein; Kelly E Ormond; Andrea K Hanson-Kahn
Journal: J Genet Couns Date: 2019-02-01 Impact factor: 2.537