Literature DB >> 27899636

The Monarch Initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species.

Christopher J Mungall¹, Julie A McMurry², Sebastian Köhler³, James P Balhoff⁴, Charles Borromeo⁵, Matthew Brush², Seth Carbon¹, Tom Conlin², Nathan Dunn¹, Mark Engelstad², Erin Foster², J P Gourdine², Julius O B Jacobsen⁶, Dan Keith², Bryan Laraway², Suzanna E Lewis¹, Jeremy NguyenXuan¹, Kent Shefchek², Nicole Vasilevsky², Zhou Yuan⁵, Nicole Washington¹, Harry Hochheiser⁵, Tudor Groza⁷, Damian Smedley⁶, Peter N Robinson^3,8, Melissa A Haendel⁹.

Abstract

The correlation of phenotypic outcomes with genetic variation and environmental factors is a core pursuit in biology and biomedicine. Numerous challenges impede our progress: patient phenotypes may not match known diseases, candidate variants may be in genes that have not been characterized, model organisms may not recapitulate human or veterinary diseases, filling evolutionary gaps is difficult, and many resources must be queried to find potentially significant genotype-phenotype associations. Non-human organisms have proven instrumental in revealing biological mechanisms. Advanced informatics tools can identify phenotypically relevant disease models in research and diagnostic contexts. Large-scale integration of model organism and clinical research data can provide a breadth of knowledge not available from individual sources and can provide contextualization of data back to these sources. The Monarch Initiative (monarchinitiative.org) is a collaborative, open science effort that aims to semantically integrate genotype-phenotype data from many species and sources in order to support precision medicine, disease modeling, and mechanistic exploration. Our integrated knowledge graph, analytic tools, and web services enable diverse users to explore relationships between phenotypes and genotypes across species.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Year: 2016 PMID： 27899636 PMCID： PMC5210586 DOI： 10.1093/nar/gkw1128

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

A fundamental axiom of biology is that phenotypic manifestations of an organism are due to interaction between genotype and environmental factors over time. In the rapidly advancing era of genomic medicine, a critical challenge is to identify the genetic etiologies of Mendelian disease, cancer, and common and complex diseases, and translate basic science to better treatments. Currently, available human data associates ∼<51% of known human coding genes with phenotype data (based on OMIM (1), ClinVar (2), Orphanet (3), CTD (4) and the GWAS catalog (5)). See Table 1 for a list of database abbreviations. This coverage can be extended to ∼89% if phenotypic information from orthologous genes from five of the most well-studied model organisms is included (Figure 1). Similarly, of the 72% of the 3230 genes in ExAC with ‘near-complete depletion of predicted protein-truncating variants have no currently established human disease phenotype’ (6), where 88% of these genes without a human phenotype have a phenotype in a non-human organism. However, leveraging these model data for computational use is non-trivial primarily because the relationships between gene and disease (7) and between model system and disease phenotypes (8) are not straightforward.

Table 1.

Glossary of acronyms

Acronym	Name	URL	Ref
Bgee	BgeeDb	http://bgee.org/	(55)
BioGrid	Biological General Repository for Interaction Datasets.	https://thebiogrid.org/	(33)
CL	Cell Ontology	http://obofoundry.org/ontology/cl.html	(62)
ClinVar	ClinVar	https://www.ncbi.nlm.nih.gov/clinvar/	(2)
CTD	Clinical Toxicology Database	http://ctdbase.org/	(4)
ECO	Evidence and Conclusions Ontology	http://obofoundry.org/ontology/eco.html	(21)
ExAC	Exome Aggregation Consortium	http://exac.broadinstitute.org/	(6)
FlyBase	FlyBase	http://flybase.org	(63)
GeneNetwork	Gene Network	http://genenetwork.org	(54)
GENO	Genotype Ontology	https://github.com/monarch-initiative/GENO-ontology/	(19)
GO	Gene Ontology	http://geneontology.org	(37)
GWAS	GWAS Catalog	https://www.ebi.ac.uk/gwas/	(5)
HP	Human Phenotype Ontology	http://human-phenotype-ontology.org/	(30)
KEGG	Kyoto Encyclopedia of Genes and Genomes	http://www.kegg.jp/	(31)
MGI	Mouse Genome Informatics	http://www.informatics.jax.org/	(36)
MonDO	Monarch Merged Disease Ontology	https://github.com/monarch-initiative/monarch-disease-ontology/	(26)
MP	Mammalian Phenotype Ontology	http://obofoundry.org/ontology/mp.html	(20)
MPD	Mouse phenome database	http://phenome.jax.org/	(53)
MyGene	MyGene	http://mygene.info	(32)
OMIA	Online Mendelian Inheritance in Animals	http://omia.angis.org.au/home/	(41)
OMIM	Online Mendelian Inheritance in Man	http://omim.org	(1)
OrphaNet	Portal for rare diseases and orphan drugs	http://www.orpha.net	(3)
Panther	PantherDB	http://pantherdb.org	(34)
RO	Relation Ontology	http://obofoundry.org/ontology/ro.html	(17)
SEPIO	Scientific Evidence and Provenance Information Ontology	https://github.com/monarch-initiative/SEPIO-ontology/	(59)
SO	Sequence Ontology	http://www.sequenceontology.org/	(27)
Uberon	Uber-anatomy ontology	http://uberon.org	(23)
Upheno	Unified Phenotype Ontology	https://github.com/obophenotype/upheno/	(25)
WormBase	WormBase	http://wormbase.org	(64)
ZFIN	Zebrafish Information Resource	http://zfin.org	(35)

Figure 1.

The phenotype annotation coverage of human coding genes. Yellow bars show that 51% of those genes have at least one phenotype association reported in humans (HPO annotations of OMIM, ClinVar, Orphanet, CTD and GWAS). The blue bars show that 58% of human coding genes have orthologs with causal phenotypic associations reported in at least one non-human model (MGI, Wormbase, Flybase and ZFIN). The green bars show that 40% of human coding genes have annotations both in human and in non-human orthologs. There are phenotypic associations from humans and/or non-human orthologs that cover 89% of human coding genes. In recent years, there has been a growth in the number of genotype–phenotype databases available, covering a diversity of domain areas for human, model organisms, and veterinary species. While providing quality inventories of the relevant species and phenotypic data types, most resources are limited to a single species or limit cross-species comparison to direct assertions (e.g. Organism X is a model of Disease Y) or based upon orthology relations (e.g. organism Z is a model of Disease Y due to A and A′ being orthologs). While great strides have been made in text-based search engines, phenotype data remains difficult to search and use computationally due to its complexity and in the use of different phenotype standards and terminologies. Such barriers have made linking and integration with the precision and richness needed for mechanistic discovery across species a significant challenge (9). A newer method to aid identifying models of disease and to discover underlying mechanisms is to utilize ontologies to describe the set of phenotypes that present for a given genotype or disease, what we call a ‘phenotypic profile’. A phenotypic profile is the subject of non-exact matching within and across species using ontology integration and semantic similarity algorithms (10,11) in software applications such as Exomiser (12) and Genomiser (13), and this approach has been shown to assist disease diagnosis (14–16). The Monarch Initiative uses an ontology-based strategy to deeply integrate genotype–phenotype data from many species and sources, thereby enabling computational interrogation of disease models and complex relationships between genotype and phenotype to be revealed. The name ‘Monarch Initiative’ was chosen because it is a community effort to create paths for diverse data to be put to use for disease discovery, not unlike the navigation routes that a monarch butterfly would take.

Data architecture

The overall data architecture for Monarch is shown in Figure 2. The bulk of the data integration is carried out using our Data Ingest Pipeline (Dipper) tool (https://github.com/monarch-initiative/dipper), which maps a variety of external data sources and databases to RDF (Resource Description Framework) graphs. RDF provides a flexible way of modeling a variety of complex datatypes, and allows entities from different databases to be connected via common instance or class URIs (Uniform Resource Indicators). We use relationship types from the Relation Ontology (RO; https://github.com/oborel/obo-relations) (17) and other vocabularies to connect entities together, along with a number of Open Biological Ontologies (18) (OBOs) to classify these entities. For example, a mouse genotype can be related to a phenotype using the has_phenotype relation (RO:0002200), with the genotype classified using a term from the Genotype Ontology (GENO) (19), and the phenotype classified using the Mammalian Phenotype Ontology (MP) (20). We use the Open Biomedical Annotations (OBAN; https://github.com/EBISPOT/OBAN) vocabulary to associate evidence and provenance metadata with each edge, using the Evidence and Conclusions Ontology (ECO) for types of evidence (21).The graphs produced by Dipper are available as a standalone resource in RDF/turtle format at http://data.monarchinitiative.org/ttl.

Figure 2.

Monarch Data Architecture. Structured and unstructured data sources are loaded into SciGraph via Dipper. Ontologies are also loaded into SciGraph, resulting in a combined knowledge and data graph. Data is disseminated via SciGraph Services, an ontology-enhanced Solr instance called GOlr, and to the OwlSim semantic similarity software. Monarch applications and end users access the services for graph querying, application population and phenotype matching. We also import a number of external and in-house ontologies, for data description and data integration. As these ontologies are all available from the OBO Library in Web Ontology Language (OWL), no additional transformation is necessary. The combined corpus of graphs ingested using Dipper and from ontologies is referred to as the Monarch Knowledge Graph. The data integrated within Monarch encompasses a wide range of sources, and includes human clinical knowledge sources as well as genetic and genomic resources covering organismal biology. The list of data sources and ontologies integrated is shown in Figure 3, with a species distribution illustrated in Figure 4. The knowledge graph is loaded into an instance of a SciGraph database (https://github.com/SciGraph/SciGraph/), which embeds and extends a Neo4J database, allowing for complex queries and ontology-aware data processing and Named Entity Recognition. We provide two public endpoints for client software to query these services: https://scigraph-ontology.monarchinitiative.org/scigraph/docs (for ontology access) and https://scigraph-data.monarchinitiative.org/scigraph/docs (for ontology plus data access).

Figure 3.

Figure 4.

Distribution of phenotypic annotations across species in Monarch, broken down by the top levels of the phenotype ontology. The graph can be interactively explored at https://monarchinitiative.org/phenotype/. Note that annotations are currently dominated by human, mouse, zebrafish and C. elegans (top panel); the chart is faceted allowing individual species to be switched on and off to see contributions for less data-rich species such as veterinary animals and monkeys (middle panel). Clicking on a given phenotype text allows drilling down to its subtypes (lower panel).

Data types, sources, and the ontologies used for their integration into the Monarch knowledge graph. Each data source uses or is mapped to a suite of different ontologies or vocabularies. These are in turn integrated into bridging ontologies for Genetics (GENO), Anatomy (Uberon/CL), Phenotypes (UPheno) and Diseases (MonDO). Distribution of phenotypic annotations across species in Monarch, broken down by the top levels of the phenotype ontology. The graph can be interactively explored at https://monarchinitiative.org/phenotype/. Note that annotations are currently dominated by human, mouse, zebrafish and C. elegans (top panel); the chart is faceted allowing individual species to be switched on and off to see contributions for less data-rich species such as veterinary animals and monkeys (middle panel). Clicking on a given phenotype text allows drilling down to its subtypes (lower panel). These SciGraph instances provide powerful graph querying capabilities over the complete knowledge graph. Many of the common query patterns are executed in advance and stored in an Apache Solr index, making use of the Gene Ontology ‘GOlr’ indexing strategy, allowing for fast queries of ontology-indexed associations. Finally, we also load a subset of the graph into an OwlSim instance, which provides phenotype matching services as well as the ability to perform fuzzy phenotype searches based on a phenotype profile. We also provide phenotype matching services via the Global Alliance for Genomes and Health (GA4GH) Matchmaker Exchange (MME) API MME (22), available at https://mme.monarchinitiative.org. Many of the data sources we integrate make use of their own terminologies and ontologies. We aggregate these into a unified ontology (https://github.com/monarch-initiative/monarch-ontology/) and make use of bridging ontologies and our curated integrative ontologies to connect these together. In particular: The Uber-anatomy ontology (Uberon) bridges species-specific and clinical anatomical and tissue ontologies (23) The unified phenotype ontology bridges model organism and human phenotype ontologies and terminologies, using techniques described in (24,25) The Monarch Merged Disease Ontology (MonDO) uses a Bayes ontology merging algorithm (26) to integrate multiple human disease resources into a single ontology, and additionally includes animal diseases from OMIA. The Genotype Ontology (GENO) (19) defines genotypic elements and bridges the Sequence Ontology (SO) (27) and FALDO (28). GENO allows the propagation of phenotypes that are annotated to genotypic elements.

Entity resolution and unification

One of the many challenges faced when integrating bioinformatics resources is the presence of the same entity in multiple databases, designated by different identifiers (29). This problem is compounded by the different ways the same identifier can be written, using different prefixes or no prefix at all. Taking a Monarch page for a single gene, for example ‘fibrinogen gamma chain’, FGG, (https://monarchinitiative.org/gene/NCBIGene:2266). Monarch has integrated data from a variety of human, model organism, and other biomedical sources such as OMIM (1), Orphanet (3), ClinVar (2), HPO (30), KEGG (31), CTD (4), MyGene (32), BioGrid (33) and via orthology in PantherDB (34) we also incorporate Fgg gene data from ZFIN (35) and from MGI (36). No two of these sources represents the identifier for FGG in precisely the same way. As part of our data ingest process, we normalize all identifiers using a curated set of database prefixes. These have a defined mapping to an http URI. These curated prefixes have been deposited in the Prefix Commons (https://github.com/prefixcommons), which similarly contains identifier prefixes used within the Gene Ontology (37) and Bio2RDF (38). In post-processing equivalent identifiers, we perform clique-merging (https://github.com/SciGraph/SciGraph/wiki/Post-processors). We take all edges labeled with either the owl:sameAs or owl:equivalentClasses property and calculate equivalence cliques, based on the symmetric and transitive nature of these properties. We then merge these cliques together, taking a designated ‘clique leader’ (for instance, NCBI for genes) and mapping all edges in the monarch graph such that they point to a clique leader.

In-house curation

In addition to ingest of external sources and ontologies, we perform in-house data and ontology curation. For curation of ontology-based genotype–phenotype associations (including disease-phenotypic profiles), we are transitioning to the WebPhenote platform (http://create.monarchinitiative.org), which allows a variety of disease entities to be connected to phenotypic descriptors. We also make use of text mining to create seed disease-phenotype associations using the Bio-Lark toolkit (39), which are then manually curated. Most recently, we have performed a large-scale annotation of PubMed to extract common disease-phenotype associations (40). Most of the in-house curation work involves making smaller resources with free text descriptions of phenotypic information computable, for example, the Online Mendelian Inheritance in Animals (OMIA) resource, with whom we have been collaborating to support this curation (41).

Quality control

External resources and datasets that are incorporated into Monarch are evaluated before incorporation into the Dipper pipeline—we primarily integrate high-quality curated resources. For all ontologies we bring in, we apply automated reasoning to detect inconsistencies between different ontologies. For each release, we perform high-level checks on each integrated resource to ensure no errors in the extraction process occurred, but we do not perform in-depth curation checks of integrated resources. Each release happens once every one to two months. In order to measure annotation richness, we have also created an annotation sufficiency meter web service (42) available at https://monarchinitiative.org/page/services; this service determines whether a given phenotype profile for any organism is sufficiently broad and deep to be of diagnostic utility. The sufficiency score can be displayed as a five star scale as in PhenoTips (43) and in the Monarch web portal (see below) to aid curation or data entry, and can also be used to suggest additional phenotypic assays to be performed—whether in a patient or in a model organism.

Monarch web portal

The Monarch portal is designed with a number of different use cases in mind, including: A researcher interested in a human gene, its phenotypes, and the phenotypes of orthologs in model organisms and other species Patients or researchers interested in a particular disease or phenotype (or groups of these), together with information on all implicated genes A clinical scenario in which a patient has an undiagnosed disease showing a spectrum of phenotypes, with no definitive candidate gene demonstrated by sequencing; in this scenario the clinician wishes to search for either known diseases that have a similar presentation, or model organism genes that demonstrate homologous phenotypes when the gene is perturbed Researcher looking for diseases that have similar phenotypic feature to a newly identified model organism mutant identified in a screen Researchers or clinicians who need to identify potentially informative phenotyping assays for differential diagnosis or to identify candidate genes

Features

Integrated information on entities of interest

We provide overview pages for entities such as genes, diseases, phenotypes, genotypes, variants and publications. Each page highlights the provenance of the data from the diverse clinical, model organism, and non-model organism sources. These pages can be found either via search (see below) or through an entity resolver. For example, the URL https://monarchinitiative.org/OMIM:266510 will redirect to a page about the disease ‘Peroxisome biogenesis disorder type 3B’ from the OMIM resource, showing its relationships to other content within the Monarch knowledge graph, such as phenotypes and genes associated with the disease. We make use of MonDO (the Monarch merged disease ontology (26)) to group similar diseases together. Figure 5 shows an example page for Marfan syndrome with related phenotype, gene, model and variant data.

Figure 5.

Annotated Monarch webpage for Marfan and Marfan Related syndrome. This group of syndromic diseases has a number of different associations spanning multiple entity types—disease phenotypes, implicated human genes, variants and animal models and other model systems. An abstraction of the contents and features of the tabs is shown in the lower panel. Actual contents of the tabs are best viewed in the context of the web app at https://monarchinitiative.org/DOID:14323.

Basic Search

The portal provides different means of searching over integrated content. In cases where a user is interested in a specific disease, gene, phenotype etc., these can usually be found via autocomplete. Site-wide synonym-aware text search can also be used to find pages of interest. Because the knowledgebase combines information from multiple species, entities such as genes often have ambiguous symbols. We provide species information to help disambiguate in a search.

Search by phenotype profile

One of the most innovative features of Monarch is the ability to query within and across species to look for diseases or organisms that share a set of similar but non-exact set of phenotypes (phenotypic profile). This feature uses a semantic similarity algorithm available from the OWLsim package (http://owlsim.org). Users can launch searches against specific targets: organisms, sets of named gene models, or against all models and diseases available in the Monarch repository. The Monarch Analyze Phenotypes interface (https://monarchinitiative.org/analyze/phenotypes) allows the user to build up a ‘cart’ of phenotypes, and then perform a comparison against phenotypes related to genes and diseases. Results are ranked according to closeness of match, partitioned by species, and are displayed as both a list and in the Phenogrid widget (below).

Phenogrid

Given a set of input phenotypes, as associated with a patient or a disease, Monarch phenotypic profile similarity calculations can generate results involving hundreds of diseases and models. The PhenoGrid visualization widget (Figure 6) provides an overview of these similarity results, implemented using the D3 javascript library (44). Phenotypes and models are frequently too numerous to fit on the initial display; thus scrolling, dragging, and filtering have been implemented. PhenoGrid is available as an open-source widget suitable for integration in third-party web sites, such as for model organism databases as done in the International Mouse Phenotyping Consortium (IMPC) or clinical comparison tools. Download and installation instructions are available on the Monarch Initiative web site.

Figure 6.

Partial screenshot of PhenoGrid showing Marfan syndrome. PhenoGrid shows input phenotypes in rows, models in columns, and cell contents color-coded with greater saturation indicating greater similarity. Disease phenotypes are shown as rows, and phenotypically matching human diseases and model organism genes are shown as columns—the saturation of a cell correlates with strength if phenotypic match. Mouse-over tooltips highlight diseases associated with a selected phenotype (or vice-versa), or details (including similarity scores) of any match between a phenotype and a model. User controls support the selection of alternative sort orders, similarity metrics, and displayed organism(s) (mouse, human, zebrafish or the 10 most similar models for each). Here, we see all diseases or genes that exhibit ‘Hypoplasia of the mandible’ with the matching mouse gene Tfgb2. Actual PhenoGrid data is best viewed in the context of the web app at https://monarchinitiative.org/Orphanet:284993#compare. Note matches do not need to be exact—here the mouse phenotype of ‘small mandible’ (Mouse Phenotype Ontology) has a high scoring match to ‘micrognathia’ (Human Phenotype Ontology) based on the fact that both phenotypes are related to ‘small mandible’ (Mouse Phenotype Ontology). Advanced PhenoGrid features (not displayed) include the ability to alter the scoring and sorting methods, as well as zoomed-out map-style navigation.

Text annotation

The Monarch annotation service allows a user to enter free text (e.g. a paper abstract or a clinical narrative) and perform an automated annotation on this text, with entities in the text marked up with terms from the Monarch knowledge graph, such as genes, diseases and phenotypes. Once the text is marked up, the user has the option of turning the recognized phenotype terms into a phenotype profile, and performing a profile search, or to link to any of the entity pages identified in the annotation. This tool is also available via services.

Inferring causative variants

The Exomiser (12) and more recently, Genomiser tools (45) make use of the Monarch platform and phenotype matching algorithms to rank putative causative variants using a combined variant and phenotype score. These tools have been used to diagnose patients as part of the NIH Undiagnosed Diseases Project (14) and are the first examples of using model organism phenotype data to aid rare disease diagnostics.

DISCUSSION

The Monarch Initiative provides a system to organize and harmonize the heterogeneous genotype–phenotype data found across clinical and model and non-model organism resources (such as veterinary species), creating a unified overview of this rich landscape of data sources. Some of the challenges we have had to address are that each resource shares data via different mechanisms and uses a different data model. It is particularly important to note that each organism annotates phenotypic data to different aspects of the genotype – one resource might be to a gene, another an allele, another to a set of alleles, a full genotype or a SNP. This not only makes data integration difficult, but it also means that computation over the genotype–phenotype associations must be done with care. Similar issues at MGI have been described (46). In addition, since most anatomy, phenotype, and disease ontologies describe the biology of one species, it has traditionally been quite difficult to ‘map’ across species. Some examples are the Human Phenotype Ontology (HPO) (30) and the Mouse Anatomy Ontology (47). Monarch uses four species-neutral ontologies that unify their species-specific counterparts (as shown in Figure 3): GENO for genotypes (19), UPheno for phenotypes (25), UBERON for anatomy (23), and MonDO for diseases (26). Prior efforts to map or integrate species-specific anatomical ontologies (24,48), for example, have been utilized in the construction of these species-neutral ontologies. The end result is a translational platform that allows a unified view of human, model and non-model organism biology. A comparison between Monarch and existing resources is warranted. InterMine is an open-source data warehouse system used for disseminating data from large, complex biological heterogeneous data sources (49). InterMine provides sophisticated web services to support denormalized query and has been used to improve query and data access to model organism databases (50) and non-model organisms (51). InterMine is a federated approach where individual databases each can adopt and populate their own object-oriented data model, but can also align on certain aspects such as having genomic data models aligned using the SO. However, as yet genotype and phenotype modeling is not aligned, and Intermine does not provide disease matching or phenotypic search. We are currently working with InterMine to achieve harmonization in this area. Other resources, such as KaBOB (52) and Bio2RDF (38) semantically integrate various resources into large triplestores. Bio2RDF typically retains the source vocabulary of the integrated resources, whereas KaBOB is more similar to Monarch in that it maps OBO ontologies (18). Other data integration approaches include the BioThings API, exemplified by the MyVariant system (32) which aggregates variant data from multiple sources. We are currently working with the BioThings API developers to integrate these different approaches within the Dipper framework. Monarch is unique in that it aims to align both genotypic and phenotypic modeling across species and sources.

Future directions

Future directions include bringing in phenotypic data from specialized sources and databases, incorporating a wider range of datatypes, and to extend and improve analytic methods for making cross-species inferences. Currently the core of Monarch includes primarily qualitative phenotypes described using terms from existing phenotypic vocabularies—we are starting to bring in more quantitative data, from sources such as the MPD (53) and GeneNetwork (54), in addition to expression data annotated to Uberon in BgeeDb (55). We are also extending our phenotypic search methods to incorporate Phenologs, phenotypic groupings inferred on the basis of orthologous genes (56,57). Early comparisons suggest that addition of phenologs to our suite of tools to enable genotype–phenotype inquiry across species will extend our reach in a synergistic manner (58). We therefore plan to implement this type of approach into the Monarch tool suite and website. One of the most important realizations we came across in constructing the Monarch platform was the need to better represent scientific evidence of genotype–phenotype associations. We are currently developing a Scientific Evidence and Provenance Information Ontology (SEPIO) (59) in collaboration with the Evidence and Conclusion Ontology consortium (21) and ClinGen (60) in order to classify associations as complementary, confirmatory, or contradictory. SEPIO will also integrate biological assays from the Ontology of Biomedical Investigations (61). Monarch has also been collaborating with the US National Cancer Institute's Thesaurus (NCIT) team to integrate cancer phenotypes. Finally, Monarch has been working in the context of the Global Alliance for Genomics and Health (GA4GH) to develop a formal phenotype exchange format (www.phenopackets.org) that can aid phenotypic data sharing in numerous contexts such as clinical, model organism research, biodiversity, veterinary, and evolutionary biology.

54 in total

1. A Whole-Genome Analysis Framework for Effective Identification of Pathogenic Regulatory Variants in Mendelian Disease.

Authors: Damian Smedley; Max Schubach; Julius O B Jacobsen; Sebastian Köhler; Tomasz Zemojtel; Malte Spielmann; Marten Jäger; Harry Hochheiser; Nicole L Washington; Julie A McMurry; Melissa A Haendel; Christopher J Mungall; Suzanna E Lewis; Tudor Groza; Giorgio Valentini; Peter N Robinson
Journal: Am J Hum Genet Date: 2016-08-25 Impact factor: 11.025

2. PhenoTips: patient phenotyping software for clinical and research use.

Authors: Marta Girdea; Sergiu Dumitriu; Marc Fiume; Sarah Bowdin; Kym M Boycott; Sébastien Chénier; David Chitayat; Hanna Faghfoury; M Stephen Meyn; Peter N Ray; Joyce So; Dimitri J Stavropoulos; Michael Brudno
Journal: Hum Mutat Date: 2013-05-24 Impact factor: 4.878

3. KaBOB: ontology-based semantic integration of biomedical databases.

Authors: Kevin M Livingston; Michael Bada; William A Baumgartner; Lawrence E Hunter
Journal: BMC Bioinformatics Date: 2015-04-23 Impact factor: 3.169

4. OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders.

Authors: Joanna S Amberger; Carol A Bocchini; François Schiettecatte; Alan F Scott; Ada Hamosh
Journal: Nucleic Acids Res Date: 2014-11-26 Impact factor: 19.160

5. Automatic concept recognition using the human phenotype ontology reference and test suite corpora.

Authors: Tudor Groza; Sebastian Köhler; Sandra Doelken; Nigel Collier; Anika Oellrich; Damian Smedley; Francisco M Couto; Gareth Baynam; Andreas Zankl; Peter N Robinson
Journal: Database (Oxford) Date: 2015-02-27 Impact factor: 3.451

6. Expanding the mammalian phenotype ontology to support automated exchange of high throughput mouse phenotyping data generated by large-scale mouse knockout screens.

Authors: Cynthia L Smith; Janan T Eppig
Journal: J Biomed Semantics Date: 2015-03-25

Review 7. Mouse Genome Informatics (MGI): reflecting on 25 years.

Authors: Janan T Eppig; Joel E Richardson; James A Kadin; Martin Ringwald; Judith A Blake; Carol J Bult
Journal: Mamm Genome Date: 2015-08-04 Impact factor: 2.957

8. Data, information, knowledge and principle: back to metabolism in KEGG.

Authors: Minoru Kanehisa; Susumu Goto; Yoko Sato; Masayuki Kawashima; Miho Furumichi; Mao Tanabe
Journal: Nucleic Acids Res Date: 2013-11-07 Impact factor: 16.971

9. Unification of multi-species vertebrate anatomy ontologies for comparative biology in Uberon.

Authors: Melissa A Haendel; James P Balhoff; Frederic B Bastian; David C Blackburn; Judith A Blake; Yvonne Bradford; Aurelie Comte; Wasila M Dahdul; Thomas A Dececchi; Robert E Druzinsky; Terry F Hayamizu; Nizar Ibrahim; Suzanna E Lewis; Paula M Mabee; Anne Niknejad; Marc Robinson-Rechavi; Paul C Sereno; Christopher J Mungall
Journal: J Biomed Semantics Date: 2014-05-19

10. The Cell Ontology 2016: enhanced content, modularization, and ontology interoperability.

Authors: Alexander D Diehl; Terrence F Meehan; Yvonne M Bradford; Matthew H Brush; Wasila M Dahdul; David S Dougall; Yongqun He; David Osumi-Sutherland; Alan Ruttenberg; Sirarat Sarntivijai; Ceri E Van Slyke; Nicole A Vasilevsky; Melissa A Haendel; Judith A Blake; Christopher J Mungall
Journal: J Biomed Semantics Date: 2016-07-04

113 in total

Review 1. High-Diversity Mouse Populations for Complex Traits.

Authors: Michael C Saul; Vivek M Philip; Laura G Reinholdt; Elissa J Chesler
Journal: Trends Genet Date: 2019-05-24 Impact factor: 11.639

2. Ontology based text mining of gene-phenotype associations: application to candidate gene prediction.

Authors: Şenay Kafkas; Robert Hoehndorf
Journal: Database (Oxford) Date: 2019-01-01 Impact factor: 3.451

Review 3. High-throughput mouse phenomics for characterizing mammalian gene function.

Authors: Steve D M Brown; Chris C Holmes; Ann-Marie Mallon; Terrence F Meehan; Damian Smedley; Sara Wells
Journal: Nat Rev Genet Date: 2018-06 Impact factor: 53.242

Review 4. Closing the 'phenotype gap' in precision medicine: improving what we measure to understand complex disease mechanisms.

Authors: Calum A MacRae
Journal: Mamm Genome Date: 2019-08-19 Impact factor: 2.957

Review 5. Detecting Causal Variants in Mendelian Disorders Using Whole-Genome Sequencing.

Authors: Abdul Rezzak Hamzeh; T Daniel Andrews; Matt A Field
Journal: Methods Mol Biol Date: 2021

6. Mapping Molecular Datasets Back to the Brain Regions They are Extracted from: Remembering the Native Countries of Hypothalamic Expatriates and Refugees.

Authors: Arshad M Khan; Alice H Grant; Anais Martinez; Gully A P C Burns; Brendan S Thatcher; Vishwanath T Anekonda; Benjamin W Thompson; Zachary S Roberts; Daniel H Moralejo; James E Blevins
Journal: Adv Neurobiol Date: 2018

7. Chemical-Induced Phenotypes at CTD Help Inform the Predisease State and Construct Adverse Outcome Pathways.

Authors: Allan Peter Davis; Thomas C Wiegers; Jolene Wiegers; Robin J Johnson; Daniela Sciaky; Cynthia J Grondin; Carolyn J Mattingly
Journal: Toxicol Sci Date: 2018-09-01 Impact factor: 4.849

8. MARRVEL: Integration of Human and Model Organism Genetic Resources to Facilitate Functional Annotation of the Human Genome.

Authors: Julia Wang; Rami Al-Ouran; Yanhui Hu; Seon-Young Kim; Ying-Wooi Wan; Michael F Wangler; Shinya Yamamoto; Hsiao-Tuan Chao; Aram Comjean; Stephanie E Mohr; Norbert Perrimon; Zhandong Liu; Hugo J Bellen
Journal: Am J Hum Genet Date: 2017-05-11 Impact factor: 11.025

9. Enabling semantic queries across federated bioinformatics databases.

Authors: Ana Claudia Sima; Tarcisio Mendes de Farias; Erich Zbinden; Maria Anisimova; Manuel Gil; Heinz Stockinger; Kurt Stockinger; Marc Robinson-Rechavi; Christophe Dessimoz
Journal: Database (Oxford) Date: 2019-01-01 Impact factor: 3.451

10. Encoding Clinical Data with the Human Phenotype Ontology for Computational Differential Diagnostics.

Authors: Sebastian Köhler; N Christine Øien; Orion J Buske; Tudor Groza; Julius O B Jacobsen; Craig McNamara; Nicole Vasilevsky; Leigh C Carmody; J P Gourdine; Michael Gargano; Julie A McMurry; Daniel Danis; Christopher J Mungall; Damian Smedley; Melissa Haendel; Peter N Robinson
Journal: Curr Protoc Hum Genet Date: 2019-09