Literature DB >> 27899636

The Monarch Initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species.

Christopher J Mungall1, Julie A McMurry2, Sebastian Köhler3, James P Balhoff4, Charles Borromeo5, Matthew Brush2, Seth Carbon1, Tom Conlin2, Nathan Dunn1, Mark Engelstad2, Erin Foster2, J P Gourdine2, Julius O B Jacobsen6, Dan Keith2, Bryan Laraway2, Suzanna E Lewis1, Jeremy NguyenXuan1, Kent Shefchek2, Nicole Vasilevsky2, Zhou Yuan5, Nicole Washington1, Harry Hochheiser5, Tudor Groza7, Damian Smedley6, Peter N Robinson3,8, Melissa A Haendel9.   

Abstract

The correlation of phenotypic outcomes with genetic variation and environmental factors is a core pursuit in biology and biomedicine. Numerous challenges impede our progress: patient phenotypes may not match known diseases, candidate variants may be in genes that have not been characterized, model organisms may not recapitulate human or veterinary diseases, filling evolutionary gaps is difficult, and many resources must be queried to find potentially significant genotype-phenotype associations. Non-human organisms have proven instrumental in revealing biological mechanisms. Advanced informatics tools can identify phenotypically relevant disease models in research and diagnostic contexts. Large-scale integration of model organism and clinical research data can provide a breadth of knowledge not available from individual sources and can provide contextualization of data back to these sources. The Monarch Initiative (monarchinitiative.org) is a collaborative, open science effort that aims to semantically integrate genotype-phenotype data from many species and sources in order to support precision medicine, disease modeling, and mechanistic exploration. Our integrated knowledge graph, analytic tools, and web services enable diverse users to explore relationships between phenotypes and genotypes across species.
© The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Year:  2016        PMID: 27899636      PMCID: PMC5210586          DOI: 10.1093/nar/gkw1128

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

A fundamental axiom of biology is that phenotypic manifestations of an organism are due to interaction between genotype and environmental factors over time. In the rapidly advancing era of genomic medicine, a critical challenge is to identify the genetic etiologies of Mendelian disease, cancer, and common and complex diseases, and translate basic science to better treatments. Currently, available human data associates ∼<51% of known human coding genes with phenotype data (based on OMIM (1), ClinVar (2), Orphanet (3), CTD (4) and the GWAS catalog (5)). See Table 1 for a list of database abbreviations. This coverage can be extended to ∼89% if phenotypic information from orthologous genes from five of the most well-studied model organisms is included (Figure 1). Similarly, of the 72% of the 3230 genes in ExAC with ‘near-complete depletion of predicted protein-truncating variants have no currently established human disease phenotype’ (6), where 88% of these genes without a human phenotype have a phenotype in a non-human organism. However, leveraging these model data for computational use is non-trivial primarily because the relationships between gene and disease (7) and between model system and disease phenotypes (8) are not straightforward.
Table 1.

Glossary of acronyms

AcronymNameURLRef
BgeeBgeeDbhttp://bgee.org/(55)
BioGridBiological General Repository for Interaction Datasets.https://thebiogrid.org/(33)
CLCell Ontologyhttp://obofoundry.org/ontology/cl.html(62)
ClinVarClinVarhttps://www.ncbi.nlm.nih.gov/clinvar/(2)
CTDClinical Toxicology Databasehttp://ctdbase.org/(4)
ECOEvidence and Conclusions Ontologyhttp://obofoundry.org/ontology/eco.html(21)
ExACExome Aggregation Consortiumhttp://exac.broadinstitute.org/(6)
FlyBaseFlyBasehttp://flybase.org(63)
GeneNetworkGene Networkhttp://genenetwork.org(54)
GENOGenotype Ontologyhttps://github.com/monarch-initiative/GENO-ontology/(19)
GOGene Ontologyhttp://geneontology.org(37)
GWASGWAS Cataloghttps://www.ebi.ac.uk/gwas/(5)
HPHuman Phenotype Ontologyhttp://human-phenotype-ontology.org/(30)
KEGGKyoto Encyclopedia of Genes and Genomeshttp://www.kegg.jp/(31)
MGIMouse Genome Informaticshttp://www.informatics.jax.org/(36)
MonDOMonarch Merged Disease Ontologyhttps://github.com/monarch-initiative/monarch-disease-ontology/(26)
MPMammalian Phenotype Ontologyhttp://obofoundry.org/ontology/mp.html(20)
MPDMouse phenome databasehttp://phenome.jax.org/(53)
MyGeneMyGenehttp://mygene.info(32)
OMIAOnline Mendelian Inheritance in Animalshttp://omia.angis.org.au/home/(41)
OMIMOnline Mendelian Inheritance in Manhttp://omim.org(1)
OrphaNetPortal for rare diseases and orphan drugshttp://www.orpha.net(3)
PantherPantherDBhttp://pantherdb.org(34)
RORelation Ontologyhttp://obofoundry.org/ontology/ro.html(17)
SEPIOScientific Evidence and Provenance Information Ontologyhttps://github.com/monarch-initiative/SEPIO-ontology/(59)
SOSequence Ontologyhttp://www.sequenceontology.org/(27)
UberonUber-anatomy ontologyhttp://uberon.org(23)
UphenoUnified Phenotype Ontologyhttps://github.com/obophenotype/upheno/(25)
WormBaseWormBasehttp://wormbase.org(64)
ZFINZebrafish Information Resourcehttp://zfin.org(35)
Figure 1.

The phenotype annotation coverage of human coding genes. Yellow bars show that 51% of those genes have at least one phenotype association reported in humans (HPO annotations of OMIM, ClinVar, Orphanet, CTD and GWAS). The blue bars show that 58% of human coding genes have orthologs with causal phenotypic associations reported in at least one non-human model (MGI, Wormbase, Flybase and ZFIN). The green bars show that 40% of human coding genes have annotations both in human and in non-human orthologs. There are phenotypic associations from humans and/or non-human orthologs that cover 89% of human coding genes.

The phenotype annotation coverage of human coding genes. Yellow bars show that 51% of those genes have at least one phenotype association reported in humans (HPO annotations of OMIM, ClinVar, Orphanet, CTD and GWAS). The blue bars show that 58% of human coding genes have orthologs with causal phenotypic associations reported in at least one non-human model (MGI, Wormbase, Flybase and ZFIN). The green bars show that 40% of human coding genes have annotations both in human and in non-human orthologs. There are phenotypic associations from humans and/or non-human orthologs that cover 89% of human coding genes. In recent years, there has been a growth in the number of genotype–phenotype databases available, covering a diversity of domain areas for human, model organisms, and veterinary species. While providing quality inventories of the relevant species and phenotypic data types, most resources are limited to a single species or limit cross-species comparison to direct assertions (e.g. Organism X is a model of Disease Y) or based upon orthology relations (e.g. organism Z is a model of Disease Y due to A and A′ being orthologs). While great strides have been made in text-based search engines, phenotype data remains difficult to search and use computationally due to its complexity and in the use of different phenotype standards and terminologies. Such barriers have made linking and integration with the precision and richness needed for mechanistic discovery across species a significant challenge (9). A newer method to aid identifying models of disease and to discover underlying mechanisms is to utilize ontologies to describe the set of phenotypes that present for a given genotype or disease, what we call a ‘phenotypic profile’. A phenotypic profile is the subject of non-exact matching within and across species using ontology integration and semantic similarity algorithms (10,11) in software applications such as Exomiser (12) and Genomiser (13), and this approach has been shown to assist disease diagnosis (14–16). The Monarch Initiative uses an ontology-based strategy to deeply integrate genotype–phenotype data from many species and sources, thereby enabling computational interrogation of disease models and complex relationships between genotype and phenotype to be revealed. The name ‘Monarch Initiative’ was chosen because it is a community effort to create paths for diverse data to be put to use for disease discovery, not unlike the navigation routes that a monarch butterfly would take.

Data architecture

The overall data architecture for Monarch is shown in Figure 2. The bulk of the data integration is carried out using our Data Ingest Pipeline (Dipper) tool (https://github.com/monarch-initiative/dipper), which maps a variety of external data sources and databases to RDF (Resource Description Framework) graphs. RDF provides a flexible way of modeling a variety of complex datatypes, and allows entities from different databases to be connected via common instance or class URIs (Uniform Resource Indicators). We use relationship types from the Relation Ontology (RO; https://github.com/oborel/obo-relations) (17) and other vocabularies to connect entities together, along with a number of Open Biological Ontologies (18) (OBOs) to classify these entities. For example, a mouse genotype can be related to a phenotype using the has_phenotype relation (RO:0002200), with the genotype classified using a term from the Genotype Ontology (GENO) (19), and the phenotype classified using the Mammalian Phenotype Ontology (MP) (20). We use the Open Biomedical Annotations (OBAN; https://github.com/EBISPOT/OBAN) vocabulary to associate evidence and provenance metadata with each edge, using the Evidence and Conclusions Ontology (ECO) for types of evidence (21).The graphs produced by Dipper are available as a standalone resource in RDF/turtle format at http://data.monarchinitiative.org/ttl.
Figure 2.

Monarch Data Architecture. Structured and unstructured data sources are loaded into SciGraph via Dipper. Ontologies are also loaded into SciGraph, resulting in a combined knowledge and data graph. Data is disseminated via SciGraph Services, an ontology-enhanced Solr instance called GOlr, and to the OwlSim semantic similarity software. Monarch applications and end users access the services for graph querying, application population and phenotype matching.

Monarch Data Architecture. Structured and unstructured data sources are loaded into SciGraph via Dipper. Ontologies are also loaded into SciGraph, resulting in a combined knowledge and data graph. Data is disseminated via SciGraph Services, an ontology-enhanced Solr instance called GOlr, and to the OwlSim semantic similarity software. Monarch applications and end users access the services for graph querying, application population and phenotype matching. We also import a number of external and in-house ontologies, for data description and data integration. As these ontologies are all available from the OBO Library in Web Ontology Language (OWL), no additional transformation is necessary. The combined corpus of graphs ingested using Dipper and from ontologies is referred to as the Monarch Knowledge Graph. The data integrated within Monarch encompasses a wide range of sources, and includes human clinical knowledge sources as well as genetic and genomic resources covering organismal biology. The list of data sources and ontologies integrated is shown in Figure 3, with a species distribution illustrated in Figure 4. The knowledge graph is loaded into an instance of a SciGraph database (https://github.com/SciGraph/SciGraph/), which embeds and extends a Neo4J database, allowing for complex queries and ontology-aware data processing and Named Entity Recognition. We provide two public endpoints for client software to query these services: https://scigraph-ontology.monarchinitiative.org/scigraph/docs (for ontology access) and https://scigraph-data.monarchinitiative.org/scigraph/docs (for ontology plus data access).
Figure 3.

Data types, sources, and the ontologies used for their integration into the Monarch knowledge graph. Each data source uses or is mapped to a suite of different ontologies or vocabularies. These are in turn integrated into bridging ontologies for Genetics (GENO), Anatomy (Uberon/CL), Phenotypes (UPheno) and Diseases (MonDO).

Figure 4.

Distribution of phenotypic annotations across species in Monarch, broken down by the top levels of the phenotype ontology. The graph can be interactively explored at https://monarchinitiative.org/phenotype/. Note that annotations are currently dominated by human, mouse, zebrafish and C. elegans (top panel); the chart is faceted allowing individual species to be switched on and off to see contributions for less data-rich species such as veterinary animals and monkeys (middle panel). Clicking on a given phenotype text allows drilling down to its subtypes (lower panel).

Data types, sources, and the ontologies used for their integration into the Monarch knowledge graph. Each data source uses or is mapped to a suite of different ontologies or vocabularies. These are in turn integrated into bridging ontologies for Genetics (GENO), Anatomy (Uberon/CL), Phenotypes (UPheno) and Diseases (MonDO). Distribution of phenotypic annotations across species in Monarch, broken down by the top levels of the phenotype ontology. The graph can be interactively explored at https://monarchinitiative.org/phenotype/. Note that annotations are currently dominated by human, mouse, zebrafish and C. elegans (top panel); the chart is faceted allowing individual species to be switched on and off to see contributions for less data-rich species such as veterinary animals and monkeys (middle panel). Clicking on a given phenotype text allows drilling down to its subtypes (lower panel). These SciGraph instances provide powerful graph querying capabilities over the complete knowledge graph. Many of the common query patterns are executed in advance and stored in an Apache Solr index, making use of the Gene Ontology ‘GOlr’ indexing strategy, allowing for fast queries of ontology-indexed associations. Finally, we also load a subset of the graph into an OwlSim instance, which provides phenotype matching services as well as the ability to perform fuzzy phenotype searches based on a phenotype profile. We also provide phenotype matching services via the Global Alliance for Genomes and Health (GA4GH) Matchmaker Exchange (MME) API MME (22), available at https://mme.monarchinitiative.org. Many of the data sources we integrate make use of their own terminologies and ontologies. We aggregate these into a unified ontology (https://github.com/monarch-initiative/monarch-ontology/) and make use of bridging ontologies and our curated integrative ontologies to connect these together. In particular: The Uber-anatomy ontology (Uberon) bridges species-specific and clinical anatomical and tissue ontologies (23) The unified phenotype ontology bridges model organism and human phenotype ontologies and terminologies, using techniques described in (24,25) The Monarch Merged Disease Ontology (MonDO) uses a Bayes ontology merging algorithm (26) to integrate multiple human disease resources into a single ontology, and additionally includes animal diseases from OMIA. The Genotype Ontology (GENO) (19) defines genotypic elements and bridges the Sequence Ontology (SO) (27) and FALDO (28). GENO allows the propagation of phenotypes that are annotated to genotypic elements.

Entity resolution and unification

One of the many challenges faced when integrating bioinformatics resources is the presence of the same entity in multiple databases, designated by different identifiers (29). This problem is compounded by the different ways the same identifier can be written, using different prefixes or no prefix at all. Taking a Monarch page for a single gene, for example ‘fibrinogen gamma chain’, FGG, (https://monarchinitiative.org/gene/NCBIGene:2266). Monarch has integrated data from a variety of human, model organism, and other biomedical sources such as OMIM (1), Orphanet (3), ClinVar (2), HPO (30), KEGG (31), CTD (4), MyGene (32), BioGrid (33) and via orthology in PantherDB (34) we also incorporate Fgg gene data from ZFIN (35) and from MGI (36). No two of these sources represents the identifier for FGG in precisely the same way. As part of our data ingest process, we normalize all identifiers using a curated set of database prefixes. These have a defined mapping to an http URI. These curated prefixes have been deposited in the Prefix Commons (https://github.com/prefixcommons), which similarly contains identifier prefixes used within the Gene Ontology (37) and Bio2RDF (38). In post-processing equivalent identifiers, we perform clique-merging (https://github.com/SciGraph/SciGraph/wiki/Post-processors). We take all edges labeled with either the owl:sameAs or owl:equivalentClasses property and calculate equivalence cliques, based on the symmetric and transitive nature of these properties. We then merge these cliques together, taking a designated ‘clique leader’ (for instance, NCBI for genes) and mapping all edges in the monarch graph such that they point to a clique leader.

In-house curation

In addition to ingest of external sources and ontologies, we perform in-house data and ontology curation. For curation of ontology-based genotype–phenotype associations (including disease-phenotypic profiles), we are transitioning to the WebPhenote platform (http://create.monarchinitiative.org), which allows a variety of disease entities to be connected to phenotypic descriptors. We also make use of text mining to create seed disease-phenotype associations using the Bio-Lark toolkit (39), which are then manually curated. Most recently, we have performed a large-scale annotation of PubMed to extract common disease-phenotype associations (40). Most of the in-house curation work involves making smaller resources with free text descriptions of phenotypic information computable, for example, the Online Mendelian Inheritance in Animals (OMIA) resource, with whom we have been collaborating to support this curation (41).

Quality control

External resources and datasets that are incorporated into Monarch are evaluated before incorporation into the Dipper pipeline—we primarily integrate high-quality curated resources. For all ontologies we bring in, we apply automated reasoning to detect inconsistencies between different ontologies. For each release, we perform high-level checks on each integrated resource to ensure no errors in the extraction process occurred, but we do not perform in-depth curation checks of integrated resources. Each release happens once every one to two months. In order to measure annotation richness, we have also created an annotation sufficiency meter web service (42) available at https://monarchinitiative.org/page/services; this service determines whether a given phenotype profile for any organism is sufficiently broad and deep to be of diagnostic utility. The sufficiency score can be displayed as a five star scale as in PhenoTips (43) and in the Monarch web portal (see below) to aid curation or data entry, and can also be used to suggest additional phenotypic assays to be performed—whether in a patient or in a model organism.

Monarch web portal

The Monarch portal is designed with a number of different use cases in mind, including: A researcher interested in a human gene, its phenotypes, and the phenotypes of orthologs in model organisms and other species Patients or researchers interested in a particular disease or phenotype (or groups of these), together with information on all implicated genes A clinical scenario in which a patient has an undiagnosed disease showing a spectrum of phenotypes, with no definitive candidate gene demonstrated by sequencing; in this scenario the clinician wishes to search for either known diseases that have a similar presentation, or model organism genes that demonstrate homologous phenotypes when the gene is perturbed Researcher looking for diseases that have similar phenotypic feature to a newly identified model organism mutant identified in a screen Researchers or clinicians who need to identify potentially informative phenotyping assays for differential diagnosis or to identify candidate genes

Features

Integrated information on entities of interest

We provide overview pages for entities such as genes, diseases, phenotypes, genotypes, variants and publications. Each page highlights the provenance of the data from the diverse clinical, model organism, and non-model organism sources. These pages can be found either via search (see below) or through an entity resolver. For example, the URL https://monarchinitiative.org/OMIM:266510 will redirect to a page about the disease ‘Peroxisome biogenesis disorder type 3B’ from the OMIM resource, showing its relationships to other content within the Monarch knowledge graph, such as phenotypes and genes associated with the disease. We make use of MonDO (the Monarch merged disease ontology (26)) to group similar diseases together. Figure 5 shows an example page for Marfan syndrome with related phenotype, gene, model and variant data.
Figure 5.

Annotated Monarch webpage for Marfan and Marfan Related syndrome. This group of syndromic diseases has a number of different associations spanning multiple entity types—disease phenotypes, implicated human genes, variants and animal models and other model systems. An abstraction of the contents and features of the tabs is shown in the lower panel. Actual contents of the tabs are best viewed in the context of the web app at https://monarchinitiative.org/DOID:14323.

Annotated Monarch webpage for Marfan and Marfan Related syndrome. This group of syndromic diseases has a number of different associations spanning multiple entity types—disease phenotypes, implicated human genes, variants and animal models and other model systems. An abstraction of the contents and features of the tabs is shown in the lower panel. Actual contents of the tabs are best viewed in the context of the web app at https://monarchinitiative.org/DOID:14323.

Basic Search

The portal provides different means of searching over integrated content. In cases where a user is interested in a specific disease, gene, phenotype etc., these can usually be found via autocomplete. Site-wide synonym-aware text search can also be used to find pages of interest. Because the knowledgebase combines information from multiple species, entities such as genes often have ambiguous symbols. We provide species information to help disambiguate in a search.

Search by phenotype profile

One of the most innovative features of Monarch is the ability to query within and across species to look for diseases or organisms that share a set of similar but non-exact set of phenotypes (phenotypic profile). This feature uses a semantic similarity algorithm available from the OWLsim package (http://owlsim.org). Users can launch searches against specific targets: organisms, sets of named gene models, or against all models and diseases available in the Monarch repository. The Monarch Analyze Phenotypes interface (https://monarchinitiative.org/analyze/phenotypes) allows the user to build up a ‘cart’ of phenotypes, and then perform a comparison against phenotypes related to genes and diseases. Results are ranked according to closeness of match, partitioned by species, and are displayed as both a list and in the Phenogrid widget (below).

Phenogrid

Given a set of input phenotypes, as associated with a patient or a disease, Monarch phenotypic profile similarity calculations can generate results involving hundreds of diseases and models. The PhenoGrid visualization widget (Figure 6) provides an overview of these similarity results, implemented using the D3 javascript library (44). Phenotypes and models are frequently too numerous to fit on the initial display; thus scrolling, dragging, and filtering have been implemented. PhenoGrid is available as an open-source widget suitable for integration in third-party web sites, such as for model organism databases as done in the International Mouse Phenotyping Consortium (IMPC) or clinical comparison tools. Download and installation instructions are available on the Monarch Initiative web site.
Figure 6.

Partial screenshot of PhenoGrid showing Marfan syndrome. PhenoGrid shows input phenotypes in rows, models in columns, and cell contents color-coded with greater saturation indicating greater similarity. Disease phenotypes are shown as rows, and phenotypically matching human diseases and model organism genes are shown as columns—the saturation of a cell correlates with strength if phenotypic match. Mouse-over tooltips highlight diseases associated with a selected phenotype (or vice-versa), or details (including similarity scores) of any match between a phenotype and a model. User controls support the selection of alternative sort orders, similarity metrics, and displayed organism(s) (mouse, human, zebrafish or the 10 most similar models for each). Here, we see all diseases or genes that exhibit ‘Hypoplasia of the mandible’ with the matching mouse gene Tfgb2. Actual PhenoGrid data is best viewed in the context of the web app at https://monarchinitiative.org/Orphanet:284993#compare. Note matches do not need to be exact—here the mouse phenotype of ‘small mandible’ (Mouse Phenotype Ontology) has a high scoring match to ‘micrognathia’ (Human Phenotype Ontology) based on the fact that both phenotypes are related to ‘small mandible’ (Mouse Phenotype Ontology). Advanced PhenoGrid features (not displayed) include the ability to alter the scoring and sorting methods, as well as zoomed-out map-style navigation.

Partial screenshot of PhenoGrid showing Marfan syndrome. PhenoGrid shows input phenotypes in rows, models in columns, and cell contents color-coded with greater saturation indicating greater similarity. Disease phenotypes are shown as rows, and phenotypically matching human diseases and model organism genes are shown as columns—the saturation of a cell correlates with strength if phenotypic match. Mouse-over tooltips highlight diseases associated with a selected phenotype (or vice-versa), or details (including similarity scores) of any match between a phenotype and a model. User controls support the selection of alternative sort orders, similarity metrics, and displayed organism(s) (mouse, human, zebrafish or the 10 most similar models for each). Here, we see all diseases or genes that exhibit ‘Hypoplasia of the mandible’ with the matching mouse gene Tfgb2. Actual PhenoGrid data is best viewed in the context of the web app at https://monarchinitiative.org/Orphanet:284993#compare. Note matches do not need to be exact—here the mouse phenotype of ‘small mandible’ (Mouse Phenotype Ontology) has a high scoring match to ‘micrognathia’ (Human Phenotype Ontology) based on the fact that both phenotypes are related to ‘small mandible’ (Mouse Phenotype Ontology). Advanced PhenoGrid features (not displayed) include the ability to alter the scoring and sorting methods, as well as zoomed-out map-style navigation.

Text annotation

The Monarch annotation service allows a user to enter free text (e.g. a paper abstract or a clinical narrative) and perform an automated annotation on this text, with entities in the text marked up with terms from the Monarch knowledge graph, such as genes, diseases and phenotypes. Once the text is marked up, the user has the option of turning the recognized phenotype terms into a phenotype profile, and performing a profile search, or to link to any of the entity pages identified in the annotation. This tool is also available via services.

Inferring causative variants

The Exomiser (12) and more recently, Genomiser tools (45) make use of the Monarch platform and phenotype matching algorithms to rank putative causative variants using a combined variant and phenotype score. These tools have been used to diagnose patients as part of the NIH Undiagnosed Diseases Project (14) and are the first examples of using model organism phenotype data to aid rare disease diagnostics.

DISCUSSION

The Monarch Initiative provides a system to organize and harmonize the heterogeneous genotype–phenotype data found across clinical and model and non-model organism resources (such as veterinary species), creating a unified overview of this rich landscape of data sources. Some of the challenges we have had to address are that each resource shares data via different mechanisms and uses a different data model. It is particularly important to note that each organism annotates phenotypic data to different aspects of the genotype – one resource might be to a gene, another an allele, another to a set of alleles, a full genotype or a SNP. This not only makes data integration difficult, but it also means that computation over the genotype–phenotype associations must be done with care. Similar issues at MGI have been described (46). In addition, since most anatomy, phenotype, and disease ontologies describe the biology of one species, it has traditionally been quite difficult to ‘map’ across species. Some examples are the Human Phenotype Ontology (HPO) (30) and the Mouse Anatomy Ontology (47). Monarch uses four species-neutral ontologies that unify their species-specific counterparts (as shown in Figure 3): GENO for genotypes (19), UPheno for phenotypes (25), UBERON for anatomy (23), and MonDO for diseases (26). Prior efforts to map or integrate species-specific anatomical ontologies (24,48), for example, have been utilized in the construction of these species-neutral ontologies. The end result is a translational platform that allows a unified view of human, model and non-model organism biology. A comparison between Monarch and existing resources is warranted. InterMine is an open-source data warehouse system used for disseminating data from large, complex biological heterogeneous data sources (49). InterMine provides sophisticated web services to support denormalized query and has been used to improve query and data access to model organism databases (50) and non-model organisms (51). InterMine is a federated approach where individual databases each can adopt and populate their own object-oriented data model, but can also align on certain aspects such as having genomic data models aligned using the SO. However, as yet genotype and phenotype modeling is not aligned, and Intermine does not provide disease matching or phenotypic search. We are currently working with InterMine to achieve harmonization in this area. Other resources, such as KaBOB (52) and Bio2RDF (38) semantically integrate various resources into large triplestores. Bio2RDF typically retains the source vocabulary of the integrated resources, whereas KaBOB is more similar to Monarch in that it maps OBO ontologies (18). Other data integration approaches include the BioThings API, exemplified by the MyVariant system (32) which aggregates variant data from multiple sources. We are currently working with the BioThings API developers to integrate these different approaches within the Dipper framework. Monarch is unique in that it aims to align both genotypic and phenotypic modeling across species and sources.

Future directions

Future directions include bringing in phenotypic data from specialized sources and databases, incorporating a wider range of datatypes, and to extend and improve analytic methods for making cross-species inferences. Currently the core of Monarch includes primarily qualitative phenotypes described using terms from existing phenotypic vocabularies—we are starting to bring in more quantitative data, from sources such as the MPD (53) and GeneNetwork (54), in addition to expression data annotated to Uberon in BgeeDb (55). We are also extending our phenotypic search methods to incorporate Phenologs, phenotypic groupings inferred on the basis of orthologous genes (56,57). Early comparisons suggest that addition of phenologs to our suite of tools to enable genotype–phenotype inquiry across species will extend our reach in a synergistic manner (58). We therefore plan to implement this type of approach into the Monarch tool suite and website. One of the most important realizations we came across in constructing the Monarch platform was the need to better represent scientific evidence of genotype–phenotype associations. We are currently developing a Scientific Evidence and Provenance Information Ontology (SEPIO) (59) in collaboration with the Evidence and Conclusion Ontology consortium (21) and ClinGen (60) in order to classify associations as complementary, confirmatory, or contradictory. SEPIO will also integrate biological assays from the Ontology of Biomedical Investigations (61). Monarch has also been collaborating with the US National Cancer Institute's Thesaurus (NCIT) team to integrate cancer phenotypes. Finally, Monarch has been working in the context of the Global Alliance for Genomics and Health (GA4GH) to develop a formal phenotype exchange format (www.phenopackets.org) that can aid phenotypic data sharing in numerous contexts such as clinical, model organism research, biodiversity, veterinary, and evolutionary biology.
  54 in total

1.  A Whole-Genome Analysis Framework for Effective Identification of Pathogenic Regulatory Variants in Mendelian Disease.

Authors:  Damian Smedley; Max Schubach; Julius O B Jacobsen; Sebastian Köhler; Tomasz Zemojtel; Malte Spielmann; Marten Jäger; Harry Hochheiser; Nicole L Washington; Julie A McMurry; Melissa A Haendel; Christopher J Mungall; Suzanna E Lewis; Tudor Groza; Giorgio Valentini; Peter N Robinson
Journal:  Am J Hum Genet       Date:  2016-08-25       Impact factor: 11.025

2.  PhenoTips: patient phenotyping software for clinical and research use.

Authors:  Marta Girdea; Sergiu Dumitriu; Marc Fiume; Sarah Bowdin; Kym M Boycott; Sébastien Chénier; David Chitayat; Hanna Faghfoury; M Stephen Meyn; Peter N Ray; Joyce So; Dimitri J Stavropoulos; Michael Brudno
Journal:  Hum Mutat       Date:  2013-05-24       Impact factor: 4.878

3.  KaBOB: ontology-based semantic integration of biomedical databases.

Authors:  Kevin M Livingston; Michael Bada; William A Baumgartner; Lawrence E Hunter
Journal:  BMC Bioinformatics       Date:  2015-04-23       Impact factor: 3.169

4.  OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders.

Authors:  Joanna S Amberger; Carol A Bocchini; François Schiettecatte; Alan F Scott; Ada Hamosh
Journal:  Nucleic Acids Res       Date:  2014-11-26       Impact factor: 19.160

5.  Automatic concept recognition using the human phenotype ontology reference and test suite corpora.

Authors:  Tudor Groza; Sebastian Köhler; Sandra Doelken; Nigel Collier; Anika Oellrich; Damian Smedley; Francisco M Couto; Gareth Baynam; Andreas Zankl; Peter N Robinson
Journal:  Database (Oxford)       Date:  2015-02-27       Impact factor: 3.451

6.  Expanding the mammalian phenotype ontology to support automated exchange of high throughput mouse phenotyping data generated by large-scale mouse knockout screens.

Authors:  Cynthia L Smith; Janan T Eppig
Journal:  J Biomed Semantics       Date:  2015-03-25

Review 7.  Mouse Genome Informatics (MGI): reflecting on 25 years.

Authors:  Janan T Eppig; Joel E Richardson; James A Kadin; Martin Ringwald; Judith A Blake; Carol J Bult
Journal:  Mamm Genome       Date:  2015-08-04       Impact factor: 2.957

8.  Data, information, knowledge and principle: back to metabolism in KEGG.

Authors:  Minoru Kanehisa; Susumu Goto; Yoko Sato; Masayuki Kawashima; Miho Furumichi; Mao Tanabe
Journal:  Nucleic Acids Res       Date:  2013-11-07       Impact factor: 16.971

9.  Unification of multi-species vertebrate anatomy ontologies for comparative biology in Uberon.

Authors:  Melissa A Haendel; James P Balhoff; Frederic B Bastian; David C Blackburn; Judith A Blake; Yvonne Bradford; Aurelie Comte; Wasila M Dahdul; Thomas A Dececchi; Robert E Druzinsky; Terry F Hayamizu; Nizar Ibrahim; Suzanna E Lewis; Paula M Mabee; Anne Niknejad; Marc Robinson-Rechavi; Paul C Sereno; Christopher J Mungall
Journal:  J Biomed Semantics       Date:  2014-05-19

10.  The Cell Ontology 2016: enhanced content, modularization, and ontology interoperability.

Authors:  Alexander D Diehl; Terrence F Meehan; Yvonne M Bradford; Matthew H Brush; Wasila M Dahdul; David S Dougall; Yongqun He; David Osumi-Sutherland; Alan Ruttenberg; Sirarat Sarntivijai; Ceri E Van Slyke; Nicole A Vasilevsky; Melissa A Haendel; Judith A Blake; Christopher J Mungall
Journal:  J Biomed Semantics       Date:  2016-07-04
View more
  113 in total

Review 1.  High-Diversity Mouse Populations for Complex Traits.

Authors:  Michael C Saul; Vivek M Philip; Laura G Reinholdt; Elissa J Chesler
Journal:  Trends Genet       Date:  2019-05-24       Impact factor: 11.639

2.  Ontology based text mining of gene-phenotype associations: application to candidate gene prediction.

Authors:  Şenay Kafkas; Robert Hoehndorf
Journal:  Database (Oxford)       Date:  2019-01-01       Impact factor: 3.451

Review 3.  High-throughput mouse phenomics for characterizing mammalian gene function.

Authors:  Steve D M Brown; Chris C Holmes; Ann-Marie Mallon; Terrence F Meehan; Damian Smedley; Sara Wells
Journal:  Nat Rev Genet       Date:  2018-06       Impact factor: 53.242

Review 4.  Closing the 'phenotype gap' in precision medicine: improving what we measure to understand complex disease mechanisms.

Authors:  Calum A MacRae
Journal:  Mamm Genome       Date:  2019-08-19       Impact factor: 2.957

Review 5.  Detecting Causal Variants in Mendelian Disorders Using Whole-Genome Sequencing.

Authors:  Abdul Rezzak Hamzeh; T Daniel Andrews; Matt A Field
Journal:  Methods Mol Biol       Date:  2021

6.  Mapping Molecular Datasets Back to the Brain Regions They are Extracted from: Remembering the Native Countries of Hypothalamic Expatriates and Refugees.

Authors:  Arshad M Khan; Alice H Grant; Anais Martinez; Gully A P C Burns; Brendan S Thatcher; Vishwanath T Anekonda; Benjamin W Thompson; Zachary S Roberts; Daniel H Moralejo; James E Blevins
Journal:  Adv Neurobiol       Date:  2018

7.  Chemical-Induced Phenotypes at CTD Help Inform the Predisease State and Construct Adverse Outcome Pathways.

Authors:  Allan Peter Davis; Thomas C Wiegers; Jolene Wiegers; Robin J Johnson; Daniela Sciaky; Cynthia J Grondin; Carolyn J Mattingly
Journal:  Toxicol Sci       Date:  2018-09-01       Impact factor: 4.849

8.  MARRVEL: Integration of Human and Model Organism Genetic Resources to Facilitate Functional Annotation of the Human Genome.

Authors:  Julia Wang; Rami Al-Ouran; Yanhui Hu; Seon-Young Kim; Ying-Wooi Wan; Michael F Wangler; Shinya Yamamoto; Hsiao-Tuan Chao; Aram Comjean; Stephanie E Mohr; Norbert Perrimon; Zhandong Liu; Hugo J Bellen
Journal:  Am J Hum Genet       Date:  2017-05-11       Impact factor: 11.025

9.  Enabling semantic queries across federated bioinformatics databases.

Authors:  Ana Claudia Sima; Tarcisio Mendes de Farias; Erich Zbinden; Maria Anisimova; Manuel Gil; Heinz Stockinger; Kurt Stockinger; Marc Robinson-Rechavi; Christophe Dessimoz
Journal:  Database (Oxford)       Date:  2019-01-01       Impact factor: 3.451

10.  Encoding Clinical Data with the Human Phenotype Ontology for Computational Differential Diagnostics.

Authors:  Sebastian Köhler; N Christine Øien; Orion J Buske; Tudor Groza; Julius O B Jacobsen; Craig McNamara; Nicole Vasilevsky; Leigh C Carmody; J P Gourdine; Michael Gargano; Julie A McMurry; Daniel Danis; Christopher J Mungall; Damian Smedley; Melissa Haendel; Peter N Robinson
Journal:  Curr Protoc Hum Genet       Date:  2019-09
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.