| Literature DB >> 33183351 |
Qian Zhu1, Dac-Trung Nguyen2, Ivan Grishagin2, Noel Southall2, Eric Sid3, Anne Pariser3.
Abstract
BACKGROUND: The Genetic and Rare Diseases (GARD) Information Center was established by the National Institutes of Health (NIH) to provide freely accessible consumer health information on over 6500 genetic and rare diseases. As the cumulative scientific understanding and underlying evidence for these diseases have expanded over time, existing practices to generate knowledge from these publications and resources have not been able to keep pace. Through determining the applicability of computational approaches to enhance or replace manual curation tasks, we aim to both improve the sustainability and relevance of consumer health information, but also to develop a foundational database, from which translational science researchers may start to unravel disease characteristics that are vital to the research process.Entities:
Keywords: Data integration; GARD; Knowledge graph; Ontology; Rare diseases
Mesh:
Year: 2020 PMID: 33183351 PMCID: PMC7663894 DOI: 10.1186/s13326-020-00232-y
Source DB: PubMed Journal: J Biomed Semantics
Examples of FDA orphan drug designations
| # | Generic Name | Orphan Designation | Designation Date | Designation Status |
|---|---|---|---|---|
| 1 | ((1r, 4r)-N1-(2-benzyl-7-(2-methyl-2H-tetrazol-5-yl)-9H-pyrimido[4,5-b]indol-4-yl)cyclohexane-1,4-diamine dihydrobromide dihydrate)-Expanded cord blood | Prevention of Graft-versus-Host-Disease | 12/13/2018 | Designated |
| 2 | ((4-(3-benzyl-4-hydroxybenzyl)-3,5-dimethylphenoxy)methyl)phosphonic acid | Treatment of X-linked adrenoleukodystrophy | 12/05/2016 | Designated |
| 3 | (+)-(2S)-2-(4-chloro-2-methoxyphenyl)-2-{[3-methoxy-5-(methylsulfonyl)phenyl]amino}-1-[5-(trifluoromethoxy)-1H-indol-3-yl]ethanone | Treatment of dengue virus infection | 12/26/2017 | Designated |
Primary classes and the corresponding data sources
| Classes | Primary data resources | Abbreviations used in this study |
|---|---|---|
| Condition and Designated | • Inxight Drugs • FDA Orphan Drug Designations | • S_RANCHO-DISEASE-DRUG_2018-12-18_13–30 • S_FDAORPHANGARD_20190216 |
| Rare Diseases and 32 different rare disease categories from GARD | • GARD • MONDO • Orphanet • OMIM | • S_GARD • S_MONDO • S_ORDO • S_OMIM |
| Human Phenotype | • HPO • Phenotype And Trait Ontology • Mammalian Phenotype Ontology | • S_HPO • S_PATO • S_MP |
| Drug | • Inxight Drugs • FDA Orphan Drug Designations • VA National Drug File (VANDF)a • MeSHa | • S_RANCHO-DISEASE-DRUG_2018-12-18_13–30 • S_FDAORPHANGARD_20190216 • S_VANDF • S_MESH |
| Chemical | • ChEBI • Thesaurusa | • S_CHEBI • S_THESAURUS |
| Gene | • GHR • Ontology of Genes and Genomes • MedGena • Thesaurusa | • S_GHR • S_OGG • S_MEDGEN • S_THESAURUS |
| Protein | • Protein Ontology | • S_CL • S_CLO • S_MP • S_HP |
| Cell | • Cell Ontology • Cell Line Ontology • Thesaurusa • MedGena | • S_CL • S_CLO • S_THESAURUS • S_MEDGEN |
| Tissue | • Thesaurusa | • S_THESAURUS |
| DATA | • All data properties are store in this class | • DATA |
Semantic types have been adopted to represent different classes from VANDF, MeSH, MedGen, and Thesaurus, such as T109 representing “Organic Chemical”, T121 representing “Pharmacologic Substance”, T025 for “Cell”, T028 for “Gene or Genome”, etc.
Object properties
| Object Property | Relationships |
|---|---|
| has_phenotype | Disease and Phenotype |
| subClassOf | Parent and Child concepts |
| equivalentClass | Equivalence (in terms of their class extension) of two named classes. |
| exactMatch | Two concepts with a high degree of confidence that the concepts can be used interchangeably. |
| R_rel | Relationships derived from other resources, such as “has_inheritance_type” from the HPO |
| N_Name | Mappings based on concepts names and/or their synonyms. |
| I_Code | Mappings based on identifiers, such as UMLSCUI, MONDO ID, HPO ID. |
| I_GENE | Mappings based on Gene symbols |
| PAYLOAD | Concept and DATA node |
Data properties
| Data property | Corresponding class | Explanation |
|---|---|---|
| ConditionDoId, ConditionDoValue, ConditionMeshId, ContitionName, ConditionFdaUse, ConditionComment | Condition | • ConditionDoId: Mapped Disease Ontology ID; • ConditionMeshId: Mapped MeSH ID |
gard_id, Categories, is_rare, name, synonyms, xrefs, Sign and symptom, Treatment, Diagnosis, etc. | GARD Rare Diseases | • is_rare: An indicator of “RARE” disease; • xrefs: mappings to other resources, including MONDO, Orphanet; |
| CompoundName, CompoundSmiles, CAS, UNII, OfflabelUseComment | Drug | |
| ID, Label, URI, IAO_0000115 | Gene | • IAO_0000115: Definition of the concept |
ID, IAO_0000115, Label, Synonym, uri, Gene Symbol | Protein | • IAO_0000115: Definition of the concept • Id: Protein Ontology identifier |
| IAO_0000115, hasDbXref, hasRelatedSynonym, label uri | Tissue | • IAO_0000115: Definition of the concept • hasDbXref: external references |
hasDbXref, IAO_0000115 id, label, uri | Human Phenotype | • Id: HPO identifier |
| Annotation properties and object properties are adopted from NCI Thesaurus | Chemical |
Statistical results of GARD data
| Sections of GARD profile | Number of GARD diseases |
|---|---|
| Summary | 3077 |
| Symptoms | 868 |
| Cause | 862 |
| Inheritance | 729 |
| Diagnosis | 615 |
| Treatment | 1058 |
| Prognosis | 602 |
Mapping results for FDA orphan drug designations to GARD
| Mapping methods | #Mappings | #FDA orphan designations | #GARD diseases | |
|---|---|---|---|---|
| Automation | 1449 | 1162 | 482 | |
| Manual process | Done | 1041 | 859 | 491 |
| Approximate | 4,92 | 339 | 220 | |
| Failed | 618 | 618 | NA | |
Mapping results for FDA designated Drugs to UNII
| Drug mappings | |
|---|---|
| # Unique designated drugs mapped to UNII | 3322 |
| # Unique designated drugs unable to map to UNII | 525 |
Statistical results for curated drug-disease-associations from Inxight Drugs
| Total number of nodes | 12,138 |
| • Number of drugs | 8218 |
| • Number of conditions | 3920 |
| Total number of relationships | 180,363 |
Statistical results of some primary resources from the knowledge graph
| Datasets | Number of Nodes |
|---|---|
| BRENDA Tissue & Enzyme Source Ontology | 6352 |
| Human Phenotype Ontology (HPO) | 40,260 |
| Genetics Home Reference (GHR) | 1307 |
| National Organization for Rare Disorders (NORD) | 1281 |
| Medical Subject Headings (MeSH) | 279,463 |
| Monarch Disease Ontology (MONDO) | 118,962 |
| Online Mendelian Inheritance in Man (OMIM) | 109,624 |
| Orphanet | 43,610 |
| Ontology of Genes & Genomes (OGG) | 69,973 |
| Chemical Entities of Biological Interest (ChEBI) | 134,358 |
| VA National Drug File (VANDF) | 28,278 |
| Phenotype And Trait Ontology (PATO) | 3504 |
| Inxight Drugs | 19,817 |
| FDA Orphan Drug Designations | 6074 |
| GARD | 6323 |
Disease mappings across multiple disease resources
| GARD | Orphanet | MONDO | OMIM | NORD | MeSH | NCI-t | DO | GHR | MedGen | |
|---|---|---|---|---|---|---|---|---|---|---|
| 4198 | 5698 | 3481 | 1166 | 3914 | 1442 | 2920 | 860 | 4733 | ||
| 4260 | 8670 | 4475 | 672 | 3618 | 1476 | 3619 | 628 | 6726 | ||
| 6613 | 11,612 | 8954 | 1683 | 8697 | 3549 | 11,678 | 1262 | 14,652 | ||
| 4824 | 7445 | 11,655 | 1089 | 8530 | 2771 | 6575 | 869 | 13,599 | ||
| 1065 | 649 | 1190 | 725 | 959 | 647 | 826 | 382 | 1070 | ||
| 3836 | 3328 | 7935 | 5872 | 1027 | 2147 | 3526 | 860 | 8935 | ||
| 1506 | 1471 | 3483 | 1722 | 728 | 2523 | 2588 | 509 | 4642 | ||
| 2925 | 4403 | 9987 | 4358 | 915 | 3771 | 2401 | 737 | 6782 | ||
| 852 | 628 | 1138 | 566 | 383 | 899 | 499 | 736 | 1069 | ||
| 6600 | 10,717 | 18,004 | 11,067 | 1678 | 12,239 | 5041 | 10,503 | 1251 |
a NCI-t: NCI Thesaurus; b Disease Ontology
Fig. 1Disease profile for “WILSON DISEASE” (large yellow nodes denotes GARD diseases; blue nodes denotes phenotypes; purple nodes denotes drugs; red nodes denotes genes; green nodes denotes mappings to other resources)
Fig. 2Mapping examples as guidance for data harmonization (yellow nodes denote GARD diseases; green nodes denote concepts from other resources, such as OMIM)
Fig. 3Demonstration of potential disease pathogenesis discovery for rare diseases (large yellow nodes denote GARD diseases; small yellow nodes denote conditions; purple nodes denote drugs; red nodes denote chemicals)