| Literature DB >> 31735951 |
Jean-Philippe F Gourdine1,2,3, Matthew H Brush1,3, Nicole A Vasilevsky1,3, Kent Shefchek3,4, Sebastian Köhler3,5, Nicolas Matentzoglu3,6, Monica C Munoz-Torres3,4, Julie A McMurry3,4, Xingmin Aaron Zhang3,7, Peter N Robinson3,7, Melissa A Haendel1,3,4.
Abstract
While abnormalities related to carbohydrates (glycans) are frequent for patients with rare and undiagnosed diseases as well as in many common diseases, these glycan-related phenotypes (glycophenotypes) are not well represented in knowledge bases (KBs). If glycan-related diseases were more robustly represented and curated with glycophenotypes, these could be used for molecular phenotyping to help to realize the goals of precision medicine. Diagnosis of rare diseases by computational cross-species comparison of genotype-phenotype data has been facilitated by leveraging ontological representations of clinical phenotypes, using Human Phenotype Ontology (HPO), and model organism ontologies such as Mammalian Phenotype Ontology (MP) in the context of the Monarch Initiative. In this article, we discuss the importance and complexity of glycobiology and review the structure of glycan-related content from existing KBs and biological ontologies. We show how semantically structuring knowledge about the annotation of glycophenotypes could enhance disease diagnosis, and propose a solution to integrate glycophenotypes and related diseases into the Unified Phenotype Ontology (uPheno), HPO, Monarch and other KBs. We encourage the community to practice good identifier hygiene for glycans in support of semantic analysis, and clinicians to add glycomics to their diagnostic analyses of rare diseases.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31735951 PMCID: PMC6859258 DOI: 10.1093/database/baz114
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1An example of classification of relevant glycans involved in human diseases based on the essential of glycobiology and CHEBI ontology. Glycans can be free or conjugated to macromolecules (protein, lipids). Free glycans can be monosaccharides (n = 1), oligosaccharides (2 < n < 10) or polysaccharides (n > 10), and their derivatives (e.g. acetylated, sulfated) 13 exemplary diseases names along with the mutated genes and MONDO ID (in parentheses) are indicated for 13 classes of glycans. Up, down orange arrows and orange equal signs indicate, respectively, the involvement of the gene products in glycan biosynthesis, degradation or transport. Based on our disease curation, there are 176 glycan-related diseases (CDG and diseases in which glycophenotypes are detectable, see online supplementary material for Table S1).
Glycan roles, exemplary HPO terms and glycophenotypes associated with six genetic diseases
| Glycan roles | Glycan-related group and pathways | Mutated gene | Disease (Mondo identifier) | Abnormal phenotypes associated with disease | ||
|---|---|---|---|---|---|---|
| Abnormal glycophenotypes | Examplary anatomical, infectious and behavioral phenotypes | |||||
| Structural | Physical barrier | Glycosaminoglycans (HS polymerization) |
| Hereditary multiple osteochondromas (MONDO:0005508) | Decreased circulating HS level (HP:0410343) | Abnormality of the humerus (HP:003063) Multiple exostoses (HP:0002762) |
| Protein folding | O-glycans synthesis Protein folding |
| Peters Plus syndrome (MONDO:0009856) | Shortened O-fucosylated glycan on properdin (HP:0410344) | Anterior chamber synechiae (HP:0007833) Brachycephaly (HP:0000248) | |
| Energy storage | Polysaccharide (glycogen degradation) |
| Pompe Disease (MONDO:0009290) | Increase of urinary polyhexose glycans (HP:0410345) | Cardiomegaly (HP:0001640) Cognitive impairment (HP:0100543) | |
| Modulatory | Signaling | O-Glycans synthesis (O-Fucosylation) Notch signaling |
| Spondylocostal dysostosis 3 (MONDO:0012349) | Decreased glycosyltransferase O-fucosylpeptide 3-β- | Scoliosis (HP:0002650) Slender finger (HP:0001238) |
| Recognition | Intrinsic | O-glycans synthesis (O-mannosylation) laminin-dystroglycan binding |
| Muscular dystrophy-dystroglycanopathy type A1 (MONDO:0014023) | Hypoglycosylation of alpha-dystroglycan (HP:0030046) | Cataract (HP:0000518) Intellectual disability, severe (HP:0010864) |
| Extrinsic | GBP to pathogen Toll-like receptor signaling Creation of C4 and C2 activators |
| Mannose-binding lectin (MBL) deficiency (MONDO:0013714) | Decreased mannose-binding protein level (HP:0032305) | Recurrent | |
Figure 2Examples of glycophenotypes that can be captured from various laboratory techniques From a patient’s anatomical structures (indicated in orange boxes, e.g. blood), glycans such as free oligosaccharides (F.O.S.) and glycan-related molecules can be analyzed by standard glycomics assays (indicated in green boxes, e.g. GBP or antibody assay). Patients’ glycophenotypes indicated in the blue boxes can be captured from publications: decreased circulating HS level (82), increased urinary polyhexose glycans (36), increased dermatan sulfate in the CSF (83), decreased O-Fucosylpeptide 3-beta-N-acetylglucosaminyltransferase activity (84) and increased Tn antigen in white blood cells (85). In our preliminary work, we have logically defined design patterns (86) that would generate hundreds of classes, but they are not yet fully integrated in the HPO.
Figure 3Improvement of disease diagnostic with molecular glycophenotypes for fucosidosis. Panel A lists 18 phenotypes frequently associated with fucosidosis. The columns in Panels B and C illustrate simulated patients phenotypes profiles composed of a random selection of 10 of these 16 phenotypes. The profiles in Panel C include glycophenotypes (bottom in orange), whereas those in Panel B do not. Panel D shows that when these two groups of 1000 simulated profiles are compared for their diagnostic utility, the profiles that contain glycophenotypes (C) significantly outperform those that do not (B) (Fisher exact P-value = 8.5e-47). Moreover, more specific glycophenotypes are more diagnostically useful than more general ones. This underscores the importance of harmonizing glycophenotypes across data resources as well as collecting them from patients.
Description of the KB and Ontologies Overview of relevant knowledge bases and ontologies based on their contents and glycan related data (glycophenotypes, glycan related diseases, genes, GBP, etc.)
| Resources (names and links) | Domains | Descriptions |
|---|---|---|
|
| ||
| CAZY ( | Glyco-genes | CAZY has curated data from publications on carbohydrate-active enzymes responsible for the synthesis and breakdown of glycoconjugates, oligosaccharides and polysaccharides. It provides classification of these glyco-enzymes based on their activities (glycoside hydrolases, glycosyltransferases, polysaccharide lyases, carbohydrate esterases and auxiliary activities) and glycan-related genes browser in different species |
| CFG ( | Glyco-genes GBP glycans diseases | The CFG has generated and collected publicly available data on GBPs (glycan array), glycan profiles in cells and tissue, phenotypic analyses of transgenic mouse lines with knockout glycan related genes (histology, immunology, hematology and metabolism/behavior) |
| GlyConnect ( | Glyco-genes GBP Glycans Diseases | GlyConnect integrates of information about protein glycosylation for different species based on taxonomy, protein, tissue, composition disease, glycosylation sites, peptides and references |
| GlycoSciences ( | Glycans diseases | GlycoSciences provides experimental information for glycans such as structure, composition, motifs, biophysical experiments on glycans and curation of comprehensive repository of cluster of differentiation (CD) antigens |
| GlyTouCan ( | Glycans | Glytoucan is a free glycan repository that provides unique accession numbers to any glycan independently of experimental information ( |
| JCGGDB ( | Glyco-genes GBP glycans diseases | JCGGDB is an integrative database for glycan information and diseases using different resources. It has compiled information related to glycan-related genes or GlycoGene (enzymes, transporter, etc.) and glycan diseases (e.g. CDG-Ia), pathosis, links to other KBs associated gene descriptions (e.g. PMM2) and a genetic glyco-diseases ontology that provides a hierarchical classification of the diseases |
| KEGG ( | Glyco-genes GBP glycans diseases | KEGG is a KB that includes a module dedicated to glycobiology (KEGG-glycan) in which glycan identifiers, glycan pathways, genes, and links to other glycan databases. It allows for the search of glycan terms (abbreviation and synonyms) and gives composition, identifiers, reaction, pathways, etc. |
| Monarch ( | Glyco-genes GBP glycans diseases | Monarch initiative is a platform that provides analytic tools and web services for cross-species comparison of genotype–phenotype associations, disease modeling and precision medicine using semantically integrated data |
| OMIM ( | Glyco-genes GBP diseases | OMIM is a resource containing information about human genes and genetic disorders. It provides information on genetic diseases and associated phenotypes, including disease names and synonyms, unique, phenotype-gene relationships, descriptions of diseases (diagnosis, pathogenesis), clinical and biochemical features, genetic information as well as animal models |
| PubChem ( | Glyco-genes GBP glycans diseases | Pubchem is an open KB from the NIH for chemical structures, identifiers, chemical and physical properties (biological activities, patents, health, safety, toxicity data, etc.). Data can be queried online or downloaded (JSON, XML, ASN.1 files) |
| Reactome ( | Glyco-genes GBP glycans diseases | Reactome is an open-source and peer-reviewed pathway KB that allows search based on biological terms. Reactome has a repertoire of diseases of glycosylation (related to GAG, |
| UniLectin ( | GBP glycans | UniLectin is an interactive KB that classifies and curates GBP (or lectin) and their ligands |
|
| ||
| CHEBI ( | Glycans | CHEBI is a dictionary for small molecules developed by the European Bioinformatics Institute using sources from KEGG and developed with an ontology framework. It provides an identifier, name, annotation rating, structure, molecular formula, charge, average mass, ontology, etc. |
| GO ( | Glyco-genes GBP glycans | GO consortium is an initiative for the computational representation of genes and their biological functions at the molecular, cellular and histological levels. It provides gene annotations, ontology, mapping and tools such as gene enrichment analysis |
| NCIt ( | Glyco-genes GBP glycans | NCIt is a thesaurus from the National Cancer Institute Enterprise Vocabulary Services. It provides concepts, terminology, therapies related to cancer and related biomedical topics |
Review of KB and Ontologies We reviewed relevant knowledge bases and ontologies based on criteria such as presence of human-machine readable, phenotype algorithms, numbers of glycan related terms, etc. Some KBs are richer than other, nevertheless, none of them cover all the criteria
| Resources | Human and machine-readable formats | Queryable data store | Phenotype algorithms | Standardized terminologies and ontologies | Type of data | Glycans-related terms (glycan, sugar, carbohydrate, glycoproteins, glycolipid, glycosyltransferase and lectin) |
|---|---|---|---|---|---|---|
| CAZY | No | Yes | No | Many | Curated | 333 |
| CFG | No | Yes | No | Many | Raw & curated | >1000 |
| Glyconnect | Yes | Yes | No | Many | Curated | >1000 |
| GlycoSciences | Yes | Yes | No | Many | Curated | >1000 |
| Glytoucan | Yes | Yes | No | Many | Curated | >1000 |
| JCGGDB | Yes | Yes | No | Many | Curated | >1000 |
| KEGG glycan | Yes | Yes | No | Many | Curated | >1000 |
| Monarch | Yes | Yes | Yes | Many | Curated | 54 |
| OMIM | Yes | Yes | No | Many | Curated | 227 |
| Pubchem | Yes | Yes | No | Many | Curated | 252 |
| Reactome | Yes | Yes | Yes | Many | Curated | 352 |
| UniLectin | No | Yes | No | Many | Curated | 50 |
| CHEBI | Yes | Yes | No | Many | Curated | >1000 |
| GO | Yes | Yes | No | Many | Curated | >1000 |
| NCIt | Yes | Yes | No | Many | Curated | 364 |
Figure 4Potential KBs and data sources for the improvement of glycophenotypes representation for HPO and Monarch glycophenotypes related to diseases indicated in publications and KBs could be used to enhance glycan-related knowledge in Monarch.
Figure 5Example of omics integration with ontologies related to glycans: graph representation of the impact of a dysfunctional C1GALT1C1 on health C1GALT1C1 encodes Cosmc, a molecular chaperone for a glycosyltransferase that initiates O-GalNac glycans synthesis (T-synthase) (127). Dysfunctional Cosmc can lead to an improper T-synthase folding, thus abnormal O-glycans with the abnormal glycophenotype: increase of sTn/Tn antigen (SNFG symbols, respectively, yellow square for Tn and a purple diamond/yellow square for sTn) (128). Dysfunctional Cosmc (129) can be due to mutations or epigenetic factors, for instance the hypermethylation of C1GALT1C1’s promoter can lead to increase of sTn/Tn antigen. These two glycophenotypes are also common in many cancers (130). Mouse models have shown that C1GALT1C1 mutation can lead to abnormal O-glycans on platelets, generating bleeding disorders similar to Bernard-Soulier syndrome (MONDO:0009276) (131) Inflammatory bowel disease similar to Crohn’s Colitis (132) and abnormal microbiota (133). In fact, human gut microbiota (HGM) feeds on normal MUC2 glycans (134–136). Hence, the disruption of MUC2 glycosylation due to C1GALT1C1 mutation could potentially lead to microbiota and host physiology issue (137).