| Literature DB >> 34192308 |
Luc Thomès1, Rebekka Burkholz2, Daniel Bojar1.
Abstract
While glycans are crucial for biological processes, existing analysis modalities make it difficult for researchers with limited computational background to include these diverse carbohydrates into workflows. Here, we present glycowork, an open-source Python package designed for glycan-related data science and machine learning by end users. Glycowork includes functions to, for instance, automatically annotate glycan motifs and analyze their distributions via heatmaps and statistical enrichment. We also provide visualization methods, routines to interact with stored databases, trained machine learning models and learned glycan representations. We envision that glycowork can extract further insights from glycan datasets and demonstrate this with workflows that analyze glycan motifs in various biological contexts. Glycowork can be freely accessed at https://github.com/BojarLab/glycowork/.Entities:
Keywords: Python; data science; glycobioinformatics; glycobiologymachine learning
Mesh:
Substances:
Year: 2021 PMID: 34192308 PMCID: PMC8600276 DOI: 10.1093/glycob/cwab067
Source DB: PubMed Journal: Glycobiology ISSN: 0959-6658 Impact factor: 4.313
Fig. 1Structure of the glycowork package. (A) Modular structure of glycowork. Modules are depicted as boxes containing submodules. Dependencies between modules are indicated by connecting lines. (B) Workflow of the glycan_to_nxGraph function from the glycowork.motif.graph submodule. An example glycan in IUPAC-condensed notation is converted into a graph. The resulting edge, node and position lists are shown, with “T” indicating a terminal position and “I” indicating an internal position. (C) Workflow of the annotate_glycan function from the glycowork.motif.annotate submodule. Glycan graphs and graphs for known motifs are used to identify occurring motifs via subgraph isomorphism tests.
Fig. 2Example workflows from glycowork. (A) Investigating N-linked glycans in animals. The plot_embeddings function displays N-linked glycans, with colors corresponding to taxonomic phyla. The make_heatmap function displays glycan motif distributions for each phylum. (B) Analysis of rhamnose sequence neighborhood in bacteria. Bacterial glycans are colored based on the presence of rhamnose (Rha). Proportions of rhamnose and its variants (left) and their observed neighboring monosaccharides (right), as stacked bar graphs, are visualized via the characterize_monosaccharide function. (C) Glycan-binding specificities of influenza viruses. Measured glycan-binding of various influenza strains is represented as a heatmap. The get_pvals_motifs function displays, for each motif, a P-value and a corrected P-value. Shown are the top 10 motifs, with the full table available in Table SI. (D) Glycan classification using machine learning. The train_ml_model function constructs a model to discriminate between “animal” and “non-animal” glycans. The analyze_ml_model function displays important criteria for glycan classification. Full-scale heatmaps shown in A and C are found in Figures S1 and S2.