| Literature DB >> 33081008 |
Sofya I Scherbinina1,2, Philip V Toukach1.
Abstract
Analysis and systematization of accumulated data on carbohydrate structural diversity is a subject of great interest for structural glycobiology. Despite being a challenging task, development of computational methods for efficient treatment and management of spatial (3D) structural features of carbohydrates breaks new ground in modern glycoscience. This review is dedicated to approaches of chemo- and glyco-informatics towards 3D structural data generation, deposition and processing in regard to carbohydrates and their derivatives. Databases, molecular modeling and experimental data validation services, and structure visualization facilities developed for last five years are reviewed.Entities:
Keywords: PDB glycans; carbohydrate; database; glycoinformatics; model build; molecular modeling; spatial structure; structure validation; structure visualization; web-tool
Mesh:
Substances:
Year: 2020 PMID: 33081008 PMCID: PMC7593929 DOI: 10.3390/ijms21207702
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Typical components of a carbohydrate 3D structure exemplified on sucrose: (a) primary structure (in Symbol Nomenclature for Glycans (SNFG)); (b) superimposed conformational states and Cremer–Pople diagram; (c) conformational space of a two-torsion glycosidic linkage (Ramachandran plot); (d) transitions of glycosidic dihedrals.
Figure 2Networking between glycoinformatics projects and related services that promotes achievement of data integration in glycomics. Reproduced with permission from [29], © 2020 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Carbohydrate databases with 3D structure support.
| Database | Years a | Description b | Data Coverage | Carbohydrate 3D Structures | References |
|---|---|---|---|---|---|
|
| |||||
| Carbohydrate Structure Database (CSDB) | 2005– present |
structures from prokaryotes, plants, and fungi taxonomy diseases bibliography
|
24669 structures 12521 organisms 9353 publications 2096 glycosyltransferase activities 13378 NMR spectra (1H, 13C) |
1327 disaccharide conformational maps 3D atomic coordinate generation | [ |
| Glycosciences.DE | 1997– present |
taxonomy bibliography |
26559 structures 20211 publications 3434 NMR spectra (1H, 13C) |
13599 3D structure models 12098 PDB entries (1880 distinct glycan structures) 2585 conformational maps 3D atomic coordinate generation | [ |
| Glyco3D | 2015– present |
taxonomy bibliography
|
245 monosaccharides 125 disaccharides 314 bio-oligosaccharides 140 polysaccharides 415 GT structures 88 mAb structures 46 GAG structures 1662 lectin structures X-ray data * NMR data * molecular modeling data * |
3035 3D structures * PDB entries * disaccharide conformational maps * 3D atomic coordinate generation | [ |
| PolySac3DB | 2012– present |
polysaccharides taxonomy bibliography
|
157 structures 84 publications X-ray data * NMR data * molecular modeling data * |
157 3D structures PDB entries * conformational maps * | [ |
| EK3D | 2016– present |
bibliography
|
molecular modeling data protein data |
72 3D structures 3D atomic coordinate generation | [ |
| 3DSDSCAR | 2010– present |
sialic acid-containing oligosaccharides aqueous molecular dynamics simulations |
27 structures |
92 3D conformational models | [ |
| MatrixDB | 2011– present |
protein–polysaccharide interactions taxonomy genetic data bibliography
|
58 GAG sequences proteoglycan structures * 1507 experiments 1058 experimentally supported associations 269 publications |
3D structures * PDB entries * 3D-atomic coordinates generation (GAGs) | [ |
| EPS-DB | 2017– present |
bacterial exopolysaccharides functional properties genetic data taxonomy bibliography
|
105 structures |
85 3D structure models 3D-atomic coordinates generation | [ |
| GlyMDB | 2020– present |
glycan microarrays |
5203 glycan microarray samples |
1965 3D structures (PDB entries) 771 3D structures with glycan ligands (PDB entries) | [ |
| CFG Glycan Structures Database | 2006– present |
mammalian glycan arrays taxonomy biological sources diseases bibliography |
N-glycans * O-glycans * |
3D-atomic coordinates generation | [ |
|
| |||||
| GlycoNAVI Tcarp | 2020– present |
diseases genetic data taxonomy bibliography |
2723 unique analyzed glycans 5814 glycoproteins 712 lectins |
3D structures * 15003 PDB entries 3D atomic coordinate generation | [ |
| GlyCosmos | 2017– present |
diseases genetic data taxonomy | 109854 glycans |
3D structures (PDB and UniProtKB entries) * | [ |
| SugarBind | 2010– present |
adherence to pathogens taxonomy diseases bibliography
|
739 lectins 204 glycan ligands 567 pathogenic agents 1266 bindings 183 publications |
3D lectin structures (PDB entries) * | [ |
| GlyConnect | 2019– present |
protein glycosylation taxonomy biological sources diseases bibliography
|
2662 glycoproteins 3609 glycans 246 organisms 5675 sites 913 publications |
3D glycoprotein structures (PDB entries) * | [ |
| ProGlycProt | 2012– present |
prokaryotes taxonomy bibliography homology models *
|
crystal structures 61 glycoproteins 62 glycosyltransferases 38 enzymes/proteins involved in protein glycosylation 518 publications |
3D structures (PDB entries) * 3D homology models (UniProtKB entries) * | [ |
| ProCarbDB | 2020– present |
protein-carbohydrate complexes taxonomy bibliography binding affinities
|
5254 complexes 867 ligand monomers X-ray data |
5254 3D structures (PDB entries) | [ |
| Procaff | 2019– present |
protein-carbohydrate complexes taxonomy bibliography |
3122 entries 228 publications 125 organisms 354 proteins 835 carbohydrates thermodynamic data |
335 3D structures (PDB entries) | [ |
| GBSDB | 2020– present |
protein-carbohydrate complexes
|
6402 carbohydrate-containing PDB structures 12075 binding sites |
6402 3D structures (PDB entries) | [ |
| PROCARB | 2010– present |
protein-carbohydrate complexes |
604 complexes 48 modeled glycoproteins 100 unique carbohydrate ligands |
604 complexes 3D structures (PDB entries) 26 N-linked 3D homology models 22 O-linked 3D homology models | [ |
| UniLectin3D | 2019– present |
lectins taxonomy bibliography
|
2207 structures (1401 interacting with glycan) 535 distinct lectin sequences 228 distinct glycans 896 publications X-ray data |
3D structures (PDB entries) * | [ |
| Lectin Frontier | 2015– present |
lectins taxonomy bibliography |
398 structures binding affinities |
3D structures (PDB entries) * | [ |
| LectinDB | 2006– present |
lectins taxonomy (all domains, incl. viruses) bibliography
|
789 organisms 821 PDB entries |
PDB entries * | [ |
| GlycoEpitope | 2006– present |
epitopes taxonomy diseases functions receptors bibliography
|
178 epitopes 624 antibodies |
PDB entries (epitopes) * | [ |
| GlycoCD | 2012– present |
glycan CD antigens bibliography
|
19 glycan CDs 44 CRD-CDs |
PDB entries * | [ |
| SACS | 2002– present |
antibodies automatically-updated |
3994 entries crystal/EM structure data |
PDB entries * | [ |
| SabDab | 2014– present |
antibodies automatically-updated taxonomy binding affinities
|
4223 entries 111 carbohydrate-containing antigen types experimental data |
111 3D structures (PDB entries) | [ |
| CAZy | 1998– present |
carbohydrate-active enzymes and carbohydrate-binding modules taxonomy genetic data bibliography
|
CAZy structures * CAZy activities * |
7500 c 3D structures bearing glycan-containing ligand or a glycan analog revealing enzyme-glycan interactions (PDB entries) | [ |
| dbPTM | 2006– present |
protein post-translational modifications taxonomy diseases genetic data bibliography
|
32 C-linked glycosylations 3289 N-linked glycosylations 1860 O-linked glycosylations 6 S-linked glycosylations |
3D structures (UniProtKB entries) * | [ |
| SWISS-MODEL Repository | 2004– present |
3D protein homology models taxonomy regularly updated |
glycoprotein structures * 1698194 models from SWISS-MODEL for UniProtKB 158670 structures from PDB with mapping to UniProtKB |
3D structures (PDB and UniProtKB entries) * | [ |
|
| |||||
| GlycoMaps DB | 2004– present |
di- to pentasaccharides |
|
conformational maps for 2585 glycosidic linkages | [ |
| GFDB | 2013– present |
glycosidic torsion angles clustering analysis |
1754 c unique glycan sequences in PDB 9055 c unique fragments with chemical modifications 127202 c fragment structures |
PDB entries * 3D-atomic coordinates generation | [ |
| GLYCAM-Web | 2013– present |
mammalian glycans |
pre-built libraries of predicted 3D structures of common bioglycans |
3D structure models * 3D-atomic coordinates generation | ( |
a Where unknown, the year of the first publication is given. b Database is marked as curated if manual verification of data was reported in the original publication or at the database web site. c Published coverage data can be outdated; database interface provides no statistics on current coverage. * Database provides no search facilities for indicated carbohydrate 3D structural data.
Figure 3NMR-validated conformational analysis of high-mannose oligosaccharide GM9 based on replica-exchange molecular dynamics (REMD) simulation results. (a) Superimposition of 260 GM9 conformers extracted from REMD trajectory (black—GlcNAc, green—Man, blue—Glc). (b) primary structure of the GM9 oligosaccharide (SNFG representation). (c) REMD density maps for φ-ψ torsions of GM9 branch (Glc1Man3). Red dots locate glycosidic torsion angles derived from crystallographic data of Glc1Man3 tetrasaccharide ligand complexed with the lectin domain of calreticulin (PDB ID: 3O0W). Panels (a) and (c) were reproduced with permission from [149], © 2020 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Figure 4Citations of dedicated force fields in carbohydrate studies for the recent five years, according to Scopus. Outer circle shows total citations (number of citing publications) of force fields in 2015–2020. Inner circle shows citations in articles filtered by a carbohydrate topic. See detailed data, references to original publications, absolute values, and carbohydrate filer details in Supplementary Table S2.
Figure 5Digest of the most commonly used carbohydrate force fields with parameterization protocol comparison. Reproduced with permission from [138], © 2020 Elsevier Inc.
Informatics tools for carbohydrate and glycoprotein modeling, 3D structure prediction and analysis.
| Tool | Description | Type a | Reference |
|---|---|---|---|
|
| |||
| CHARMM-GUI Glycan Modeler | Web-service | [ | |
| CHARMM-GUI Glycolipid/LPS Modeler | Glycolipid and lipoglycan structure modeling | Web-service | [ |
| Glycosylator | Rapid modeling of glycans and glycoproteins (including glycosylation) based on CHARMM force field | Python framework | [ |
| RosettaCarbohydrate | Modeling a wide variety of saccharide and glycoconjugate structures (including loop modeling, glyco-ligand docking and glycosylation) | Python framework | [ |
| Azahar | Monte Carlo conformational search and trajectory analysis of glycans | Python framework; PyMol plugin | [ |
| Shape | Carbohydrate-dedicated fully automated MM3-based conformation simulation | Standalone software | [ |
| Glydict | MM3-based N-glycan structure prediction based on MD simulations | Web-service | [ |
| GLYGAL | MM3-based conformational analysis of oligosaccharides | Standalone software | [ |
| Fast Sugar Structure Prediction Software (FSPS) | Automatic structure prediction tool for oligo- and polysaccharides in solution | Standalone software | [ |
|
| |||
| GLYCAM-Web Glycoprotein Builder | Attaching a glycan (user input) to a protein (PDB file) | Web-service | ( |
| GlyProt | Web-service | [ | |
| Phenix CarboLoad | Loading a carbohydrate structure into protein model and PDB file generation | Python framework | [ |
| GLYCAM-Web GlySpec (Grafting) | Prediction of glycan specificity by integrating glycan array screening data and 3D structure | Web-service | [ |
|
| |||
| CHARMM-GUI Membrane Builder | Building complex glycolipid-/LPS-/LOS-containing biological membrane systems | Web-service | [ |
| GNOMM (gram-negative outer membrane modeler) | Automated building of lipopolysaccharide-rich bacterial outer membranes (3D model preparation for MD simulations in GROMACS) | Standalone software | [ |
| Micelle Maker | Micelle building based on broad range of starting lipids and glycolipids (3D model preparation using AMBER software package and GLYCAM library) | Web-service | [ |
|
| |||
| Cheminformatics Tool for Probabilistic Identification of Carbohydrates (CTPIC) | Identification of small saccharides and their derivatives (input in SDF or MOL format) | Web-service | [ |
| Sails | Automated identification of linked sugars | Python framework | ( |
| GlyFinder | Locating relevant carbohydrate-containing structures in Protein Data Bank | Part of web-service pipeline | [ |
| pdb2linucs | Extraction of carbohydrate data from a PDB file | Web-tool | [ |
| GLYCAM-Web PDB-preprocessor | Processing of PDB files with (glyco-)proteins for AMBER-style output | Web-service | ( |
| Sugar identification program | Identifying the residue names of carbohydrates in a PDB file | Standalone software | ( |
| Glycan Reader | Automated sugar identification and simulation preparation for carbohydrates and glycoproteins in PDB files | Web-service | [ |
|
| |||
| doGlycans | Preparing carbohydrate structures (including polysaccharides, glycolipids and glycoproteins) for GROMACS atomistic simulations | Python framework | [ |
| GLYCAM-Web Carbohydrate builder | 3D structure prediction of carbohydrates and related macromolecules using GLYCAM06 force field and MD in AMBER (successor of GLYCAM Biomolecule Builder ( | Web-service | [ |
| SWEET-II | Rapid 3D model construction of oligo- and polysaccharides with MM3 optimization | Web-service | [ |
| REStLESS API | 3D structure generation of carbohydrates and derivatives from CSDB Linear notation with MMFF94 optimization (including aglycone moiety) | Web-service | [ |
|
| |||
| POLYS | 3D structure generation of poly- and complex oligosaccharides from MM2-precalculated glycosidic linkage torsions and energy minimization | Web-service | [ |
| CarbBuilder | Building of 3D structures of polysaccharides in CHARMM force field from pre-calculated glycosidic linkage torsions | Standalone software | [ |
| GAG-builder | Translating of GAG sequences into 3D models based on POLYS glycan builder | Web-service | [ |
| GLYCAM-Web GAG Builder | Modeling of GAG 3D structure in GLYCAM06 force field using AMBER MD package | Web-service | [ |
|
| |||
| BALLDock/SLICK | Protein-carbohydrate complex docking software | Standalone software, a module in docking software | [ |
| HADDOCK | Modeling of biomolecular complexes with support of glycosylated proteins | Web-service | [ |
| Vina-Carb | CHI-energy functions implemented in AutoDock Vina software | Standalone software | [ |
| GLYCAM-Web Antibody docking | Docking of an antibody (from a PDB file) to a glycan antigen (from a library or user input) | Web- service | ( |
| Cluspro | Sulfated GAG docking (as one of options) | Web-service | [ |
| GAGDock (DarwinDock) | Modification of DarwinDock method for sulfated glycosaminoglycans | Algorithm | [ |
| GlycoTorch Vina | Docking of sulfated glycosaminoglycans based on Vina-Carb | Standalone software | [ |
|
| |||
| Conformational Analysis Tool (CAT) | Analysis of carbohydrate molecular trajectory data derived from MD simulations | Standalone software | [ |
| Best-fit, Four-Membered Plane (BFMP) | Analysis of conformational data from crystal structures and MD simulations of carbohydrates | Standalone software | [ |
| Distance Mapping | Estimation of nuclear Overhauser effects in disaccharides | Web-tool | ( |
| MD2NOE | Calculation of Nuclear Overhauser effect build-up curves from long MD trajectories | Standalone software | [ |
| GS-align | Glycan structure alignment and similarity calculation | Standalone software | [ |
| GlyTorsion | Analysis of torsion angles in carbohydrates from Protein Data Bank | Web-tool | [ |
| GlyVicinity | Analysis of amino acids in the vicinity of carbohydrate residues derived from Protein Data Bank | Web-tool | [ |
a Web-service implies an automated pipeline for running a specific software (e.g., molecular modeling, structure building, carbohydrate coordinate extraction, format conversion). It results in 3D structural data output starting from primary structure input or atomic coordinate file upload. Web-tool is employed for 3D structural data processing and analysis without 3D structural data output; it is a simpler application designed primarily for statistics and visualization. Other types are self-explanatory.
Figure 6Interplay of the instrumental and computational methods in the 3D structure determination of carbohydrates, proteins, and protein–glycoconjugate complexes. Reproduced from [285] © 2020 The authors. Published by Wiley-VCH Verlag GmbH & Co. KGaA.
Tools for structural validation of carbohydrates.
| Tool | Description | Type a | Reference |
|---|---|---|---|
| CNS | Macromolecular structure determination and refinement (including carbohydrates and glycoproteins) based on X-ray and NMR data | Standalone software | [ |
| pdb-care | Identification and assigning carbohydrate structures using atom types and coordinates from PDB files | Web-tool | [ |
| CARP | Glycoprotein 3D quality evaluation based on the analysis of glycosidic torsion angles from PDB | Web-tool | [ |
| GlyProbity | Accuracy and internal consistency check of carbohydrate 3D structures | Part of web-service pipeline | [ |
| PDB2Glycan | 3D structure analysis and validation of glycoprotein PDB entries | Part of web-service pipeline | [ |
| PDB-REDO | Glycoprotein structure model improvement and validation | Web-service; standalone software | [ |
| Coot | Refinement and validation of glycoprotein 3D structure from cryoEM and X-ray crystallography data | Standalone software | [ |
| Rosetta Carbohydrate | Refinement of glycoprotein 3D structure from cryoEM and X-ray crystallography data, based on correction of conformational and configurational errors in carbohydrates | Python framework | [ |
| Privateer | Automated validation of carbohydrate conformation data based on 3D structure analysis | Standalone software | [ |
| Phenix | Determination, refinement and validation of macromolecular structure (including carbohydrates and glycoproteins) from cryoEM, X-ray diffraction and neutron diffraction crystallography data | Standalone software | [ |
| Motive Validator | Automatic custom residue validation in biomolecules, including carbohydrates | Web-service | [ |
| ValidatorDB | Pre-computed validation results of ligands and non-standard residues in PDB (including carbohydrates) | Web-service | [ |
a See footnote a to Table 2.
Figure 7X-ray diffraction data refinement of N-glycan moiety from PDB ID 2Z62. 2mFo–DFc electron density map contoured at 1σ is displayed in grey; positive and negative mFo–DFc difference electron density maps contoured at 3σ are displayed in green and red, respectively. (a) Original glycan structure model from the PDB entry. (b) PDB-REDO model with properly renamed fucose residue and improved fit to the electron density. (c) Manually rebuilt model based on PDB-REDO results. (d) CARP distribution plot for glycosidic φ-ψ torsions of FUC(1-6)NAG (from panel (a)) in PDB. Characteristic points: R, model refined with PDB-REDO; P, original PDB model; M, manually rebuilt model. Reproduced from [295], © 2020 The authors. Published by John Wiley & Sons, Inc.
Figure 8M. catarrhalis lgt2Δ structure validation based on NOE data analysis. (a) Characteristic proton-proton contacts; (b) NOE-filtered (blue boxes) sampling of proton-proton distances from MD simulation (grey shades). Reproduced from [314], © 2020 The authors. Licensee MDPI, Basel, Switzerland.
Figure 9Distribution of D- (shown in blue) and L-pyranoside (shown in yellow) ring conformations as function of resolution for all sugar moieties in N-glycosylated proteins in PDB (on April 2019) solved with (a) X-ray crystallography and (b) electron cryo-microscopy. Non-chair conformations are bordered by dotted line boxes for 0.0-6.0 Å (green) and 6.0-10.0 Å (red) resolution ranges; the percentage of structures is given in the boxes. Reproduced with permission from [301], © 2020 Elsevier Ltd.
Figure 10Deposition statistics of carbohydrate-containing structures in Protein Data Bank based on carbohydrate remediated list data. Data for 2020 cover seven of twelve months. See detailed data in Supplementary Tables S3–S4.
Figure 11Glycan structure colored according to SNFG, or superimposed with 3D SNFG, as implemented in SweetUnityMol (a), GLYCOSCIENCES.de (via JSmol) (b), and CSDB (via JSmol) (c,d), see text. Panel (a) was reproduced with permission from [372], © Springer Japan 2017.
Figure 12Glycan structure colored according to SNFG, or superimposed with 3D SNFG, as implemented in 3D-SNFG (a), LiteMol (b), Mol* (c); monosaccharide presentation in Glycoblocks (d). Panel (a) was reproduced with permission from [366], © 2020, Oxford University Press. Panel (d) was reproduced from [369], © 2020 The authors. Published by John Wiley & Sons, Inc.