Literature DB >> 28906114

doGlycans-Tools for Preparing Carbohydrate Structures for Atomistic Simulations of Glycoproteins, Glycolipids, and Carbohydrate Polymers for GROMACS.

Reinis Danne1, Chetan Poojari1,2, Hector Martinez-Seara1,3, Sami Rissanen1, Fabio Lolicato1,2, Tomasz Róg1,2, Ilpo Vattulainen1,2,4.   

Abstract

Carbohydrates constitute a structurally and functionally diverse group of biological molecules and macromolecules. In cells they are involved in, e.g., energy storage, signaling, and cell-cell recognition. All of these phenomena take place in atomistic scales, thus atomistic simulation would be the method of choice to explore how carbohydrates function. However, the progress in the field is limited by the lack of appropriate tools for preparing carbohydrate structures and related topology files for the simulation models. Here we present tools that fill this gap. Applications where the tools discussed in this paper are particularly useful include, among others, the preparation of structures for glycolipids, nanocellulose, and glycans linked to glycoproteins. The molecular structures and simulation files generated by the tools are compatible with GROMACS.

Entities:  

Mesh:

Substances:

Year:  2017        PMID: 28906114      PMCID: PMC5662928          DOI: 10.1021/acs.jcim.7b00237

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


Introduction

Carbohydrates constitute a large and diverse class of chemical compounds involved in all aspects of life including energy storage and conversion, cellular signaling, and recognition. The function of carbohydrates is often based on highly specific interactions and seemingly tiny but extremely important structural details. Given this, atomistic simulations would be an exceptionally useful technique to explore such specific features. However, the progress in the field of computer simulations to address questions related to carbohydrates has been slow, and the reason is quite simple: the lack of accurate force fields. The force fields commonly used in biomolecular simulations, such as CHARMM36 or OPLS,[1,2] have been optimized for proteins, and this situation has lasted for decades. Recently, lipids and nucleic acids have also been given the attention that they deserve, resulting in high-quality force fields for these molecule types.[3−9] Meanwhile, the development of force fields for carbohydrates has not been given the same weight. For instance, carbohydrate parametrization in the OPLS force field is based on a single glucose unit only,[10] which demonstrates how substandard the situation has been. Another example is the GROMOS force field, which has an extension for hexopyranoses only.[11,12] Considerable improvement has taken place only through the recent development of GLYCAM,[13] a force field based on AMBER, which includes parameters for describing a quite large set of carbohydrate types. Thanks to the emerging progress in improving the quality of carbohydrate force fields, the next task is to develop practical tools for the design and preparation of carbohydrate structures. The first thing the tool should do is to define the molecule in terms of its chemical structure, thus constructing the molecular topology. This is not an issue with proteins and nucleic acids, for instance, since practically all packages for molecular dynamics (MD) simulations have tools for automatic preparation of topologies for these molecules. However, for carbohydrates found in, e.g., glycoproteins, the present state-of-the-art is less advanced. The only tools currently available for generating the molecular topology are GLYCAMWeb(14) and the Glycan Reader part of CHARMM-GUI.[15,16] GLYCAM Web generates a topology file for glycoprotein simulations, but the files can only be used for MD simulations in the AMBER program,[17] and converting the topology file for other MD simulation packages is not a simple feat. Meanwhile, the Glycan Reader tool (used with CHARMM) requires the input protein coordinate file to include carbohydrates for generating the molecular topology file. This is a severe limitation, since one of the main problems in glycolipid and glycoprotein modeling is the lack of three-dimensional (3D) structures of carbohydrate branches, which due to considerable thermal fluctuations are not resolved from crystal structures or they are removed from the protein prior to crystallization. The same issue concerns highly flexible protein loops. However, while for protein loops there are numerous tools allowing one to put them back to the protein structure, there is no analogical software for carbohydrate branches of glycoproteins, with one exception. The GlyProt tool enables addition of N-glycans to proteins; however, it lacks the feature of adding O-glycans, and it does not generate the molecular topology file for MD simulations.[18] There are also other challenges. In particular, while there are several tools available for constructing and predicting 3D structures of glycans, such as SWEET-II,[19]Glydict,[20]POLYS,[21] and Shape,[22] none of these tools work in conjunction with proteins or generate the topology files required for MD simulations. In the context of glycolipids, the only tool available is the Glycolipid Modeler part of CHARMM-GUI.[15] However, this tool can use only a set of predefined glycolipids and glycan sequences. Possibilities to build glycolipids of ones own interest are therefore limited. In this article, we present new Open Babel[23] and python-based tools for preparation of carbohydrate structures that can be simulated with the GROMACS software package.[24] The tools generate topologies of carbohydrates in the GROMACS format and then prepare 3D structures of carbohydrates covalently linked to the given lipids and proteins. The tools allow one to construct both linear and branched carbohydrates based on a user-defined glycan sequence. They have already been tested and validated in a number of case studies (see Figure ).
Figure 1

Snapshots of simulations of different carbohydrate systems that have been built with the present tool set, used as examples in this article. (a) Crystalline cellulose fragment (Reprinted with permission from ref (25). Copyright 2011 American Chemical Society.); (b) CD59 protein showing N- and O-glycan branches rendered as blue and red licorice, respectively;[26] (c) EGFR monomer showing N-linked glycans rendered as blue licorice;[27] and (d) structure of the GM1 glycolipid together with a snapshot of a lipid bilayer with 5 mol % of GM1 (Reprinted with permission from ref (28). Copyright 2011 Nature Publishing Group).

Snapshots of simulations of different carbohydrate systems that have been built with the present tool set, used as examples in this article. (a) Crystalline cellulose fragment (Reprinted with permission from ref (25). Copyright 2011 American Chemical Society.); (b) CD59 protein showing N- and O-glycan branches rendered as blue and red licorice, respectively;[26] (c) EGFR monomer showing N-linked glycans rendered as blue licorice;[27] and (d) structure of the GM1 glycolipid together with a snapshot of a lipid bilayer with 5 mol % of GM1 (Reprinted with permission from ref (28). Copyright 2011 Nature Publishing Group).

Software Functionalities

Here we present three illustrative examples of the main functionalities of the novel software package that we introduce in this work. We first discuss how the tools can be used to build elongated carbohydrate polymers. In this context, the molecule of interest is cellulose that is highly abundant in plant cells with a lot of potential applications. In the second example, we discuss how to construct carbohydrate branches of glycoproteins. Given that glycosylation is very common among the proteins found, e.g., in plasma membranes, and given that this topic has not been explored much until now, there is reason to assume that it will be paid considerable attention in future simulation studies. In the third and final example, we show how the tools can be employed to build carbohydrate units found in glycolipids. This topic is also one of the exceptionally important ones given that glycolipids modulate quite a few cellular functions. In all the three examples, we show every step needed to prepare a model for a given system. Below, for each of the topics discussed in this paper, we first briefly outline the biological relevance to study these molecules or their molecular complexes. Then, we show concretely how the application prepreader.py (using only lowercase font in the command name) can be used to prepare carbohydrate chains for polymer simulations, and the tool doglycans.py (again using only lowercase) to prepare models for glycoproteins and glycolipids. Together, these constitute the doGlycans tool set.

Carbohydrate Polymers

Long carbohydrate polymers play an important role in the functions of animals, plants, and microorganisms. Key examples in this context are cellulose and hemicellulose that are the main components of rigid plant cell walls. Cellulose is composed of long unbranched glucose chains having up to thousands of monomers, while hemicellulose chains that include small numbers of other hexose units may be moderately branched. Cellulose is the most common organic polymer on earth and an important means for storing glucose. It therefore has a lot of relevance in ruminant animals feeding, and it is also a potential source of fuel and renewable energy. Cellulose has numerous applications in paper and textile industries, pharmaceutical industry, and food processing,[29] and it is also a promising material in nanotechnology.[30,31] In our previous atomistic MD simulation studies of cellulose nanofibers, where we used the present tool to build the carbohydrate units, we identified the cause of cellulose twisting,[25] elucidated the role of amorphous cellulose regions in the elastic properties of cellulose nanofibers,[32] and showed how cellulose enzymes interact with differently ordered regions of cellulose nanofibers.[33] To set up related cellulose simulations, all one has to do is to prepare a sequence file with the required number of d-glucose units and then run the prepreader.py script. Detailed instructions are given in section 3 in the doGlycans Manual. The procedure to do it is shown schematically in Figure .
Figure 2

Functionality of the doGlycans tool set described in a schematic manner for glycoproteins, glycolipids, and carbohydrate polymers that are the main application targets. Part of the diagram shown in pink refers to the use of the CHARMM force field, which required the use of CHARMM GUI. (bottom-left inset) Cellulose Cel5A complex with cellulose nanofiber (Reprinted with permission from ref (33). Copyright 2015 Springer.).

Functionality of the doGlycans tool set described in a schematic manner for glycoproteins, glycolipids, and carbohydrate polymers that are the main application targets. Part of the diagram shown in pink refers to the use of the CHARMM force field, which required the use of CHARMM GUI. (bottom-left inset) Cellulose Cel5A complex with cellulose nanofiber (Reprinted with permission from ref (33). Copyright 2015 Springer.).

Glycosylation of Proteins

Glycosylation is a complex cotranslational and posttranslational process that takes place in all species in which glycans (oligosaccharides) are covalently attached to proteins.[34] A majority of proteins (>50%) secreted by eukaryotic cells undergo glycosylation, and in general, all cells in nature are coated with distinct glycans that are biologically important for cell function.[35] There are five different classes of glycosylation, namely (a) N-linked glycosylation, (b) O-linked glycosylation, (c) phospho-serine glycosylation, (d) C-mannosylation, and (e) glypiation (formation of glycosylphosphatidylinositol (GPI) anchors).[36] Each glycan added to a protein is involved in a wide range of functions discussed elsewhere.[37−41] Here we focus on glycosylation in the context of signaling. Membrane receptors governing many signaling processes have been studied quite extensively through atomistic simulations, yet the role of glycosylation in receptor function has been paid exceptionally little attention through simulations. This is largely due to the lack of structural information about the carbohydrate chains. Most of the glycoprotein structures that have been determined are based on extensively manipulated proteins, where in particular the carbohydrate chains are often truncated from the studied receptor before its 3D structure is being determined (see, e.g., ref (39)), since the soft carbohydrates often hinder or even block crystallization. Given this issue, there is considerable demand for a tool that would prepare carbohydrate structures to be attached to the glycosylation sites of glycoproteins. The present tool renders this possible and thereby fosters means to bridge simulations closer to systems explored in experiments. In our recent studies on epidermal growth factor receptor (EGFR), we used the tool to elucidate the role of N-linked glycans in the structural arrangement and interactions of EGFR with a lipid membrane.[27] We glycosylated EGFR with Man3GlcNAc2, a minimal N-linked glycan core which is independent of cell-type and is essential for protein folding in the endoplasmic reticulum.[42,43] We found that the presence of the Man3GlcNAc2 glycan on the EGFR ectodomain significantly altered the receptor subdomains alignment on the membrane surface by lifting the subdomains from the membrane surface. Glycosylation-induced elevation of EGFR from the membrane surface therefore exposes the extracellular subdomains for ligand binding, which is a necessary step in subsequent EGFR dimerization and signal transduction processes. In section 4 (see the Manual), we show how doglycans.py can be used to glycosylate EGFR by adding N-linked glycans to the receptor. The description demonstrates the glycosylation in three steps (see Figure ), where one first processes the protein data bank (PDB) file, then prepares the sequence file, and finally runs the script for glycosylation. The script can also be used to covalently link O-linked glycans. As a test case we demonstrate (see the SI) the linking of both N- and O-glycans to the human CD59 glycoprotein, which is a small GPI-anchored glycoprotein, whose function is to protect host cells by interfering with membrane attack complex (MAC) structure formation.[44]

Glycolipids

Glycolipids constitute a large and important class of lipids. However, they are poorly characterized from the biophysical point of view. MD simulations are not an exception here as the number of simulation studies of glycolipids is very limited.[45,46] The three main classes of glycolipids are glycerol-based galactolipids typical for plants, ceramide-based glycolipids typical for animals, and lipopolysaccharides (LPSs) typical for Gram-negative bacteria. Galactolipids are the main component of photosynthesizing organelles thylakoids, where their content is as large as 75 mol %. Effectively 50% of all lipids in plant cells are galactolipids. Ceramide-based glycolipids in animals are present in small concentrations (of a few mole percent) and are located predominately in the extracellular leaflet of cell membranes. Their functions are mostly related to cell signaling as they act as receptors on the cell surface. For instance, ganglioside GM1 known to be a receptor for bacterial toxins such as cholera toxin regulates the function of membrane proteins such as EGFR.[47,48] Gangliosides are known to take part in cell–cell recognition and to play an important role particularly in the development of the central nervous system.[49] Meanwhile, LPS is a complex molecule with three functional parts: lipid A, a carbohydrate core, and O antigen. Lipid A is based on a phosphorylated glucosamine disaccharide with 6–8 hydrocarbon chains attached, while the carbohydrate core is attached to lipid A and is conserved within the species. O antigen is the most variable part of the molecule and is characteristic for a given strain of bacteria, being therefore of high relevance in medical diagnostics. LPS and its fragments are recognized by the innate immunological system, thus inducing a strong immunological response. To facilitate the preparation of related glycolipid structures (see Figure ), in the software package we provide previously constructed topologies and structure files for the most relevant glycolipid (GM1) as well as examples of commands and sequence files needed to prepare them. Predefined lipid bases are also provided. Instructions to this end are given in section 5 in the doGlycans Manual.

Force Field and Software Limitations

Although the doGlycans tool set is highly flexible, it has certain limitations. Perhaps most importantly, only the sugars units defined in GLYCAM can be used. For instance, in the current version of GLYCAM, common bacterial sugars present in LPS such as KDO and amino sugars are still missing. Moving on, the topologies generated by doGlycans for glycosylated proteins are currently compatible only with the OPLS and AMBER force fields, the topologies generated for glycolipids are compatible with the OPLS force field, and the topology for carbohydrate polymers is based on the GLYCAM force field. There is however an extension to AMBER: to generate glycolipids topology and structure files for simulations with the AMBER force field, the only input that the user has to provide to the doGlycans tool set is the ceramide topology. The carbohydrate polymer topology for simulations with OPLS can be generated based on a structure file built by doGlycans, using for example the MKTOP script.[50] Structure files generated by doGlycans for glycoproteins, glycolipids, and carbohydrate polymers can be used in combination with CHARMM GUI to generate topologies for simulations with the CHARMM force field. This is, however, limited to the components defined in CHARMM GUI.

Conclusions

In spite of the exceptional variety of carbohydrates and their functions, the progress in atomistic simulation studies of carbohydrates has been slowed down due to practical limitations. One of the key issues has been the limited availability of user-friendly tools to generate the structures and topology files needed in atomistic MD simulations of carbohydrate polymers, glycoproteins, and glycolipids. To overcome these problems, we developed tools for atomistic MD simulations of carbohydrates and carbohydrate conjugates (glycoproteins and glycolipids). These tools allow their users to easily build and simulate carbohydrates with varying complexity. The structures and simulation files generated by the tools are compatible with the GROMACS package. The functionalities of the tools include the preparation of 3D structures of carbohydrates and the generation of topologies that are consistent with the GROMACS format. The most important function of the tools is the preparation of glycoproteins, where the glycans missing in protein crystal structures are added to the protein in question. We extensively tested, validated, and applied the tools in our previous simulation studies.[27] Given this, the doglycans.py application (together with prepreader.py) will help scientists to foster their work in exploring the rich variety of carbohydrate functions through atomistic MD simulations. Given that ∼50% of all eukaryotic proteins are glycosylated, the tool may have a quite considerable impact in strengthening the progress in the field.
  39 in total

Review 1.  Protein glycosylation: nature, distribution, enzymatic formation, and disease implications of glycopeptide bonds.

Authors:  Robert G Spiro
Journal:  Glycobiology       Date:  2002-04       Impact factor: 4.313

Review 2.  Structure and function of the mannose 6-phosphate/insulinlike growth factor II receptors.

Authors:  S Kornfeld
Journal:  Annu Rev Biochem       Date:  1992       Impact factor: 23.643

3.  CHARMM-GUI: a web-based graphical user interface for CHARMM.

Authors:  Sunhwan Jo; Taehoon Kim; Vidyashankara G Iyer; Wonpil Im
Journal:  J Comput Chem       Date:  2008-08       Impact factor: 3.376

Review 4.  Glycosphingolipid functions.

Authors:  Clifford A Lingwood
Journal:  Cold Spring Harb Perspect Biol       Date:  2011-07-01       Impact factor: 10.005

Review 5.  The role of complement regulatory proteins (CD55 and CD59) in the pathogenesis of autoimmune hemocytopenias.

Authors:  Alejandro Ruiz-Argüelles; Luis Llorente
Journal:  Autoimmun Rev       Date:  2006-10-16       Impact factor: 9.754

6.  Optimization of the additive CHARMM all-atom protein force field targeting improved sampling of the backbone φ, ψ and side-chain χ(1) and χ(2) dihedral angles.

Authors:  Robert B Best; Xiao Zhu; Jihyun Shim; Pedro E M Lopes; Jeetain Mittal; Michael Feig; Alexander D Mackerell
Journal:  J Chem Theory Comput       Date:  2012-07-18       Impact factor: 6.006

7.  Rapid generation of a representative ensemble of N-glycan conformations.

Authors:  Martin Frank; Andreas Bohne-Lang; Thomas Wetter; Claus-W von der Lieth
Journal:  In Silico Biol       Date:  2002

8.  GROMOS 53A6GLYC, an Improved GROMOS Force Field for Hexopyranose-Based Carbohydrates.

Authors:  Laercio Pol-Fachin; Victor H Rusu; Hugo Verli; Roberto D Lins
Journal:  J Chem Theory Comput       Date:  2012-09-18       Impact factor: 6.006

Review 9.  Force Field Development for Lipid Membrane Simulations.

Authors:  Alexander P Lyubartsev; Alexander L Rabinovich
Journal:  Biochim Biophys Acta       Date:  2016-01-04

10.  Toward Atomistic Resolution Structure of Phosphatidylcholine Headgroup and Glycerol Backbone at Different Ambient Conditions.

Authors:  Alexandru Botan; Fernando Favela-Rosales; Patrick F J Fuchs; Matti Javanainen; Matej Kanduč; Waldemar Kulig; Antti Lamberg; Claire Loison; Alexander Lyubartsev; Markus S Miettinen; Luca Monticelli; Jukka Määttä; O H Samuli Ollila; Marius Retegan; Tomasz Róg; Hubert Santuz; Joona Tynkkynen
Journal:  J Phys Chem B       Date:  2015-11-25       Impact factor: 2.991

View more
  22 in total

1.  Multiscale Simulations of Biological Membranes: The Challenge To Understand Biological Phenomena in a Living Substance.

Authors:  Giray Enkavi; Matti Javanainen; Waldemar Kulig; Tomasz Róg; Ilpo Vattulainen
Journal:  Chem Rev       Date:  2019-03-12       Impact factor: 60.622

2.  Computer simulations of protein-membrane systems.

Authors:  Jennifer Loschwitz; Olujide O Olubiyi; Jochen S Hub; Birgit Strodel; Chetan S Poojari
Journal:  Prog Mol Biol Transl Sci       Date:  2020-02-26       Impact factor: 3.622

3.  Removal of N-linked glycans in cellobiohydrolase Cel7A from Trichoderma reesei reveals higher activity and binding affinity on crystalline cellulose.

Authors:  Bartłomiej M Kołaczkowski; Kay S Schaller; Trine Holst Sørensen; Günther H J Peters; Kenneth Jensen; Kristian B R M Krogh; Peter Westh
Journal:  Biotechnol Biofuels       Date:  2020-08-06       Impact factor: 6.040

4.  CHARMM-GUI Glycan Modeler for modeling and simulation of carbohydrates and glycoconjugates.

Authors:  Sang-Jun Park; Jumin Lee; Yifei Qi; Nathan R Kern; Hui Sun Lee; Sunhwan Jo; InSuk Joung; Keehyung Joo; Jooyoung Lee; Wonpil Im
Journal:  Glycobiology       Date:  2019-04-01       Impact factor: 4.313

5.  Atomistic structure and dynamics of the human MHC-I peptide-loading complex.

Authors:  Olivier Fisette; Gunnar F Schröder; Lars V Schäfer
Journal:  Proc Natl Acad Sci U S A       Date:  2020-08-11       Impact factor: 11.205

6.  Identifying Drug Targets in Pancreatic Ductal Adenocarcinoma Through Machine Learning, Analyzing Biomolecular Networks, and Structural Modeling.

Authors:  Wenying Yan; Xingyi Liu; Yibo Wang; Shuqing Han; Fan Wang; Xin Liu; Fei Xiao; Guang Hu
Journal:  Front Pharmacol       Date:  2020-04-30       Impact factor: 5.810

7.  Umbrella Visualization: A method of analysis dedicated to glycan flexibility with UnityMol.

Authors:  Camille Besançon; Alexandre Guillot; Sébastien Blaise; Manuel Dauchez; Nicolas Belloy; Jessica Prévoteau-Jonquet; Stéphanie Baud
Journal:  Methods       Date:  2019-07-11       Impact factor: 3.608

8.  Cryo-EM structure of the complete and ligand-saturated insulin receptor ectodomain.

Authors:  Theresia Gutmann; Ingmar B Schäfer; Chetan Poojari; Beate Brankatschk; Ilpo Vattulainen; Mike Strauss; Ünal Coskun
Journal:  J Cell Biol       Date:  2020-01-06       Impact factor: 10.539

9.  A functional antibody cross-reactive to both human and murine cytotoxic T-lymphocyte-associated protein 4 via binding to an N-glycosylation epitope.

Authors:  Dong Li; Jing Li; Huanyu Chu; Zhuozhi Wang
Journal:  MAbs       Date:  2020 Jan-Dec       Impact factor: 5.857

10.  Glucosylceramide modifies the LPS-induced inflammatory response in macrophages and the orientation of the LPS/TLR4 complex in silico.

Authors:  Edouard Mobarak; Liliana Håversen; Moutusi Manna; Mikael Rutberg; Malin Levin; Rosie Perkins; Tomasz Rog; Ilpo Vattulainen; Jan Borén
Journal:  Sci Rep       Date:  2018-09-11       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.