Literature DB >> 32356985

Identification and Modeling of a GT-A Fold in the α-Dystroglycan Glycosylating Enzyme LARGE1.

Benedetta Righino¹, Manuela Bozzi^1,2, Davide Pirolli², Francesca Sciandra², Maria Giulia Bigotti^3,4, Andrea Brancaccio^2,4, Maria Cristina De Rosa².

Abstract

The acetylglucosaminyltransferase-like protein LARGE1 is an enzyme that is responsible for the final steps of the post-translational modifications of dystroglycan (DG), a membrane receptor that links the cytoskeleton with the extracellular matrix in the skeletal muscle and in a variety of other tissues. LARGE1 acts by adding the repeating disaccharide unit [-3Xyl-α1,3GlcAβ1-] to the extracellular portion of the DG complex (α-DG); defects in the LARGE1 gene result in an aberrant glycosylation of α-DG and consequent impairment of its binding to laminin, eventually affecting the connection between the cell and the extracellular environment. In the skeletal muscle, this leads to degeneration of the muscular tissue and muscular dystrophy. So far, a few missense mutations have been identified within the LARGE1 protein and linked to congenital muscular dystrophy, and because no structural information is available on this enzyme, our understanding of the molecular mechanisms underlying these pathologies is still very limited. Here, we generated a 3D model structure of the two catalytic domains of LARGE1, combining different molecular modeling approaches. Furthermore, by using molecular dynamics simulations, we analyzed the effect on the structure and stability of the first catalytic domain of the pathological missense mutation S331F that gives rise to a severe form of muscle-eye-brain disease.

Entities: Chemical

Mesh：

Substances：

Year: 2020 PMID： 32356985 PMCID： PMC7340341 DOI： 10.1021/acs.jcim.0c00281

Source DB: PubMed Journal: J Chem Inf Model ISSN： 1549-9596 Impact factor: 4.956

Introduction

Glycosylation is one of the most common post-translational modifications of proteins in the cell and plays a key role in physiological and pathological cellular functions.[1−3] Glycosylation patterns are altered in a number of diseases including cancer and neurodegenerative and autoimmune disorders.[4] Dystroglycan (DG) is a highly glycosylated membrane receptor, formed by the two subunits α- and β-dystroglycan (hereinafter α-DG and β-DG), that connects the extracellular matrix with the cytoskeleton in many tissues, such as muscle and brain.[5] α-DG is a peripheral membrane protein formed by two globular domains, the N- and C-terminal domains, separated by an elongated mucin-like region.[6] α-DG binds to a set of extracellular matrix molecules that harbor one or more laminin globular domains (LG-domains),[7] such as laminin,[8] agrin,[9] perlecan,[10] pikachurin,[11] neurexin,[12] and slit.[13] The binding properties of α-DG depend on a complex O-mannosyl glycosylation, a multi-step process that involves at least 17 different enzymes.[14] Mutations in any of these enzymes are linked to a number of muscular dystrophies, collectively known as secondary dystroglycanopathies, whose clinical presentations can vary from severe congenital muscular dystrophies to milder limb-girdle muscular dystrophy with manifestation in adulthood.[15] The hallmark of secondary dystroglycanopathies is the absence or the significant reduction of α-DG glycosylation.[16] The α-DG O-mannosyl modification starts with the attachment of a trisaccharide molecule GalNAc-β3-GlcNAc-β4-Man to Ser/Thr residues in the α-DG mucin-like region.[17] The mannose is then phosphorylated by POMK to allow further glycan elongation and phosphorylation by enzymes including fukutin, FKRP, TMEM5, B4GAT1, and LARGE1.[18−21] In particular, LARGE1 is a type II transmembrane protein, localized in the Golgi apparatus,[22] characterized by two distinct domains, that can be found within the Golgi lumen, with glycosyltransferase activities: a xylosyltransferase activity (domain 1) and a glucuronyltransferase activity (domain 2)[18] (Figure ).

Figure 1

Schematic representation of LARGE1 domains.

Schematic representation of LARGE1 domains. Indeed, LARGE1 catalyzes the addition to its substrate of several units of the disaccharide [-3Xyl-α1,3GlcAβ1-], known as matriglycan, which is the functional motif that ensures the binding of α-DG to the extracellular matrix proteins.[14,23] During DG maturation, LARGE1 binds to the N-terminal domain of α-DG, and this binding is an essential anchor for the enzyme to specifically modify the mucin-like region.[19,24] Different mutations in LARGE1 have been identified in patients affected by severe congenital muscular dystrophy with central nervous system involvement.[25−29] These mutations include missense and frameshift mutations in both domain 1 and domain 2, as well as intragenic deletions/insertions. Overexpression of LARGE1 in cells from patients affected by different α-dystroglycanopathies restored α-DG laminin binding, indicating that the modulation of LARGE1 activity may represent a therapeutic strategy for the treatment of secondary dystroglycanopathies.[30,31] In this context, the effort to reach a fundamental understanding of the molecular details of the glycosylation process is undermined by the lack of high-resolution structural data on LARGE1. In order to start filling this gap, molecular modelling can be used to construct a working three-dimensional model structure of the α-DG glycosylating enzyme. When structural homologs are not available, as in the case of LARGE1, ab initio modelling may guide a conformational search based on a designed energy function, but, in spite of recent advances, such an approach has shown low accuracy in prediction.[32] Traditional physics-based potentials for molecular mechanics and conformational searches, first used for organic compounds,[33,34] have not yet been developed to a point where reliable model structures can be generated except for a limited number of proteins.[32] Knowledge-based energy functions were demonstrated to perform better,[35] and among them, the TASSER approach ranked as the top method for automated protein structure prediction in the latest CASP experiments for three-dimensional structure prediction.[36,37] TASSER and its development, I-TASSER gateway, which involves template threading, structural fragment assembly, model refinement, and structure-based protein function annotation, indicate that ab initio and template-based modelling may successfully combine.[38] To date, the only in silico analysis of LARGE1, confined to the study of LARGE1 domain 2, is that of Bhattacharya et al.[39] In this study, we present a three-dimensional model for both domain 1 and 2 of LARGE1 and analyze their structural features using the best performing methods for fold recognition and domain boundary prediction.[40,41] The generated three-dimensional structures were refined, and the best reliable models were selected and subjected to molecular dynamics (MD) simulations. MD calculations are useful tools for evaluating the stability of predicted domains as well as the effect of a pathological mutation,[42−44] and we employed them in order to provide insight into the outcome of the S331F mutation found in a patient affected by congenital muscular dystrophy with eye and brain involvement.[26]

Materials and Methods

Functional Domains Prediction and Models Building

The 756 amino acid long sequence of LARGE1 from Mus musculus was retrieved from the UniProt database (http://www.uniprot.org/)[45] (entry code Q9Z1M7), and the domain analysis was performed using the Conserved Domain Database (CDD) server.[46] Following a successful modelling strategy,[47] two different protein-modelling approaches were employed to build the molecular models of the identified LARGE1 protein domains: HHPRED[48] in combination with MODELLER[49] and the I-TASSER server.[50] The web-service HHpred, available at https://toolkit.tuebingen.mpg.de/tools/hhpred, offers a threading approach that uses the hidden Markov models[51] and was employed to align the sequences and search for suitable template structures. The PDB70 profile database was used for the template search, and the crystallographic structure showing the best HHpred probability score, the largest coverage, and the best resolution was chosen as a template. The resulting alignment was submitted to the program MODELLER v 9.15 as implemented in Discovery Studio 2016 (Dassault Systèmes BIOVIA, Discovery Studio Modelling Environment, Release 2016, San Diego: Dassault Systèmes, 2016) to build the three-dimensional structure of the identified LARGE1 domains. Fifty models with a degree of optimization scored as “high” were generated by the “Build Homology Model” protocol of Discovery Studio and ranked by the PDF total energy. The best-ranked models were then submitted to the “Loop Refinement” protocol of the program generating five models for each loop, with a “high” optimization level. The refined models with the lowest PDF total energy score were chosen as the final models. For comparison, the I-TASSER web-service was also used, a meta-server that automatically employs ten threading algorithms in combination with ab initio modelling to build the tertiary structure of a protein as well as replica-exchange Monte Carlo dynamics (REMD) simulations for the atomic-level refinement. PROCHECK[52] and ProSA-Web[53] were employed to evaluate the quality of the model. The structure of the mutant S331F of LARGE1 domain 1 was carried out with the “Build Mutant” protocol of the BIOVIA Discovery Studio suite (Dassault Systèmes BIOVIA, Discovery Studio Modelling Environment, Release 2016, San Diego: Dassault Systèmes, 2016). Fifty models were generated with a “high” optimization level and with no restraints and evaluated based on their PDF total energy and DOPE score.

Molecular Dynamics Simulations of WT and of the S331F Mutant

Molecular dynamics simulations were executed using the version 4.8 of Desmond (D. E. Shaw Research, New York) employing the OPLS2005 force field.[54] An all-hydrogen model of the proteins was first generated using the Protein Preparation Wizard tool[55] available in the Maestro software (Schrödinger, LLC, New York, NY, 2017), for hydrogen insertion and force field atom-types and partial charges assignment. The systems for the simulations were built with the “System Builder” tool of the Maestro suite, drawing a triclinic box around the domain 1 of LARGE1 with a distance buffer of 10 Å between the protein and the side of the box, and filling it with SPC water molecules. A 0.15 mol/L NaCl salt concentration was added, and additional Cl– ions were added to neutralize the system (Table ). The resulting systems both consisted of a 71 × 76 × 71 Å-sized box containing a total of about 30000 atoms, among which ∼8800 were water molecules and ∼4590 were protein atoms (Table ).

Table 1

Details of the Starting Structures for MD Simulations

system property	WT	S331F mutant
number of atoms	30,017	30,963
protein residues	276	276
protein atoms	4585	4594
water molecules	8773	8752
Cl^– ions	31	31
Na⁺ ions	24	24
Mn²⁺ ion	1	1
size (Å)	71.2 × 66.4 × 70.5	71.2 × 66.4 × 70.5
ligand atoms	55	55
protein charge	+7	+7
ligand charge	–2	–2

The two systems were first minimized in order to relax the molecules into a local energy minimum and to remove the steric clashes, using the default protocol consisting of an initial steepest descent phase followed by a minimization with the LBFGS method, until convergence. The systems were then equilibrated using the Desmond standard NPT relaxation protocol with default parameters. We performed duplicate MD simulations of 500 ns each, using different random seeds for the assignment of the initial velocities, at a constant pressure of 1 atm and 300 K, saving the energies and the trajectories every 20 ps. The Martyna–Tobias–Klein method was used for pressure coupling, and the Nose–Hoover thermostat was employed for the temperature bath, using the default settings for both, while a cut-off of 9 Å was used for the non-bond interactions. The analysis of trajectories was performed with the “simulation event analysis” tool of Maestro (Maestro-Desmond Interoperability Tools, Schrödinger, New York, NY, 2017), and the solvent accessible surface area (SASA) over the simulation time was calculated with VMD.[56] Discovery Studio (Dassault Systèmes BIOVIA, Discovery Studio, 2018, San Diego: Dassault Systèmes, 2018), Maestro, and VMD were used for the visual inspection of the three-dimensional structures and for the simulations results. The convergence of the MD simulations was assessed by calculating the root mean square inner product (RMSIP) using Bio3D library as implemented in the version 3.5.2 of the R package.[57,58] The protein motion essential for the biological function of LARGE1 was analyzed by principal component analysis (PCA)-based dimensionality reduction of the atomic fluctuations of the Cα atoms in the simulated trajectory. The Bio3D library,[58] as implemented in the version 3.5.2 of the R package, was employed to perform and graphically represent the PCA of the WT and the S331F mutant trajectories. The conformations from the last 210 ns replica MD simulations were used for the wild-type (WT) and mutant enzymes. All the calculations were carried out employing an NVIDIA M4000 GPU and an Intel Xeon X5660 processor, on a HPZ820 workstation running Linux Centos 7 operating system. FoldX v5.0 was used to estimate the impact of the S331F mutation on protein stability (http://foldxsuite.crg.eu).[59] The algorithm calculates the free energy of folding of the WT and the mutated protein and estimates whether the mutation has a destabilizing (ΔΔG > 0) or stabilizing (ΔΔG < 0) effect.

Cloning and Expression of LARGE1 Domain 1 in Escherichia coli

The nucleotide sequence of LARGE1 domain 1 (residues 138–413) from M. musculus was synthesized and optimized for expression in E. coli by Invitrogen (GeneArt gene synthesis). The synthetic gene was cloned with extremities 5′BamHI-3′EcoRI downstream the thioredoxin gene in the vector pHisTrx, for expression in E. coli as a fusion product; the expression vector thus obtained was transformed into E. coli BL21 (DE3). A single colony from the transformants was picked and grown o.n. in LB + Amp 100 mg/L, then diluted 100-fold in fresh LB + Amp 100 mg/mL, and grown at 30 °C shaking at 220 rpm until OD600 ≈ 0.8; the expression was induced by adding IPTG to 0.5 mM final concentration. Protein production was checked on single aliquots collected at specific times after induction by lysing the cell pellets and running them on sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE). The bulk of cells was collected 3 h after induction, and the cell pellet was resuspended and lysed; the soluble proteins were separated from the insoluble ones by high-speed centrifugation and finally checked on SDS-PAGE in order to identify which fraction contained the fusion product. To increase the solubility of LARGE1 domain 1, the same construct was used to transform Arctic Xpress E. coli BL21 (DE3) competent cells (Agilent Technologies) following the manufacturer instructions. A single colony from the transformants was picked and grown o.n. in LB containing ampicillin 100 mg/L and gentamicin 20 mg/L, then diluted 200× in fresh LB containing no selection antibiotic, and grown at 30 °C shaking at 220 rpm for 3 h. The expression was then induced by adding IPTG to 1 mM final concentration and growing the cells for additional 24 h at 12 °C shaking at 200 rpm. Cells were harvested, resuspended in binding buffer (20 mM Tris HCl, 0.5 M NaCl, 15 mM imidazole, 2 mM Mn2+), and lysed in a cell disruptor (Constant Systems, model Z plus 1.1 KW). The soluble and insoluble fractions were separated by centrifugation, and the supernatant was loaded on a 5 mL His trap Ni2+ column equilibrated in binding buffer, for isolation of the His-tagged fusion product by binding and imidazole elution. Expression and purification outcomes were checked on SDS-PAGE.

Results and Discussion

Molecular Modelling

A schematic summarizing the workflow of our molecular modelling approach is provided in Figure .

Figure 2

Schematic of the methodology employed for modelling LARGE1 domains 1 and 2.

Molecular Modelling of LARGE1 Domain 1

We decided to use the murine LARGE1 sequence for molecular modelling due to the very high degree of sequence similarity between murine and human variants (100% sequence identity for domain 1 and 98.5 for domain 2, Supporting Information, Figure S1). It is worth noting that we have recently shown that the X-ray structures of murine and human N-terminal of α-dystroglycan (displaying the same degree of sequence similarity found between human and murine LARGE1) are identical.[60] As for the human point mutation of LARGE1 that we have introduced and analyzed in the model and in agreement with the above, all the residues in the region specifically perturbed by the mutation S331F in murine LARGE1 are identical to those in its human counterpart. The xylosyltransferase and glucuronyltransferase activities of LARGE1 have been attributed to two different domains of the protein.[18] Indeed, the identification of functional domains by the CDD server revealed the presence of two domains, namely, domain 1 (residues 138–413) and domain 2 (residues 473–742), both predicted to belong to two glycosyltransferase families (family 8 and 49, respectively), in agreement with the data reported in the literature.[18,61] The absence of a suitable template structure in the Protein Data Bank for homology modelling of the whole-length protein led us to generate three-dimensional model structures of the two identified domains individually. The HHpred analysis on domain 1 (residues 138–413) identified three potential templates in the Protein Data Bank: the crystal structure of a protein belonging to the glycosyltransferase family 8 from Anaerococcus prevotii (PDB code 3TZT(62)), the crystal structure of the galactosyltransferase LgtC from Neisseria meningitidis (PDB code 1G9R(63)), and the crystal structure of the xyloside xylosyltransferase 1 from M. musculus (PDB code 4WMA(64)). Amongst these, 1G9R (2 Å resolution, 19% sequence identity) was selected as the template structure because it exhibited the lowest number of missing residues. The HHpred alignment was then used by MODELLER for model building. Structural models were generated, and the highest-scoring model (lowest PDF total energy) was selected for further analysis. The quality of the model was evaluated considering the overall stereochemistry by PROCHECK, and the resulting Ramachandran plot showed that 95.7% of residues was located in the most favorable regions (core and allowed), whereas the 3.9% was located within the generously allowed region and only the 0.4%, confined to Lys 386, in the disallowed region. The interaction energy of each residue with the structural surrounding environment, calculated with PROSA-Web, gave a global Z-score of −5.6 (276 residues), which falls well within the accepted range for proteins of the same size and indicates an overall good model quality (Supporting Information, Figure S2A). The PROSA-Web local quality analysis of the model, represented in the energy profile plot, shows that all the residues, except those in regions 265–285, have negative scores, where the more negative is the score, the more correctly modeled is the region (Supporting Information, Figure S2B). For comparison, an alternative approach was used and the model of domain 1 was also generated on the I-TASSER server, which is based on a combination of multiple-threading alignments, iterative template fragment assembly simulations, and ab initio modelling algorithms. The template search stage of the I-TASSER procedure identified the crystal structures of the galactosyltransferase LgtC from N. meningitidis (PDB entry codes: 1G9R,[63]1GA8,[63] and 1SS9(65)) of a glycosyltransferase family 8 from A. prevotii (PDB code 3TZT(62)) and of the xyloside-α-1,3-xylosyltransferase 1 (XXYLT1) from Homo sapiens (PDB code 4WM0(64)) as the best templates. In agreement with the HHpred results, all the templates detected by I-TASSER displayed the typical GT-A fold, consisting of two α/β/α domains with a continuous central β-sheet,[66] all belonging to family 8. Identified templates share sequence identity with the target sequence ranging from 19 to 24% (Supporting Information, Table S1). Five I-TASSER models were obtained, and the best scoring model displayed a TM-score of 0.79 ± 0.09, an estimated root-mean-square deviation (rmsd) of 4.8 ± 3.1 Å and, on a scale of accuracy ranging from −5 (lowest) to 2 (highest), a C-score value of 0.57. The model structure obtained from the I-TASSER procedure featured a lower percentage of residues (87%) in the core and allowed regions of the Ramachandran plot and 6.7 and 5.9% residues in the generously allowed and disallowed regions, respectively. The ProSA-web local quality analysis of this model reported negative energy values for all the residues, with the exception of the region 271–296, and the global quality analysis (276 residues) gave a Z-score of −6.92, which falls within the range of the experimentally resolved structures of the same size (Supporting Information, Figure S3A,B). The models generated by I-TASSER and HHpred were submitted to the COFACTOR software,[67] which identifies the most structurally similar enzymes in the Protein Data Bank, performing a local and global structure match. This analysis showed that the two generated models share the highest structural similarity with the crystallographic structure of the glycosyltransferase LgtC from N. meningitidis (PDB code number 1SS9), displaying a TM-score of 0.89 and 0.92 for the I-TASSER and HHpred model, respectively, as well as a Cα-rmsd lower than 2 and a sequence coverage of 94.6% for both the model structures. Given that a TM-score greater than 0.7 indicates that the structures have a 90% probability of being in the same topology family,[68] the two models of domain 1 can thus confidently be assigned to the GT-A glycosyltransferase family. For comparison, the calculated Cα-rmsd values between the MODELLER and I-TASSER LARGE1 models are 2.3 Å (domain 1) and 4.1 Å (domain 2), indicative of similar models. According to the validation results, the model structure obtained by the combined HHpred/MODELLER approach was selected as the best model for LARGE1 domain 1 for further analysis. The model structure of domain 1 consists of a central seven-stranded mainly parallel β-sheet (Figure A), sandwiched between 14 α-helices and a small β-sheet, which is conserved in all the proteins with a GT-A fold (Figure A,B).

Figure 3

Structure and topology of LARGE1 domain 1. (A) Topology diagram of the domains. Helices are shown as red spheres, and β-strands are shown as yellow triangles. (B) Structural model represented as a cartoon, and the secondary structure is colored (helices in red, β-strands in yellow, and random coils in magenta). Individual side chains are represented in green as balls and sticks. On one side of the β-sheet, helix-C contacts the β1 and β2 strands, whereas helices G, J, and K contact the rest of the β-sheet. On the other side, a small β-sheet (strands β5 and β9) is located between helices A, B, and N that are in contact with the β1, β2, and β3 strands, whereas helices D, L, M, and E are in contact with the β6 and β8 strands (Figure ). As found in other GT-A folded proteins, the small β-sheet formed by two antiparallel strands (β5 and β9) is linked to the β4 strand by a loop (L4) that contains a conserved DXD motif likely to bind the manganese cation (Mn2+)[69] responsible for the catalytic activity of domain 1.[18,22] The Mn2+ ion has a crucial role in the catalytic activity of LARGE1, and it is coordinated by the DXD motif residues as well as by the α- and β-phosphate moieties of the substrate.[63] In the predicted model structure of LARGE1 domain 1, the Mn2+ ion is coordinated by the NE2 atom of His380 of the β9 strand and by the carboxylate oxygen atoms of Asp242 and Asp244 belonging to the β4−β5 linker loop L4 (Figure B). Analogous interactions are established by the Mn2+ ion in the neighbor crystallographic structure of LgtC from N. meningitidis (1G9R) where the cation is coordinated by His244, Asp103, and Asp105, respectively (ref (63), Figure ).

Figure 4

Structure-based sequence alignment of LARGE1 domain 1 and LgtC from N. meningitidis. Numbering is relative to the sequence of LARGE1, and the secondary structure elements of LgtC are based on Persson et al.[63] The figure was prepared using the ESPript server (http://espript.ibcp.fr). Identical residues are shown in white on red, and homologous residues are in red letters. Secondary structure details are represented (helices: squiggles, beta strands: arrows, and turns: TT letters). Full inspection of the structural alignment reveals that all major secondary structure elements of the model of LARGE1 domain 1 superimpose almost exactly with those of the LgtC glycosyl transferase, with only few exceptions related to the helical content (Figure ).

Molecular Modelling of LARGE1 Domain 2

An analogous analysis was performed for LARGE1 domain 2. The top-ranking templates identified with a 99.7% probability by the template search methods of HHpred were all N-acetylgalactosaminyl transferases, namely, 1XHB,[70]6E4R,[71]6H0B,[72]6IWR,[73] and 2D7I.[74]1XHB, the crystal structure of the UDP-GalNAc: polypeptide α-N-acetylgalactosaminyltransferase-T1 from M. musculus, was selected as the best template because it exhibited the highest P-value (2.0 × 10–19), highest probability (99.7%), and a sequence identity of 10% with the target. The stereochemical quality of the model with the lowest PDF value, generated by MODELLER on the basis of the HHpred alignment, was evaluated using PROCHECK. The Ramachandran plot calculation revealed that 91.5% of the residues are found in the most favorable regions (core and allowed), 3.7% in the generously allowed region, and the remaining 4.9% in the disallowed region. ProSA-Web analysis of the model (270 residues) revealed a Z-score value of the target protein of −2.69, that is, within the range of the experimental protein structures of the same size, which indicates an acceptable overall quality of the model structure. However, the ProSA-Web analysis revealed that about 50% of the residues display positive values of interaction energy, suggesting errors in some regions of the model (Supporting Information, Figure S4A,B). The crystallographic templates found by I-TASSER to be suitable for a threading-ab initio model were the crystal structure of chondroitin polymerase from E. coli (PDB code 2Z86(75)), the structure of the human UDP-GalNAc:polypeptide α-N-acetylgalactosaminyltransferase-T2 (PDB codes 2FFU,[76]5AJO[77] and 4D0T(78)), and the crystal structure of the human pp-GalNAc-T10 (pdb code 2D7I(74)). Notably, the PDB entry 4D0T was also used as a template for LARGE1 domain 2 model building in the work published by Bhattacharya.[39] Sequence identities between identified templates and LARGE1 domain 2 fall in the range 17–21% (see Supporting Information, Table S2, for details). The top scoring model of domain 2 produced by I-TASSER had a C-score value equal to −0.82, an estimated TM-score of 0.61 ± 0.14 Å and an estimated rmsd of 7.8 ± 4.4 Å. The closest structural analogue to the model of domain 2, found by I-TASSER in the Protein Data Bank is 2FFU, with a TM-score of 0.87 (rmsd 1.61 Å), where a TM-score higher than 0.5 generally indicates the same fold in SCOP/CATH. In agreement with the study of Bhattacharya and colleagues,[39] the crystal structure with PDB code 4D0T, that is, the human glycosyltransferase GalNAc-T2 in complex with UDP-GalNAc, the EA2 peptide, and manganese ion, was also found to have a close structural similarity to the model of LARGE1 domain 2 (TM-score of 0.84 and rmsd of 1.66 Å). The analysis of the stereochemical quality of this model, assessed with PROCHECK, showed that 95.5%, 1.6, and 2.8% of the residues are located within the core and allowed, generously allowed, and disallowed regions of the Ramachandran plot, respectively. The overall model quality shows a ProSA-web Z-score of −7.2, in agreement with the Z-scores calculated for the experimentally resolved structures in the Protein Data Bank (Supporting Information Figure S5A). The ProSA-web server showed that the interaction energy of each residue with the rest of the protein has negative values, apart from the residues within the two small regions 584–590 and 597–601 (Supporting Information, Figure S5B). Comparison of the results obtained using the two modelling approaches led to the selection of the structural model built with I-TASSER as the best model of LARGE1 domain 2 (Figure ).

Figure 5

Structure and topology of LARGE1 domain 2. (A) Topology diagram of the domains. Helices are shown as red spheres, and β-strands are shown as yellow triangles. (B) Structural model is represented as a cartoon, and the secondary structure is colored (helices in red, β-strands in green, and random coils in magenta). Individual side chains are represented as green balls and sticks. Although the I-TASSER server performed better than HHpred in terms of quality of the final model, the results from both methods agree on pointing toward domain 2 to be classified as an enzyme of the GT-A family, with a Rossman-like fold consisting of a two sandwiched β-sheets interposed between four α-helices on both sides (Figure A). Interestingly, although the fold, especially the arrangement of the β-sheets, is conserved between the two domains of LARGE1, the three-dimensional model of domain 2 shows a lower number of strands with respect to domain 1 (eight vs nine) (Figures A and 5A). Strand β8 undergoes a distortion at the level of Pro699 that makes its C-terminal portion parallel to the β6 strand and its N-terminal portion antiparallel to the β5 strand, thus creating the double β-sheet structure shown in Figure B. Interestingly, the third DXD motif, that in domain 2 is DID (563–565), essential for the glucuronyltransferase activity of LARGE1,[18,22] is located in a small unstructured turn between the strands β4 and β5, and it is structurally aligned with the DXH motif of the structural neighbor UDP-GalNAc:polypeptide α-N-acetylgalactosaminyltransferase-T2 from H. sapiens. The turn region is located deep into the binding cleft of the enzyme, with the two aspartates (Asp563 and Asp565) pointing toward the solvent and the isoleucine (Ile564) buried in the protein core, indicating an orientation that could favor the coordination of the carboxyl groups of the aspartates to a metal ion cofactor such as Mn2+[79] (Figure B). It is noteworthy that His705, in the β8 strand, is conserved in 2FFU (His359, see sequence alignment in Figure ) and that in the crystal structure, this residue coordinates the metal ion.

Figure 6

Structure-based sequence alignment of LARGE1 domain 2 and UDP-GalNAc:Polypeptide α-N-acetylgalactosaminyltransferase-2. The figure was prepared using the ESPript server (http://espript.ibcp.fr). Identical residues are shown in white on red, and homologous residues are in red letters. Secondary structure details are represented (helices: squiggles, beta strands: arrows, and turns: TT letters). His705 in the model of domain 2 is in close contact with the aspartates 563 and 565 (distance shorter than 4 Å), suggesting an involvement of this basic residue in the catalytic cluster (Figure B).

Effects of the S331F Mutation on the Structure and Dynamics of LARGE1 Domain 1

The mutation of the LARGE1 residue Ser331 to Phe has been known for some time to cause a congenital muscular dystrophy phenotype characterized by eye malformations and mental retardation.[26] According to our in silico generated structure of domain 1, Ser331 is located in a random coil region (residues 325–333) between helices I and J, on the rim of the active-site cleft and close to Asn213, which belongs to the coil region between strand β3 and helix C (residues 200–219) (Figure B). Molecular dynamics simulations of the WT and of the S331F mutant were carried out to explore the conformational changes that occur upon S331F replacement and to assess its structural and dynamics effects on the enzyme function, with the final aim of understanding the molecular mechanism at the basis of the pathological alterations induced by this mutation. Two different MD replicates were performed for both WT and S331F mutant. The stability of the two systems has been evaluated by measuring the rmsd of the alpha carbons with respect to their positions at time 0, over the simulation time. After an equilibration of 40 ns, the rmsd reached a plateau with a stable value of 4.4 ± 0.2 and of 4.5 ± 0.2 Å for WT and S311F, respectively (Figure A), until the end of the simulations. An analysis of the Cα atom root mean square fluctuation (RMSF) with respect to their average positions calculated over the last 200 ns was then performed, in order to identify the regions of the protein whose movements are mostly affected by the replacement of Ser311 with a Phe residue (Figure B). The results indicate that most of the residues share similar fluctuations. The differences in RMSF between the two simulated systems are shown in Figure C, which reveals larger fluctuations from the averaged MD conformations in the unstructured regions spanning residues 178–180, 290–292, 324, 367–368, and 410–413.

Figure 7

Analysis of MD trajectories. (A) Cα-rmsd from the starting structure. (B) Residue-based Cα-RMSF relative to the average structure. (C) ΔRMSF: difference between the RMSF of the mutant S331F and that of the WT for each residue. A positive value indicates larger fluctuations for the mutant, whereas a negative value means larger fluctuations for the WT. Plots relative to the WT system are colored in blue and those relative to the S331F mutant in orange. Although several residues were found to display significant differences in RMSF upon mutation, the important catalytic residues such as the two DXD motifs and His380 were not affected in their atomic fluctuations. An alteration of total, hydrophobic, and hydrophilic SASAs in the mutant protein with respect to the WT was also observed. A representative plot of SASA for all-atom, hydrophobic, and polar residues against time is shown in Figure .

Figure 8

Analysis of MD trajectories. Time series of the SASA of LARGE1 domain 1. The evolution of the total, hydrophobic, and hydrophilic SASA over the simulation time is shown in panels (A–C), respectively. (D) Percentage of hydrophobic SASA (hyd-SASA) and hydrophilic SASA (phil-SASA) with respect to the total SASA. Each time-series refers to both the simulation replicas. For color code, see Figure . Compared to the WT, the S331F mutant showed higher all-atom SASAs, which converged to an average value of 15145 ± 248 Å2 (14777 ± 327 Å2 in the WT protein). Notably, the hydrophobic SASA of the mutant increased about 12.7% over the simulation time with respect to the WT protein (5115 ± 265 and 4538 ± 166 Å2, respectively) (Figure ), a feature that could significantly affect the interactions at the protein surface. In order to check the convergence of the MD simulations, we have performed the RMSIP analysis on two equilibrated regions of the trajectories. The obtained high RMSIP values of 0.80 (replica#1), 0.82 (replica#2) and 0.82 (replica#1), 0.83 (replica#2) for the WT and mutant proteins, respectively, are indicative of an adequate level of convergence. The local conformational changes that occur upon S331F mutation have been inspected by analyzing the non-bonding interactions established by Ser/Phe331. In the WT simulations, Ser331 is hydrogen bonded, by means of its OG and O backbone oxygen atoms, to Asn213 (O backbone oxygen, ND2 and OD1 atoms), whereas in the mutant, Phe331 moves away from Asn213, thus causing a conformational change in loops 200–219 and 325–333 (Figures and 10). All plots of the duplicate trajectories were similar and are shown in the Supporting Information (Figures S6–S8).

Figure 9

Figure 10

Analysis of MD trajectories. Frequency of the hydrogen bonds (%) between Ser331 and Asn213 in domain 1 WT (A) and S331F (B). Time evolution of the distances between the geometric centroids of the Phe331 and Asn213 side chain atoms (C) and between the O backbone atoms of Phe331 and Asn213 (black and grey, respectively).

Representative structures of the simulated LARGE1 WT and S331F mutant. The structurally closest frame to the averaged structure from the MD simulations (obtained selecting the frame with the lowest Cα-rmsd with respect to the average Cα-coordinates calculated over the equilibrated trajectory) is reported for each simulated system. Panel (A) shows the average structures from the WT simulation, and panel (B) shows the average structures from the S331F mutant simulation. The backbone is represented as tube colored in grey, the atomic details are represented as sticks colored by atom type, and the black dashed line represents the hydrogen bond interaction. Analysis of MD trajectories. Frequency of the hydrogen bonds (%) between Ser331 and Asn213 in domain 1 WT (A) and S331F (B). Time evolution of the distances between the geometric centroids of the Phe331 and Asn213 side chain atoms (C) and between the O backbone atoms of Phe331 and Asn213 (black and grey, respectively). A weak destabilizing effect of the S331F mutation (ΔΔG = 0.21 kcal/mol) was also calculated by FoldX. This ΔΔG value, which cannot be directly related to a change in enzyme function,[80] suggests that a phenylalanine in position 331 is not fully compatible with the surrounding chemical environment, as already observed in MD. Time evolution of the secondary structure elements along the MD simulation was also analyzed, showing the stability of the β-strand elements during the entire MD simulations and the structural transition of few α-helices into 310-helices (Supporting Information, Figure S9). In order to probe the changes in conformational dynamics induced by the S331F mutation, analysis of the MD trajectory by PCA using the Ca atoms was also performed. Results indicate that the first twenty principal components (PCs) account for 90.7% (WT) and 92.8% (mutant) of the motions observed in the last 400 ns of the trajectories, whereas the first three PCs account for the 74.5% (WT) and the 82.3% (mutant) motions (Supporting Information, Figure S10A). The trace values of the diagonalized covariance matrix were found to be 1706.9 Å2 (WT) and 2145.8 Å2 (mutant), indicating an increased flexibility of S331F in the collective motion as compared to the WT (Supporting Information, Figure S10A,B). Analysis of the contribution of each residue to the first principal component revealed that the Ser 331 to Phe mutation alters the protein motion in the loop regions surrounding the residue 331. Notably, the peaks corresponding to residues 211–216 and 282–291 are shifted backward (to 201–210 and 276–288, respectively) in the mutated protein (Supporting Information, Figure S10C). This analysis suggests that the S331F mutation introduces local differences in the motion of the active site rim of domain 1, which might affect the interaction with the substrate.

Cloning and Expression of LARGE1 Domain 1 in E. coli

A series of first attempts at heterologous expression of domain 1 as revealed by our modelling results were carried out. The DNA sequence coding for the region comprising residues 138–413 of M. musculus LARGE1 was custom-synthesized and optimized for the expression in E. coli. The synthetic gene thus obtained was cloned into the pHisTrx vector for expression downstream the fusion partner His6-tagged thioredoxin and subsequently transformed into E. coli strain BL21 (DE3). Although preliminary, the evidence that the protein can be recovered in the soluble form upon heterologous expression in the presence of helper proteins such as the chaperonin GroEL (see Supporting Information, Figures S11–S13) reinforce the notion that the subdomain 1 represents an autonomous folding unit, as suggested by our modelling results. Further experiments are granted aimed at producing quantitative amounts of domain 1 for future biochemical and structural studies. Recombinant expression of individual LARGE1 domains 1 and 2, as well of the two domains together in a full-length LARGE1 construct, would allow a thorough analysis of the enzymatic proprieties and activity of this glycosyltransferase. The two catalytic domains of LARGE1 are connected by a flexible linker (ranging from amino acid 414 to 472) that due to its nature could not be accurately modeled. A very interesting question is whether the two domains establish long-range interactions and work somehow cooperatively during the sequential synthesis of each added disaccharide block (i.e. the final product of every full LARGE1 enzymatic cycle) or rather function in an independent fashion. Whether the binding of each substrate to each catalytic domain induces any allosteric long-range effects influencing the enzymatic behavior of the other catalytic domain remains to be determined at the current stage. Noteworthy, also the N-terminal region of α-dystroglycan[81] harbors two different domains (IG1 and S6), and it is intriguing to foresee a scenario in which the two DG domains would be involved reciprocally in chaperoning the activity of the two catalytic domains of LARGE1.

Conclusions

Our combined use of molecular modelling approaches allowed for a three-dimensional description of the two domains respectively responsible for the xylosyltransferase and glucuronyltransferase activities of LARGE1. This work characterized the three DXD motifs, each likely to bind a manganese cation (Mn2+), that constitute the active sites for the glycosyltransferase activities of the two domains,[22,69,82] thus offering some initial insight into the possible mechanism of binding of the sugar donor and acceptor. Molecular dynamic simulations of the S331F mutant, carried out in comparison with the WT counterpart, points to a more flexible and less stable enzyme, as suggested by an increased RMSF and all-atom and hydrophobic SASA parameters, compared to the WT protein. Indeed, a reduced stability could be at the basis of the poorer enzymatic activity of this variant of LARGE. This would eventually lead to the hypoglycosylation of α-DG observed in a patient affected by a severe form of congenital muscular dystrophy with anomalies of eye and brain linked to this missense mutation.[26] Our study demonstrates that in the absence of experimental structural data on LARGE1, template-based three-dimensional modelling of the enzyme can offer some insight into its overall structural features, and the analysis of its dynamic behavior by in silico methods can help to elucidate the molecular mechanism underneath the diseases linked to missense mutations of the enzyme. In addition, our modelling study might pave the road for the analysis of potential complexes formed by LARGE1 and the α-dystroglycan N-terminal domain (αDGN), whose structure we have previously solved.[81]

78 in total

Review 1. Glycosyltransferase structure and mechanism.

Authors: U M Unligil; J M Rini
Journal: Curr Opin Struct Biol Date: 2000-10 Impact factor: 6.809

2. How significant is a protein structure similarity with TM-score = 0.5?

Authors: Jinrui Xu; Yang Zhang
Journal: Bioinformatics Date: 2010-02-17 Impact factor: 6.937

Review 3. Protein sequence comparison and fold recognition: progress and good-practice benchmarking.

Authors: Johannes Söding; Michael Remmert
Journal: Curr Opin Struct Biol Date: 2011-03-31 Impact factor: 6.809

4. The I-TASSER Suite: protein structure and function prediction.

Authors: Jianyi Yang; Renxiang Yan; Ambrish Roy; Dong Xu; Jonathan Poisson; Yang Zhang
Journal: Nat Methods Date: 2015-01 Impact factor: 28.547

5. Congenital muscular dystrophies with defective glycosylation of dystroglycan: a population study.

Authors: E Mercuri; S Messina; C Bruno; M Mora; E Pegoraro; G P Comi; A D'Amico; C Aiello; R Biancheri; A Berardinelli; P Boffi; D Cassandrini; A Laverda; M Moggio; L Morandi; I Moroni; M Pane; R Pezzani; A Pichiecchio; A Pini; C Minetti; T Mongini; E Mottarelli; E Ricci; A Ruggieri; S Saredi; C Scuderi; A Tessa; A Toscano; G Tortorella; C P Trevisan; C Uggetti; G Vasco; F M Santorelli; E Bertini
Journal: Neurology Date: 2009-03-18 Impact factor: 9.910

6. Brain involvement in muscular dystrophies with defective dystroglycan glycosylation.

Authors: Emma Clement; Eugenio Mercuri; Caroline Godfrey; Janine Smith; Stephanie Robb; Maria Kinali; Volker Straub; Kate Bushby; Adnan Manzur; Beril Talim; Frances Cowan; Ros Quinlivan; Andrea Klein; Cheryl Longman; Robert McWilliam; Haluk Topaloglu; Rachael Mein; Stephen Abbs; Kathryn North; A James Barkovich; Mary Rutherford; Francesco Muntoni
Journal: Ann Neurol Date: 2008-11 Impact factor: 10.422

7. SGK196 is a glycosylation-specific O-mannose kinase required for dystroglycan function.

Authors: Takako Yoshida-Moriguchi; Tobias Willer; Mary E Anderson; David Venzke; Tamieka Whyte; Francesco Muntoni; Hane Lee; Stanley F Nelson; Liping Yu; Kevin P Campbell
Journal: Science Date: 2013-08-08 Impact factor: 47.728

Review 8. Matriglycan: a novel polysaccharide that links dystroglycan to the basement membrane.

Authors: Takako Yoshida-Moriguchi; Kevin P Campbell
Journal: Glycobiology Date: 2015-04-16 Impact factor: 4.313

Review 9. Analysis of α-Dystroglycan/LG Domain Binding Modes: Investigating Protein Motifs That Regulate the Affinity of Isolated LG Domains.

Authors: Christopher E Dempsey; Maria Giulia Bigotti; Josephine C Adams; Andrea Brancaccio
Journal: Front Mol Biosci Date: 2019-03-29

10. Structural basis of laminin binding to the LARGE glycans on dystroglycan.

Authors: David C Briggs; Takako Yoshida-Moriguchi; Tianqing Zheng; David Venzke; Mary E Anderson; Andrea Strazzulli; Marco Moracci; Liping Yu; Erhard Hohenester; Kevin P Campbell
Journal: Nat Chem Biol Date: 2016-08-15 Impact factor: 15.040

4 in total

1. Ligand-Induced Conformational and Dynamical Changes in a GT-B Glycosyltransferase: Molecular Dynamics Simulations of Heptosyltransferase I Complexes.

Authors: Bakar A Hassan; Jozafina Milicaj; Carlos Andres Ramirez-Mondragon; Yuk Yin Sham; Erika A Taylor
Journal: J Chem Inf Model Date: 2021-12-30 Impact factor: 4.956

2. Muscular dystrophy-dystroglycanopathy in a family of Labrador retrievers with a LARGE1 mutation.

Authors: G Diane Shelton; Katie M Minor; Ling T Guo; Steven G Friedenberg; Jonah N Cullen; Jeffrey M Hord; David Venzke; Mary E Anderson; Megan Devereaux; Sally J Prouty; Caryl Handelman; Kevin P Campbell; James R Mickelson
Journal: Neuromuscul Disord Date: 2021-07-28 Impact factor: 4.296

3. Conserved Conformational Hierarchy across Functionally Divergent Glycosyltransferases of the GT-B Structural Superfamily as Determined from Microsecond Molecular Dynamics.

Authors: Carlos A Ramirez-Mondragon; Megin E Nguyen; Jozafina Milicaj; Bakar A Hassan; Frank J Tucci; Ramaiah Muthyala; Jiali Gao; Erika A Taylor; Yuk Y Sham
Journal: Int J Mol Sci Date: 2021-04-28 Impact factor: 5.923

4. High degree of conservation of the enzymes synthesizing the laminin-binding glycoepitope of α-dystroglycan.

Authors: Maria Giulia Bigotti; Andrea Brancaccio
Journal: Open Biol Date: 2021-09-29 Impact factor: 6.411

4 in total