Benedetta Righino1, Manuela Bozzi1,2, Davide Pirolli2, Francesca Sciandra2, Maria Giulia Bigotti3,4, Andrea Brancaccio2,4, Maria Cristina De Rosa2. 1. Dipartimento di Scienze Biotecnologiche di Base, Cliniche Intensivologiche e Perioperatorie, Università Cattolica del Sacro Cuore, L.go F. Vito 1, 00168 Rome, Italy. 2. Institute of Chemical Sciences and Technologies "Giulio Natta" (SCITEC)-CNR, L.go F. Vito 1, 00168 Rome, Italy. 3. School of Translational Health Sciences, Research Floor Level 7, Bristol Royal Infirmary, Upper Maudlin Street, BS2 8HW Bristol, U.K. 4. School of Biochemistry, University Walk, University of Bristol, BS8 1TD Bristol, U.K.
Abstract
The acetylglucosaminyltransferase-like protein LARGE1 is an enzyme that is responsible for the final steps of the post-translational modifications of dystroglycan (DG), a membrane receptor that links the cytoskeleton with the extracellular matrix in the skeletal muscle and in a variety of other tissues. LARGE1 acts by adding the repeating disaccharide unit [-3Xyl-α1,3GlcAβ1-] to the extracellular portion of the DG complex (α-DG); defects in the LARGE1 gene result in an aberrant glycosylation of α-DG and consequent impairment of its binding to laminin, eventually affecting the connection between the cell and the extracellular environment. In the skeletal muscle, this leads to degeneration of the muscular tissue and muscular dystrophy. So far, a few missense mutations have been identified within the LARGE1 protein and linked to congenital muscular dystrophy, and because no structural information is available on this enzyme, our understanding of the molecular mechanisms underlying these pathologies is still very limited. Here, we generated a 3D model structure of the two catalytic domains of LARGE1, combining different molecular modeling approaches. Furthermore, by using molecular dynamics simulations, we analyzed the effect on the structure and stability of the first catalytic domain of the pathological missense mutation S331F that gives rise to a severe form of muscle-eye-brain disease.
The acetylglucosaminyltransferase-like protein LARGE1 is an enzyme that is responsible for the final steps of the post-translational modifications of dystroglycan (DG), a membrane receptor that links the cytoskeleton with the extracellular matrix in the skeletal muscle and in a variety of other tissues. LARGE1 acts by adding the repeating disaccharide unit [-3Xyl-α1,3GlcAβ1-] to the extracellular portion of the DG complex (α-DG); defects in the LARGE1 gene result in an aberrant glycosylation of α-DG and consequent impairment of its binding to laminin, eventually affecting the connection between the cell and the extracellular environment. In the skeletal muscle, this leads to degeneration of the muscular tissue and muscular dystrophy. So far, a few missense mutations have been identified within the LARGE1 protein and linked to congenital muscular dystrophy, and because no structural information is available on this enzyme, our understanding of the molecular mechanisms underlying these pathologies is still very limited. Here, we generated a 3D model structure of the two catalytic domains of LARGE1, combining different molecular modeling approaches. Furthermore, by using molecular dynamics simulations, we analyzed the effect on the structure and stability of the first catalytic domain of the pathological missense mutation S331F that gives rise to a severe form of muscle-eye-brain disease.
Glycosylation
is one of the most common post-translational modifications
of proteins in the cell and plays a key role in physiological and
pathological cellular functions.[1−3] Glycosylation patterns are altered
in a number of diseases including cancer and neurodegenerative and
autoimmune disorders.[4] Dystroglycan (DG)
is a highly glycosylated membrane receptor, formed by the two subunits
α- and β-dystroglycan (hereinafter α-DG and β-DG),
that connects the extracellular matrix with the cytoskeleton in many
tissues, such as muscle and brain.[5] α-DG
is a peripheral membrane protein formed by two globular domains, the
N- and C-terminal domains, separated by an elongated mucin-like region.[6] α-DG binds to a set of extracellular matrix
molecules that harbor one or more laminin globular domains (LG-domains),[7] such as laminin,[8] agrin,[9] perlecan,[10] pikachurin,[11] neurexin,[12] and slit.[13] The binding properties of α-DG depend
on a complex O-mannosyl glycosylation, a multi-step
process that involves at least 17 different enzymes.[14] Mutations in any of these enzymes are linked to a number
of muscular dystrophies, collectively known as secondary dystroglycanopathies,
whose clinical presentations can vary from severe congenital muscular
dystrophies to milder limb-girdle muscular dystrophy with manifestation
in adulthood.[15] The hallmark of secondary
dystroglycanopathies is the absence or the significant reduction of
α-DG glycosylation.[16] The α-DG O-mannosyl modification starts with the attachment of a
trisaccharide molecule GalNAc-β3-GlcNAc-β4-Man to Ser/Thr
residues in the α-DG mucin-like region.[17] The mannose is then phosphorylated by POMK to allow further glycan
elongation and phosphorylation by enzymes including fukutin, FKRP,
TMEM5, B4GAT1, and LARGE1.[18−21] In particular, LARGE1 is a type II transmembrane
protein, localized in the Golgi apparatus,[22] characterized by two distinct domains, that can be found within
the Golgi lumen, with glycosyltransferase activities: a xylosyltransferase
activity (domain 1) and a glucuronyltransferase activity (domain 2)[18] (Figure ).
Figure 1
Schematic representation of LARGE1 domains.
Schematic representation of LARGE1 domains.Indeed, LARGE1 catalyzes the addition to its substrate of several
units of the disaccharide [-3Xyl-α1,3GlcAβ1-], known as
matriglycan, which is the functional motif that ensures the binding
of α-DG to the extracellular matrix proteins.[14,23] During DG maturation, LARGE1 binds to the N-terminal domain of α-DG,
and this binding is an essential anchor for the enzyme to specifically
modify the mucin-like region.[19,24]Different mutations
in LARGE1 have been identified
in patients affected by severe congenital muscular dystrophy with
central nervous system involvement.[25−29] These mutations include missense and frameshift mutations
in both domain 1 and domain 2, as well as intragenic deletions/insertions.
Overexpression of LARGE1 in cells from patients affected by different
α-dystroglycanopathies restored α-DG laminin binding,
indicating that the modulation of LARGE1 activity may represent a
therapeutic strategy for the treatment of secondary dystroglycanopathies.[30,31]In this context, the effort to reach a fundamental understanding
of the molecular details of the glycosylation process is undermined
by the lack of high-resolution structural data on LARGE1. In order
to start filling this gap, molecular modelling can be used to construct
a working three-dimensional model structure of the α-DG glycosylating
enzyme. When structural homologs are not available, as in the case
of LARGE1, ab initio modelling may guide a conformational
search based on a designed energy function, but, in spite of recent
advances, such an approach has shown low accuracy in prediction.[32] Traditional physics-based potentials for molecular
mechanics and conformational searches, first used for organic compounds,[33,34] have not yet been developed to a point where reliable model structures
can be generated except for a limited number of proteins.[32] Knowledge-based energy functions were demonstrated
to perform better,[35] and among them, the
TASSER approach ranked as the top method for automated protein structure
prediction in the latest CASP experiments for three-dimensional structure
prediction.[36,37] TASSER and its development, I-TASSER
gateway, which involves template threading, structural fragment assembly,
model refinement, and structure-based protein function annotation,
indicate that ab initio and template-based modelling
may successfully combine.[38] To date, the
only in silico analysis of LARGE1, confined to the
study of LARGE1 domain 2, is that of Bhattacharya et al.[39]In this study, we present a three-dimensional
model for both domain
1 and 2 of LARGE1 and analyze their structural features using the
best performing methods for fold recognition and domain boundary prediction.[40,41] The generated three-dimensional structures were refined, and the
best reliable models were selected and subjected to molecular dynamics
(MD) simulations. MD calculations are useful tools for evaluating
the stability of predicted domains as well as the effect of a pathological
mutation,[42−44] and we employed them in order to provide insight
into the outcome of the S331F mutation found in a patient affected
by congenital muscular dystrophy with eye and brain involvement.[26]
Materials and Methods
Functional Domains Prediction
and Models Building
The
756 amino acid long sequence of LARGE1 from Mus musculus was retrieved from the UniProt database (http://www.uniprot.org/)[45] (entry code Q9Z1M7), and the domain analysis
was performed using the Conserved Domain Database (CDD) server.[46] Following a successful modelling strategy,[47] two different protein-modelling approaches were
employed to build the molecular models of the identified LARGE1 protein
domains: HHPRED[48] in combination with MODELLER[49] and the I-TASSER server.[50] The web-service HHpred, available at https://toolkit.tuebingen.mpg.de/tools/hhpred, offers a threading approach that uses the hidden Markov models[51] and was employed to align the sequences and
search for suitable template structures. The PDB70 profile database
was used for the template search, and the crystallographic structure
showing the best HHpred probability score, the largest coverage, and
the best resolution was chosen as a template. The resulting alignment
was submitted to the program MODELLER v 9.15 as implemented in Discovery
Studio 2016 (Dassault Systèmes BIOVIA, Discovery Studio Modelling
Environment, Release 2016, San Diego: Dassault Systèmes, 2016)
to build the three-dimensional structure of the identified LARGE1
domains. Fifty models with a degree of optimization scored as “high”
were generated by the “Build Homology Model” protocol
of Discovery Studio and ranked by the PDF total energy. The best-ranked
models were then submitted to the “Loop Refinement”
protocol of the program generating five models for each loop, with
a “high” optimization level. The refined models with
the lowest PDF total energy score were chosen as the final models.
For comparison, the I-TASSER web-service was also used, a meta-server
that automatically employs ten threading algorithms in combination
with ab initio modelling to build the tertiary structure
of a protein as well as replica-exchange Monte Carlo dynamics (REMD)
simulations for the atomic-level refinement. PROCHECK[52] and ProSA-Web[53] were employed
to evaluate the quality of the model. The structure of the mutant
S331F of LARGE1 domain 1 was carried out with the “Build Mutant”
protocol of the BIOVIA Discovery Studio suite (Dassault Systèmes
BIOVIA, Discovery Studio Modelling Environment, Release 2016, San
Diego: Dassault Systèmes, 2016). Fifty models were generated
with a “high” optimization level and with no restraints
and evaluated based on their PDF total energy and DOPE score.
Molecular
Dynamics Simulations of WT and of the S331F Mutant
Molecular
dynamics simulations were executed using the version
4.8 of Desmond (D. E. Shaw Research, New York) employing the OPLS2005
force field.[54] An all-hydrogen model of
the proteins was first generated using the Protein Preparation Wizard
tool[55] available in the Maestro software
(Schrödinger, LLC, New York, NY, 2017), for hydrogen insertion
and force field atom-types and partial charges assignment. The systems
for the simulations were built with the “System Builder”
tool of the Maestro suite, drawing a triclinic box around the domain
1 of LARGE1 with a distance buffer of 10 Å between the protein
and the side of the box, and filling it with SPC water molecules.
A 0.15 mol/L NaCl salt concentration was added, and additional Cl– ions were added to neutralize the system (Table ). The resulting systems
both consisted of a 71 × 76 × 71 Å-sized box containing
a total of about 30000 atoms, among which ∼8800 were water
molecules and ∼4590 were protein atoms (Table ).
Table 1
Details of the Starting
Structures
for MD Simulations
system property
WT
S331F mutant
number of atoms
30,017
30,963
protein residues
276
276
protein atoms
4585
4594
water molecules
8773
8752
Cl– ions
31
31
Na+ ions
24
24
Mn2+ ion
1
1
size
(Å)
71.2 × 66.4 × 70.5
71.2 × 66.4 × 70.5
ligand atoms
55
55
protein charge
+7
+7
ligand charge
–2
–2
The two systems were first minimized in order to relax the molecules
into a local energy minimum and to remove the steric clashes, using
the default protocol consisting of an initial steepest descent phase
followed by a minimization with the LBFGS method, until convergence.
The systems were then equilibrated using the Desmond standard NPT
relaxation protocol with default parameters. We performed duplicate
MD simulations of 500 ns each, using different random seeds for the
assignment of the initial velocities, at a constant pressure of 1
atm and 300 K, saving the energies and the trajectories every 20 ps.
The Martyna–Tobias–Klein method was used for pressure
coupling, and the Nose–Hoover thermostat was employed for the
temperature bath, using the default settings for both, while a cut-off
of 9 Å was used for the non-bond interactions. The analysis of
trajectories was performed with the “simulation event analysis”
tool of Maestro (Maestro-Desmond Interoperability Tools, Schrödinger,
New York, NY, 2017), and the solvent accessible surface area (SASA)
over the simulation time was calculated with VMD.[56] Discovery Studio (Dassault Systèmes BIOVIA, Discovery
Studio, 2018, San Diego: Dassault Systèmes, 2018), Maestro,
and VMD were used for the visual inspection of the three-dimensional
structures and for the simulations results. The convergence of the
MD simulations was assessed by calculating the root mean square inner
product (RMSIP) using Bio3D library as implemented in the version
3.5.2 of the R package.[57,58]The protein motion
essential for the biological function of LARGE1
was analyzed by principal component analysis (PCA)-based dimensionality
reduction of the atomic fluctuations of the Cα atoms in the
simulated trajectory. The Bio3D library,[58] as implemented in the version 3.5.2 of the R package, was employed
to perform and graphically represent the PCA of the WT and the S331F
mutant trajectories. The conformations from the last 210 ns replica
MD simulations were used for the wild-type (WT) and mutant enzymes.
All the calculations were carried out employing an NVIDIA M4000 GPU
and an Intel Xeon X5660 processor, on a HPZ820 workstation running
Linux Centos 7 operating system. FoldX v5.0 was used to estimate the
impact of the S331F mutation on protein stability (http://foldxsuite.crg.eu).[59] The algorithm calculates the free energy of
folding of the WT and the mutated protein and estimates whether the
mutation has a destabilizing (ΔΔG >
0)
or stabilizing (ΔΔG < 0) effect.
Cloning and Expression of LARGE1 Domain 1 in Escherichia
coli
The nucleotide sequence
of LARGE1 domain 1 (residues 138–413) from M.
musculus was synthesized and optimized for expression
in E. coli by Invitrogen (GeneArt gene
synthesis). The synthetic gene was cloned with extremities 5′BamHI-3′EcoRI
downstream the thioredoxin gene in the vector pHisTrx, for expression
in E. coli as a fusion product; the
expression vector thus obtained was transformed into E. coli BL21 (DE3). A single colony from the transformants
was picked and grown o.n. in LB + Amp 100 mg/L, then diluted 100-fold
in fresh LB + Amp 100 mg/mL, and grown at 30 °C shaking at 220
rpm until OD600 ≈ 0.8; the expression was induced by adding
IPTG to 0.5 mM final concentration. Protein production was checked
on single aliquots collected at specific times after induction by
lysing the cell pellets and running them on sodium dodecyl sulfate-polyacrylamide
gel electrophoresis (SDS-PAGE). The bulk of cells was collected 3
h after induction, and the cell pellet was resuspended and lysed;
the soluble proteins were separated from the insoluble ones by high-speed
centrifugation and finally checked on SDS-PAGE in order to identify
which fraction contained the fusion product. To increase the solubility
of LARGE1 domain 1, the same construct was used to transform Arctic
Xpress E. coli BL21 (DE3) competent
cells (Agilent Technologies) following the manufacturer instructions.
A single colony from the transformants was picked and grown o.n. in
LB containing ampicillin 100 mg/L and gentamicin 20 mg/L, then diluted
200× in fresh LB containing no selection antibiotic, and grown
at 30 °C shaking at 220 rpm for 3 h. The expression was then
induced by adding IPTG to 1 mM final concentration and growing the
cells for additional 24 h at 12 °C shaking at 200 rpm. Cells
were harvested, resuspended in binding buffer (20 mM Tris HCl, 0.5
M NaCl, 15 mM imidazole, 2 mM Mn2+), and lysed in a cell
disruptor (Constant Systems, model Z plus 1.1 KW). The soluble and
insoluble fractions were separated by centrifugation, and the supernatant
was loaded on a 5 mL His trap Ni2+ column equilibrated
in binding buffer, for isolation of the His-tagged fusion product
by binding and imidazole elution. Expression and purification outcomes
were checked on SDS-PAGE.
Results and Discussion
Molecular
Modelling
A schematic summarizing the workflow
of our molecular modelling approach is provided in Figure .
Figure 2
Schematic of the methodology
employed for modelling LARGE1 domains
1 and 2.
Schematic of the methodology
employed for modelling LARGE1 domains
1 and 2.
Molecular Modelling of
LARGE1 Domain 1
We decided to
use the murine LARGE1 sequence for molecular modelling due to the
very high degree of sequence similarity between murine and human variants
(100% sequence identity for domain 1 and 98.5 for domain 2, Supporting Information, Figure S1). It is worth
noting that we have recently shown that the X-ray structures of murine
and human N-terminal of α-dystroglycan (displaying the same
degree of sequence similarity found between human and murine LARGE1)
are identical.[60] As for the human point
mutation of LARGE1 that we have introduced and analyzed in the model
and in agreement with the above, all the residues in the region specifically
perturbed by the mutation S331F in murine LARGE1 are identical to
those in its human counterpart.The xylosyltransferase and glucuronyltransferase
activities of LARGE1 have been attributed to two different domains
of the protein.[18] Indeed, the identification
of functional domains by the CDD server revealed the presence of two
domains, namely, domain 1 (residues 138–413) and domain 2 (residues
473–742), both predicted to belong to two glycosyltransferase
families (family 8 and 49, respectively), in agreement with the data
reported in the literature.[18,61] The absence of a suitable
template structure in the Protein Data Bank for homology modelling
of the whole-length protein led us to generate three-dimensional model
structures of the two identified domains individually. The HHpred
analysis on domain 1 (residues 138–413) identified three potential
templates in the Protein Data Bank: the crystal structure of a protein
belonging to the glycosyltransferase family 8 from Anaerococcus prevotii (PDB code 3TZT(62)), the crystal structure of the galactosyltransferase LgtC
from Neisseria meningitidis (PDB code 1G9R(63)), and the crystal structure of the xyloside xylosyltransferase
1 from M. musculus (PDB code 4WMA(64)). Amongst these, 1G9R (2 Å resolution, 19% sequence identity) was selected
as the template structure because it exhibited the lowest number of
missing residues. The HHpred alignment was then used by MODELLER for
model building. Structural models were generated, and the highest-scoring
model (lowest PDF total energy) was selected for further analysis.
The quality of the model was evaluated considering the overall stereochemistry
by PROCHECK, and the resulting Ramachandran plot showed that 95.7%
of residues was located in the most favorable regions (core and allowed),
whereas the 3.9% was located within the generously allowed region
and only the 0.4%, confined to Lys 386, in the disallowed region.
The interaction energy of each residue with the structural surrounding
environment, calculated with PROSA-Web, gave a global Z-score of −5.6 (276 residues), which falls well within the
accepted range for proteins of the same size and indicates an overall
good model quality (Supporting Information, Figure S2A). The PROSA-Web local quality analysis of the model,
represented in the energy profile plot, shows that all the residues,
except those in regions 265–285, have negative scores, where
the more negative is the score, the more correctly modeled is the
region (Supporting Information, Figure
S2B).For comparison, an alternative approach was used and the
model
of domain 1 was also generated on the I-TASSER server, which is based
on a combination of multiple-threading alignments, iterative template
fragment assembly simulations, and ab initio modelling
algorithms. The template search stage of the I-TASSER procedure identified
the crystal structures of the galactosyltransferase LgtC from N. meningitidis (PDB entry codes: 1G9R,[63]1GA8,[63] and 1SS9(65)) of a glycosyltransferase
family 8 from A. prevotii (PDB code 3TZT(62)) and of the xyloside-α-1,3-xylosyltransferase 1 (XXYLT1)
from Homo sapiens (PDB code 4WM0(64)) as the best templates. In agreement with the HHpred results,
all the templates detected by I-TASSER displayed the typical GT-A
fold, consisting of two α/β/α domains with a continuous
central β-sheet,[66] all belonging
to family 8. Identified templates share sequence identity with the
target sequence ranging from 19 to 24% (Supporting Information, Table S1). Five I-TASSER models were obtained,
and the best scoring model displayed a TM-score of 0.79 ± 0.09,
an estimated root-mean-square deviation (rmsd) of 4.8 ± 3.1 Å
and, on a scale of accuracy ranging from −5 (lowest) to 2 (highest),
a C-score value of 0.57. The model structure obtained
from the I-TASSER procedure featured a lower percentage of residues
(87%) in the core and allowed regions of the Ramachandran plot and
6.7 and 5.9% residues in the generously allowed and disallowed regions,
respectively. The ProSA-web local quality analysis of this model reported
negative energy values for all the residues, with the exception of
the region 271–296, and the global quality analysis (276 residues)
gave a Z-score of −6.92, which falls within
the range of the experimentally resolved structures of the same size
(Supporting Information, Figure S3A,B).
The models generated by I-TASSER and HHpred were submitted to the
COFACTOR software,[67] which identifies the
most structurally similar enzymes in the Protein Data Bank, performing
a local and global structure match. This analysis showed that the
two generated models share the highest structural similarity with
the crystallographic structure of the glycosyltransferase LgtC from N. meningitidis (PDB code number 1SS9), displaying a TM-score
of 0.89 and 0.92 for the I-TASSER and HHpred model, respectively,
as well as a Cα-rmsd lower than 2 and a sequence coverage of
94.6% for both the model structures. Given that a TM-score greater
than 0.7 indicates that the structures have a 90% probability of being
in the same topology family,[68] the two
models of domain 1 can thus confidently be assigned to the GT-A glycosyltransferase
family. For comparison, the calculated Cα-rmsd values between
the MODELLER and I-TASSER LARGE1 models are 2.3 Å (domain 1)
and 4.1 Å (domain 2), indicative of similar models. According
to the validation results, the model structure obtained by the combined
HHpred/MODELLER approach was selected as the best model for LARGE1
domain 1 for further analysis. The model structure of domain 1 consists
of a central seven-stranded mainly parallel β-sheet (Figure A), sandwiched between
14 α-helices and a small β-sheet, which is conserved in
all the proteins with a GT-A fold (Figure A,B).
Figure 3
Structure and topology of LARGE1 domain
1. (A) Topology diagram
of the domains. Helices are shown as red spheres, and β-strands
are shown as yellow triangles. (B) Structural model represented as
a cartoon, and the secondary structure is colored (helices in red,
β-strands in yellow, and random coils in magenta). Individual
side chains are represented in green as balls and sticks.
Structure and topology of LARGE1 domain
1. (A) Topology diagram
of the domains. Helices are shown as red spheres, and β-strands
are shown as yellow triangles. (B) Structural model represented as
a cartoon, and the secondary structure is colored (helices in red,
β-strands in yellow, and random coils in magenta). Individual
side chains are represented in green as balls and sticks.On one side of the β-sheet, helix-C contacts the β1
and β2 strands, whereas helices G, J, and K contact the rest
of the β-sheet. On the other side, a small β-sheet (strands
β5 and β9) is located between helices A, B, and N that
are in contact with the β1, β2, and β3 strands,
whereas helices D, L, M, and E are in contact with the β6 and
β8 strands (Figure ). As found in other GT-A folded proteins, the small β-sheet
formed by two antiparallel strands (β5 and β9) is linked
to the β4 strand by a loop (L4) that contains a conserved DXD
motif likely to bind the manganese cation (Mn2+)[69] responsible for the catalytic activity of domain
1.[18,22] The Mn2+ ion has a crucial role
in the catalytic activity of LARGE1, and it is coordinated by the
DXD motif residues as well as by the α- and β-phosphate
moieties of the substrate.[63] In the predicted
model structure of LARGE1 domain 1, the Mn2+ ion is coordinated
by the NE2 atom of His380 of the β9 strand and by the carboxylate
oxygen atoms of Asp242 and Asp244 belonging to the β4−β5
linker loop L4 (Figure B). Analogous interactions are established by the Mn2+ ion in the neighbor crystallographic structure of LgtC from N. meningitidis (1G9R) where the cation is coordinated by His244,
Asp103, and Asp105, respectively (ref (63), Figure ).
Figure 4
Structure-based sequence alignment of LARGE1 domain 1 and LgtC
from N. meningitidis. Numbering is
relative to the sequence of LARGE1, and the secondary structure elements
of LgtC are based on Persson et al.[63] The
figure was prepared using the ESPript server (http://espript.ibcp.fr). Identical
residues are shown in white on red, and homologous residues are in
red letters. Secondary structure details are represented (helices:
squiggles, beta strands: arrows, and turns: TT letters).
Structure-based sequence alignment of LARGE1 domain 1 and LgtC
from N. meningitidis. Numbering is
relative to the sequence of LARGE1, and the secondary structure elements
of LgtC are based on Persson et al.[63] The
figure was prepared using the ESPript server (http://espript.ibcp.fr). Identical
residues are shown in white on red, and homologous residues are in
red letters. Secondary structure details are represented (helices:
squiggles, beta strands: arrows, and turns: TT letters).Full inspection of the structural alignment reveals that
all major
secondary structure elements of the model of LARGE1 domain 1 superimpose
almost exactly with those of the LgtC glycosyl transferase, with only
few exceptions related to the helical content (Figure ).
Molecular Modelling of LARGE1 Domain 2
An analogous
analysis was performed for LARGE1 domain 2. The top-ranking templates
identified with a 99.7% probability by the template search methods
of HHpred were all N-acetylgalactosaminyl transferases,
namely, 1XHB,[70]6E4R,[71]6H0B,[72]6IWR,[73] and 2D7I.[74]1XHB, the crystal structure
of the UDP-GalNAc: polypeptide α-N-acetylgalactosaminyltransferase-T1
from M. musculus, was selected as the
best template because it exhibited the highest P-value
(2.0 × 10–19), highest probability (99.7%),
and a sequence identity of 10% with the target. The stereochemical
quality of the model with the lowest PDF value, generated by MODELLER
on the basis of the HHpred alignment, was evaluated using PROCHECK.
The Ramachandran plot calculation revealed that 91.5% of the residues
are found in the most favorable regions (core and allowed), 3.7% in
the generously allowed region, and the remaining 4.9% in the disallowed
region. ProSA-Web analysis of the model (270 residues) revealed a Z-score value of the target protein of −2.69, that
is, within the range of the experimental protein structures of the
same size, which indicates an acceptable overall quality of the model
structure. However, the ProSA-Web analysis revealed that about 50%
of the residues display positive values of interaction energy, suggesting
errors in some regions of the model (Supporting Information, Figure S4A,B). The crystallographic templates
found by I-TASSER to be suitable for a threading-ab initio model were the crystal structure of chondroitin polymerase from E. coli (PDB code 2Z86(75)), the structure
of the human UDP-GalNAc:polypeptide α-N-acetylgalactosaminyltransferase-T2
(PDB codes 2FFU,[76]5AJO[77] and 4D0T(78)), and the crystal structure of the human pp-GalNAc-T10
(pdb code 2D7I(74)). Notably, the PDB entry 4D0T was also used as
a template for LARGE1 domain 2 model building in the work published
by Bhattacharya.[39] Sequence identities
between identified templates and LARGE1 domain 2 fall in the range
17–21% (see Supporting Information, Table S2, for details). The top scoring model of domain 2 produced
by I-TASSER had a C-score value equal to −0.82, an estimated
TM-score of 0.61 ± 0.14 Å and an estimated rmsd of 7.8 ±
4.4 Å. The closest structural analogue to the model of domain
2, found by I-TASSER in the Protein Data Bank is 2FFU, with a TM-score
of 0.87 (rmsd 1.61 Å), where a TM-score higher than 0.5 generally
indicates the same fold in SCOP/CATH. In agreement with the study
of Bhattacharya and colleagues,[39] the crystal
structure with PDB code 4D0T, that is, the human glycosyltransferase GalNAc-T2
in complex with UDP-GalNAc, the EA2 peptide, and manganese ion, was
also found to have a close structural similarity to the model of LARGE1
domain 2 (TM-score of 0.84 and rmsd of 1.66 Å). The analysis
of the stereochemical quality of this model, assessed with PROCHECK,
showed that 95.5%, 1.6, and 2.8% of the residues are located within
the core and allowed, generously allowed, and disallowed regions of
the Ramachandran plot, respectively. The overall model quality shows
a ProSA-web Z-score of −7.2, in agreement
with the Z-scores calculated for the experimentally
resolved structures in the Protein Data Bank (Supporting Information Figure S5A). The ProSA-web server showed
that the interaction energy of each residue with the rest of the protein
has negative values, apart from the residues within the two small
regions 584–590 and 597–601 (Supporting Information, Figure S5B). Comparison of the results obtained
using the two modelling approaches led to the selection of the structural
model built with I-TASSER as the best model of LARGE1 domain 2 (Figure ).
Figure 5
Structure and topology
of LARGE1 domain 2. (A) Topology diagram
of the domains. Helices are shown as red spheres, and β-strands
are shown as yellow triangles. (B) Structural model is represented
as a cartoon, and the secondary structure is colored (helices in red,
β-strands in green, and random coils in magenta). Individual
side chains are represented as green balls and sticks.
Structure and topology
of LARGE1 domain 2. (A) Topology diagram
of the domains. Helices are shown as red spheres, and β-strands
are shown as yellow triangles. (B) Structural model is represented
as a cartoon, and the secondary structure is colored (helices in red,
β-strands in green, and random coils in magenta). Individual
side chains are represented as green balls and sticks.Although the I-TASSER server performed better than HHpred
in terms
of quality of the final model, the results from both methods agree
on pointing toward domain 2 to be classified as an enzyme of the GT-A
family, with a Rossman-like fold consisting of a two sandwiched β-sheets
interposed between four α-helices on both sides (Figure A). Interestingly, although
the fold, especially the arrangement of the β-sheets, is conserved
between the two domains of LARGE1, the three-dimensional model of
domain 2 shows a lower number of strands with respect to domain 1
(eight vs nine) (Figures A and 5A). Strand β8 undergoes
a distortion at the level of Pro699 that makes its C-terminal portion
parallel to the β6 strand and its N-terminal portion antiparallel
to the β5 strand, thus creating the double β-sheet structure
shown in Figure B.Interestingly, the third DXD motif, that in domain 2 is DID (563–565),
essential for the glucuronyltransferase activity of LARGE1,[18,22] is located in a small unstructured turn between the strands β4
and β5, and it is structurally aligned with the DXH motif of
the structural neighbor UDP-GalNAc:polypeptide α-N-acetylgalactosaminyltransferase-T2 from H. sapiens. The turn region is located deep into the binding cleft of the enzyme,
with the two aspartates (Asp563 and Asp565) pointing toward the solvent
and the isoleucine (Ile564) buried in the protein core, indicating
an orientation that could favor the coordination of the carboxyl groups
of the aspartates to a metal ion cofactor such as Mn2+[79] (Figure B). It is noteworthy that His705, in the β8 strand,
is conserved in 2FFU (His359, see sequence alignment in Figure ) and that in the crystal structure, this
residue coordinates the metal ion.
Figure 6
Structure-based sequence alignment of
LARGE1 domain 2 and UDP-GalNAc:Polypeptide
α-N-acetylgalactosaminyltransferase-2. The
figure was prepared using the ESPript server (http://espript.ibcp.fr). Identical
residues are shown in white on red, and homologous residues are in
red letters. Secondary structure details are represented (helices:
squiggles, beta strands: arrows, and turns: TT letters).
Structure-based sequence alignment of
LARGE1 domain 2 and UDP-GalNAc:Polypeptide
α-N-acetylgalactosaminyltransferase-2. The
figure was prepared using the ESPript server (http://espript.ibcp.fr). Identical
residues are shown in white on red, and homologous residues are in
red letters. Secondary structure details are represented (helices:
squiggles, beta strands: arrows, and turns: TT letters).His705 in the model of domain 2 is in close contact with
the aspartates
563 and 565 (distance shorter than 4 Å), suggesting an involvement
of this basic residue in the catalytic cluster (Figure B).
Effects of the S331F Mutation on the Structure
and Dynamics
of LARGE1 Domain 1
The mutation of the LARGE1 residue Ser331
to Phe has been known for some time to cause a congenital muscular
dystrophy phenotype characterized by eye malformations and mental
retardation.[26] According to our in silico generated structure of domain 1, Ser331 is located
in a random coil region (residues 325–333) between helices
I and J, on the rim of the active-site cleft and close to Asn213,
which belongs to the coil region between strand β3 and helix
C (residues 200–219) (Figure B). Molecular dynamics simulations of the WT and of
the S331F mutant were carried out to explore the conformational changes
that occur upon S331F replacement and to assess its structural and
dynamics effects on the enzyme function, with the final aim of understanding
the molecular mechanism at the basis of the pathological alterations
induced by this mutation. Two different MD replicates were performed
for both WT and S331F mutant. The stability of the two systems has
been evaluated by measuring the rmsd of the alpha carbons with respect
to their positions at time 0, over the simulation time. After an equilibration
of 40 ns, the rmsd reached a plateau with a stable value of 4.4 ±
0.2 and of 4.5 ± 0.2 Å for WT and S311F, respectively (Figure A), until the end
of the simulations. An analysis of the Cα atom root mean square
fluctuation (RMSF) with respect to their average positions calculated
over the last 200 ns was then performed, in order to identify the
regions of the protein whose movements are mostly affected by the
replacement of Ser311 with a Phe residue (Figure B). The results indicate that most of the
residues share similar fluctuations. The differences in RMSF between
the two simulated systems are shown in Figure C, which reveals larger fluctuations from
the averaged MD conformations in the unstructured regions spanning
residues 178–180, 290–292, 324, 367–368, and
410–413.
Figure 7
Analysis of MD trajectories. (A) Cα-rmsd from the
starting
structure. (B) Residue-based Cα-RMSF relative to the average
structure. (C) ΔRMSF: difference between the RMSF of the mutant
S331F and that of the WT for each residue. A positive value indicates
larger fluctuations for the mutant, whereas a negative value means
larger fluctuations for the WT. Plots relative to the WT system are
colored in blue and those relative to the S331F mutant in orange.
Analysis of MD trajectories. (A) Cα-rmsd from the
starting
structure. (B) Residue-based Cα-RMSF relative to the average
structure. (C) ΔRMSF: difference between the RMSF of the mutant
S331F and that of the WT for each residue. A positive value indicates
larger fluctuations for the mutant, whereas a negative value means
larger fluctuations for the WT. Plots relative to the WT system are
colored in blue and those relative to the S331F mutant in orange.Although several residues were found to display
significant differences
in RMSF upon mutation, the important catalytic residues such as the
two DXD motifs and His380 were not affected in their atomic fluctuations.
An alteration of total, hydrophobic, and hydrophilic SASAs in the
mutant protein with respect to the WT was also observed. A representative
plot of SASA for all-atom, hydrophobic, and polar residues against
time is shown in Figure .
Figure 8
Analysis of MD trajectories. Time series of the SASA of LARGE1
domain 1. The evolution of the total, hydrophobic, and hydrophilic
SASA over the simulation time is shown in panels (A–C), respectively.
(D) Percentage of hydrophobic SASA (hyd-SASA) and hydrophilic SASA
(phil-SASA) with respect to the total SASA. Each time-series refers
to both the simulation replicas. For color code, see Figure .
Analysis of MD trajectories. Time series of the SASA of LARGE1
domain 1. The evolution of the total, hydrophobic, and hydrophilic
SASA over the simulation time is shown in panels (A–C), respectively.
(D) Percentage of hydrophobic SASA (hyd-SASA) and hydrophilic SASA
(phil-SASA) with respect to the total SASA. Each time-series refers
to both the simulation replicas. For color code, see Figure .Compared to the WT, the S331F mutant showed higher all-atom SASAs,
which converged to an average value of 15145 ± 248 Å2 (14777 ± 327 Å2 in the WT protein).
Notably, the hydrophobic SASA of the mutant increased about 12.7%
over the simulation time with respect to the WT protein (5115 ±
265 and 4538 ± 166 Å2, respectively) (Figure ), a feature that
could significantly affect the interactions at the protein surface.In order to check the convergence of the MD simulations, we have
performed the RMSIP analysis on two equilibrated regions of the trajectories.
The obtained high RMSIP values of 0.80 (replica#1), 0.82 (replica#2)
and 0.82 (replica#1), 0.83 (replica#2) for the WT and mutant proteins,
respectively, are indicative of an adequate level of convergence.The local conformational changes that occur upon S331F mutation
have been inspected by analyzing the non-bonding interactions established
by Ser/Phe331. In the WT simulations, Ser331 is hydrogen bonded, by
means of its OG and O backbone oxygen atoms, to Asn213 (O backbone
oxygen, ND2 and OD1 atoms), whereas in the mutant, Phe331 moves away
from Asn213, thus causing a conformational change in loops 200–219
and 325–333 (Figures and 10). All plots of the duplicate
trajectories were similar and are shown in the Supporting Information (Figures S6–S8).
Figure 9
Representative structures
of the simulated LARGE1 WT and S331F
mutant. The structurally closest frame to the averaged structure from
the MD simulations (obtained selecting the frame with the lowest Cα-rmsd
with respect to the average Cα-coordinates calculated over the
equilibrated trajectory) is reported for each simulated system. Panel
(A) shows the average structures from the WT simulation, and panel
(B) shows the average structures from the S331F mutant simulation.
The backbone is represented as tube colored in grey, the atomic details
are represented as sticks colored by atom type, and the black dashed
line represents the hydrogen bond interaction.
Figure 10
Analysis
of MD trajectories. Frequency of the hydrogen bonds (%)
between Ser331 and Asn213 in domain 1 WT (A) and S331F (B). Time evolution
of the distances between the geometric centroids of the Phe331 and
Asn213 side chain atoms (C) and between the O backbone atoms of Phe331
and Asn213 (black and grey, respectively).
Representative structures
of the simulated LARGE1 WT and S331F
mutant. The structurally closest frame to the averaged structure from
the MD simulations (obtained selecting the frame with the lowest Cα-rmsd
with respect to the average Cα-coordinates calculated over the
equilibrated trajectory) is reported for each simulated system. Panel
(A) shows the average structures from the WT simulation, and panel
(B) shows the average structures from the S331F mutant simulation.
The backbone is represented as tube colored in grey, the atomic details
are represented as sticks colored by atom type, and the black dashed
line represents the hydrogen bond interaction.Analysis
of MD trajectories. Frequency of the hydrogen bonds (%)
between Ser331 and Asn213 in domain 1 WT (A) and S331F (B). Time evolution
of the distances between the geometric centroids of the Phe331 and
Asn213 side chain atoms (C) and between the O backbone atoms of Phe331
and Asn213 (black and grey, respectively).A weak destabilizing effect of the S331F mutation (ΔΔG = 0.21 kcal/mol) was also calculated by FoldX. This ΔΔG value, which cannot be directly related to a change in
enzyme function,[80] suggests that a phenylalanine
in position 331 is not fully compatible with the surrounding chemical
environment, as already observed in MD. Time evolution of the secondary
structure elements along the MD simulation was also analyzed, showing
the stability of the β-strand elements during the entire MD
simulations and the structural transition of few α-helices into
310-helices (Supporting Information, Figure S9).In order to probe the changes in conformational
dynamics induced
by the S331F mutation, analysis of the MD trajectory by PCA using
the Ca atoms was also performed. Results indicate that the first twenty
principal components (PCs) account for 90.7% (WT) and 92.8% (mutant)
of the motions observed in the last 400 ns of the trajectories, whereas
the first three PCs account for the 74.5% (WT) and the 82.3% (mutant)
motions (Supporting Information, Figure
S10A). The trace values of the diagonalized covariance matrix were
found to be 1706.9 Å2 (WT) and 2145.8 Å2 (mutant), indicating an increased flexibility of S331F in the collective
motion as compared to the WT (Supporting Information, Figure S10A,B). Analysis of the contribution of each residue to
the first principal component revealed that the Ser 331 to Phe mutation
alters the protein motion in the loop regions surrounding the residue
331. Notably, the peaks corresponding to residues 211–216 and
282–291 are shifted backward (to 201–210 and 276–288,
respectively) in the mutated protein (Supporting Information, Figure S10C). This analysis suggests that the
S331F mutation introduces local differences in the motion of the active
site rim of domain 1, which might affect the interaction with the
substrate.
Cloning and Expression of LARGE1 Domain 1
in E. coli
A series of first
attempts at heterologous
expression of domain 1 as revealed by our modelling results were carried
out. The DNA sequence coding for the region comprising residues 138–413
of M. musculus LARGE1 was custom-synthesized
and optimized for the expression in E. coli. The synthetic gene thus obtained was cloned into the pHisTrx vector
for expression downstream the fusion partner His6-tagged
thioredoxin and subsequently transformed into E. coli strain BL21 (DE3). Although preliminary, the evidence that the protein
can be recovered in the soluble form upon heterologous expression
in the presence of helper proteins such as the chaperonin GroEL (see Supporting Information, Figures S11–S13)
reinforce the notion that the subdomain 1 represents an autonomous
folding unit, as suggested by our modelling results.Further
experiments are granted aimed at producing quantitative amounts of
domain 1 for future biochemical and structural studies. Recombinant
expression of individual LARGE1 domains 1 and 2, as well of the two
domains together in a full-length LARGE1 construct, would allow a
thorough analysis of the enzymatic proprieties and activity of this
glycosyltransferase. The two catalytic domains of LARGE1 are connected
by a flexible linker (ranging from amino acid 414 to 472) that due
to its nature could not be accurately modeled. A very interesting
question is whether the two domains establish long-range interactions
and work somehow cooperatively during the sequential synthesis of
each added disaccharide block (i.e. the final product
of every full LARGE1 enzymatic cycle) or rather function in an independent
fashion. Whether the binding of each substrate to each catalytic domain
induces any allosteric long-range effects influencing the enzymatic
behavior of the other catalytic domain remains to be determined at
the current stage. Noteworthy, also the N-terminal region of α-dystroglycan[81] harbors two different domains (IG1 and S6),
and it is intriguing to foresee a scenario in which the two DG domains
would be involved reciprocally in chaperoning the activity of the
two catalytic domains of LARGE1.
Conclusions
Our
combined use of molecular modelling approaches allowed for
a three-dimensional description of the two domains respectively responsible
for the xylosyltransferase and glucuronyltransferase activities of
LARGE1. This work characterized the three DXD motifs, each likely
to bind a manganese cation (Mn2+), that constitute the
active sites for the glycosyltransferase activities of the two domains,[22,69,82] thus offering some initial insight
into the possible mechanism of binding of the sugar donor and acceptor.
Molecular dynamic simulations of the S331F mutant, carried out in
comparison with the WT counterpart, points to a more flexible and
less stable enzyme, as suggested by an increased RMSF and all-atom
and hydrophobic SASA parameters, compared to the WT protein. Indeed,
a reduced stability could be at the basis of the poorer enzymatic
activity of this variant of LARGE. This would eventually lead to the
hypoglycosylation of α-DG observed in a patient affected by
a severe form of congenital muscular dystrophy with anomalies of eye
and brain linked to this missense mutation.[26]Our study demonstrates that in the absence of experimental
structural
data on LARGE1, template-based three-dimensional modelling of the
enzyme can offer some insight into its overall structural features,
and the analysis of its dynamic behavior by in silico methods can help to elucidate the molecular mechanism underneath
the diseases linked to missense mutations of the enzyme. In addition,
our modelling study might pave the road for the analysis of potential
complexes formed by LARGE1 and the α-dystroglycan N-terminal
domain (αDGN), whose structure we have previously solved.[81]
Authors: E Mercuri; S Messina; C Bruno; M Mora; E Pegoraro; G P Comi; A D'Amico; C Aiello; R Biancheri; A Berardinelli; P Boffi; D Cassandrini; A Laverda; M Moggio; L Morandi; I Moroni; M Pane; R Pezzani; A Pichiecchio; A Pini; C Minetti; T Mongini; E Mottarelli; E Ricci; A Ruggieri; S Saredi; C Scuderi; A Tessa; A Toscano; G Tortorella; C P Trevisan; C Uggetti; G Vasco; F M Santorelli; E Bertini Journal: Neurology Date: 2009-03-18 Impact factor: 9.910
Authors: Emma Clement; Eugenio Mercuri; Caroline Godfrey; Janine Smith; Stephanie Robb; Maria Kinali; Volker Straub; Kate Bushby; Adnan Manzur; Beril Talim; Frances Cowan; Ros Quinlivan; Andrea Klein; Cheryl Longman; Robert McWilliam; Haluk Topaloglu; Rachael Mein; Stephen Abbs; Kathryn North; A James Barkovich; Mary Rutherford; Francesco Muntoni Journal: Ann Neurol Date: 2008-11 Impact factor: 10.422
Authors: Takako Yoshida-Moriguchi; Tobias Willer; Mary E Anderson; David Venzke; Tamieka Whyte; Francesco Muntoni; Hane Lee; Stanley F Nelson; Liping Yu; Kevin P Campbell Journal: Science Date: 2013-08-08 Impact factor: 47.728
Authors: David C Briggs; Takako Yoshida-Moriguchi; Tianqing Zheng; David Venzke; Mary E Anderson; Andrea Strazzulli; Marco Moracci; Liping Yu; Erhard Hohenester; Kevin P Campbell Journal: Nat Chem Biol Date: 2016-08-15 Impact factor: 15.040
Authors: Bakar A Hassan; Jozafina Milicaj; Carlos Andres Ramirez-Mondragon; Yuk Yin Sham; Erika A Taylor Journal: J Chem Inf Model Date: 2021-12-30 Impact factor: 4.956
Authors: G Diane Shelton; Katie M Minor; Ling T Guo; Steven G Friedenberg; Jonah N Cullen; Jeffrey M Hord; David Venzke; Mary E Anderson; Megan Devereaux; Sally J Prouty; Caryl Handelman; Kevin P Campbell; James R Mickelson Journal: Neuromuscul Disord Date: 2021-07-28 Impact factor: 4.296
Authors: Carlos A Ramirez-Mondragon; Megin E Nguyen; Jozafina Milicaj; Bakar A Hassan; Frank J Tucci; Ramaiah Muthyala; Jiali Gao; Erika A Taylor; Yuk Y Sham Journal: Int J Mol Sci Date: 2021-04-28 Impact factor: 5.923