Zhi Ping1, Feng Zhou1, Xin Lin1, Haibin Su1,2. 1. Institute of Advanced Studies, Nanyang Technological University, 60 Nanyang View, 639673 Singapore. 2. Department of Chemistry, The Hong Kong University of Science and Technology, Hong Kong, China.
Abstract
Aquaporins are transmembrane channel proteins with key function being transportation of water or other small substrates. Escherichia coli Aqp Z transports water molecules only, whereas Glp F is permeable to glycerol. It is intriguing to explore the possibility to induce glycerol permeability in Aqp Z by targeted mutations. The Aqp Z mutants with mutated selectivity filter (SF) residues exhibit poor permeability for both glycerol and water. For addressing the complexity of protein systems, pair correlation information in protein sequence analyses is instructive to identify residues that are coupled by coevolution and motion. In this study, we analyze the correlation between residues and unravel the clustering patterns of coupled residues, beyond SF residues, in aquaglyceroporins (AQGPs). The identified coupled motifs are proposed to be sequenced into aquaporin (Aqp Z) to introduce glycerol permeability. These residues are located in the vicinity of SF region, C-loop, and M6-M7 linkage domain. Significant enlargement of SF pore size of the proposed Aqp Z mutant is observed by an all-atom replica exchange molecular dynamics simulation, which is critical to facilitate considerable glycerol passage as characterized in calculated free-energy landscapes. Clearly, the hidden connections among residues play crucial roles in water/glycerol selectivity. In contrast, single-site mutation-based scheme may even lead to undesirable effects in AQGPs, such as the blocking of water transportation by aromatic π-stacked gate. As demonstrated in this work, the pair correlation analysis guided rational mutagenesis provides a feasible strategy to modulate proteins' functions.
Aquaporins are transmembrane channel proteins with key function being transportation of water or other small substrates. Escherichia coli Aqp Z transports water molecules only, whereas Glp F is permeable to glycerol. It is intriguing to explore the possibility to induce glycerol permeability in Aqp Z by targeted mutations. The Aqp Z mutants with mutated selectivity filter (SF) residues exhibit poor permeability for both glycerol and water. For addressing the complexity of protein systems, pair correlation information in protein sequence analyses is instructive to identify residues that are coupled by coevolution and motion. In this study, we analyze the correlation between residues and unravel the clustering patterns of coupled residues, beyond SF residues, in aquaglyceroporins (AQGPs). The identified coupled motifs are proposed to be sequenced into aquaporin (Aqp Z) to introduce glycerol permeability. These residues are located in the vicinity of SF region, C-loop, and M6-M7 linkage domain. Significant enlargement of SF pore size of the proposed Aqp Z mutant is observed by an all-atom replica exchange molecular dynamics simulation, which is critical to facilitate considerable glycerol passage as characterized in calculated free-energy landscapes. Clearly, the hidden connections among residues play crucial roles in water/glycerol selectivity. In contrast, single-site mutation-based scheme may even lead to undesirable effects in AQGPs, such as the blocking of water transportation by aromatic π-stacked gate. As demonstrated in this work, the pair correlation analysis guided rational mutagenesis provides a feasible strategy to modulate proteins' functions.
Aquaporin
is a subfamily of transmembrane channel proteins, which
belongs to the major intrinsic proteins.[1] They widely exist in most of prokaryotic and eukaryotic organisms.[2] The main function of aquaporins is the selective
permeation of water and other small substrate across the biological
membrane. Orthodox aquaporins (referred to asAQPs) are exclusively
permeable to water, whereas the aquaglyceroporins (referred to as
AQGPs) or glycerol facilitator proteins (GLPs) are permeable to both
water and glycerol. Since the determination of the crystal structure
of aquaporin 1 in 2000,[2] more structures
belonging to the aquaporin family have been determined in the last
two decades.[3−6] Aquaporins play crucial roles in water transportation in metabolic
pathways. Particularly, clinical and therapeutic studies have established
the intimate correlation between aquaporin-deficient mutations in
particular sites and various diseases, including cataracts,[7] nephrogenic diabetes insipidus,[8] Alzheimer’s disease,[9,10] and Parkinson’s
disease.[11]Aquaporins have quite
similar structures. They form tetrametric
channels with six transmembrane helices and two half-helices (Figure A). Asn-Pro-Ala (NPA)
motifs located separately in the two half-helices in the middle part
of the channel lumen (M3&M7) are highly conserved in the aquaporin
family. The asparagines flip the orientation of water molecules passing
through by electrostatic forces, which is necessary for breaking the
alternative donor–acceptor arrangement during water transportation.[12,13] The most critical part of orthodox aquaporins for water selectivity
is the ar/R region, commonly referred to as the selectivity filter
(SF, Figure B), which
is the narrowest part of the channel.[12,14] This region,
with the radius slightly smaller than that of a water molecule, usually
consists of four amino acids, including the core residue Arg(R) or
the “gate” of the AQP channel.[15] However, in AQGP, the size of the filter is much larger than glycerol
to enable its needed permeability. Moreover, a “hidden gate”
is formed due to the π-stacking interaction from the introduced
aromatic residues of SF region in aquaporin Z mutant through molecular
dynamics (MD) simulation, which leads to the prohibition of water
passage.[16] Interestingly, both AQP and
AQGP exist in Escherichia coli, termed
as aquaporin Z (Aqp Z)[17] and glycerol facilitator
protein F (Glp F),[18] respectively. Their
three-dimensional (3D) crystal structures are determined in 2002 and
2003 by Stroud et al.[12,19] (PDB ID: Aqp Z—1RC2; Glp F—1LDA), which provide
a solid structural basis to stimulate further endeavor in modifying
aquaporin’s permeability to different substrates by introducing
mutations to the selectivity filter. For instance, mutations in the
SF of Aqp Z was introduced aiming at inducing glycerol permeability
based on the sequence alignment of Aqp Z and Glp F.[20] All three sets of mutations and co-mutations did not yield
the expected result. The radius of SF was even smaller after these
mutations thus caused the decrease of water permeability without even
mentioning glycerol passage. Clearly, besides residues in the vicinity
of SF, there exist important residues, although not directly interacting
with substrates, that have to be taken into considerations to modulate
overall structure and selectivity. In the rational design of protein
structure and function, varying the highly conserved critical residues
intimately related to protein function or structure is a popular strategy
for altering the protein function.[21,22] To account
for the pair correlation among residues explicitly, the interactions
between residues or residue clusters are represented in the context
of complex network[23] with needed high-throughput
protein sequences data[24,25] For instance, statistical coupling
analysis (SCA)[26] method was developed to
characterize coevolving residues through analyzing entropy information
encoded in protein sequences to predict protein structure[27] as well as identify motifs that mediate protein
allosteric communications.[28]
Figure 1
Structures
of Aqp Z and Glp F. (A) Superposition of 3D structure
of Aqp Z (gray) and Glp F (cyan); the two NPA motifs located on M3
and M7 are presented in green. (B) Selectivity filter region of Glp
F, consisting of W48, G191, F200, and R206, with the cross-sectional
area of projected pore of 9.3 Å2. (C) Selectivity
filter region of Aqp Z, consisting of F43, H174, T183, and R189, with
the area of 3.9 Å2.[20]
Structures
of Aqp Z and Glp F. (A) Superposition of 3D structure
of Aqp Z (gray) and Glp F (cyan); the two NPA motifs located on M3
and M7 are presented in green. (B) Selectivity filter region of Glp
F, consisting of W48, G191, F200, and R206, with the cross-sectional
area of projected pore of 9.3 Å2. (C) Selectivity
filter region of Aqp Z, consisting of F43, H174, T183, and R189, with
the area of 3.9 Å2.[20]In this work, we perform SCA to
establish the evolutionary connection
among residues of aquaporin proteins to gain better understanding
of their effects on water/glycerol transport capability at molecular
level, which is an important extension of previous scheme employed
in AQP Z mutants focusing on residues in SF.[20] The identified critical coupled residues and the unraveled hidden
correlation patterns are further analyzed with the aid of 3D structures
to address their roles in modulation of structure and related permeability.
The subsequent in silico study with full atomistic replica exchange
molecular dynamics (REMD)[29,30] simulations is conducted
on the AQP Z mutants to provide direct measurements of residues’
influence of the transport channel. Finally, quantitative free-energy
landscapes along the channel of AQP Z mutants are computed by the
potential of mean force (PMF) approach to evaluate the permeability
of substrate passing through channels.[31,32]
Results and Discussion
A data set of 305 bacterial aquaporin
protein sequences (including
220 aquaglyceroporins and 85 orthodox aquaporins) treated after multiple
sequence alignments was constructed based on previous work by Lin
et al.[33] Each sequence has 192 columns
(positions). For convenience, the positions of these residues are
represented by their position to the corresponding residues in Aqp
Z and Glp F, the two representative structures of AQP and AQGP. A
detailed cross-reference table is provided in the Supporting Information (SI).
Strongly
Coupled Residues in AQPs
To identify the critical positions
within the proteins, we choose
the correlation pairs with the highest SCA scores to establish a much
smaller network (referred to as “SCA network”). For
the AQPs, we select 50 correlated pairs of residues with the highest
SCA scores to build one 23-node network (Figure ). Two residues, Trp(W)14 and Trp209, have
most connections among all of the nodes in this network, indicating
their structural/functional importance.
Figure 2
SCA network formed by
highly coupled critical residues in monomer
and tetramer of AQPs. (A) Sequence alignment of AQPs across different
species with critical residues highlighted in red. (B) SCA network
of the 50 highest scored pairs of residues in Aqp Z. Both Trp14 and
Trp209 have the most connections (labeled by dashed boxes). (C) Tetramer
structure of Aqp Z (PDB ID: 1RC2). Trp209 is located in the peripheral part of the
tetramer acting as an anchor to reduce the mobility of the structure.
Trp14 is located in the region close to the adjacent monomer to facilitate
the formation and stabilization of tetramer.
SCA network formed by
highly coupled critical residues in monomer
and tetramer of AQPs. (A) Sequence alignment of AQPs across different
species with critical residues highlighted in red. (B) SCA network
of the 50 highest scored pairs of residues in Aqp Z. Both Trp14 and
Trp209 have the most connections (labeled by dashed boxes). (C) Tetramer
structure of Aqp Z (PDB ID: 1RC2). Trp209 is located in the peripheral part of the
tetramer acting as an anchor to reduce the mobility of the structure.
Trp14 is located in the region close to the adjacent monomer to facilitate
the formation and stabilization of tetramer.Trp14 in monomeric Aqp Z is located in the peripheral domain,
relatively
distant from the selectivity filter and the channel. Importantly,
the Trp14 in M1 of each monomer is the only residue close to M5 and
M6 of its adjacent monomer (<5.5 Å) (Figure ), which suggests that this residue contributes
to the formation and stabilization of tetramer.[34] In Glp F, the corresponding residue is Leu(L)20, a nonpolar
aliphatic residue, instead of an aromatic residue, which could make
the structural fluctuation responsible for permeability more significant
in Glp F. Trp209 in Aqp Z is also located in the peripheral domain
of protein. Unlike Trp14, Trp209 is close to the NPA motif, with the
shortest distance being ∼3 Å. This enables the NPA motif
to be stabilized by forming a T-shaped π-stacking with Pro(P)187.[35] Furthermore, Trp209 is close to Arg189 of the
SF, and in contact with Thr(T)191 of M7 to support the structural
integrity of SF. In Glp F, the corresponding residue is Leu237, followed
by residue Pro236. The smaller size of Leu, compared to Trp residue,
generates flexibility after the tight turn introduced by Pro236.
Strongly Coupled Residues in AQGPs
Similarly,
SCA of AQGPs’ sequences yields 50 correlated pairs
of residues with the highest SCA scores to form a network with 29
nodes. After applying the force-directed graph presentation algorithm,[36] this network exhibits a very different pattern
with three clusters and two separated correlated pairs (Figure ) compared to the AQP SCA network,
which has a hierarchical pattern of two “hub” positions
and other barely connected positions. The network patterns of AQPs
and AQGPs reflect the evolutionary and function variation of the two
subfamilies to some extent. The dominance of hierarchical property
of AQPs network suggests the high-level specialty in terms of water
permeability. In contrast, multiple substrates are allowed to pass
through the channel of AQGPs. Thus, mutations of coupled residues
are desirable to accommodate the need for multifunctional needs.
Figure 3
SCA network
formed by highly coupled critical residues identified
in monomer of AQGPs. (A) Sequence alignment of AQGPs between different
species, with critical residues highlighted in red. (B) Top 50 scored
pairs of residues in Glp F. Three major clusters appear in this network.
Residues close to the SF (distance <10 Å) are distributed
in the yellow-colored cluster. (C) Critical residues in Glp F 3D structure
(PDB ID: 1LDA). Channel lumen is represented by blue dots.
SCA network
formed by highly coupled critical residues identified
in monomer of AQGPs. (A) Sequence alignment of AQGPs between different
species, with critical residues highlighted in red. (B) Top 50 scored
pairs of residues in Glp F. Three major clusters appear in this network.
Residues close to the SF (distance <10 Å) are distributed
in the yellow-colored cluster. (C) Critical residues in Glp F 3D structure
(PDB ID: 1LDA). Channel lumen is represented by blue dots.All of the constitutes of SF in Glp F, i.e., Trp48, Gly(G)191,
Phe(F)200, and Arg206, are distributed inside the three clusters.
Other identified important residues include Phe135, Thr137, Asp(D)207,
Lys(K)211, and Pro236. Phe135 and Thr137 belong to the highly conserved
FST triad motif located on the C-loop near the SF.[37] Mutation of this motif results in a loss of water permeability.[38] Asp207, Lys211, and Pro236 are reported to affect
the physiochemical properties in AQP and AQGP.[39] Interestingly, additional mutation from Trp237 to Leu237,
together with Tyr236 to Pro236, in the insect aquaporin results in
glycerol transportation accompanied with the loss of water permeability.[40] Because these residues are structurally close
to Arg206, they are likely involved in the gating mechanism.[41] The interactions between them and nearby residues
are responsible for determining Arg side-chain orientation and the
channel’s status (open or close) in the course of substrate
transportation.[20] Considering Lys83 in
Glp F located in the junction between M3 and M4, which is relatively
far from the SF, its role could be related to the structural dynamics
participating in water/glycerol transportation. Here, we focus on
15 residues that are located within three amino acids away (∼10
Å) from the SF. These residues are located in various regions
in the protein, as presented in Table .
Table 1
Critical Residues with Pronounced
Effect on SF Pore Size in Glp F and Aqp Z
Important Residues in the Vicinity of SF
The structural configuration of Asp207 located in M7 is in charge
of the open or closed state of the channel. The strong electrostatic
attractive force from Asp207 side chains and the C loop keeps the
orientation of Arg206 side chain in the extracellular part of the
protein. The orientation of the Arg206 side chain makes the Glp F
pore size larger compared to that of Aqp Z. Furthermore, the carboxyl
group of Asp207 provides electrostatic attractive force to pull the
amide group of Arg206 toward the M7, leading to the further expansion
of SF in Glp F (Figure A). In Aqp Z, the corresponding residue is noncharged Ser(S)190.
Thus, the outward pulling effect is negligible compared to Glp F.
Oliva et al. reported the correlation between electrostatic patterns
and substrate permeability of AQPs and AQGPs.[42] Replacing Ser with Asp in AQP Z can act as a “second shell”
to compensate positive charge of Arg, thus maintaining a neutral electrostatic
profile of the channel through making two salt bridges with the adjacent
Arg and Lys.[42,43] The mutation-caused change in
the electrostatic profile from AQP to AQGP further facilitates the
glycerol transportation in Aqp Z mutants.
Figure 4
Structural comparison
of Glp F and Aqp Z. (A) Asp207 side chain
electrostatically attracts Arg206 of SF in Glp F, leading the side
chain of Arg206 to point to the extracellular side (left), whereas
the corresponding residue Ser190 side chain is located relatively
far from Arg189 in Aqp Z. The side chain of Arg189 then reduces the
pore size of SF (right). The channel lumen is represented by light
blue-colored dots. (B) Ala201 (spheres) interacts with Glu152 instead
of Asp207 in Glp F (left), whereas the corresponding residue Ser184
(spheres) has polar contact with Glu138 as well as Ser190 in Aqp Z,
leading to a more compact channel (right). (C) Superposition of structures
in helix M7 in Glp F (green) and Aqp Z (magenta) with critical residues
highlighted. In Glp F, K211 attracts D207 with the rigidity provided
by Pro210.
Structural comparison
of Glp F and Aqp Z. (A) Asp207 side chain
electrostatically attracts Arg206 of SF in Glp F, leading the side
chain of Arg206 to point to the extracellular side (left), whereas
the corresponding residue Ser190 side chain is located relatively
far from Arg189 in Aqp Z. The side chain of Arg189 then reduces the
pore size of SF (right). The channel lumen is represented by light
blue-colored dots. (B) Ala201 (spheres) interacts with Glu152 instead
of Asp207 in Glp F (left), whereas the corresponding residue Ser184
(spheres) has polar contact with Glu138as well asSer190 in Aqp Z,
leading to a more compact channel (right). (C) Superposition of structures
in helix M7 in Glp F (green) and Aqp Z (magenta) with critical residues
highlighted. In Glp F, K211 attracts D207 with the rigidity provided
by Pro210.Ala(A)201 in Glp F, with the correspondent
residue asSer184 in
Aqp Z, is located adjacent to Phe200, which is one constituent of
SF. In both Glp F and AQP Z, these residues have close contact with
the same conserved residue, a Glu(E) located in M5 (Figure B). In Aqp Z, Ser184 is also
in contact with Ser190, located in M7, making nearby parts of the
helix more tightly packed so as to decrease the pore size of SF. Furthermore,
as a hydrogen bond acceptor, Ser184 plays an important role in water
transportation.[6] In contrast, Ala201 in
Glp F has a much weaker interaction with water. Therefore, water permeability
in Aqp Z is much higher than that in Glp F, owing to the efficient
water permeation pathway facilitated by a series of hydrogen bonds
in Aqp Z. With reference to Ala194 and Gln(Q)197 located in the vicinity
of SF in Aqp Z, both Lys211 and Ala214 in Glp F are remarkable in
varying the local interactions. For instance, Lys211 interacts with
Asp207 via electronic attractive force, which orients Asp carboxyl
group pointing to the extracellular side. Considering Asp207 attracts
the amide group of Arg206 in Glp F, the orientation of Arg206 is influenced
by Lys211 through Asp207. To balance the need of large space from
substituting Lys for Ala194 in Aqp Z, complimentary replacement of
Gln197 by Ala is invoked to minimize the local stress (Figure C). We note that Pro210 in
Glp F is located near Asp207 and Lys211 (Figure C). Compared to the counterpart Val(V)193
in Aqp Z, Pro makes a tight turn for the helix. Proline often plays
a role as the disruptor of protein regular secondary structures due
to its cyclic side chain, which locks one of the dihedral angles at
−60°. The unique rigidity of Pro makes the nearby structures
less flexible and partially sets the orientations of Asp207 and Lys211.
Hence, the gate of SF prefers to choose an open state due to the strong
electrostatic attraction. The rigidity in the local structure further
stabilizes this open state.
Residues
in C-Loop and M6–M7 Linker
The C-loop and M6–M7
linker play important roles in the
delivery of water/glycerol.[20] In a recent
study, single mutation (Glu125) on the C-loop in Plasmodium AQGP leads to the disability of water conduction without affecting
the glycerol permeability.[44] Another study
suggests that residues in the C-loop promote water conduction through
hydrogen bonding.[45] The sequence alignment
of the C-loop in aquaporin is complex due to the large variety of
amino acids among different proteins. In this work, the average C-loop
length in AQPs is five amino acids shorter than that found in AQGPs.
This suggests that in AQPs the structure becomes more compact due
to the shortness of the C-loop compared to the structure in AQGPs.
Hence, the pore size of AQPs is smaller than that of AQGPs. The SCA
network of AQGPs suggests that Phe and Thr are highly conserved in
the FST triad[14] and are coupled to Trp48
and Asp207. Phe135 contributes to the gating mechanism by stabilizing
the orientation of the side chain of Arg206. The interaction between
Thr137 and Asp207 side chain (distance ∼2.6 Å) makes the
Asp207 side chain pointing toward the extracellular side. All of these
delicate interactions determine the orientation of the Arg206 side
chain (Figure A).
As the counterparts of Ala117 and Asn(N)119 in Glp F, Phe135 and Thr137
in Aqp Z have polar interactions with Arg189 (Figure B). However, these interactions are not strong
enough to pull the Arg side chain upward. Leu234 in Glp F, located
near the C-loop region of the structure, reduces the bulkiness in
this region. This provides more flexibility for residues located in
the C-loop by reducing the π-stacking interactions between the
surrounding aromatic amino acids. In contrast, in Aqp Z, with phenol-contained
residues (Phe207, Phe208, and Trp209) nearby, the corresponding residue,
Trp206, makes the structure more compact near the C-loop, as well
as in SF, and limits the size of permeable molecules (Figure C,D).
Figure 5
Interactions of C-loop
residues with Arg gate of SF in Glp F and
Aqp Z. (A) In Glp F, Phe135 attracts R206 to make the carboxyl group
pointing upward to the extracellular direction. Thr137 is not in close
contact with R206, instead, couples with D207 (left corner), which
in turn interacts with R206. (B) In Aqp Z, both Ala117 and Asn 119
interact with R189, and no upward-pointing orientation of R189 side
is observed. Leu234 in Glp F (C) instead of Trp206 in Aqp Z (D) provides
more flexibility for the C-loop (orange).
Interactions of C-loop
residues with Arg gate of SF in Glp F and
Aqp Z. (A) In Glp F, Phe135 attracts R206 to make the carboxyl group
pointing upward to the extracellular direction. Thr137 is not in close
contact with R206, instead, couples with D207 (left corner), which
in turn interacts with R206. (B) In Aqp Z, both Ala117 and Asn 119
interact with R189, and no upward-pointing orientation of R189 side
is observed. Leu234 in Glp F (C) instead of Trp206 in Aqp Z (D) provides
more flexibility for the C-loop (orange).Two Gly residues in Glp F, Gly195 and Gly199, are strongly
coupled
to Asp207 in the SCA network of AQGPs (Figure B). Followed by Phe200 of SF, these two residues
are located in the M6–M7 linker region in aquaporin protein
family. These two Gly residues in Glp F, instead of Ile(I) and Asp
in Aqp Z, make the structure of M6–M7 linker more flexible
(Figure A).[20] The flexibility disrupts the π–π-stacking
interactions between Phe200 and Trp48, which is considered to be another
hidden gate of this channel.[16] The bulky
M6–M7 linker causes Thr183 to take on a highly constrained
structure, resulting in the strained structure of SF in Aqp Z. Unlike
Asn182 in Aqp Z, which acts as a hydrogen bond acceptor in the permeation
pathway,[16] Gly199 could not form hydrogen
bonds with water in Glp F. This further suggests the active role played
by M6–M7 linker in water transportation.
Figure 6
Critical residues located
on M6–M7 linker and M8. Both G195
and G199 in Glp F (A) reduce the bulkiness of the turning region,
thus providing more flexibility to the structure compared to I178
and N182 in Aqp Z (B). P236 in Glp F (C) increases the flexibility
of the surrounding residues compared to F208 in Aqp Z (D).
Critical residues located
on M6–M7 linker and M8. Both G195
and G199 in Glp F (A) reduce the bulkiness of the turning region,
thus providing more flexibility to the structure compared to I178
and N182 in Aqp Z (B). P236 in Glp F (C) increases the flexibility
of the surrounding residues compared to F208 in Aqp Z (D).Finally, in the starting region of M8, Pro236 in
Glp F breaks the
regular secondary structure and thus increases the flexibility of
nearby residues because of the specific structure of proline, which
locks one of the dihedral angles at −60° due to its cyclic
side chain (Figure C). However, the equivalent residue Phe208 in Aqp Z does not have
such effect. Instead, Phe208 increases the bulkiness of local environment
together with nearby aromatic residues and reduces the flexibility
of this region.
Replica Exchange Molecular
Dynamics (REMD)
Simulations and Potential of Mean Force (PMF) Calculations of Mutated
Aqp Z and Glp F Structures
In this work, the local molecular
structural variation of the protein channel, not the tetrameric protein
complex, is our focus. Thus, the monomer embedded in dipalmitoylphosphatidylcholine
(DPPC) bilayer is employed in full atomistic REMD simulations to ascertain
the stable structures of the Aqp Z mutants (referred to as mAqpZ)
with the identified critical residues and to provide direct measurements
of residues’ influence on the channel structure. A series of
MD simulations are also performed with the structures of Aqp Z and
Glp F, as well as the Aqp Z mutant with the mutation of three SF residues
(PDB ID: 3NKC). The original structures are crystalline structures obtained from
PDB (PDB ID: Aqp Z—1RC2; Glp F—1LDA). The lumen radii along the channel axis
of these proteins are calculated using HOLE2 (Figure A).[46]
Figure 7
Pore radii
of 3NKC, Aqp
Z, Glp F, and mAqpZ along the channel. (A) Channel radii determined
by HOLE2 for all proteins plotted as a function of positions along
the channel (z direction). Protein 3NKC (blue) mutated only
the SF residues and has the smallest pore size. The mAqpZ (pink) with
identified critical residues exhibits a pore size comparable to the
Glp F channel (red) and larger than that of original Aqp Z (blue).
(B) SF of Glp F (green) and mAqpZ (cyan). The channel sizes are comparable,
and the positions of the SF residues are similar. (C) SF of Aqp Z
(green) and 3NKC (cyan). The SF pore size of 3NKC is even smaller than that of the original
Aqp Z.
Pore radii
of 3NKC, Aqp
Z, Glp F, and mAqpZ along the channel. (A) Channel radii determined
by HOLE2 for all proteins plotted as a function of positions along
the channel (z direction). Protein 3NKC (blue) mutated only
the SF residues and has the smallest pore size. The mAqpZ (pink) with
identified critical residues exhibits a pore size comparable to the
Glp F channel (red) and larger than that of original Aqp Z (blue).
(B) SF of Glp F (green) and mAqpZ (cyan). The channel sizes are comparable,
and the positions of the SF residues are similar. (C) SF of Aqp Z
(green) and 3NKC (cyan). The SF pore size of 3NKC is even smaller than that of the original
Aqp Z.The SF pore size (the narrowest
channel radius along the channel)
of the Aqp Z mutant of three SF residues (0.51 Å) is smaller
compared to that of wild-type Aqp Z (0.60 Å), which is consistent
with the results from previous studies.[20,15] Interestingly,
the pore size of mAqpZ (0.98 Å) is much larger than that of wild-type
Aqp Z. In fact, it is closer to the pore size of Glp F (1.23 Å)
with an RMS of 0.41 Å. Furthermore, the locations of SF residues
in mAqpZ and Glp F are in a better agreement (Figure B) than the 3NKC case (Figure C). These results indicate that the critical
residues identified by SCA actively modulate the complex interactions
to affect SF pore size and delicate structural details in the vicinity
of SF.Finally, quantitative free-energy landscapes along the
channel
of AQP Z mutants are computed by the potential of mean force (PMF)
approach to evaluate the permeability of substrate passing through
channels. The PMFs are computed using umbrella sampling and weighted
histogram analysis method (WHAM) to characterize less frequent states
with high energy.[47] In Figure , the potential energies required
for glycerol permeation of Glp F and mAqpZ are ∼10 kcal/mol
(in good agreement with 7.3 kcal/mol by Jensen et al.[48]) and ∼5 kcal/mol, respectively. For water passage,
mAqpZ requires a potential energy of ∼3 kcal/mol compared to
∼8 kcal/mol of Glp F. Besides pore size, the charge[42] as well ashydrogen bond acceptors[12,48] associated with amino acids distributed on and near the channel
of mAqpZ also contribute to the decrease in energy barrier heights
in both water and glycerol conductivity with reference to Glp F (see SI, Sections 4 and 5). Overall, the PMF calculations
suggests that mAqpZ exhibits better permeability of both water and
glycerol than Glp F.
Figure 8
Free energy computed by PMF along the channel (z axis) of Glp F (red) and Aqp Z mutant (pink) of glycerol
and water
passage. The positions of glycerol/water (red and white) in protein
channel (cyan) along z axis are shown below the graph.
For glycerol transportation (A), the potential energy required to
pass through the channel is ∼10 kcal/mol for Glp F, but only
∼5 kcal/mol for mAqpZ. For water transportation, the potential
required to pass through the channel is ∼8 kcal/mol for Glp
F and ∼3 kcal/mol for mAqpZ.
Free energy computed by PMF along the channel (z axis) of Glp F (red) and Aqp Z mutant (pink) of glycerol
and water
passage. The positions of glycerol/water (red and white) in protein
channel (cyan) along z axis are shown below the graph.
For glycerol transportation (A), the potential energy required to
pass through the channel is ∼10 kcal/mol for Glp F, but only
∼5 kcal/mol for mAqpZ. For water transportation, the potential
required to pass through the channel is ∼8 kcal/mol for Glp
F and ∼3 kcal/mol for mAqpZ.
Conclusions
In summary, highly correlated
pairs of residues in bacteria orthodox
aquaporins and aquaglyceroporins are identified from statistical correlation
analysis of bacteria aquaporin protein family sequences. This unravels
“hidden” connections between residues, which although
not directly involved in substrate interactions, contribute to the
functionality, permeability, in the aquaporins and aquaglyceroporins.
A set of coupled mutation sites that contributed to the molecule selectivity
are scrutinized from the detailed interactions based on the 3D structures
of Aqp Z and Glp F. Full atomistic REMD simulations demonstrate the
enlargement of SF with desirable structural arrangement for mAqpZ.
PMF calculations also reveal the better permeability of both water
and glycerol in mAqpZ. Similar techniques combining network analysis
hold great promise in establish relationships between correlated positions
of amino acids and protein function and/or structures of aquaporin
subfamilies and/or other protein families.
Material
and Methods
Statistical Coupling Analysis
The
statistical coupling analysis (SCA) defines the statistical energy to present the coupling between
sites i and j, where f( is the frequency
of
amino acid a at site i, D( is a measure of positional conservation of amino acid a at site j, and C represents the positional correlation between sites i and j, which is a reduced weight matrix.
The higher the statistical energy of two sites, the greater the correlation
between these two sites. An adapted version of the SCA Toolbox distribution,
SCA v3.0, was used for all calculations.[49]
Network Construction and Structural Analysis
The network was built and presented using Cytoscape 2.8.2.[50] It was also used for network construction and
network properties calculations.[50,51] Structures
of E. coli Aqp Z and Glp F were downloaded
from the protein data bank.[52] The structures
were viewed and analyzed using the software PyMOL 1.5.[53]
Molecular Dynamics Simulation
Protein
structures of wild-type E. coli Aqp
Z and Glp F, as well as the Aqp Z mutant with only SF residues mutated,
were downloaded from PDB (PDB ID: Aqp Z—1RC2; Glp F—1LDA; Aqp Z mutant with
SF mutated: 3NKC). The Aqp Z mutants with identified mutated residues were created
from the original Aqp Z structure by manual mutation using PyMOL 1.5.[53] These proteins were solvated in cubic water
system (spc216[54]) with the approximate
simulation box size of 8 nm × 8 nm × 10 nm and embedded
in a lipid bilayer consisting of 116 DPPC molecules. The initial DPPC
bilayer structure was downloaded from the Tieleman group: http://people.ucalgary.ca/~tieleman/download.html. Both normal and REMD simulations were performed with simulation
package GROMACS 4.6.5[55] with OPLS-aa force
field[56] for 1 ns under NPT (T = 300 K, P = 1 atm). The systems were initially
minimized and equilibrated. In all simulations, the periodic boundary
conditions were applied and the particle-mesh Ewald method[57] with a real space cutoff of 0.9 nm was used
for electrostatic potential calculation. The Lennard-Jones interactions
were switched off beyond the range of 1.2 nm. An integration step
of 2 fs was used, and the simulated structures of the system were
recorded every 1 ps.
Replica Exchange Molecular
Dynamics
The conventional simulations were performed at the
temperature of
300 K. The replicas were exchanged every 2 ps. A total number of 16
replicas were used at temperatures from 300 to 500 K.
Potential of Mean Force Calculation
In this set of
simulations, to obtain the frames of different positions
of substrate in the channel, an individual water molecule or glycerol
molecule was placed at the entrance of channel, along the center of
pore coordinates first. Harmonic forces along the channel direction
was applied on the molecules to pull them to pass through respective
protein channel. Trajectories were obtained from these simulations,
and 80 windows (∼1 Å each) were selected for further usage.
For each window, an umbrella sampling simulation was performed to
characterize the free-energy landscape of this window. Each umbrella
sampling simulation was carried out by applying a harmonic restraint
force along the pore coordinate (z axis) with a force
constant of 1000 kJ/mol nm2. After the set of simulation
was done, for each of the simulation, population histograms as a function
of the reaction coordinate were obtained. The WHAM method[47] was utilized for generating a PMF profile.
Authors: Paul Shannon; Andrew Markiel; Owen Ozier; Nitin S Baliga; Jonathan T Wang; Daniel Ramage; Nada Amin; Benno Schwikowski; Trey Ideker Journal: Genome Res Date: 2003-11 Impact factor: 9.043
Authors: David F Savage; Joseph D O'Connell; Larry J W Miercke; Janet Finer-Moore; Robert M Stroud Journal: Proc Natl Acad Sci U S A Date: 2010-09-20 Impact factor: 11.205
Authors: Eric Beitz; Slavica Pavlovic-Djuranovic; Masato Yasui; Peter Agre; Joachim E Schultz Journal: Proc Natl Acad Sci U S A Date: 2004-01-20 Impact factor: 11.205