Literature DB >> 30276247

Free Energy Landscape of the Complete Transport Cycle in a Key Bacterial Transporter.

Balaji Selvam¹, Shriyaa Mittal¹, Diwakar Shukla¹.

Abstract

PepTSo is a proton-coupled bacterial symporter, from the major facilitator superfamily (MFS), which transports di-/tripeptide molecules. The recently obtained crystal structure of PepTSo provides an unprecedented opportunity to gain an understanding of functional insights of the substrate transport mechanism. Binding of the proton and peptide molecule induces conformational changes into occluded (OC) and outward-facing (OF) states, which we are able to characterize using molecular dynamics (MD) simulations. The structural knowledge of the OC and OF state is important to fully understand the major energy barrier associated with the transport cycle. In order to gain functional insight into the interstate dynamics, we performed extensive all atom MD simulations. The Markov state model was constructed to identify the free energy barriers between the states, and kinetic information on intermediate pathways was obtained using the transition pathway theory (TPT). TPT shows that the OF state is obtained by the movement of TM1 and TM7 at the extracellular side approximately 12-16 Å away from each other, and the inward movement of TM4 and TM10 at the intracellular halves to 3-4 Å characterizes the OC state. Helix distance distributions obtained from MD simulations were compared with experimental double electron-electron resonance spectroscopy and were found to be in excellent agreement with previous studies. We also predicted the optimal positions for placement of methane thiosulfonate spin label probes to capture the slowest protein dynamics. Our finding sheds light on the conformational cycle of this key membrane transporter and the functional relationships between the multiple intermediate states.

Entities: Chemical Disease Gene Mutation Species

Year: 2018 PMID： 30276247 PMCID： PMC6161048 DOI： 10.1021/acscentsci.8b00330

Source DB: PubMed Journal: ACS Cent Sci ISSN： 2374-7943 Impact factor: 14.553

Introduction

Understanding the exchange of biological molecules across the phospholipid bilayer is of fundamental interest to the scientific community. The transport of a large variety of molecules is essential for the pathological physiological functions of the cell. In particular, peptide transporter proteins act as carriers that facilitate the transport of small groups of amino acids across the cell membrane. These transporters belong to the peptide transporter (PTR) family which are members of the major facilitator superfamily (MFS) of secondary active membrane transporters.[1] To understand the functional mechanism of MFS transporters, Kaback et al. proposed an alternate access model that involves alternate accessibility to either side of the membrane via distinct inward facing (IF), occluded (OC), and outward facing (OF) conformational states.[2] Another model is the rocker-switch model where the substrate is posited to bind to the center of the transporter, causing rigid body motion of the N and C domains to alternate access between the intracellular and extracellular sides.[3] The third hypothesis is the elevator-like model where the substrate binds to a single domain, locks the pore channel, slides downward, and opens up to release the substrate.[4−6] Even with progress in X-ray crystallographic techniques, atomic level structural information on the distinct states and other intermediate states is still unknown. In humans, peptide transporters PepT1 and PepT2 are expressed in the intestine and kidney and are actively involved in the uptake of dietary peptide molecules.[7] Hence, peptide transporters are considered as crucial targets for drug delivery and a means to improve the pharmacokinetics of drug molecules.[8] Despite the known importance of peptide transporters, structural changes involved in the overall functional dynamics have not been studied comprehensively. In this work, we have chosen a bacterial peptide transporter from Shewanella oneidensis, PepTSo, to understand the conformational dynamics and mechanistic transport in this peptide transporter and by extension of the proton coupled oligopeptide transporter (POT) family. PepTSo is a bacterial POT consisting of 12 transmembrane helices (TMs) divided into N (TM1–TM6) and C (TM7–TM12) terminal domains, each with six helices.[9,10] The N and C domains are connected by two short helices (SHs) that are closely packed to the C-terminal domain. PepTSo uses an inward electrochemical potential as the driving force to transport di-/tripeptide molecules into the cell against their concentration gradient. PepTSo shares close sequence similarity and structural homology with other peptide transporters PepTSt,[11−13] GkPOT,[14] PepTSo2,[15,16] YePEPT,[17] and PepTXc.[18] PepTSo and YePEPT were crystallized in the apo form (other peptide transporters were crystallized as holo structures) in the IF state. Biochemical and biophysical experiments provide insights into the functional behavior of the transporter proteins; dynamics information on the conformational transition between different structural states is missing. The repeat swap method (RSM) was used as an alternative protocol to obtain the OF conformational state of PepTSo.[19] The method involves swapping the N and C domains to create a template, aligning the swaps to the template coordinate file and finally constructing the homology model to obtain the RSM model. However, the some of the limitations of this technique are that it requires careful assessment of the sequences for structural alignments of the internal repeats and cannot provide an array of intermediate states, and no information is gained regarding the dynamics of the process of peptide transport. Limited computational methods have been employed to study the functional mechanisms and conformational dynamics of transporters.[20−23] These methods contributed to the investigation of substrate driven structural changes of transporters, but failed to obtain the kinetics of the crucial conformational transitions which would be critical to identify the transitions among key intermediates states that are essential to the functional mechanism. Evidently, these transporters exhibit large conformational changes, which cannot be understood using the static snapshots provided by X-ray crystallographic studies, and hence the mechanism of peptide transport is still unknown. Molecular Dynamics (MD) simulations are an appealing methodology to determine the conformational changes and provide a mechanistic insight into PepTSo. In our study, we performed atomistic MD simulations over a duration of ∼54 μs using an adaptive sampling approach (see Methods). Further we constructed a Markov state model (MSM), a technique that has been used extensively to study conformational diversity in biological systems.[24−28] Using MSMs we were able to combine a large number of shorter simulations and perform efficient analysis on the huge amount of the data. To estimate the time scales of the transitions between the intermediates in the transport cycle, trajectories were reconstructed using the kinetic Monte Carlo approach. MD simulations have unraveled the OC and OF states of PepTSo along with other intermediate states that the protein can adopt to enable transport. We compared the helix distance distributions from our MD simulation ensemble to the experimental double electron–electron resonance (DEER) spectroscopy results. To our knowledge, this is the first long time scale MD simulations based thermodynamics and kinetics study that captures key intermediate states and the functional landscape of PepTSo using unbiased computational methods.

Results

MD Simulations Reveal the Conformational Substates of PepTSo.

The IF state crystal structure of PepTSo (PDB: 4UVM(10)) was used as the starting conformation for simulations. The MD simulations were conducted using an adaptive sampling approach (see Methods). From our simulation data, we were successfully able to identify the functionally important intermediate states. The free energy landscape projected onto the extracellular and intracellular distances, weighted by the MSM equilibrium probability distribution, reveals the energy minima corresponding to conformations states, IF (1), partial IF–OC (2), OC (3), partial OC–OF (4), OF (5), and wide open intermediate states (6 and 7) (Figure ). The extracellular and intracellular side distances determined between the residue pairs Arg32-CZ (TM1)–Asp316-CG (TM7) and Ser131-CO (TM4)–Tyr431-OH (TM10), respectively, show that the helices of the N and C domains are ∼7.0 and ∼10.5 Å apart in the PepTSo crystallized IF state (PDB: 4UVM). We observe that the intracellular distance between TM4 and TM10 may increase up to ∼20 Å, albeit resulting in a high energy unstable state. PepTSo adopts a partial IF-OC state as the intracellular helical distance between TM4–TM10 reduces to 5–6 Å by a movement of both helices inward toward each other. Further, this distance reduces to 3 Å, leading to the OC state. The transition from OC to OF involves two stages: first, the moving apart of TM1 and TM7 at the extracellular side to a distance of 7–8 Å by forming an intermediate partial OC–OF state, and second, extending the distance of the extracellular vestibule of TM1 and TM7 up to 12–16 Å (Figures S1 and S2). The IF, OC, and OF states can be distinguished by passing a spherical probe from one side of the protein to the other, calculated using the HOLE program.[29] The probe radius through the different states is visualized in Figure , and the residues used to determine intracellular and extracellular distances in our study are indicated, as they constrict the IF, OC, and OF states.

Figure 1

Figure 2

Distinct conformational states of PepTSo are visualized by passing a spherical probe from one side of the protein to the other through (A) the crystal structure IF state, and for the MD simulations predicted structures for (B) OC and (C) OF states, calculated using the HOLE program.[29] The gating residues Arg32 (TM1)–Asp316 (TM7) and Ser131 (TM4)–Tyr431 (TM10) that act as bottlenecks for the conformational transition are indicated. (D) The pore radius along the protein for the three conformational states.

Conformational landscape of PepTSo. The conformational landscape is generated using the extracellular and intracellular side distances measured between atom pairs Arg32-CZ (TM1)–Asp316-CG (TM7) and Ser131-CO (TM4)–Tyr431-OH (TM10), respectively. The conformational states are depicted as IF (1), partial IF–OC (2), OC (3), partial OC–OF (4), OF (5), and wide open states (6 and 7). The black dots indicate the PepTSo crystal structures available in the Protein Data Bank. Distinct conformational states of PepTSo are visualized by passing a spherical probe from one side of the protein to the other through (A) the crystal structure IF state, and for the MD simulations predicted structures for (B) OC and (C) OF states, calculated using the HOLE program.[29] The gating residues Arg32 (TM1)–Asp316 (TM7) and Ser131 (TM4)–Tyr431 (TM10) that act as bottlenecks for the conformational transition are indicated. (D) The pore radius along the protein for the three conformational states. To determine the conformational exchange between these intermediate states, we performed transition pathway theory (TPT) analysis on MSM (see Methods).[30] All high-ranked paths undergo a transition from the IF to OF via the OC state and other intermediate states. The free energy barrier for the transition from IF to OC via the partial IF–OC state is ∼2–2.5 kcal/mol, and for subsequent transition to OF through states partial OC–OF or a wide open state is ∼1.5–2 kcal/mol. Hence, the total free energy barrier of ∼4 kcal/mol was determined for one complete cycle from IF to OF at equilibrium.

Structural Characteristics of the OC State

The predicted OC state shows large deviations in the intracellular side of N and C domains as compared to the IF state. The intracellular halves of the helices rearrange in several positions as compared to the crystal structure—TM1 moves inward and closely packs with TM3 and TM4; TM10 and TM11 undergo inward movement and interact with TM2 and TM4, respectively; TM4 and TM5 are tilted ∼5° and ∼10°, respectively, and move inward closer to the center of the transporter. TM2 becomes more straight compared to the kinked helix in the IF crystal structure. On the extracellular half, TM7 and TM8 are rotated by ∼15° and ∼8°, respectively, and move closer to TM1; TM9 and TM10 move up to 6 and 11 Å outward and form extensive contacts with the loops joining TM7 and TM8 (Figure S3). The OC state is stabilized by extensive intramolecular hydrogen bonds between the N and C domains. At the extracellular side, the residues Asn33 (TM1), Ser165 (TM5), Ser320 (TM7), and Gln341 (TM8) form a hydrogen bond network that acts as the lid by packing the helices close to each other (Figure S4). The closure of the pore channel is further stabilized by Arg32 (TM1)–Asp316 (TM7) salt bridge interaction on this end of the transporter. Asp316 of this ionic residue pair is known to play a major role in proton driven peptide transport.[9,12,19,31] The Glu419 (TM10) interaction with Lys318 (TM7), Thr416 (TM10), and Asn344 (TM8) stabilizes the conformation of TM7 and 8. The Glu419 (TM10) is a conserved residue in the POT family, and its mutation to Gln results in loss of proton driven peptide transport.[12] Another conserved motif, ExxERxxY on TM1 (Figure S5), and its interaction with Lys127 (TM4) play a crucial role in peptide transport, and its mutation abolishes the transport function.[12] On the intracellular side, the Pro127 (TM4) introduces a helix kink which facilitates the inward movement of TM4 and TM5. The glycine residues (Gly418 (TM10), Gly426 (TM10), and Gly440 (TM11)) increase the structural flexibility allowing the helices to twist. Tyr431-OH (TM10) establishes a hydrogen bond contact with Ser131-CO (TM4) and stabilizes the OC state (Figure S4). Our predictions conform with the hypothesis of Stelzl et al. for the extracellular and intracellular gating residues that are critical for the conformational switch and functional mechanism of transporters.[32] A comparison of the predicted OC state has been made with the multidrug transporter EmrD (PDB: 2GFP,[33]Figures S6 and S7) and xylose transporter XylE (PDB: 4GBY,[34]Figures S8 and S9), MFS family OC structures in the Protein Data Bank. The observed gating residues Thr119 (TM4)–Phe311 (TM10) in EmrD OC and Met149(TM4)–Ser396 (TM10) in XylE OC state are comparable to PepTSo.

Structural Characteristics of the OF State

The predicted OF structure reveals dramatic rearrangements at the extracellular side compared to IF (crystal structure) and OC (predicted structure) whereas only subtle changes at the intracellular side compared to OC (Figure S10). TM7 rotates ∼7° as compared to IF, and Asp316 (TM10) moves ∼12 Å away from Arg32 (TM1) forming a new contact with Asn454 (TM11), which then interacts with His61 (TM2). Our predictions are consistent with the structure proposed by Parker et al.[18] His57 (His61 in PepTSo) is involved in proton driven peptide transport in PepT1.[35−37] Thus, the extracellular binding partners move away from each other and increase the channel viability and hence adopt the OF state. TM7 is stabilized by an interaction between Lys318 (TM7) and Glu419 (TM10) at the extracellular part of PepTSo. The ExxERxxxY motif forms similar contacts to those seen in the OC state. The kink caused by Pro71 in TM2 results in slight bending of the helix allowing Trp76-O (TM2) to form a contact with Thr441 (TM11). The interaction of Tyr431-OH (TM10) with Ser131-CO and Gly135-O on TM4 stabilizes the intracellular half of the OF state. Our predicted OF structure from MD simulations shows good agreement with the fucose transporter (FucP, PDB: 3O7Q), which was crystallized in the OF state,[38] with an RMSD of 2.9 Å (Figures S11 and S12). The intermediate states 6 and 7 are predicted to be wide open states that may also lead to transitions to the OF state. The PepTSo OF state predicted using RSM[10] was compared to the predicted OF structure and has an RMSD of 3.6 Å (Figures S13 and S14).

Switching of Gating Residues Determines Conformational Changes in PepTSo.

From TPT, we identified that the interaction between Arg32-CZ (TM1)–Asp316-CG (TM7) acts as the gating bottleneck on the extracellular side of the transporter. In the IF crystal structure (PDB: 4UVM), the distance between these atoms is 7.2 Å. However, our simulations reveal that these residues come closer and form a salt bridge interaction locking TM1 and TM7 to characterize an IF state (Figure A). Previous studies have also shown that the IF state is stabilized by the formation of this ionic lock.[12] On the intracellular side, Newstead et al. observed that the hydrophobic gate between Phe150-CB (TM5) and Met443-CB (TM11) form the intracellular gate to characterize the OC state.[9] We find these residues determine only the partial IF–OC state (Figure B, Figures S15 and S16). After PepTSo adopts the partial IF–OC state, an additional hydrogen bond between Ser131-CO (TM4) and Tyr431-OH (TM10) results in complete transition to the OC state (Figure C).

Figure 3

PepTSo is shown in the center with TM 1, 2, 4, 7, 8, and 10 in green, magenta, blue, cyan, yellow, and orange, respectively. The short helices (SH) which join N and C domains are not shown for clarity. The remaining six helices are in gray. All state conformations are the extracellular view of the protein. (A) IF state is stabilized by ion lock at the extracellular side, and the intracellular half is wide open. (B) Inward movement of TM4 and TM10 determines the partial IF–OC state. (C) Further inward movement leads to formation of hydrogen bond interaction between Tyr431-Ser131 in the OC state. (D) Gating residues at the intracellular side weaken the extracellular interaction to form a partial OC–OF state. (E) Helices TM1 and TM7 move far away to 15 Å in the OF state. Next, the partial OC–OF state is obtained as the distance between Arg32-CZ (TM1) and Asp316-CG (TM7) increases up to 7–8 Å. The weakening of ionic interaction between Arg32 (TM1) and Asp316 (TM7) leads Asp316 to form a new polar contact with Tyr68 (TM2). Further, it establishes a contact with Asn454 (TM11), which results in complete loss of interaction with Arg32 (Figure D). The extracellular halves of the N and C terminal domains start moving further away from each other (on the extracellular side) and results in loss of Tyr68 interaction with Asp316. His61 (TM2), Asn454 (TM11), and Asp316 (TM7) form a hydrogen bond triad, thus adopting the final OF state (Figure E). The predicted OF state reveals that the distance between the residue pairs Arg32 (TM1) and Asp316 (TM7) may increase up to 12–16 Å to recognize the substrate molecule and initiate the transport cycle. Using a kinetic Monte Carlo reconstructed trajectory of length 25 μs from the constructed data set, the mean passage time for the transition from IF to OF, via OC was found to be ∼1 μs (Figure S17). This synthetic trajectory reveals that the rates of transitions between alternate opening of extracellular and intracellular gates shows faster dynamics as compared to previous studies where they have obtained the transition from IF to partial IF–OC and to OF in ∼0.6 μs.[10,18]

Comparison of Predicted OC and OF Structures with Experiments

DEER spectroscopy is a biophysical technique that allows determination of residue pair distance distributions between two cysteins that have been modified via site-directed spin labeling (SDSL). The technique has been used for widely used for the study of membrane proteins[39] where multiple peaks in the distributions indicate a diverse conformational composition of the protein during the experiment. Atom-pair distances can also be obtained from the ensemble information from MD simulations and hence provide an avenue for comparison with DEER spectroscopy distance distributions. We compared the helix distance distribution measurements obtained from our simulations with experimental data (Figures and S18) by constructing augmented Markov models (AMMs)[40,41] and the RotamerConvolveMD Python package[32] that maps a rotamer library of the spin labels on the residues to estimate the distribution (see Methods). The overall distributions were found to be in good agreement with SDSL DEER experiments.[10] For the residues pairs Ser141–Ser432, Ser141–Met438, Ile47–Val330, and Arg201–Glu364 the distance distribution ranges are larger for the simulation data than the experimental observations. This increase in distance distribution shows that the transporter may adopt a wide range of flexibility in order to transport diverse substrate molecules. Further, we compared the experimental data with distance distributions obtained from representative structures of IF, OC, and OF states individually (Figure S19) as well as other intermediate states (Figure S20) in our simulations data, to characterize the distance ranges spanned by the predicted free energy minima.

Figure 4

MD simulation predicted DEER distance distribution ranges (green) are compared to the experimental DEER distance distribution range (blue).

MD simulation predicted DEER distance distribution ranges (green) are compared to the experimental DEER distance distribution range (blue). Crucial to this experimental technique is the choice of residue pairs to label. For a 500 residue protein, like PepTSo, one must choose a set of labeled residues pairs from ∼125 000 possibilities, which leads to combinatorial choices counting to a billion. We developed a method that could predict the smallest set of residue pairs for spin labeling that best captures the slow dynamics in a membrane protein using the large amount of MD simulation data generated and a hyper parameter optimization strategy for MSMs.[42−44] To obtain an optimal set of residue pairs, we assigned an “optimal probe score” to each of the possible residue pair sets (see Methods).[45] A higher score indicates slower kinetics of the MSM based on the chosen residue pair distances and hence can be considered as a better model to understand the underlying dynamics of the system. The residues pairs that are high ranked can be used for SDSL DEER spectroscopy experiments in membrane proteins. The probe scores for all constructed MSMs range from 1.24 to 5.89 (Figure A), and these choices vary in hugely in the number of residue–residue distances measured from 1 to 12. The residue pairs chosen for DEER experiments in PepTSo measured 8 residue pair distances with 12 probes which we used to construct an MSM yielding a lower probe score of 5.61 (red line in Figure A,B).[10] The highest ranked choice of distances (Figure C) comprised of six distances, three on the extracellular side of the protein (among residues Ala178, Leu403, and Pro467) and three on the intracellular side (among residues Lys200, Ala437, and Arg374). One of the low ranked (probe score 1.24) sets with two probe positions is shown in Figure S21. It is evident that this choice would not give the necessary dynamic information on the system as, in the case of all MFS transporters, the correlated motions of helices in the N and C domain are important for functional dynamics. The distance between the residues Glu49–Tyr175 (Figure S21A) and Tyr85–Ser141 (Figure S21B) where both the potential probes are on the same domain would not provide any essential kinetic information that is not already captured in a slower MSM.

Figure 5

(A) Cross-validated scores for 4300 possible DEER residue-pair sets. The red line is the score of the MSM using experimental residue distance pairs. (B) The residues used for experimental DEER[10] and (C) highest ranked predicted DEER residue-pairs choices. It is also of note that the sets where we chose to measure distances on either the extracellular side of the protein or the intracellular side are low ranked as compared to cases where we measure distances on both ends (Figure S22). Clearly from our previous observation of the switching of gating residues, understanding the complete dynamics of PepTSo requires residue pair distances on both extracellular and intracellular sides of the protein. For the residue set with the highest score, for the constructed MSM, distance distribution plots have been obtained (Figure S23) intracellular residue pair distances are shifted toward larger values for the IF state, and extracellular distributions are shifted to larger values for the OF state.

Discussion

Our results reveal the conformational changes that characterize various states of PepTSo and transitions between them. Our analysis indicates that hydrogen bonds and hydrophobic and aromatic interactions act as gating mechanisms to stabilize the key functional conformation states. The residue pair Arg32 (TM1)–Asp316 (TM7) forms the salt bridge interaction at the extracellular side and locks the PepTSo in the IF state. We show that the formation of OC state involves two steps, (i) the helices involving residues Phe150 (TM5) and Met443 (TM11) come closer to ∼5–6 Å, and (ii) the following residues from helices TM4 and TM10 form additional hydrogen bonds between Ser131-CO (TM4) and Tyr431-OH (TM10). The OF state is obtained by breakage of ionic lock between Arg32 (TM1)–Asp316 (TM7) and movement of TM1 and TM7 to 12–16 Å away from each other. Water molecules around the lipid bilayer and the transporter protein could also play an important role in the conformational change. Figure S30 shows that water molecules have a higher preference for the phosphate group in the phosphatidylcholine (POPC) lipid head groups as compared to the nitrogen due to the hydrophobicity posed by three methyl groups around it.[46] Lipid bilayers properties in our simulations, area per lipid, and membrane thickness (Figure S31) indicate that the lipid bilayer remains in the same configuration throughout the simulation time. Overall, our predicted values agree well with experimental studies,[47] where the area per lipid value is 68.3 ± 1.5 Å2 and membrane thickness value is 37 Å, although a slight variation in the computed values compared to experimental observations could be due to differences in temperature and physiological conditions. Moreover, there is usually at least one water molecule between the lipid molecules that prevents them from coming close to each other (Figure S32) and hence stabilizing the lipid bilayer. Water molecules are cotransported along with substrate molecules, and this property has been well studied for membrane transporters.[48] We calculated the hydration level in three states and observed the fluctuation of number of permeating water molecules (Figure S33). A large number of water molecules were noticed for the OF state compared to OC and IF states. We also compared the water molecules in the IF state with other available crystal structures from the POT transporter family (Figure S34). The water permeation could also be associated with the transitions among the different conformational states of the protein. The comparison of simulation DEER distance distributions exhibit good agreement with experimentally obtained DEER data. Using the Optimal Probe method,[45] we also predict the best DEER SDSL positions for future experiments. We suggest six distance measurements, three on either of the sides of PepTSo which can capture the dynamics involved in the movement of the helices of this protein. In general, this method can be used to predict measurable DEER distances for any protein of interest. PepTSo and other POT family members PepTSt, PepTSo2, PepTXc, and GkPOT have conserved sequence motifs and structural folds, suggesting that the mechanistic basis of substrate transport will be universal in this family. We posit in the OF state the binding of a proton and a peptide molecule should increase the structural plasticity of the extracellular side of the transporter and initiate structural rearrangements of helices. The movements of helices are driven through a network of hydrogen bonding interactions involving TM1, TM2, TM7, and TM11, resulting in the OC state. Tyr68 (TM2) and Asn454 (TM11) act as a key residues that drives Asp316 (TM7) to form a salt bridge with Arg32 (TM1). Biophysical studies also show that Tyr68 is critical for affinity and specificity for peptides.[12] The closure of extracellular part results in formation of the OC state and the peptide molecule moves into the central cavity. The increase in strength of the salt bridge at the extracellular side results in weakening of the intracellular gating residues. Helices TM4, TM5, TM10, and TM11 increase in structural flexibility and thereby determine the functionally important conformational states to allow substrate transport. The proton and substrate translocates to the conserved ExxERxxxY motif and finally leaves the transporter into the cytoplasm of the cell. Our study is a first large-scale simulation of a member of the POT family. It is a first extensive analysis of the diversity of the conformational states of the protein and the many rare intermediate states. However, our study is based on equilibrium simulations in the apo state of PepTSo. The varying dynamics in the presence of the proton and peptide are yet to be demonstrated and understood in great detail. Our study opens new dimensions to obtain a mechanistic understanding into the POT family of proteins and to enable design and transport of peptido-mimetic drugs.

Methods

Molecular Dynamics Simulation

The crystal structure of PepTSo was used as a starting structure for MD simulation. The three-dimensional (3D) coordinates (PDB: 4UVM(10)) were obtained from the Protein Data Bank. The tleap program in AmberTools14[49] was used to build the MD system. The protein was solvated in a phospholipid bilayer (POPC) in an orthorhombic box containing TIP3P water molecules[50] in a periodic box size 98 × 98 × 119 Å3. A salt (NaCl) concentration of 0.15 M was used to neutralize the MD system. All chain termini were capped with neutral acetyl and methylamide groups. The standard protonation states was used for the titratable groups, and the final MD system contained approximately 110 000 atoms. The MD system was energy minimized for 20 000 steps using the conjugate gradient method, slowly heated from 0 to 300 K and equilibrated for 40 ns. The MD simulations were performed in constant NPT conditions at 300 K and 1 atm. The temperature was controlled using a Berendsen thermostat, and the pressure was maintained using a Berendsen barostat.[51] Long range electrostatic interaction was treated with the Particle Mesh Ewald method,[52] and bonds involving hydrogens were constrained using the SHAKE algorithm.[53] The nonbonded distance cutoff was set to 10 Å, and an integration step of 2 fs was used. All simulations were performed using the AMBER FF14SB force field.[54] An adaptive sampling approach was used to select the new starting structures for the subsequent MD runs to enhance the conformational sampling of the free energy landscape. For each round, the previous sampled data were clustered using the K-means algorithm based on the extracellular and intracellular experimental DEER residue pair distances, and the least populated states were chosen to conduct the next round of simulations. The sampling bias introduced in the data set from seeding new trajectories in this manner is eliminated in the way a Markov state model (discussed below) is constructed on the data.[55] Adaptive sampling is a widely used sampling methodology and has been used to predict novel conformations of the proteins, pathways of conformational change, protein folding, and even protein–protein association.[25,26,28,56−58] In addition to unbiased MD simulation data obtained as above, 5 μs of accelerated MD (aMD) simulation were also performed also using the adaptive sampling protocol (Table S1). For aMD, a boost potential (4 kcal/mol) was added to the dihedrals of the protein, and a further boost potential (0.2 kcal/mol) was added to the entire MD system. The integration step chosen is 3 fs.[59] The free energy landscape in shown in Figure S24. The aMD simulation data were clustered, and the starting structures were chosen for classical MD (cMD) to sample the conformational landscape efficiently (Table S2). The final cMD was performed for a total duration of ∼54 μs. Each individual MD trajectory is of ∼34 ns.

Markov State Model

Markov state models (MSM) are a powerful kinetic modeling technique that discretize the landscape explored using MD simulations into microstates and determine the transition probabilities between them. The trajectories are clustered into N microstates based on a geometric criterion. Next, a transition probability matrix, T(τ) is obtained for a lag-time, τ such that p(t + τ) = p(t)T(τ) where p(t) and p(t + τ) are vectors denoting the probability of each of the N states at times t and t + τ. We obtained the dependence of the slowest implied relaxation time-scales of the eigenvectors of T(τ) for multiple values of τ, which provides an estimate of the best Markovian lag-time to build our MSM for analysis. The MSM Builder3.4 Python package[60] was used to build the MSM on the PepTSo trajectory data. The seven transmembrane helical distances and three extracellular and intracellular residue pair distances were chosen as featurization metrics to construct an MSM (Figure S25). Twenty-four nanoseconds was determined to be a Markovian lag time from the implied time-scales plot (Figure S26). The number of clusters was chosen to be 200 as it yielded the highest Generalized Matrix Rayleigh Quotient (GMRQ) score while building multiple MSMs on varying this hyper-parameter (Figure S27).[44] TPT analysis was performed to obtain the top flux pathway (Figures S28 and S29).

Kinetic Monte Carlo

Kinetic Monte Carlo is a method for sampling from a kinetic model, which can be used to create trajectories of state-to-state dynamics. For any chosen initial state i, a transition to any state j from the set of all states in the MSM occurs with probability p from the MSM’s reversible maximum-likelihood transition matrix. This is implemented as follows: (1) generate a pseudorandom number between 0 and 1, (2) take a cumulative sum of p values over all possible j (S = ∑p), and if the pseudorandom number lies between S and S, (3) transition to state j = n + 1. This state, j, is added to the trajectory, and the process is repeated for the desired number of steps.

DEER Distance Distribution Analysis

To validate our predicted structures with the DEER experiments, we constructed an AMM[40] for each experimental DEER distribution as constraints using pyEMMA v2.4 + 936.g26d8e55.[41] The list of constraints is provided in Table S3. We extracted 50 structures from the highest weighted clusters (in the MSM or the AMM) for each of the following conformations: IF, OC, OF, partial IF-OC, partial OC-OF, and two wide-open states. Using the Python library RotamerConvolveMD,[32] we calculate the distance between the methane thiosulfonate spin label (MTSSL) probe conformations, mapped onto the rotamer library, for each pair of residues on all the structures. The raw distance information is then plotted as a histogram, and the distance range spanned by the distance distribution in order to compare to the experimental information.

Optimal DEER Residue-Pair Predictions

Although the DEER experimental method serves as an effective method to understand the dynamics of the proteins, the technique is quite challenging to the study of membrane proteins and requires substantial effort ranging from obtaining a large quantity of the expressed protein to choosing an appropriate residue for site directed mutation to attach MTSSL probe. In this study, we developed a method to obtain a set of residue pairs for cysteine mutation to attach the spin label MTSSL for DEER experiments. A total of ∼4300 sets of residue pairs were obtained, which also took into account the distance constraints of DEER experiments which do not allow the placement of probes on residues that would come closer than 15 Å or farther than 60 Å in any of the ∼54 μs of our MD simulation data set. 200 clusters were obtained by clustering the MD data based on the set of distances between the residue pairs using K-means algorithm. Next, an MSM was constructed at the previously chosen Markovian lag time of 24 ns. We obtained an optimal probe score for each of the 4300 MSMs constructed. A higher score indicates slower kinetics of the MSM and can be considered as a better model to understand the underlying dynamics of the system. The set of residues pairs that led to the highest ranked MSM was chosen as the optimal residue positions to attach probes. To predict best probe positions for future experiments, we have taken into consideration the limitations and constraints of the DEER experimental technique. Experiments can measure residues in the range of 15–60 Å.[61] It cannot measure the distribution across the membrane protein by placing probes on both intracellular and extracellular side. MTSSL probes can only be placed on solvent exposed residues and not within the transmembrane region as they would collide with the helical residues and lipid molecules of the membrane. Residue–residue distances are determined by the CA atom distances of the residues pairs through our ∼54 μs of MD simulation data. We divide the transport protein into two parts, the intracellular side and the extracellular side. On either side, the loop regions and ends of the helices are the possible residues where the probes can be attached. On the basis of the mentioned constraints, we create an N × N adjacency matrix where N is the number of residues. The matrix element N is 0, if probes cannot be positioned on residues i and j. The value is 1 if the probes can be positioned on residues i and j after all the checks. Further, a least effort DEER experiment would require that the number of mutations required be limited to as few as possible and hence place the least number of probes. Thus, from our adjacency matrix we determined a long list of 2000 sets, each of residues that can be simultaneously be spin labeled for DEER signal calculation. These sets are then assigned an optimal probe score. Up to this step, all our choices were with two or three residues because they were limited by residue pair sets which included only residues that are either on the intracellular half of the protein or on the extracellular half. To obtain a mixed set of probes (on both ends of the protein, but does not require any measurement across the protein), we chose the highest scored set of two residues and highest scored set of three residues. Both were on the intracellular side. We then created another list of mixed probe positions with these on the intracellular side and all possible choices from the extracellular side. We ranked these mixed sets using the same protocol. A similar methodology as described in the previous section has been followed to predict the distance distribution for each of the optimal probe position sets obtained using the Optimal Probes algorithm. We extracted 100 structures from each of the 7 minimum free energy states observed on the free energy landscape for PepTSo. Using the Python library RotamerConvolveMD,[32] we calculated the distances between the MTSSL probe conformations, mapped onto the rotamer library, for each pair of residues on all the structures. The raw distance information is then plotted as a histogram and compared to the experimental information.

Optimal Probe Score

The optimal probe score is the GMRQ (generalized matrix Rayleigh quotient) score, which is a method to assign scores to an MSM based on the hyper-parameters used for the MSM construction.[42,43] The objective of the GMRQ score is to assign a score to every MSM such that the dynamics observed by the MSM maximize the sum of the first m eigenvalues of the transition probability matrix. A higher ranked MSM will capture the slow kinetics of the system as compared to an MSM with lower GMRQ. To avoid overfitting, a cross-validation methodology is followed where 50% of the data was used to train the model, and the rest of the data are used to test the model. We exploit this method to rank our featurization metric of the MSM and hence obtain the best set of distances that would lead to a high ranked MSM. We have used the mean of five cross-validated test set GMRQ scores as the optimal probe score. We then suggest these residue pairs as a metric for future experiments. A Web site for the implementation of the method is available at https://github.com/ShuklaGroup/optimalProbes/wiki.

53 in total

Review 1. Ins and outs of major facilitator superfamily antiporters.

Authors: Christopher J Law; Peter C Maloney; Da-Neng Wang
Journal: Annu Rev Microbiol Date: 2008 Impact factor: 15.500

2. Identification of the histidine residues involved in substrate recognition by a rat H+/peptide cotransporter, PEPT1.

Authors: T Terada; H Saito; M Mukai; K I Inui
Journal: FEBS Lett Date: 1996-09-30 Impact factor: 4.124

3. HOLE: a program for the analysis of the pore dimensions of ion channel structural models.

Authors: O S Smart; J G Neduvelil; X Wang; B A Wallace; M S Sansom
Journal: J Mol Graph Date: 1996-12

Review 4. Water-transporting proteins.

Authors: Thomas Zeuthen
Journal: J Membr Biol Date: 2009-11-30 Impact factor: 1.843

5. Coarse-grained simulations of proton-dependent conformational changes in lactose permease.

Authors: Yead Jewel; Prashanta Dutta; Jin Liu
Journal: Proteins Date: 2016-05-03

Review 6. Di/tri-peptide transporters as drug delivery targets: regulation of transport under physiological and patho-physiological conditions.

Authors: C U Nielsen; B Brodin
Journal: Curr Drug Targets Date: 2003-07 Impact factor: 3.465

7. Crystal structure of a prokaryotic homologue of the mammalian oligopeptide-proton symporters, PepT1 and PepT2.

Authors: Simon Newstead; David Drew; Alexander D Cameron; Vincent L G Postis; Xiaobing Xia; Philip W Fowler; Jean C Ingram; Elisabeth P Carpenter; Mark S P Sansom; Michael J McPherson; Stephen A Baldwin; So Iwata
Journal: EMBO J Date: 2010-12-03 Impact factor: 11.598

8. Protein conformational plasticity and complex ligand-binding kinetics explored by atomistic simulations and Markov models.

Authors: Nuria Plattner; Frank Noé
Journal: Nat Commun Date: 2015-07-02 Impact factor: 14.919

9. Proton movement and coupling in the POT family of peptide transporters.

Authors: Joanne L Parker; Chenghan Li; Allete Brinth; Zhi Wang; Lutz Vogeley; Nicolae Solcan; Gregory Ledderboge-Vucinic; Jessica M J Swanson; Martin Caffrey; Gregory A Voth; Simon Newstead
Journal: Proc Natl Acad Sci U S A Date: 2017-11-27 Impact factor: 11.205