Literature DB >> 24900999

Topography prediction of helical transmembrane proteins by a new modification of the sliding window method.

Maria N Simakova1, Nikolai N Simakov2.   

Abstract

Protein functions are specified by its three-dimensional structure, which is usually obtained by X-ray crystallography. Due to difficulty of handling membrane proteins experimentally to date the structure has only been determined for a very limited part of membrane proteins (<4%). Nevertheless, investigation of structure and functions of membrane proteins is important for medicine and pharmacology and, therefore, is of significant interest. Methods of computer modeling based on the data on the primary protein structure or the symbolic amino acid sequence have become an actual alternative to the experimental method of X-ray crystallography for investigating the structure of membrane proteins. Here we presented the results of the study of 35 transmembrane proteins, mainly GPCRs, using the novel method of cascade averaging of hydrophobicity function within the limits of a sliding window. The proposed method allowed revealing 139 transmembrane domains out of 140 (or 99.3%) identified by other methods. Also 236 transmembrane domain boundary positions out of 280 (or 84%) were predicted correctly by the proposed method with deviation from the predictions made by other methods that does not exceed the detection error of this method.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 24900999      PMCID: PMC4034515          DOI: 10.1155/2014/921218

Source DB:  PubMed          Journal:  Biomed Res Int            Impact factor:   3.411


1. Introduction

Problem and relevance of the study of membrane proteins, including GPCRs, are as follows. Membrane proteins are responsible for many cellular functions and processes, in particular ensuring the selective exchange of substances between the cell and its environment, maintaining the electric potential inside and outside the cell, and providing the transfer of electric signals into and out of the cell. They participate in nearly all energy transduction processes in the organism. Protein functions are specified by its three-dimensional structure, which is usually obtained by X-ray crystallography [1, 2]. This method is directly applied to protein crystals, which must be produced beforehand using a very complex and laborious technique. The difficulty of handling membrane proteins during their production, purification, and crystallization due to protein instability, unfolding, aggregation, and heterogeneity has made it hard to solve their structures experimentally and to date the structure has only been determined for a very limited part of membrane proteins (<4%). It is supposed that all information about the ultimate structure of a protein is contained in its amino acid sequence. Therefore, methods of computer modeling based on the data on the primary protein structure or the symbolic amino acid sequence have become an actual alternative to the experimental method of X-ray crystallography for studying the structure of membrane proteins [3]. From the variety of membrane proteins, the group of integral polytopic proteins (transmembrane proteins, TMPs) with multiple hydrophobic sites, domains permeating the membrane, is of considerable interest. Many of these proteins function as gateways or “loading docks” to transport specific substances and relay signals across the biological membrane. The apparent feature and the inherent property of α-helical membrane proteins are the (possibly periodical) repetition of transmembrane domains consisting of hydrophobic amino acids (15–30 aa in length) [4]. If the mentioned repetition is periodic, it can be detected using the known method of Fourier transform, applied to a digital image of a symbolic sequence of amino acids in a protein, as was done in our previous works [4, 5]. If the repetition of transmembrane regions is aperiodic, it can be revealed by another method, that is, the method of the reiterated (four to five times) averaging of the protein hydrophobicity function in a window within the limits of 9–11 amino acids that moves along the sequence. This method is a novel advanced version of the known method of sliding window, which has been proposed and used in our previous work [4] to investigate the secondary structure of different membrane proteins. The aim of the present work is to apply this method for the prediction of the characteristics of unknown secondary structures of TMPs, mainly of GPCRs; these characteristics specify the functional properties of the proteins. G protein-coupled receptors (GPCRs), also known as seven-transmembrane domain receptors, comprise the largest family of membrane proteins in the human genome and the richest source of targets for the pharmaceutical industry [6]. Over 800 unique GPCRs have been revealed from human genome sequence analysis, approximately 460 of which are predicted to be olfactory receptors [7, 8]. The physiologic function of a large fraction of these 800 GPCRs is unknown. There are many obstacles to obtaining structures of GPCRs by X-ray crystallography; the major difficulties include poor protein stability and absence of homogeneity during crystallization due to inherent properties of these receptors [6, 9, 10]. Therefore, it is necessary to develop novel approaches in structurally resolving aspects of their biology [11-13]. One of such useful approaches is to screen these proteins with help of structural bioinformatics and methods of computer modeling to identify those of them with the best characteristics for structural studies and for crystallography trials.

2. Materials and Methods

We used the method of reiterated averaging hydrophobicity function within a sliding window over the amino acid sequence. Since TM domains (TMDs) consist predominantly of hydrophobic amino acids, it is evident that the average hydrophobicity for this region, as specified in the protein sequence by a function f(k) = H [i(k)] of amino acid number k in the sequence, must be higher than that for both hydrophilic topological domains (TPDs) adjacent to it. Furthermore, this local property does not depend on the periodicity of the arrangement of characteristic TMDs and TPDs in the amino acid sequence. Here, i(k) = 1,2, …, 20 is the number of amino acids of the 20 known (Table 1), which is located at position k in the protein sequence.
Table 1

Hydrophobicity scales H (i).

i CodeAbbreviationName H 1(i), [14] H 2(i), [19] H 3(i), [19] H 4(i), [17, 19] H 5(i), [18] H 6(i), [16] H 7(i) [20]
1AAlaAlanine1.8000.740.621.60−0.17
2CCysCysteine2.5110.910.292.000.24
3DAspAspartic acid−3.50−10.62−0.90−9.20−1.23
4EGluGlutamic acid−3.50−10.62−0.74−8.20−2.02
5FPhePhenylalanine2.8110.881.193.701.13
6GGlyGlycine−0.40−10.720.481.00−0.01
7HHisHistidine−3.2000.78−0.40−3.00−0.96
8IIleIsoleucine4.5110.881.383.100.31
9KLysLysine−3.90−10.52−1.50−8.80−0.99
10LLeuLeucine3.8110.851.062.800.56
11MMetMethionine1.9110.850.643.400.23
12NAspAsparagine−3.50−10.63−0.78−4.80−1.23
13PProProline−1.60−10.640.12−0.20−0.45
14QGlnGlutamine−3.50−10.62−0.85−4.10−0.58
15RArgArginine−4.50−10.64−2.53−12.3−0.81
16SSerSerine−0.80−10.66−0.180.60−0.13
17TThrThreonine−0.70−10.70−0.051.20−0.14
18VValValine4.2110.861.082.60−0.07
19WTrpTryptophan−0.9110.850.811.901.85
20YTyrTyrosine−1.3000.760.26−0.700.94
For the first time, this idea was realized in [14], where averaging of the function f(k) within the limits of a segment, or window of width d = 5, 7, 9, 11, or 13 amino acids, moving along the amino acid sequence, was used. The result of averaging was assigned to a member of a new numerical sequence f 1(k) with number k corresponding to the current position of the average segment point. The scale of hydrophobicity H (i) used in this method can be specified in different ways (Table 1) depending on the physically measured value that characterizes this property [14-20]. In [14-16], the change of value of free energy of amino acid side groups upon their transfer into water from a hydrophobic medium was used as a measure of hydrophobicity. In [17, 19], the measure (scale) of amino acid hydrophobicity was defined as the function H 4(i) = 1 − 〈A〉/A 0 (Table 1) based on the values of the amino acid surface area A 0(i), which is available to solvent in the standard state, and the mean solvent accessible surface area 〈A(i)〉 in a folded protein conformation. In [17], the correlation between the free energy value and the surface area available to solvent was established. The set of 20 amino acids can be divided into a few characteristic groups based on their degree of hydrophobicity by different ways. Thus, according to [19], we used the division of 20 amino acids into three groups by the degree of hydrophobicity, including hydrophobic (C, F, I, L, M, V, and W, seven in total), hydrophilic (D, E, G, K, N, P, Q, R, S, and T, ten in total), and neutral (A, H, and Y, three in total). The hydrophobic amino acids were assigned a value of +1, the hydrophilic amino acids were assigned a value of −1, and the neutral amino acids were assigned a value of 0. Thus, we obtained the crude scale H 3(i) in Table 1. On another crude scale H 2(i) the hydrophobic amino acids were assigned a value of +1, and the remaining amino acids were assigned a value of 0. In our previous work [4], we proposed the procedure, different from that used in [14], for averaging the function f(k) on the scale H (i). The averaging was carried out not once, but repeatedly, using the algorithm where every new averaging was performed on the previous function f (k) over a window with a greater width d = 2n + 1; thus, the first averaging was over three elements, the second one was over five elements, and so on. In our opinion, the best result was obtained at n = 4 and the averaging over the window of width d = 9 amino acids (sometimes at n = 5 and d = 11 amino acids). It is interesting to compare the values of the functions f (k) with the characteristic value of the initial hydrophobicity function f 0(k) = f(k), its arithmetic mean, calculated for the entire length L of the protein chain For the major part of each hydrophobic region, in particular TMD, the correlation f (k) > u must be performed, and in the hydrophilic region (TPD), a different correlation f (k) < u must be performed. The scale and function of hydrophobicity can be specified in different ways (there are more than 30 known ones). A comparison of different scales and functions of hydrophobicity carried out in our previous work [4] showed that the numbers and arrangements of transmembrane regions obtained upon their usage were often almost identical, even for very simple (rough) scales, for example, H 2(i) and H 3(i) (see Table 1). However, sometimes a particular scale can be preferable for a given protein due to the better resolution of closely spaced TMDs.

3. Results and Discussion

3.1. Testing of the Improved Method of a Sliding Window on Proteins with Known Structure

The improved method of a sliding window proposed in [4] by algorithm (1) was applied in this work to the group of membrane proteins, such as GPCRs, and to some other transmembrane α-helical proteins. To further test the predictions of our method, first it was used to examine 5 proteins with already known structure (Table 2).
Table 2

Comparison of TMD boundaries calculated upon processing of hydrophobicity functions f (k) at n = 3, 4, 5 on H (i) (N = 3 and 5) scales for GPCRs with known data from [21].

Protein name, code, lengthData source Number and boundaries of transmembrane domains
Scale level1234567
GLR_HUMAN P47871477 aa[21, 22]137–161174–198226–249264–285304–326351–369382–402
H 5(i), n = 4  u = 0.266143–166180–192218–257261–288303–327353–368384–401

CRFR1_HUMAN P34998444 aa[21, 23]112–142179–203219–247255–282299–324336–360368–397
H 3(i), n = 5 u〉 = −0.052116–146178–204217–247255–280302–325344–362370–397

ADRB1_MELGA P07700483 aa[21, 24]39–6777–103116–137156–179206–231286–315321–343
H 3(i), n = 4, u = 0.144–6481–99108–138160–181214–229293–314320–331

5HT1B_HUMAN P28222 390 aa[21, 25]50–7585–110124–145166–187206–228316–336350–371
H 5(i), n = 4  u = 0.24346–7286–109119–145168–185 205–230316–340343–369
H 5(i), n = 3 u〉 = 0.18245–7385–110118–145168–185205–229316–340344–370

5HT2B_HUMAN P41595481 aa[21, 26]57–7991–113130–151172–192217–239325–345361–382
H 5(i), n = 5 u〉 = 0.16454–8189– 89–109 −151 131–151 173–194215–243325–352356–381
Figure 1 shows the results of averaging the hydrophobicity function for the protein sequence P47871 on the scale H 5(i) in Table 1. Obviously, a hydrophobic segment in the form of a narrow peak relating to the signal peptide (SP) is present on the left edge of the graph of the function f 4(k). If this peak is excluded, the remaining seven wide peaks that exceed the mean level u = const = 0.27 will just correspond to 7 TMDs in the resolved structure of this protein [21, 22]. In the graph of the function f 2(k) the 2nd, the 3rd, the 5th, and the 7th TMDs have not been resolved yet, and there are several narrow peaks in their places.
Figure 1

Hydrophobicity functions f (k) for the protein P47871 in Table 2 after averaging at n = 2 and n = 4 on the scale H 5(i) in Table 1; dotted line shows the level u = const = 0.266.

Figure 2 shows the results obtained for the protein sequence P34998 using the relatively rough hydrophobicity scale H 3(i) in Table 1. Apparently, a hydrophobic segment relating to the SP is revealed on the left edge of the graph of the function f 5(k) above the mean level u = 〈f(k)〉 = −0.05, and also, in contrast to the function f 2(k), all 7 TMDs known for the protein structure P34998 [21, 23] are resolved.
Figure 2

Hydrophobicity functions f (k) for the protein P34998 in Table 2 after averaging at n = 2 and n = 5 on the scale H 3(i) in Table 1; dotted line shows the level u = const = 〈f(k)〉 = −0.052.

The boundaries of TMDs of different proteins were determined by the intersection of the graph of the function f (k) with the straight line of some level u = const (e.g., the mean level u = 〈f(k)〉 for the whole protein sequence). They are summarized in Table 2 for 5 known proteins. The TMD boundaries from [21] are also shown for comparison in Table 2. Taking into account the errors Δk ≈ d/2 ≈ 5 ⋯ 6 of the TMD boundary k detection, good agreement of the results of the TMD boundary position calculations with the data from [21] can be obtained. Indeed, according to Table 2, 34 TMDs out of 35 were resolved (or 97%); the obtained TMD boundary positions do not exceed the detection errors (Δk ≤ 6) for 62 out of 70 boundaries (or 89%).

Remark 1

In the protein with a code P41595, the 2nd and the 3rd domains not resolved in calculating can be resolved using the outer boundaries of the combined segment of 89–151 aa by adding to the left border k = 89 and subtracting from the right border k = 151 the estimated average length of a domain 20 aa, as shown in Table 2 in a bold font. In [21], a signal peptide (SP) consisting of 1–25 aa of a protein sequence is indicated in the structure of the protein P47871. In this part of the protein chain, the hydrophobic region of 11–23 aa was detected by the proposed method. Similarly, the sequence of the protein P34998 [21] contains a signal peptide consisting of 1–23 amino acid residues. The proposed method was helpful to reveal here the hydrophobic region of 9–19 aa. It is worth noting that processing with reiterated (four to five times) averaging of the hydrophobicity function f (k) on different scales (the rough scales H 2(i) and H 3(i) or the more precise scales H 4(i)–H 7(i)) produces different values for the TMD boundaries. Sometimes these differences are minor, but sometimes they are significant [4].

3.2. Comparison of Protein Secondary Structure Predictions Made by the Proposed Method and Other Techniques

Secondary structure predictions of a set of 20 membrane proteins belonging to a class of GPCRs performed using the new proposed method were compared with the predictions made by other methods (Table 3).
Table 3

Comparison of TMD boundaries calculated upon processing of hydrophobicity functions f (k) at n = 4, 5 on H (i) (N = 3, 5, 6) scales for GPCRs with known data from [21].

Protein name, code, lengthData source Number and boundaries of transmembrane domains
Scale level1234567
S1PR1_HUMAN P21453382 aa[21], by similarity47–7179–107122–140160–185202–222256–277294–314
H 5(i), n = 5  u = 0.2548–6983–107122–142160–195199–223255–281293–310

ACM2_HUMAN P08172466 aa[21], by similarity23–4560–8098–119140–162185–207389–409424–443
H 3(i), n = 5  u = 0.0721–4860–8590–122142–167192–208389–415422–429

ACM3_RAT P08483589 aa[21], by similarity67–90104–124142–163184–206229–251492–512527–546
H 5(i), n = 5  u = 0.3062–92105–128137–161187–208221–249492–515526–541

CXCR1_HUMAN P25024350 aa[21], potential40–6676–96112–133155–174200–220243–264286–308
H 3(i), n = 4  u = −0.0539–6776–96102–141152–175199–230241–267291–308

CCR5_HUMAN P51681352 aa[21], potential31–5869–89103–124142–166199–218236–260278–301
H 5(i), n = 5  u = 0.2533–5668–93100–136141–164196–218238–264288–299

HRH1_HUMAN P35367487 aa[21], potential30–4964–83102–123146–165190–210419–438451–470
H 5(i), n = 5  u = 0.1725–5063–9396–122147–167188–212418–442449–469

OPRK_HUMAN P41145380 aa[21], potential59–8596–117133–154174–196223–247276–299312–333
H 6(i), n = 4  u = 0.5056–8399–122143–151180–195227–248277–300302–320

OPRM_MOUSE P42866398 aa[21], potential65–94104–121144–163194–209235–257281–303312–328
H 3(i), n = 4  u = −0.0268–95105–114136–162187–205229–262280–306317–325

OPRD_MOUSE P32300 372 aa[21], potential46–7585–102125–144175–190216–238262–284294–310
H 5(i), n = 5  u = 0.21344–7485–102112–142167–187211–236263–286296–319

OPRX_HUMAN P41146 370 aa[21], potential51–7788–109125–146166–188212–236265–288301–322
H 3(i), n = 5  u = 0.01142–7990–107112–130172–186212–241263–284301–335

NTR1_RAT P20789424 aa[21], potential65–8797–121144–165189–210236–260309–330349–372
H 5(i), n = 5  u = 0.14463–86103–139154–172191–208220–268306–324338–374

PAR1_HUMAN P25116425 aa[21], potential103–128138–157177–198219–239269–288312–334351–374
H 3(i), n = 4  u = 0.100101–133136–158175–208221–238270–296313–338350–371

O51E1_HUMAN Q8TCB6317 aa[21], potential28–4857–77102–122142–162199–219239–259275–295
H 5(i), n = 5 u〉 = 0.30012–4960–7780–120146–166198–227243–260276–292

SMO_HUMAN Q99835787 aa[21], potential234–254263–283315–335359–379403–423452–472525–545
H 3(i), n = 5  u = 0.00236–251264–283313–340362–380403–425451–473519–545

GP160_HUMAN Q9UJ42338 aa[21], potential24–4459–7994–114137–157178–198245–265269–289
H 5(i), n = 5  u = 0.42026–4059–8197–118139–157182–202244–271274–292

HRH3_HUMAN Q9Y5N1445 aa[21], potential40–6071–91109–129157–177197–217360–380396–416
H 3(i), n = 5  u = 0.0033–6172–95105–132155–173191–222360–388395–416

HRH4_HUMAN Q9H3N8390 aa[21], potential20–4053–7388–108132–152173–193305–325342–362
H 5(i), n = 5  u = 0.2516–4155–7983–107130–153169–198305–331341–357

RAI3_HUMAN Q8NFJ5 357 aa[21], potential34–5469–8998–118130–150177–197213–233248–268
H 5(i), n = 4 u〉 = 0.19526–5368–92 96–118130–155178–202213–233246–265

VN1R1_HUMAN Q9GZP7 353 aa[21], potential57–7785–105133–153170–190227–247275–295304–324
H 4(i), n = 5 u〉 = 0.75453–7790–103122–145165–188222–245274–301306–338

APJ_HUMAN P35414 380 aa[21], potential27–5167–91101–125145–166201–221245–271285–308
H 3(i), n = 5  u = −0.09030–5267–8598–135147–167208–228246– 246–266 −312 292–312
As can be seen from Table 3, the proposed method allowed revealing 139 TMDs out of 140 (or 99.3%) identified by other methods. In the protein P35414 (the last one in Table 3) the 6th and the 7th domains “merged” into one long stretch of 246–312 aa. However, taking into account Remark 1, the boundaries of these two domains can be easily recovered using the outer boundaries of the combined segment by adding to the left border k = 246 and subtracting from the right border k = 312 the estimated average length of a domain 20 aa, as shown in Table 3 in a bold font. 236 TMD boundary positions out of 280 (or 84%) were predicted correctly by the proposed method with deviation from the predictions made by other methods that does not exceed the detection error of this method (Δk ≤ 6). In [21], a signal peptide (SP) consisting of 1–21 aa of a protein sequence is indicated in the structure of the protein P25116. In this part of the protein chain the hydrophobic region of 6–17 aa was detected by the proposed method. Similarly, the sequence of the protein Q99835 [21] contains a signal peptide consisting of 1–27 amino acid residues. The proposed method was helpful to reveal here the hydrophobic region of 13–23 aa.

3.3. Predictions of Unknown Secondary Structure of GPCRs and Other Membrane Proteins

Then the proposed method of multiple averaging of hydrophobicity function was used to predict the location of hydrophobic regions, including TMDs, in several GPCRs with unknown structure. The results are shown in Table 4.
Table 4

Prediction of TMD boundaries calculated upon processing of hydrophobicity functions f (k) at n = 4, 5 on H (i) (N = 3, 4, 5) scales for GPCRs.

Protein name, code, length Scale levelNumber and boundaries of hydrophobic regions, including TMDs
1234567
A4D1U0_HUMAN A4D1U0299 aa H 5(i), n = 5 u〉 = 0.4397–2845–7082–102127–147173–194222–240253–274
H 3(i), n = 5 u〉 = 0.0577–2946–7177–103124–144179–194222–237258–275

A5Z1T7_HUMAN A5Z1T7300 aa H 4(i), n = 5 u〉 = 0.7557–2743–57 75–100121–146185–210225–240263–274
H 3(i), n = 5 u〉 = −0.0437–2641–6475–97123–144185–209225–238264–274

B5B0C2_HUMAN B5B0C2337 aa H 5(i), n = 5 u〉 = 0.14214–4049–7285–122132–155189–201227–255275–293
H 3(i), n = 5 u〉 = −0.03014–3951–7189– 89–109 −154 134–154 193–205226–256277–292

M9TID6_9BETA M9TID6347 aa H 3(i), n = 4  u = 0.05543–5769–8897–123149–161188–219232–262265–288
H 5(i), n = 5 u〉 = 0.19133–5666–87100–123148–164186–217233– 233–253 −295 275–295

Q76L88_HUMAN Q76L88321 aa H 5(i), n = 5 u〉 = 0.20111–4054–7893–116156–178196–223251–270
H 3(i), n = 5 u〉 = −0.05013–3755–7890–117153–174198–225248–282
At least two hydrophobicity scales H (i) were applied to make predictions for each of the 5 proteins. Obviously, these predictions are consistent with each other for most of the domain boundaries considering the detection errors Δk = ±6. For the protein B5D0C2 the calculation on the H 5(i) scale resolved the 3rd and the 4th domains, but the application of the H 3(i) scale did not resolve these domains; they merged into a single domain. And it was vice versa for the protein M9TID6 with the 6th and the 7th TMDs. Taking into account Remark 1, the boundaries of unresolved domains can be restored, as shown in Table 4 in a bold font. Surprisingly, for the protein Q76L88 given that f (k) is higher than the mean level u = 〈f(k)〉, only 6 domains were surely detected instead of 7 as for other proteins in Table 4. The results of prediction of TMDs using the proposed method are shown in Table 5 for 4 α-helical membrane proteins of unknown structure. The first two proteins (P71044 and P49785) belong to the group of channels: intercellular, the third one Q8TMG0 to the group of methyltransferases, and the fourth one P77335 to the group of adventitious membrane proteins: alpha-helical pore-forming toxins.
Table 5

Prediction of hydrophobic regions and TMDs calculated upon processing of hydrophobicity functions f (k) at n = 4, 5 on H (i) (N = 3, 4, 5) scales for α-helical membrane proteins.

Protein name, code, lengthData sourceNumber and boundaries of hydrophobic regions, including TMDs
Scale level1234567
SP2Q_BACSU P71044283 aa[21], potential22–42
H 5(i), n = 5 u〉 = − 0.12720–4770–94107–124130–175207–229
H 4(i), n = 5 u〉 = 0.69616–4870–94109–121132–174197–225

SP3AH_BACSU P49785218 aa[21], potential7–26
H 5(i), n = 5 u〉 = − 0.1373–3192–106146–179193–211
H 4(i), n = 5 u〉 = 0.6923–3095–113146–179193–211

Q8TMG0_METAC Q8TMG0194 aa H 5(i), n = 5 u〉 = 0.2327–2049–6776–93130–162
H 3(i), n = 5 u〉 = 0.0410–2245–6277–91127–163

HLYE_ECOLI P77335303 aa[21], potential183–203
H 3(i), n = 5 u〉 = − 0.2480–1724–3882–103114–123180–209242–247264–280
H 5(i), n = 4 u〉 = 0.0295–2632–4081–102115–123179–208242–253267–275
Here, as well as in Table 4, the predictions were made on at least two hydrophobicity scales H (i). Evidently, these predictions are consistent with each other for all domain boundaries considering the detection errors Δk = ±6. Individual single domains predicted earlier by other methods [21] were also identified by the proposed method. Table 6 shows data comparison from [21] with prediction of TMDs made by the proposed method for the long (L = 2424 aa) α-helical membrane protein from the group of adventitious membrane proteins: alpha-helical pore-forming toxins. Obviously, compliance between the predictions takes place for most of TMDs considering errors in determining their boundaries Δk ≤ 6.
Table 6

Prediction of TMDs calculated upon processing of hydrophobicity functions f (k) at n = 5 on the scale H 5(i) for the long α-helical membrane protein.

Protein name, code, lengthData sourceNumber and boundaries of transmembrane domains
Scale 123456
CAC1A_RABIT P278842424 aa[21], potential99–117136–155168–185191–209 229–248 336–360
H 5, u = 0.305101–116141–158172–185 210–249 302–317336–358
Scale789101112
[21], potential488–506522–541550–568579–597 617–636 690–714
H 5, u = 0.305491–507518–537554–577 609–638 654–665685–714
Scale131415161718
[21], potential1254–12721289–13081321–13391351–1369 1389–1408 1496–1520
H 5, u = 0.3051255–12701293–13121323–1339 1384–1408 1456–14671497–1522
Scale192021222324
[21], potential1576–16041610–16291638–16561666–1684 1704–1723 1796–1820
H 5, u = 0.3051575–15991607–16331641–1660 1691–1725 1794–1820
In the calculation using the proposed method of multiple averaging of hydrophobicity function over a sliding window, besides those domains indicated in Table 6, a hydrophobic region of 16–28 aa was identified, which may belong to a signal peptide (SP) or may be the 1st one out of 24 TMDs of the present protein. Moreover, it is obvious that TMDs numbered in [21] as 5, 11, 17, and 23 and highlighted in Table 6 by a bold font in our prediction have the numbers, which are one less than in [21], but other domains that are not specified in [21] have the numbers, which are one more. Thus, two varied predictions in Table 6 have great similarities as well as notable differences.

4. Conclusions

The first membrane protein topology prediction algorithms were based solely on the hydrophobicity plots, for example, [14, 16, 18], and it seemed that the performance of these early methods was rather poor in practice. Hence, they soon were supplied by novel statistical, machine-learning methods, which use hundreds of free parameters extracted from databases of experimentally mapped topologies [13, 27]. However, as it is stated in [27], the translocons (cellular machineries) responsible for membrane-protein biogenesis do not have access to statistical data but rather exploit molecular interactions to ensure that membrane proteins attain their correct topology. Therefore, as it is concluded in [13], those methods which are based on the same physical properties that determine translocon-mediated membrane insertion, by using properly scaled hydrophobicity values, may access the same level of prediction accuracy as the best statistical methods. Thereby, here we presented the results of the study of 35 transmembrane proteins using cascade averaging of hydrophobicity function within the limits of a sliding window, as expressed in formula (1). In the work [4], the proposed method was successfully applied to predict the location of TMDs, secondary structure elements of a number of membrane proteins, in particular, bacteriorhodopsin, halorhodopsin, sensory rhodopsin 2, some connexins, and others. In the current work, this method was used to analyze the arrangement of the hydrophobic regions, including the transmembrane domains of another protein class, primarily GPCRs. At first, the method was tested on 5 known proteins of this class. Then an additional comparison of TMDs location predictions made by the proposed method and some other methods [21] was carried out on 20 proteins of the same class. These verifications confirmed the applicability of the proposed method for the stated purposes. Whereupon, this method was used to predict the TMDs in proteins with unknown structure, namely, 5 GPCRs and 5 α-helical transmembrane proteins of other classes. For 9 out of 10 of these proteins (Tables 4 and 5) concordant predictions were made using at least two different hydrophobicity scales. The prediction made by the proposed method for a very long protein (Table 6) is consistent largely with the prediction made by another method [21]. These facts indicate the applicability and usefulness of the new method presented in our work [4] and proposed here.
  23 in total

1.  Functionally different agonists induce distinct conformations in the G protein coupling domain of the beta 2 adrenergic receptor.

Authors:  P Ghanouni; Z Gryczynski; J J Steenhuis; T W Lee; D L Farrens; J R Lakowicz; B K Kobilka
Journal:  J Biol Chem       Date:  2001-04-24       Impact factor: 5.157

2.  [Computational methods for prediction of structure of membrane proteins using their amino acids sequences].

Authors:  M N Simakova; N N Simakov
Journal:  Mol Biol (Mosk)       Date:  2013 Mar-Apr

Review 3.  Experimentally determined hydrophobicity scale for proteins at membrane interfaces.

Authors:  W C Wimley; S H White
Journal:  Nat Struct Biol       Date:  1996-10

Review 4.  Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins.

Authors:  D M Engelman; T A Steitz; A Goldman
Journal:  Annu Rev Biophys Biophys Chem       Date:  1986

5.  The apolar surface area of amino acids and its empirical correlation with hydrophobic free energy.

Authors:  C Frömmel
Journal:  J Theor Biol       Date:  1984-11-21       Impact factor: 2.691

6.  Analysis of membrane and surface protein sequences with the hydrophobic moment plot.

Authors:  D Eisenberg; E Schwarz; M Komaromy; R Wall
Journal:  J Mol Biol       Date:  1984-10-15       Impact factor: 5.469

7.  Hydrophobicity of amino acid residues in globular proteins.

Authors:  G D Rose; A R Geselowitz; G J Lesser; R H Lee; M H Zehfus
Journal:  Science       Date:  1985-08-30       Impact factor: 47.728

8.  A simple method for displaying the hydropathic character of a protein.

Authors:  J Kyte; R F Doolittle
Journal:  J Mol Biol       Date:  1982-05-05       Impact factor: 5.469

9.  Structure of the human glucagon class B G-protein-coupled receptor.

Authors:  Fai Yiu Siu; Min He; Chris de Graaf; Gye Won Han; Dehua Yang; Zhiyun Zhang; Caihong Zhou; Qingping Xu; Daniel Wacker; Jeremiah S Joseph; Wei Liu; Jesper Lau; Vadim Cherezov; Vsevolod Katritch; Ming-Wei Wang; Raymond C Stevens
Journal:  Nature       Date:  2013-07-17       Impact factor: 49.962

10.  The G-protein-coupled receptors in the human genome form five main families. Phylogenetic analysis, paralogon groups, and fingerprints.

Authors:  Robert Fredriksson; Malin C Lagerström; Lars-Gustav Lundin; Helgi B Schiöth
Journal:  Mol Pharmacol       Date:  2003-06       Impact factor: 4.436

View more
  2 in total

1.  Corrigendum to "Topography Prediction of Helical Transmembrane Proteins by a New Modification of the Sliding Window Method".

Authors:  Maria N Simakova; Nikolai N Simakov
Journal:  Biomed Res Int       Date:  2017-11-05       Impact factor: 3.411

2.  Search for Highly Divergent Tandem Repeats in Amino Acid Sequences.

Authors:  Valentina Rudenko; Eugene Korotkov
Journal:  Int J Mol Sci       Date:  2021-07-01       Impact factor: 5.923

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.