Literature DB >> 10438601

LearnCoil-VMF: computational evidence for coiled-coil-like motifs in many viral membrane-fusion proteins.

M Singh1, B Berger, P S Kim.   

Abstract

Crystallographic studies have shown that the coiled-coil motif occurs in several viral membrane-fusion proteins, including HIV-1 gp41 and influenza virus hemagglutinin. Here, the LearnCoil-VMF program was designed as a specialized program for identifying coiled-coil-like regions in viral membrane-fusion proteins. Based upon the use of LearnCoil-VMF, as well as other computational tools, we report detailed sequence analyses of coiled-coil-like regions in retrovirus, paramyxovirus and filovirus membrane-fusion proteins. Additionally, sequence analyses of these proteins outside their putative coiled-coil domains illustrate some structural differences between them. Complementing previous crystallographic studies, the coiled-coil-like regions detected by LearnCoil-VMF provide further evidence that the three-stranded coiled coil is a common motif found in many diverse viral membrane-fusion proteins. The abundance and structural conservation of this motif, even in the absence of sequence homology, suggests that it is critical for viral-cellular membrane fusion. The LearnCoil-VMF program is available at http://web.wi.mit.edu/kim Copyright 1999 Academic Press.

Entities:  

Mesh:

Substances:

Year:  1999        PMID: 10438601      PMCID: PMC7125536          DOI: 10.1006/jmbi.1999.2796

Source DB:  PubMed          Journal:  J Mol Biol        ISSN: 0022-2836            Impact factor:   5.469


Introduction

The surface glycoproteins of enveloped viruses are essential to viral cell entry and replication. These envelope proteins mediate both the initial virion attachment to the cell, as well as the subsequent fusion of viral and cellular membranes; these processes result in the release of viral contents into the host cell. Insight into the membrane fusion process has been advanced by structural studies Wilson et al 1981, Bullough et al 1994, Rey et al 1995, Fass et al 1996, Chan et al 1997, Weissenhorn et al 1997, Weissehorn et al 1998b, Malashkevich et al 1998, Malashkevich et al 1999, Caffrey et al 1998. Remarkably, these studies suggest that rather diverse enveloped viruses share similar mechanisms for membrane fusion Chan and Kim 1998, Hughson 1997. The structures of the membrane-fusion proteins of influenza virus (hemagglutinin HA2), Moloney murine leukemia virus (MoMLV TM), human and simian immunodeficiency viruses (HIV-1 and SIV gp41), and Ebola virus (Ebola GP2) reveal significant similarity: in their putative fusogenic (i.e., fusion-active) conformations, all five proteins contain a parallel homotrimeric coiled coil adjacent to the fusion-peptide regions (Figure 1). At the carboxyl end of this coiled coil, the polypeptide chains reverse direction, and the base of the coiled coil is supported by additional structures. In HIV-1 and SIV gp41, influenza HA2 and Ebola GP2, this support comes from three extended helices packed on the exterior of the coiled coil. In MoMLV TM, a short helix and a more extended region in each monomer serve to stabilize the coiled coil. In all five structures, the C-terminal residues extend back towards the N-terminal end of the coiled coil to form a hairpin-like structure.
Figure 1

Common structural elements between MoMLV TM, HIV gp41 and influenza HA2. The top panel shows schematic maps of the MoMLV, HIV and influenza sequences. For each, the position of the coiled coil is shown in red, and the position of the supporting structures is shown in blue. The bottom panel shows the structures of MoMLV TM residues 45–98 (Fass , HIV gp41 residues 546–581 (Chan , and influenza HA2 residues 40–129 (Bullough . The interior coiled coils are shown in red, and the exterior supporting structures are shown in blue.

Common structural elements between MoMLV TM, HIV gp41 and influenza HA2. The top panel shows schematic maps of the MoMLV, HIV and influenza sequences. For each, the position of the coiled coil is shown in red, and the position of the supporting structures is shown in blue. The bottom panel shows the structures of MoMLV TM residues 45–98 (Fass , HIV gp41 residues 546–581 (Chan , and influenza HA2 residues 40–129 (Bullough . The interior coiled coils are shown in red, and the exterior supporting structures are shown in blue. Do other enveloped viruses share similar structures for membrane fusion? Computational methods provide a quick way to begin addressing this question for a large, diverse group of viruses. The lack of sequence similarity among viral membrane-fusion proteins requires computational tools more sensitive than alignment-based methods. Fortunately, the coiled-coil structure common to HIV-1 gp41, SIV gp41, influenza HA2, MoMLV TM, and Ebola GP2 is a motif for which several prediction schemes have already been developed Parry 1982, Lupas et al 1991, Berger et al 1995, Wolf et al 1997. Nevertheless, these methods cannot be used alone to recognize the coiled coils in membrane-fusion proteins. For example, the coiled-coil region in HIV-1 (Chan is scored with low likelihood (⩽10 %) using either PairCoil (Berger , MultiCoil (Wolf or COILS (Lupas . Several authors have previously noted heptad repeat patterns visually in viral membrane-fusion proteins Groot et al 1987, Gallaher et al 1989, Delwart et al 1990, Chambers et al 1990. However, the probability of a heptad repeat pattern occurring by chance in a protein sequence is significant (Brendel & Karlin, 1989), and predicting coiled coils “by eye” may therefore lead to significant over-prediction. For this reason, reliable computational methods are necessary in evaluating whether these patterns actually correspond to coiled-coil-like structures. Here, an iterative algorithm LearnCoil Berger and Singh 1997, Singh et al 1998 is used as the primary method to detect potential coiled-coil-like regions in viral membrane-fusion proteins (see Methods). In the algorithm, an initial evaluation of viral membrane-fusion protein sequences was made using the PairCoil algorithm and a probability table estimated from a two-stranded and three-stranded coiled-coil database Berger et al 1995, Wolf et al 1997. Then, those sequences with likelihoods above zero were selected by a randomized procedure to update the table. This process was repeated until convergence. After convergence, the final tables built from the viral membrane-fusion proteins were incorporated into a program, LearnCoil-VMF. The LearnCoil-VMF program detects coiled-coil-like regions in many viral membrane-fusion proteins, including most retrovirus and paramyxovirus membrane-fusion proteins examined. These proteins are quite diverse, with no apparent sequence similarity between most pairs of viruses in different families, or between some viruses within the same family. The LearnCoil-VMF program also helps further characterize the overall core structure of these viral membrane-fusion proteins. The exterior of the HIV-1 gp41 core is made up of three extended amphipathic helices that contain a heptad repeat and wrap around the interior coiled coil in a left-handed direction (Figure 1); the LearnCoil-VMF program identifies these coiled-coil-like helices in HIV-1 as well as in other viruses in the lentivirus genus of retroviruses. Intriguingly, LearnCoil-VMF also identifies two coiled-coil-like regions in many paramyxovirus membrane-fusion proteins. A recent result shows that peptides containing the two predicted coiled-coil-like regions in the paramyxovirus simian parainfluenza virus 5 (SV5) interact with each other to form an α-helical trimeric complex (Joshi . The inability of previous coiled-coil recognition methods to identify the coiled-coil-like structures found in many viral membrane-fusion proteins indicates that, although these proteins contain an apparent heptad repeat, their coiled-coil-like regions have some subtle statistical differences from other known coiled coils. This work thus helps characterize the coiled-coil-like regions found in viral membrane-fusion proteins. Additionally, because coiled-coil recognition methods are primarily limited by the lack of a sufficiently large and diverse database of coiled-coil sequences (Wolf , this work suggests improvements for existing methods for coiled-coil recognition.

Results

Coiled-coil-like regions are detected using LearnCoil-VMF in many viral membrane-fusion proteins, including many retrovirus envelope proteins, paramyxovirus fusion proteins, orthomyxovirus hemagglutinins, coronavirus spike proteins, arenavirus glycoproteins, and baculovirus envelope glycoproteins. (See Methods for a description of the LearnCoil-VMF algorithm.) Here, we focus on the membrane-fusion proteins of retroviruses, paramyxoviruses and filoviruses. Protein sequences from any virus family may be submitted at the LearnCoil-VMF webpage.

Retroviruses

LearnCoil-VMF finds coiled-coil-like regions in membrane-fusion proteins from all retrovirus genera. Detailed analysis is given for the lentivirus, mammalian type C, avian type C, type D and BLV-HTLV type retrovirus genera (Murphy . LearnCoil-VMF detects two coiled-coil-like regions in most of the envelope proteins from the lentivirus genus of retroviruses Table 1, Table 2.† Note the overall sequence diversity in Table 1, Table 2; even though pairs of sequences are quite similar, there is not a single conserved residue in either Table. The subsequences shown are aligned (using the output of LearnCoil-VMF) to the N36/C34 fragment crystallized for HIV-1 gp41 (Chan and SIV gp41 (Malashkevich . For the envelope protein sequences shown in Table 1, Table 2, two coiled-coil-like regions are detected by LearnCoil-VMF in HIV-1, SIV, feline immunodeficiency virus (FIV), visna virus, and caprine arthritis encephalitis virus (CAEV), but only one coiled-coil-like region is found in equine infectious anemia virus (EIAV) and bovine immunodeficiency virus (BIV). However, the sequence features shown in Table 2 shared between EIAV and BIV and the other lentiviruses suggests that a second helix exists in EIAV and BIV, although the LearnCoil-VMF method is not able to detect it. In particular, the heptad repeat is maintained in the N-terminal portion of these subsequences, as it is in the other lentivirus sequences given. Furthermore, in both the HIV-1 and SIV structures, there are two tryptophan residues in the C-terminal helix that fit into a conserved pocket found in the interior three-stranded coiled coil Chan et al 1997, Chan et al 1998, Malashkevich et al 1998; these residues (corresponding to columns 1 and 4 in Table 2) are conserved in EIAV, while the first tryptophan residue is conserved in BIV and the second is replaced by a leucine residue. Finally, for an alignment of the lentivirus envelope sequences, the PHD secondary structure program predicts the region shown in Table 2 as helical, with 85% of the predictions with reliability index at least 8 (the three-state accuracy for predictions with reliability index 8 or higher is 91.1%) Rost and Sander 1993, Rost and Sander 1994, Rost et al 1994. ‡ Thus, computational methods argue that for the lentivirus membrane-fusion proteins, there are two coiled-coil-like regions that form a six-helical bundle structure similar to that found in HIV-1.
Table 1

Lentivirus N-terminal coiled coils

VirusAdd distSequenceLikelihoods
LPCM
bcdefgabcdefgabcdefgabcdefgabcdefgab
HIV-136SGIVQQQNNLLRAIEAQQHLLQLTVWGIKQLQARIL9001
SIV34AGIVQQQQQLLDVVKRQQELLRLTVWGTKNLQTRVS9000
Visna37QSLANATAAQQNVLEATYAMVQHVAKGVRILEARVA9000
CAEV37QTLANATAAQQDALEATYAMVQHVAKGVRILEARVA9000
FIV36ATHQETIEKVTEALKINNLRLVTLEHQVLVIGLKVE9313
EIAV39NHTFEVENSTLNGMDLIERQIKILYAMILQTHADVQ9012
BIV39ERVVQNVSYIAQTQDQFTHLFRNINNRLNVLHRRVS5000

For the lentiviruses, two helical regions are found; the lentivirus N-terminal helix is shown here. In all Tables, L, P, C and M indicate the highest likelihood for the shown sequence fragment for (respectively) LearnCoil-VMF, PairCoil, COILS and Multicoil. (0 represents likelihoods less than 0.1; 1 represents likelihoods at least 0.1 and less than 0.2, etc.) For Multicoil, the likelihoods given are the total coiled-coil likelihoods (i.e. the sum of the dimer and trimer likelihoods). In all Tables lower-case letters represent positions in the heptad repeat. Add dist is the number of amino acid residues from the proteolytic maturation cleavage site to the beginning of the shown subsequence. The N36 fragment of HIV-1 gp41 (Chan is shown, and the other lentivirus sequences are aligned to it. Abbreviations and GenBank accession numbers (Benson for both lentivirus Tables are: HIV-1, human immunodeficiency virus 1 (119452); SIV, simian immunodeficiency virus (119496); Visna, visna virus (543528); CAEV, caprine arthritis encephalitis virus (399432); FIV, feline immunodeficiency virus (544245); EIAV, equine infectious anemia virus (119407); BIV, bovine immunodeficiency virus (119399).

Table 2

The lentivirus C-terminal helices

VirusAdd dist2SequenceLikelihoods
LPCM
abcdefgabcdefgabcdefgabcdefgabcdef
HIV-147WMEWDREINNYTSLIHSLIEESQNQQEKNEQELL9381
SIV43WQEWERKVDFLEANITALLEEAQIQQEKNMYELQ9195
Visna47WQQWEEEIEQHEANLSQLLREAALQVHIAQRDAQ9085
CAEV47WQQWERELQGYDGNLTMLLRESARQTQLAEEQVR9070
FIV50LGEWYNQTKELQQKFYEIIMNIEQNNVQVKKGLQ9040
EIAV44WDDWVSKMEDLNQEILTTLHGARNNLAQSMITFN0000
BIV44WSDLQDEYDKIEEKILKIRVDWLNSSLSDTQDTF0000

Add dist2 column is the distance from the end of the region in Table 1 to the start of this region. The C34 fragment of HIV-1 gp41 (Chan is shown, and the other lentivirus sequences are aligned to it.

Lentivirus N-terminal coiled coils For the lentiviruses, two helical regions are found; the lentivirus N-terminal helix is shown here. In all Tables, L, P, C and M indicate the highest likelihood for the shown sequence fragment for (respectively) LearnCoil-VMF, PairCoil, COILS and Multicoil. (0 represents likelihoods less than 0.1; 1 represents likelihoods at least 0.1 and less than 0.2, etc.) For Multicoil, the likelihoods given are the total coiled-coil likelihoods (i.e. the sum of the dimer and trimer likelihoods). In all Tables lower-case letters represent positions in the heptad repeat. Add dist is the number of amino acid residues from the proteolytic maturation cleavage site to the beginning of the shown subsequence. The N36 fragment of HIV-1 gp41 (Chan is shown, and the other lentivirus sequences are aligned to it. Abbreviations and GenBank accession numbers (Benson for both lentivirus Tables are: HIV-1, human immunodeficiency virus 1 (119452); SIV, simian immunodeficiency virus (119496); Visna, visna virus (543528); CAEV, caprine arthritis encephalitis virus (399432); FIV, feline immunodeficiency virus (544245); EIAV, equine infectious anemia virus (119407); BIV, bovine immunodeficiency virus (119399). The lentivirus C-terminal helices Add dist2 column is the distance from the end of the region in Table 1 to the start of this region. The C34 fragment of HIV-1 gp41 (Chan is shown, and the other lentivirus sequences are aligned to it. Recently, to test these predictions for visna virus, peptides were constructed corresponding to the sequences shown in Table 1, Table 2. These peptides were crystallized and in fact form a six-helical bundle structure that is very similar to HIV-1 and SIV gp41 (V.N. Malashkevich & P.S.K., unpublished results). For envelope proteins from the retrovirus genera mammalian type C, avian type C, type D and BLV-HTLV type, there is a single predicted coiled-coil region using the LearnCoil-VMF, PairCoil, MultiCoil, and COILS programs (Table 3). A fragment of the envelope protein of MoMLV, a mammalian type C retrovirus, has been studied using X-ray crystallography (Fass ; the subsequences shown in Table 3 are aligned to the coiled-coil region found in this crystal structure. All four programs identify the coiled-coil regions in the mammalian type C retroviruses and the type D viruses; however, only LearnCoil-VMF identifies coiled coils in the avian C and BLV-HTLV retroviruses.
Table 3

Representative envelope proteins for retrovirus genera mammalian C, avian C, D and BLV-HTLV

VirusAdd distSequenceLikelihoods
LPCM
Mam Cgabcdefgabcdefgabcdefgabcdefgabcd
MoMLV46DLREVEKSISNLEKSLTSLSEVVLQNRRGLDLL9799
FeLV46DIQALEESISALEKSLTSLSEVVLQNRRGLDIL9659
GALV47DLRALQDSVSKLEDSLTSLSEVVLQNRRGLDLL9639
ARV41DVQALSGTINDLQDQIDSLAEVVLQNRRGLDLL9698
Avian C
RSV50QANLTTSLLGDLLDDVTSIRHAVLQNRAAIDFL9000
D
MPMV41DVQAISSTIQDLQDQVDSLAEVVLQNRRGLDLL9669
BLV-HTLV
BLV41DQQRLITAINQTHYNLLNVASVVAQNRRGLDWL9000
HTLV-141DISQLTQAIVKNHKNLLKIAQYAAQNRRGLDLL9000
HTLV-241DISHLTQAIVKNHQNILRVAQYAAQNRRGLDLL9000
PTLV41DIDHLTRAIVKNHDNILRVAQYAAQNRRGLDLL9000

For these sequences, there is only one predicted coiled coil. Add dist is the distance from the cleavage site to start of the subsequence shown. The subsequences shown correspond to the coiled coil in the Moloney murine leukemia virus crystal structure (Fass and are aligned using subsequence QNRRGLDLL (Delwart . Abbreviations and GenBank accession numbers (Benson are: MoMLV, Moloney murine leukemia virus (119478); FeLV, feline leukemia virus (119418); GALV, gibbon ape leukemia virus (119426); ARV, avian reticuloendotheliosis virus (119396); RSV, Rous sarcoma virus (119487); MPMV, Mason-Pfizer monkey virus (119482); BLV, bovine leukemia virus (119401); HTLV, human T-lymphotropic virus (119464 and 119467); PTLV, primate T-lymphtropic virus (632274).

Representative envelope proteins for retrovirus genera mammalian C, avian C, D and BLV-HTLV For these sequences, there is only one predicted coiled coil. Add dist is the distance from the cleavage site to start of the subsequence shown. The subsequences shown correspond to the coiled coil in the Moloney murine leukemia virus crystal structure (Fass and are aligned using subsequence QNRRGLDLL (Delwart . Abbreviations and GenBank accession numbers (Benson are: MoMLV, Moloney murine leukemia virus (119478); FeLV, feline leukemia virus (119418); GALV, gibbon ape leukemia virus (119426); ARV, avian reticuloendotheliosis virus (119396); RSV, Rous sarcoma virus (119487); MPMV, Mason-Pfizer monkey virus (119482); BLV, bovine leukemia virus (119401); HTLV, human T-lymphotropic virus (119464 and 119467); PTLV, primate T-lymphtropic virus (632274). Interestingly, there is some computational evidence that the mammalian type C, avian type C, and type D retroviruses also contain a second helix. In particular, for an alignment of these sequences, the PHD secondary structure program finds a second helix approximately 25 residues C-terminal to the region shown in Table 3, with length 18. (For MoMLV, this places the start of the helix five residues C-terminal to the fragment solved by X-ray crystallography (Fass .) Additionally, in all of these sequences except Rous sarcoma virus, the same region scores as coiled-coil-like when using the MultiCoil program with windows of length 14 (individual likelihoods varying from 0.31 to 0.71); this suggests a short amphipathic helix. This computational evidence is consistent with circular dichroism (CD) experiments on MoMLV that show additional helical content C-terminal of the coiled-coil structure (Fass & Kim, 1995). Additionally, these retrovirus envelope sequences share sequence similarity with Ebola GP2, and recent X-ray crystallography studies reveal a helix in this region Weissehorn et al 1998b, Malashkevich et al 1999.

Paramyxoviruses

LearnCoil-VMF detects two coiled-coil-like regions (likelihoods ⩾0.5) in 15 out of the 20 sequences listed in Table 4, Table 5. An additional four viruses have two scoring regions, although the second regions are given likelihoods less than 0.5 by LearnCoil-VMF. Human parainfluenza virus 1 is the only sequence for which only one coiled-coil-like region is found; however, this sequence aligns well with the other paramyxovirus sequences in its genus, and thus it is expected to contain a second coiled-coil-like region as well. As in the lentivirus membrane-fusion proteins, LearnCoil-VMF finds one coiled-coil-like region soon after the cleavage site and fusion peptide, and the other directly preceding the final transmembrane domain. The number of residues between the two coiled-coil-like regions in all the paramyxoviruses is substantial (more than 265 residues in the paramyxoviruses, compared to 50 or fewer residues in the lentiviruses); in fact, there is no apparent sequence similarity between the paramyxovirus and lentivirus sequences. An additional third coiled-coil region is found using all four programs in bovine respiratory syncytial virus and human respiratory syncytial virus; however, this region is before the cleavage site, in the F2 glycoprotein (data not shown). The significance of this region is unknown.
Table 4

Paramyxovirus N-terminal helix

VirusAdd distSequenceLikelihoods
LPCM
Paramyxgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabc
BPIV-327EAKQAKSDIEKLKEAIRDTNKAVQSIQSSVGNLIVAVKSVQDYVNN9898
HPIV-327EAKQARSDIEKLKEAIRDTNKAVQSVQSSIGNLIVAIKSVQDYVNK9898
HPIV-130EAREARKDIALIKDSIIKTHNSVELIQRGIGEQIIALKTLQDFVNN7010
Sendai30EAREAKRDIALIKESMTKTHKSIELLQNAVGEQILALKTLQDFVND9496
Morbilli
CDV27QSNLNAQAIQSLRTSLEQSNKAIEEIREATQETVIAVQGVQDYVNN9677
Measles27QSMLNSQAIDNLRASLETTNQAIEAIRQAGQEMILAVQGVQDYINN9043
PPRV27QSLMNSQAIESLKTSLEKSNQAIEEIRLANKETILAVQGVQDYINN9685
PDV27QSNLNAQAIQSLRASLEQSNKAIDEVRQASQNIIIAVQGVQDYVNN9578
RPV27QSMMNSQAIESLKASLETTNQAIEEIRQAGQEMVLAVQGVQDYINN9057
Rubula
HPIV-227KANANAAAINNLASSIQSTNKAVSDVIDASRTIATAVQAIQDHING9010
HPIV-4a27KAQENAKLILTLKKAATETNEAVRDLANSNKIVVKMISAIQNQINT9262
HPIV-4b27KAQENAQLILTLKKAAKETNDAVRDLTKSNKIVARMISAIQNQINT9532
SV527KANENAAAILNLKNAIQKTNAAVADVVQATQSLGTAVQAVQDHINS9421
PRV27RANKNAEKVEQLSQALGETNAAISDLIDATKNLGFAVQAIQNQINT9545
Mumps27QAQTNARAIAAMKNSIQATNRAVFEVKEGTQQLAIAVQAIQDHINT9000
NDV27QANQNAANILRLKESITATIEAVHEVTDGLSQLAVAVGKMQQFVND9204
Pneumo
BRSV20KVLHLEGEVNKIKNALLSTNKAVVSLSNGVSVLTSKVLDLKNYIDK9211
HRSV20KVLHLEGEVNKIKNALLSTNKAVVSLSNGVSVLTSKVLDLKNYINN9211
PVM20KTVQLESEIALIRDAVRNTNEAVVSLTNGMSVLAKVVDDLKNFISK9040
TRTV24KTIRLEGEVKAIKNALRNTNEAVSTLGNGVRVLATAVNDLKEFISK9452

Add dist is the distance between the cleavage site and beginning of the region shown. Abbreviations and GenBank accession numbers (Benson for both Table 4, Table 5: paramyx, genus paramyxovirus; morbilli, genus morbillivirus; rubula, genus rubulavirus; pneumo, genus pneumovirus; BPIV, bovine parainfluenza (1353202); HPIV, human parainfluenza (138273, 138268, 138269, 1255649, and 1255651); Sendai (138276); CDV, canine distemper (138249); measles (138254); PPRV, peste-des-petits-ruminants (1085797); PDV, phocine distemper (138267); RPV, rinderpest (138275); SV5, simian parainfluenza 5 (335117); PRV, porcine rubulavirus (1808667); mumps (138259); NDV, Newcastle disease (465403); BRSV, bovine respiratory syncytial (138248); HRSV, human respiratory syncytial (138250); PVM, pneumonia virus of mice (549309); TRTV, turkey rhinotracheitis (138283).

Table 5

Paramyxovirus C-terminal helix

VirusAdd dist2SequenceLikelihoods
LPCM
Paramyxdefgabcdefgabcdefgabcdefgabcd
BPIV-3275ISMELNKAKLELEESKEWIKKSNQKLDSV9301
HPIV-3275ISIELNKAKSDLEESKEWIRRSNQKLDSI9302
HPIV-1275ISLNLASATNFLEESKIELMKAKAIISAV0000
Sendai275ISLNLADATNFLQDSKAELEKARKILSEV3574
Morbilli
CDV268ISLDRLDVGTNLGNALKKLDDAKVLIDSS9270
Measles268ISLERLDVGTNLGNAIAKLEDAKELLESS9363
PPRV268ISLEKLDVGTNLGNAVTRLENAKELLDAS9020
PDV268ISLERLDVGTNLGSALKKLDDAKVLIESS9161
RPV268ISLEKLDVGTNLWNAVTKLEKAKDLLDSS5000
Rubula
HPIV-2275LSNQINSINKSLKSAEDWIADSNFFANQA9000
HPIV-4a275LSTDLNQYNQLLKSAEDHIQRSTDYLNSI9065
HPIV-4b275LSTDLNQYNQLLKSAENHIQRSNDYLNSI9083
SV5275ISQNLAAVNKSLSDALQHLAQSDTYLSAI9000
PRV275ISGNLIAVNNSLSSALNHLATSEILRNEQ2000
Mumps275ISTELSKVNASLQNAVKYIKESNHQLQSV9630
NDV275ISTELGNVNNSISNALDKLEESNSKLDKV9694
Pneumo
BRSV287FDASIAQVNAKINQSLAFIRRSDELLHSV9000
HRSV287FDASISQVNEKINQSLAFIRRSDELLHNV9000
PVM285FDVAIRDVEHSINQTRTFFKASDQLLDLS4000
TRTV285FNVALDQVFESIDRSQDLIDKSNDLLGAD3000

Add dist2 is the distance from the end of the region shown in Table 4 to the beginning of this region.

Paramyxovirus N-terminal helix Add dist is the distance between the cleavage site and beginning of the region shown. Abbreviations and GenBank accession numbers (Benson for both Table 4, Table 5: paramyx, genus paramyxovirus; morbilli, genus morbillivirus; rubula, genus rubulavirus; pneumo, genus pneumovirus; BPIV, bovine parainfluenza (1353202); HPIV, human parainfluenza (138273, 138268, 138269, 1255649, and 1255651); Sendai (138276); CDV, canine distemper (138249); measles (138254); PPRV, peste-des-petits-ruminants (1085797); PDV, phocine distemper (138267); RPV, rinderpest (138275); SV5, simian parainfluenza 5 (335117); PRV, porcine rubulavirus (1808667); mumps (138259); NDV, Newcastle disease (465403); BRSV, bovine respiratory syncytial (138248); HRSV, human respiratory syncytial (138250); PVM, pneumonia virus of mice (549309); TRTV, turkey rhinotracheitis (138283). Paramyxovirus C-terminal helix Add dist2 is the distance from the end of the region shown in Table 4 to the beginning of this region. Recently it has been shown that two peptides, each containing one of the heptad repeat regions of simian parainfluenza virus 5 (SV5), interact and form a helical complex that consists of a trimer of heterodimers (Joshi . The first peptide (denoted N1) contains the region shown in Table 4, along with seven N-terminal residues and ten C-terminal residues, and the second peptide (denoted C1) contains the region shown in Table 5, along with 14 N-terminal residues. Interestingly, for both the lentiviruses and the paramyxoviruses, the percentage of β-branched residues in a and d positions found in the first coiled-coil-like region is approximately twice the percentage found in the second coiled-coil-like region, † with a high percentage of β-branched residues in both the a and d positions in the first coiled-coil-like region. This is consistent with the observation that β-branched residues in both buried positions favor formation of the trimer oligomeric state of coiled coils (Harbury .

Filoviruses

As anticipated earlier (Fass , recent structural studies have suggested that the Ebola GP2 glycoprotein is structurally analogous to MoMLV TM Weissenhorn et al 1998a, Weissehorn et al 1998b, Malashkevich et al 1999. LearnCoil-VMF finds a coiled-coil-like region in the Ebola GP2 viral membrane-fusion protein, but not in the closely related Marburg virus GP (Table 6). The Ebola and Marburg virus glycoproteins are very similar to the non-filovirus Rous sarcoma virus membrane-fusion protein; remarkably, in the putative coiled-coil region, these filovirus proteins are more similar at a sequence level to the Rous sarcoma virus envelope protein than are any of the retrovirus sequences in Table 1, Table 3. In fact, the sequence similarity between the Rous sarcoma virus envelope protein and Ebola GP2 extends throughout the membrane-fusion domain of these sequences (Gallaher, 1996).‡ Thus, sequence analysis suggests that the Ebola GP2 contains a coiled coil, flanked by a short amphipathic helix on its outer core, and this is in agreement with recent structural studies Weissehorn et al 1998b, Malashkevich et al 1999. Although LearnCoil-VMF does not find a coiled-coil-like region in the Marburg virus GP, its sequence similarity to Ebola within this region (Table 6) suggests a similar structure.
Table 6

The filoviruses and Rous sarcoma virus

VirusSequenceLikelihoods
LPCM
gabcdefgabcdefgabcdefgabcdefgabcd
EbolaLANETTQALQLFLRATTELRTFSILNRKAIDFL8000
MarburgLANQTAKSLELLLRVTTEERTFSLINRHAIDFL0000
RousQANLTTSLLGDLLDDVTSIRHAVLQNRAAIDFL9000
∗∗ ∗ ∗ ∗ ∗ ∗ ∗∗ ∗∗∗∗∗

Shown is an alignment between the putative coiled-coil region of the viral membrane-fusion proteins of Ebola, Marburg and the non-filovirus Rous sarcoma virus. Residues that are identical in all three viruses are indicated with asterisks. The regions shown correspond to the coiled-coil in MoMLV TM (Table 3). GenBank accession numbers (Benson are: Ebola virus, 290392; Marburg virus, 2459880; and Rous sarcoma virus, 119487.

The filoviruses and Rous sarcoma virus Shown is an alignment between the putative coiled-coil region of the viral membrane-fusion proteins of Ebola, Marburg and the non-filovirus Rous sarcoma virus. Residues that are identical in all three viruses are indicated with asterisks. The regions shown correspond to the coiled-coil in MoMLV TM (Table 3). GenBank accession numbers (Benson are: Ebola virus, 290392; Marburg virus, 2459880; and Rous sarcoma virus, 119487.

Other virus families

Virus families in the iteration test set for which no coiled-coil regions are found in their putative membrane-fusion proteins by LearnCoil-VMF include flaviviridae (e.g., yellow fever virus and tick-borne encephalitis virus), rhabdoviridae (e.g., vesicular stomatitis virus), and togaviridae (e.g., eastern equine encephalitis virus). In the case of tick-borne encephalitis virus, the membrane-fusion protein indeed does not contain a coiled coil, at least in the native (i.e., non-fusogenic) state (Rey .

PDB sequences

In order to test the discriminative power of LearnCoil-VMF, sequences in the Protein Data Bank (PDB) were evaluated with the final learned tables. The homodimeric GCN4 coiled coil and its mutants, the Fos-Jun heterodimeric coiled coil, the Max homodimeric coiled coil, the trimeric chicken cartilage matrix coiled coil, the pentameric COMP coiled coil, and the antiparallel seryl-tRNA synthetase coiled coil all have likelihoods greater than 0.5 using both LearnCoil-VMF and PairCoil. None of these proteins are included in the databases used by LearnCoil-VMF; their identification suggests similarities between the identified coiled-coil-like regions in viral membrane-fusion proteins and traditional coiled coils. The following lists all non-coiled-coil structures in the PDB (June 1998 release) that are given greater than 0.5 likelihood by LearnCoil-VMF but not PairCoil (PDB accession ID, description, Learn-Coil-VMF likelihood, PairCoil likelihood): 4hb1, designed four helical bundle, 0.51, 0.33; 1rmv, ribgrass mosaic virus coat protein, >0.9, 0.16; 1wer, ras-gtpase-activating domain of human P120gap, 0.78, 0.40; 1zak, adenylate kinase from maize, 0.61, 0.21. The scoring regions for all these sequences correspond to primarily helical regions in the X-ray crystal structures. Thus, the LearnCoil-VMF program is not overly permissive in identifying coiled-coil-like regions.

Discussion

The LearnCoil-VMF program has detected coiled-coil-like regions in many viral membrane-fusion proteins. These detected regions are likely to correspond to coiled-coil-like structures for several reasons. For membrane-fusion proteins with solved X-ray crystal structures, LearnCoil-VMF detects only those regions that have been established as coiled-coil-like. Additionally, other protein sequences in the PDB with known coiled-coil structures, such as GCN4 (PDB accession ID 2zta), still score as coiled-coil-like using the LearnCoil-VMF program. On the other hand, although some non-coiled-coil sequences from the PDB are detected by the LearnCoil-VMF algorithm, the vast majority are not, indicating specificity in the learned tables. Finally, despite the significant sequence diversity in many viral membrane-fusion proteins within a particular genus, LearnCoil-VMF is usually consistent in recognizing similar regions within them. For example, although the lentivirus sequences in Table 1 vary considerably, the coiled-coil-like regions that the LearnCoil-VMF program identifies are all similarly located within the protein sequences (i.e., following the protein maturation cleavage site). Complementing previous crystallographic studies, the coiled-coil-like regions detected by LearnCoil-VMF give further evidence that the three-stranded coiled coil is a common motif found in many diverse viral membrane-fusion proteins, including those of many retroviruses, paramyxoviruses and filoviruses. Additionally, for many of these proteins, there is computational evidence for the existence of helices that pack against the outside of the coiled coil. For many of the retrovirus membrane-fusion proteins, such as those in the mammalian type C, avian type C, and type D genera, secondary structure prediction indicates a short, amphipathic helix following the N-terminal coiled-coil-like region. For the lentivirus membrane-fusion proteins, the two coiled-coil-like regions found by LearnCoil-VMF, in conjunction with the HIV-1 and SIV gp41 crystal structures, suggest amphipathic helices packed against the length of the three-stranded coiled-coil core. Two similarly placed coiled-coil-like regions are also found in the paramyxovirus membrane-fusion proteins, and studies on SV5 indicate that the two coiled-coil-like regions interact to form a trimer of heterodimers (Joshi (see also note added in proof). Other experimental evidence hints at a mechanistic similarity between the paramyxovirus membrane-fusion proteins and the HIV-1 and SIV gp41 structures. In particular, synthetic C-peptides (peptides that overlap the second heptad repeat region of HIV-1 gp41) have been shown to be potent inhibitors of HIV-1 infection Jiang et al 1993, Wild et al 1994, Lu et al 1995, Kilby et al 1998, Chan et al 1998. It is thought that the C-peptide acts in a dominant-negative manner, with the synthetic peptides binding to the central coiled coil Lu et al 1995, Chen et al 1995, Chan and Kim 1998. Like HIV-1, many paramyxoviruses are inhibited by peptides that overlap the second heptad repeat region Rapaport et al 1995, Lambert et al 1996, Yao and Compans 1996, Joshi et al 1998 (i.e., peptides overlapping those shown in Table 5). By analogy to influenza HA2, some of the coiled-coil-like regions found by LearnCoil-VMF may only be helical in the fusogenic state of the protein. In influenza HA2, a region that corresponds to a loop in the X-ray structure of native HA2 (Wilson “springs” into a helical conformation in its fusogenic state Carr and Kim 1993, Bullough et al 1994. This loop-to-helix region in influenza HA2 is predicted as coiled-coil-like using all four prediction programs. Furthermore, in the case of HIV-1, although the gp41 helical core is extremely stable, synthetic C peptides inhibit HIV-1 infection and syncytia formation at very low concentration, thus giving evidence that the gp41 core structure is not present in the native state of this protein (for a review, see Chan & Kim, 1998). Coiled coils have also been predicted Carr and Kim 1993, Spring et al 1993, Inoue et al 1992 and experimentally found to be a dominant feature Sutton et al 1998, Hayashi et al 1994, Chapman et al 1994, Fasshauer et al 1997, Lin and Scheller 1997, Hanson et al 1997, Weber et al 1998, Nicholson et al 1998 of the SNARE proteins that mediate membrane-fusion events in neurotransmission and protein trafficking. The general purpose coiled-coil prediction programs PairCoil, MultiCoil and COILS identify some but not all of the regions that make up the four-stranded parallel coiled coil in the crystal structure of the core of the synaptic-fusion complex (Sutton . Although LearnCoil-VMF is not as effective in identifying these coiled-coil regions, an iterative approach such as LearnCoil may be useful in designing a specialized program for identifying these coiled-coil regions in as yet unidentified proteins involved in other, non-viral membrane-fusion events.

Methods

A collection of 588 putative membrane-fusion proteins for enveloped viruses was gathered using the Entrez protein browser (Benson . For each virus family (Murphy , the following protein sequences were gathered (Fields : arenavirus, GP-C glycoprotein; baculovirus, glycoproteins gp64 and gp67; bunyavirus, G2 glycoprotein; coronavirus, S spike protein; filovirus, GP peplomar glycoprotein; flavivirus, E protein; herpesvirus, gH glycoprotein; orthomyxovirus, hemagglutinin; paramyxovirus, F protein; retrovirus, envelope protein; rhabdovirus, G protein; and togavirus, E1 glycoprotein. Obvious closely related sequences were eliminated by scoring each sequence with the PairCoil program and allowing only one representative sequence into the iteration test set from those that have the same sequence score. This left a total of 266 sequences in the iteration test set. The primary method used for coiled-coil detection is the LearnCoil program. The LearnCoil program is a general iterative method that extends the two-stranded coiled-coil prediction program PairCoil to improve identification of other types of coiled coils (Berger & Singh, 1997). Previously, the LearnCoil program has been used to identify coiled-coil-like regions in histidine kinase proteins (Singh . Iterative approaches similar to LearnCoil have been applied to sequence alignment and motif recognition Tatusov et al 1994, Lawrence et al 1993, Attwood and Findlay 1993, Gribskov 1992, Dodd and Egan 1990. Most recently, the position-specific iterated BLAST (PSI-Blast) procedure has been developed to detect weak but biologically relevant sequence similarities during database searches (Altschul . The basic method and its application to viral membrane-fusion proteins are first briefly outlined, and then described below in more detail. Further description of the method and its cross-validation testing can be found elsewhere (Berger & Singh, 1997). The LearnCoil program iteratively scans the 266 test sequences of putative viral membrane-fusion proteins. In each iteration, the algorithm scores all the test sequences (if the sequence was identified in the previous iteration, its effects are removed before scoring) and converts each score into a likelihood as in the pairwise correlation method PairCoil (Berger . Using these likelihoods, a subset of the sequences are chosen to build a database of potential coiled-coil-like regions. At the end of the iteration, these selected sequences are used to update the parameters to the scoring procedure. This iterative procedure repeats until the number of residues in each subsequent database changes by less than 3 %. In the final iteration, regions that have likelihood of at least 0.5 are selected for the final database. Since the procedure is randomized, the algorithm was run five times on the iteration test sequences. This gave five learned probability tables, which were averaged and then used along with the PairCoil scoring method to build the LearnCoil-VMF program. LearnCoil-VMF was thus designed as a specialized program for identifying coiled-coil-like regions in viral membrane-fusion proteins. As in each iteration of the LearnCoil algorithm, for each viral membrane-fusion protein, its reported LearnCoil-VMF likelihood is computed after removing the effects of the sequence from the averaged probability table.

Scoring

The LearnCoil program uses the scoring method of PairCoil (Berger as a subroutine. It uses probability estimates (see below) of each residue pair existing in each pair of heptad repeat positions. To obtain normalized probabilities, each probability for a given pair of heptad repeat positions distance i apart is divided by the corresponding distance-i probability for sequences in Genpept. Normalized singles probabilities are computed similarly. A residue propensity in a given heptad repeat position incorporates correlations between that residue and the residues that follow at distances i = 1, i = 2 and i = 4 (chosen empirically). For normalized probabilities P, the propensity of kth residue is given by:13lnP(k,k+1)P(k,k+2)P(k,k+4)P(k+1)P(k+2)P(k+4) Windows of length 30 are considered, and a window score for a particular heptad-repeat position is the sum of the residue propensities in the window. Finally, the residue score for the kth residue is the maximum window score over all windows containing it in all seven possible heptad repeat positions, and each sequence score is the maximum score over all its residues. Each score is converted into a likelihood in the interval [0,1] using the methods described by Berger et al 1995, Berger and Singh 1997. Mathematical justification of the scoring subroutine can be found in Berger (1995).

Selection

During each iteration, the LearnCoil algorithm builds a new database of potential coiled-coil regions. At the start of each iteration, this new database contains no residues. Once each sequence in the iteration test set is scored, it is selected for the new database with probability proportional to its likelihood. That is, a number is drawn uniformly at random from the interval [0,1], and if the number drawn is less than or equal to the likelihood of the sequence, then the sequence is selected for the new database. If a sequence is selected, regions in the sequence where the residues have a likelihood of greater than or equal to either the sequence likelihood or 0.5 are included in the new database. At the ends of scoring regions, window effects are handled by requiring that consecutive residue likelihoods do not drop off by more than 0.1.

Estimating probabilities

In the first iteration, probabilities are estimated from a database consisting of dimeric (Berger and trimeric (Wolf coiled coils†. At the end of each iteration, estimates of the singles and pairwise probabilities are updated using the new database. The new probabilities are a weighted average of the probabilities computed from the original database and the probabilities computed from the database selected in this iteration. The original database is weighted 0.05, and the selected database is weighted 0.95. These updated probabilities affect the scoring of sequences in the next iteration. In each iteration after the first, if a sequence is included in the database built in the previous iteration, it is removed from the database and the probabilities are adjusted before that sequence is scored. Other computational methods used for coiled-coil detection are PairCoil (Berger , MultiCoil (Wolf and COILS (Lupas .‡ Additionally, the PHD program Rost and Sander 1993, Rost and Sander 1994, Rost et al 1994 is used for secondary structure predictions. For all coiled-coil prediction methods, a likelihood of greater than or equal to 0.5 is taken as a prediction of a coiled-coil-like structure.
  61 in total

1.  Predicting coiled coils from protein sequences.

Authors:  A Lupas; M Van Dyke; J Stock
Journal:  Science       Date:  1991-05-24       Impact factor: 47.728

2.  Crystal structure of the Ebola virus membrane fusion subunit, GP2, from the envelope glycoprotein ectodomain.

Authors:  W Weissenhorn; A Carfí; K H Lee; J J Skehel; D C Wiley
Journal:  Mol Cell       Date:  1998-11       Impact factor: 17.970

3.  Computational learning reveals coiled coil-like motifs in histidine kinase linker domains.

Authors:  M Singh; B Berger; P S Kim; J M Berger; A G Cochran
Journal:  Proc Natl Acad Sci U S A       Date:  1998-03-17       Impact factor: 11.205

4.  A structural change occurs upon binding of syntaxin to SNAP-25.

Authors:  D Fasshauer; D Bruns; B Shen; R Jahn; A T Brünger
Journal:  J Biol Chem       Date:  1997-02-14       Impact factor: 5.157

Review 5.  Enveloped viruses: a common mode of membrane fusion?.

Authors:  F M Hughson
Journal:  Curr Biol       Date:  1997-09-01       Impact factor: 10.834

6.  Algorithms for protein structural motif recognition.

Authors:  B Berger
Journal:  J Comput Biol       Date:  1995       Impact factor: 1.479

7.  PHD--an automatic mail server for protein secondary structure prediction.

Authors:  B Rost; C Sander; R Schneider
Journal:  Comput Appl Biosci       Date:  1994-02

8.  A spring-loaded mechanism for the conformational change of influenza hemagglutinin.

Authors:  C M Carr; P S Kim
Journal:  Cell       Date:  1993-05-21       Impact factor: 41.582

9.  Combining evolutionary information and neural networks to predict protein secondary structure.

Authors:  B Rost; C Sander
Journal:  Proteins       Date:  1994-05

10.  Dissection of a retrovirus envelope protein reveals structural similarity to influenza hemagglutinin.

Authors:  D Fass; P S Kim
Journal:  Curr Biol       Date:  1995-12-01       Impact factor: 10.834

View more
  69 in total

1.  Mutational evidence for an internal fusion peptide in flavivirus envelope protein E.

Authors:  S L Allison; J Schalich; K Stiasny; C W Mandl; F X Heinz
Journal:  J Virol       Date:  2001-05       Impact factor: 5.103

2.  The trimer-of-hairpins motif in membrane fusion: Visna virus.

Authors:  V N Malashkevich; M Singh; P S Kim
Journal:  Proc Natl Acad Sci U S A       Date:  2001-07-10       Impact factor: 11.205

3.  Role of metastability and acidic pH in membrane fusion by tick-borne encephalitis virus.

Authors:  K Stiasny; S L Allison; C W Mandl; F X Heinz
Journal:  J Virol       Date:  2001-08       Impact factor: 5.103

4.  Furin is involved in baculovirus envelope fusion protein activation.

Authors:  Marcel Westenberg; Hualin Wang; Wilfred F J IJkel; Rob W Goldbach; Just M Vlak; Douwe Zuidema
Journal:  J Virol       Date:  2002-01       Impact factor: 5.103

5.  Formation and characterization of the trimeric form of the fusion protein of Semliki Forest Virus.

Authors:  D L Gibbons; A Ahn; P K Chatterjee; M Kielian
Journal:  J Virol       Date:  2000-09       Impact factor: 5.103

6.  Membrane fusion mediated by coiled coils: a hypothesis.

Authors:  J Bentz
Journal:  Biophys J       Date:  2000-02       Impact factor: 4.033

7.  Unique stabilizing interactions identified in the two-stranded alpha-helical coiled-coil: crystal structure of a cortexillin I/GCN4 hybrid coiled-coil peptide.

Authors:  Darin L Lee; Sergei Ivaninskii; Peter Burkhard; Robert S Hodges
Journal:  Protein Sci       Date:  2003-07       Impact factor: 6.725

8.  The 3D structure of the fusion primed Sendai F-protein determined by electron cryomicroscopy.

Authors:  Kai Ludwig; Bolormaa Baljinnyam; Andreas Herrmann; Christoph Böttcher
Journal:  EMBO J       Date:  2003-08-01       Impact factor: 11.598

Review 9.  Recent advances in the study of active endogenous retrovirus envelope glycoproteins in the mammalian placenta.

Authors:  Yufei Zhang; Jing Shi; Shuying Liu
Journal:  Virol Sin       Date:  2015-08-18       Impact factor: 4.327

10.  A conserved trimerization motif controls the topology of short coiled coils.

Authors:  Richard A Kammerer; Dirk Kostrewa; Pavlos Progias; Srinivas Honnappa; David Avila; Ariel Lustig; Fritz K Winkler; Jean Pieters; Michel O Steinmetz
Journal:  Proc Natl Acad Sci U S A       Date:  2005-09-19       Impact factor: 11.205

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.