Monique Merchant1,2, Carlos P Mata1,2, Yangci Liu1,2, Haoming Zhai1,2, Anna V Protasio3,4, Yorgo Modis1,2. 1. Molecular Immunity Unit, Department of Medicine, University of Cambridge, MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, CB2 0QH, UK. 2. Cambridge Institute of Therapeutic Immunology and Infectious Disease (CITIID), University of Cambridge School of Clinical Medicine, Cambridge CB2 0AW, UK. 3. Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge CB2 1QP, UK. 4. Christ's College, University of Cambridge, St Andrew's Street, Cambridge, CB2 3BU, UK.
Abstract
Endogenous viral elements (EVEs), accounting for 15% of our genome, serve as a genetic reservoir from which new genes can emerge. Nematode EVEs are particularly diverse and informative of virus evolution. We identify Atlas virus-an intact retrovirus-like EVE in the human hookworm Ancylostoma ceylanicum, with an envelope protein genetically related to GN-GC glycoproteins from the family Phenuiviridae. A cryo-EM structure of Atlas GC reveals a class II viral membrane fusion protein fold not previously seen in retroviruses. Atlas GC has the structural hallmarks of an active fusogen. Atlas GC trimers insert into membranes with endosomal lipid compositions and low pH. When expressed on the plasma membrane, Atlas GC has cell-cell fusion activity. With its preserved biological activities, Atlas GC has the potential to acquire a cellular function. Our work reveals structural plasticity in reverse-transcribing RNA viruses.
Endogenous viral elements (EVEs), accounting for 15% of our genome, serve as a genetic reservoir from which new genes can emerge. Nematode EVEs are particularly diverse and informative of virus evolution. We identify Atlas virus-an intact retrovirus-like EVE in the human hookworm Ancylostoma ceylanicum, with an envelope protein genetically related to GN-GC glycoproteins from the family Phenuiviridae. A cryo-EM structure of Atlas GC reveals a class II viral membrane fusion protein fold not previously seen in retroviruses. Atlas GC has the structural hallmarks of an active fusogen. Atlas GC trimers insert into membranes with endosomal lipid compositions and low pH. When expressed on the plasma membrane, Atlas GC has cell-cell fusion activity. With its preserved biological activities, Atlas GC has the potential to acquire a cellular function. Our work reveals structural plasticity in reverse-transcribing RNA viruses.
Retroviruses and other reverse-transcribing RNA viruses can frequently integrate their genome, reverse-transcribed from RNA into DNA, into the host-cell genome. Viral genomes integrated into germline cells are inherited by future generations as endogenous viral elements (EVEs). Most EVEs are endogenous retroviruses (ERVs) and account for approximately 15% of the human genome, seven times more than protein-coding genes (, ). EVEs and other transposons were initially viewed as parasitic DNA. It is now evident that they serve as a genetic reservoir, from which new genes and regulatory elements can emerge. Sequences of retroviral origin help control gene expression by serving as promoters, enhancers, and other regulatory elements (, ). Genes coopted from EVEs have evolved to fulfill vital cellular functions (). For example, syncytins, which drive cell-cell fusion of trophoblasts during placental development, are encoded by ERV envelope glycoprotein (env) genes (, ). Another recent example is the Gag capsid protein encoded by the Caenorhabditis elegans Cer1 retrotransposon, which encapsidates small noncoding RNAs into nucleocapsids that can transfer RNAs conferring learned pathogen avoidance behavior from infected parents to naïve progeny ().The reduced mutation rate of host versus retrovirus genomes [10−9 versus 10−3 mutations per site per year ()] means that EVEs are windows to ancestral retroviral sequences—evolutionary fossils preserved from the time of integration (). Some EVE genes are expressed in human tissues and retain their biological activities, such as membrane fusion activity in the case of Env proteins (Envs) (, , ). Aberrant expression of Envs is associated with disease (, ). With the biology of EVEs still largely uncharted, it is likely that many cellular functions of EVEs in health and disease remain undiscovered. Studying EVE genes with previously unidentified properties could therefore provide insights on the evolutionary history of reverse-transcribing RNA viruses and identify fundamental principles in host-virus coevolution.Nematode EVEs are particularly diverse and informative of virus evolution. EVEs from the family Belpaoviridae (BEL/Pao) (), related to retroviruses and widespread across metazoa, have revealing genetic features in nematodes (). The presence in C. elegans EVEs of overlapping open reading frames, otherwise unique to complex vertebrate retroviruses, suggests retroviruses originated in early metazoa with a common ancestor resembling belpaoviruses (). Furthermore, nematode endogenous belpaoviruses encode Envs that are genetically unrelated to retrovirus Envs (). Instead of a class I viral membrane fusion protein with a core fold of three bundled α helices (–)—a defining feature of modern retroviruses—belpaovirus Envs have sequence similarity to GC (G2) envelope glycoproteins from phleboviruses and bandaviruses (family Phenuiviridae) (). GC proteins are class II membrane fusion proteins, with a three-domain β strand architecture (, ) also found in alphaviruses (), flaviviruses (, ), and Rubella virus () but structurally unrelated to class I fusion proteins. A series of conformational changes in class II fusion proteins, triggered by endosomal acidification, catalyzes fusion of the viral and endosomal membranes to deliver the viral genome into the cytosol (–). A hydrophobic fusion loop first inserts into the endosomal membrane. The proteins then form trimers and fold back on themselves, pulling the cell membrane (held by the fusion loop) and the viral membrane (held by a transmembrane anchor) together so they fuse ().Class II fusion proteins are not limited to viruses: They also drive cell-cell fusion events of fundamental importance, including syncytial epithelia formation in C. elegans and other nematodes (–), and gamete fusion in protozoa, plants, algae, and invertebrates (–). The identical topology and overall arrangement of the three domains of viral and eukaryotic class II fusion proteins, along with similarities in their membrane fusion mechanisms, makes it all but certain they evolved from a common ancestor (). Although the evolutionary origin of the ancestral class II fusion protein remains unknown, the presence of class II fusion proteins in EVEs raises the provocative prospect that a gene transfer from a virus to a cell led to the advent of sexual reproduction (, ).Here, we identify a novel, intact endogenous belpaovirus in the human hookworm Ancylostoma ceylanicum (a parasitic nematode) with an Env more similar than any other eukaryotic sequence to phlebovirus GC protein sequences. We expressed and purified the GC-homologous fragment from this EVE, henceforth Atlas virus. A cryo–electron microscopy (cryo-EM) structure of Atlas GC reveals a class II viral fusion protein fold similar to phlebovirus GC proteins and not seen in retroviruses, as predicted 20 years ago (, ). We show that Atlas GC has all the hallmarks of an active class II membrane fusion protein. It undergoes a monomer-to-trimer transition and inserts into lipid membranes with a specific lipid composition in response to a low pH trigger. Our work provides biochemical validation for the hypothesis that acquisition of a fusion protein from an infectious virus, as exemplified by Atlas virus, represents a general paradigm of how retrotransposons can become retroviruses () and how ancestral reverse-transcribing viruses may have originated (). The preserved biological activities of Atlas GC, including membrane fusion activity, raise the question of whether these activities, and those of EVE gene products more broadly, have cellular functions or cause disease.
RESULTS
An intact EVE with a phlebovirus-like Env in the hookworm A. ceylanicum
The bioinformatic discovery of nematode EVEs with phlebovirus GC-like Env sequences not seen in retroviruses (, ) requires biochemical validation. To identify phlebovirus-like EVE Envs suitable for biochemical analysis, we performed PSI-BLAST (position-specific iterative basic local alignment search tool) searches for protein sequences similar to biochemically characterized phlebovirus and bandavirus GC proteins. A search with Rift Valley fever virus (RVFV) GC as the query identified the gene Acey_s0020.g108 (UniProt: A0A016UZK2) in the human hookworm A. ceylanicum as containing the most similar sequence outside infectious virus taxa [expected value (E value) of 10−20]. The homologous sequence lies within a 9204-nucleotide (nt) element having all the features of an intact EVE, including 100% identical 271-nt long terminal repeats (LTRs) and a coding sequence encoding a single 2828-residue Gag-Pol-Env polyprotein without any stop codons or introns (Fig. 1A and fig. S1A). We refer to this element as A. ceylanicum Atlas virus. It is one of nine A. ceylanicum EVEs that encode complete Gag-Pol-Env polyproteins (Fig. 1, B and C). These EVEs have the distinguishing genomic features of belpaoviruses from other nematode species, including an atypical Env and an aspartate to asparagine substitution (Y[X]DD → YVDN) in the most conserved reverse transcriptase (RT) motif, the polymerase site (fig. S1B) (, , ). Phylogenetic classification of the Atlas virus based on RT sequences confirms the Belpaoviridae phylogeny of Atlas virus, with Cer13 from C. elegans () as the closest neighbor in the RT tree (Fig. 1D). With a Gag sequence less than 20% identical to its closest homolog (Acey_s0020.g1106), the Atlas EVE is a candidate for classification as a member of the family Belpaoviridae (which contains a single genus, Semotivirus) (fig. S1C) ().
Fig. 1.
Atlas is an endogenous belpaovirus with a phlebovirus-like Env.
(A) Gene architecture of the Atlas EVE, gene Acey_s0020.g108 from the A. ceylanicum human hookworm parasite. Inset: Annotated close-up of the Atlas Env region encoding phlebovirus GN- and GC-like glycoproteins. Red residue numbers refer to the GC sequence alone. a.a., amino acid. TM, transmembrane; DI, domain I; DII, domain II; DIII, domain III. (B) List of A. ceylanicum EVEs encoding complete Gag-Pol-Env polyproteins with phlebovirus-like Envs. LTR sequence identities for two of the EVEs could not be calculated: Acey_s0206.g1994 had degenerate LTRs, and one LTR of Acey_s0025.g1106 was truncated in the genome assembly. bp, base pair. (C) Phylogenetic tree of GC proteins from Phenuiviridae and GC-like (G2-like) sequences from EVEs. Ace, A. ceylanicum; Cel, C. elegans; Nam, Necator Americanus, RVFV, Rift Valley fever virus; DabieV, Dabie bandavirus (SFTSV); HRTV, Heartland virus. (D) Phylogenetic classification of Atlas based on RT sequences. Alu TasV, Ascaris lumbricoides Tas virus; Dsi NinV, Drosophila simulans Ninja virus; Bmo PaoV, Bombyx mori Pao virus; Tru SuzV, Takifugu rubripes Suzu virus; Dme BelV, D. melanogaster Bel virus; Aga MooV, Anopheles gambiae Moose virus; RooV, Roo virus. Trees were drawn with iTOL v6.
Atlas is an endogenous belpaovirus with a phlebovirus-like Env.
(A) Gene architecture of the Atlas EVE, gene Acey_s0020.g108 from the A. ceylanicum human hookworm parasite. Inset: Annotated close-up of the Atlas Env region encoding phlebovirus GN- and GC-like glycoproteins. Red residue numbers refer to the GC sequence alone. a.a., amino acid. TM, transmembrane; DI, domain I; DII, domain II; DIII, domain III. (B) List of A. ceylanicum EVEs encoding complete Gag-Pol-Env polyproteins with phlebovirus-like Envs. LTR sequence identities for two of the EVEs could not be calculated: Acey_s0206.g1994 had degenerate LTRs, and one LTR of Acey_s0025.g1106 was truncated in the genome assembly. bp, base pair. (C) Phylogenetic tree of GC proteins from Phenuiviridae and GC-like (G2-like) sequences from EVEs. Ace, A. ceylanicum; Cel, C. elegans; Nam, Necator Americanus, RVFV, Rift Valley fever virus; DabieV, Dabie bandavirus (SFTSV); HRTV, Heartland virus. (D) Phylogenetic classification of Atlas based on RT sequences. Alu TasV, Ascaris lumbricoides Tas virus; Dsi NinV, Drosophila simulans Ninja virus; Bmo PaoV, Bombyx mori Pao virus; Tru SuzV, Takifugu rubripes Suzu virus; Dme BelV, D. melanogaster Bel virus; Aga MooV, Anopheles gambiae Moose virus; RooV, Roo virus. Trees were drawn with iTOL v6.The phlebovirus GC-like sequence spans the last 498 residues of the Atlas virus polyprotein (residues 2330 to 2828). It contains a single predicted C-terminal transmembrane helix, like phlebovirus GC proteins. Phleboviruses and other Phenuiviridae family members express a glycoprotein precursor that is cleaved by cellular proteases into two envelope glycoproteins, GN and GC (or G1 and G2). GN is the receptor-binding protein required for cellular attachment, and GC is the membrane fusion protein required for cell entry. GN is highly antigenic and more variable in sequence than GC. Our analysis of the Atlas virus Env sequence detected a slight but statistically significant similarity in the 421 residues preceding the GC-like sequence (residues 1909 to 2329) to GN glycoproteins from Phenuiviridae family viruses (E value of ≥10−7). Moreover, the distribution of predicted transmembrane helices and proteolytic cleavage sites in and adjacent to the Atlas GN- and GC-like sequences is the same as in phlebovirus and bandavirus glycoproteins (Fig. 1A). Together, these sequence features suggest that the Atlas virus Env contains tandem phlebovirus-like GN and GC glycoproteins instead of a retrovirus-like glycoprotein. With all the features of a recently active EVE and an apparently intact set of phlebovirus-like glycoproteins, Atlas virus is an excellent candidate for biochemical analysis.
Atlas GC has a class II membrane fusion protein fold not seen in retroviruses
As the molecular structure of phlebovirus GC proteins and how they drive fusion of the viral and cellular membranes are well established from previous studies (, ), we focused our biochemical analyses on the GC-like sequence from the Atlas virus. A recombinant ectodomain fragment of Atlas GC (polyprotein residues 2330 to 2772) was expressed in Drosophila melanogaster D.mel-2 cells as a secreted protein. The purified protein was a soluble, folded homotrimer (fig. S2). The structure of the GC trimer was determined by single-particle cryo-EM image reconstruction at an overall resolution of 3.76 Å (Fig. 2, A and B, table S1, and fig. S3). The map was sufficiently detailed for an atomic model to be built and refined for Atlas GC residues 2330 to 2769 using the crystal structure of RVFV GC () as a starting model (see Materials and Methods; fig. S4). The structure reveals a three-domain class II membrane fusion protein fold (Fig. 2, C and D). An atomic model of the GC trimer independently generated with AlphaFold-Multimer () was similar [root mean square deviation (RMSD) (Cα) = 1.17 Å; Fig. 2E]. All previously described retroviral Env structures have a helical coiled-coil class I fusion protein fold (–). The structure of the Atlas GC ectodomain fragment is similar to phlebovirus and bandavirus GC structures, specifically the trimeric postfusion GC structures from RVFV (), Dabie bandavirus [DABV, formerly SFTS (severe fever with thrombocytopenia syndrome) phlebovirus] () and Heartland virus (HRTV) (Fig. 2, D and F) (). Domain I, a 10-stranded β barrel augmented by a three-stranded sheet, organizes the structure. Two insertions in domain I form the elongated, mostly β-stranded domain II. Domain III has the seven-stranded β-sandwich topology of fibronectin type III (FN3) domains also found in macroglobulin domains (, ), but the hydrophobic core and disulfide bonding pattern of domain III differ from these and other annotated domains from nonviral species. A 16–amino acid portion of the stem region, which links domain III to the C-terminal transmembrane anchor in class II fusion proteins, could be modeled, spanning 5 nm from the end of domain III to within approximately 1 nm of the tip of domain II. The stem forms trimer contacts, adding a β strand to domain II of a different subunit, as seen in RVFV GC (). The overall configuration bears strong similarity to other viral and cellular class II fusion proteins including, in order of decreasing similarity: alphavirus E1 proteins (), EFF-1/AFF-1 cell-cell fusion proteins from C. elegans and other animals (, ), HAP2 (hapless 2, also known as generative cell-specific protein1 or GCS1) gamete fusion proteins from protozoa (–) and plants (, ), and flavivirus E proteins (, , ) (Fig. 2F). Despite these structural similarities, the only proteins or domains of known structure with detectable amino acid sequence similarity to Atlas GC (E value of <1 in PSI-BLAST) are the phlebovirus and bandavirus GC proteins (22 to 24% sequence identity; fig. S5).
Fig. 2.
Cryo-EM structure of a phlebovirus GC-like Env fragment from Atlas virus.
(A) Domain organization of Atlas GC. (B) Cryo-EM image reconstruction of a soluble ectodomain fragment of Atlas GC at 3.76 Å overall resolution. The purified protein was a homotrimer (fig. S2), and threefold (C3) symmetry was imposed. The map is colored by domain as in (A). A representative cryo-EM micrograph is shown in fig. S4A. (C) Overview of the refined atomic model of the Atlas GC trimer. Disulfide bonds (green) and an N-linked glycan (blue) are shown as sticks. (D) A single Atlas GC protomer. (E) Protomer from the AlphaFold-Multimer structure prediction for the GC trimer. The backbone is colored by model confidence [measured with the predicted local distance difference test (pLDDT)]; side chains of disulfide-bonded cysteines are in green. (F) The structures most similar to Atlas GC based on structure comparison with DALI (), GC glycoproteins from RVFV [root mean square deviation (RMSD), 2.6 Å; z score, 29; PDB: 6EGU ()], DABV [formerly SFTS phlebovirus; RMSD, 2.6 Å; z score, 28; PDB: 5G47 ()], and HRTV (RMSD, 2.6 Å; z score, 28; PDB: 5YOW ()]. Other representative class II fusion proteins are shown for comparison: SFV E1 [z score, 19; PDB: 1RER ()], C. elegans EFF-1 [z score, 19; PDB: 4OJD ()], Chlamydomonas reinhardtii HAP2 [z score, 16; PDB: 5MF1 ()], and dengue virus (DENV) E [z score, 14; PDB: 3G7T ()].
Cryo-EM structure of a phlebovirus GC-like Env fragment from Atlas virus.
(A) Domain organization of Atlas GC. (B) Cryo-EM image reconstruction of a soluble ectodomain fragment of Atlas GC at 3.76 Å overall resolution. The purified protein was a homotrimer (fig. S2), and threefold (C3) symmetry was imposed. The map is colored by domain as in (A). A representative cryo-EM micrograph is shown in fig. S4A. (C) Overview of the refined atomic model of the Atlas GC trimer. Disulfide bonds (green) and an N-linked glycan (blue) are shown as sticks. (D) A single Atlas GC protomer. (E) Protomer from the AlphaFold-Multimer structure prediction for the GC trimer. The backbone is colored by model confidence [measured with the predicted local distance difference test (pLDDT)]; side chains of disulfide-bonded cysteines are in green. (F) The structures most similar to Atlas GC based on structure comparison with DALI (), GC glycoproteins from RVFV [root mean square deviation (RMSD), 2.6 Å; z score, 29; PDB: 6EGU ()], DABV [formerly SFTS phlebovirus; RMSD, 2.6 Å; z score, 28; PDB: 5G47 ()], and HRTV (RMSD, 2.6 Å; z score, 28; PDB: 5YOW ()]. Other representative class II fusion proteins are shown for comparison: SFV E1 [z score, 19; PDB: 1RER ()], C. elegans EFF-1 [z score, 19; PDB: 4OJD ()], Chlamydomonas reinhardtii HAP2 [z score, 16; PDB: 5MF1 ()], and dengue virus (DENV) E [z score, 14; PDB: 3G7T ()].Atlas GC has the same structural features that distinguish phlebovirus and bandavirus glycoproteins from other class II fusion proteins: a larger number of disulfide bonds, 10 of which are conserved in phleboviruses and bandaviruses but not in other class II proteins; N-linked glycosylation in domain III; and a more extensive and rigid interface between domains I and II (Fig. 2, D and F, and fig. S5). The most notable differences between Atlas virus and phlebovirus GC structures are differences in the disulfide bonding pattern and in the composition of side chains lining the glycerophospholipid (GPL) headgroup binding pocket conserved in arboviral class II fusion proteins (). We discuss these differences and their potential functional implications below. Atlas GC also has a different glycosylation pattern, with a single predicted N-linked glycosylation site at Asn414 in domain III with a weak corresponding feature in the density map (fig. S6). In contrast, phlebovirus and bandavirus GC proteins contain two N-linked glycans in domain III, at two different sites. One of these, Asn1035 in RVFV, covers the fusion loop in the prefusion conformation of RVFV GC and stabilizes the prefusion dimer by forming contacts across the dimer interface (), as also seen in flavivirus E proteins (, ). This glycosylation site is conserved in DABV and HRTV GC but absent in Atlas GC. Despite these minor differences, the notable overall structural similarity of Atlas GC to phlebovirus and bandavirus GC proteins in the postfusion conformation experimentally validates the evolutionary link between nematode EVEs from the family Belpaoviridae and the fusion proteins of phleboviruses postulated on the basis of previous genomic analyses (, ).
Structure of the putative lipid membrane anchor of Atlas GC
Viral fusion proteins insert a membrane anchor—the fusion loop, in class II proteins—into the host cell membrane to initiate virus-cell membrane fusion. The putative fusion loop of Atlas GC can be identified by analogy to RVFV GC as spanning residues 127 to 140. The local resolution of the cryo-EM density for this region is lower than for the rest of the map, but the large number of structural constraints imposed by the positions of disulfide-bonded cysteines and other residues conserved in phleboviruses GC proteins allowed an atomic model to be built unambiguously (Fig. 3A and fig. S6, B and C). Specifically, the fusion loop is constrained by four disulfide bonds conserved in phleboviruses and bandaviruses, a fifth disulfide specific to Atlas GC (Cys129 to Cys138), a phenylalanine (Phe136) required in phleboviruses and bandaviruses at the apex of the fusion loop for membrane binding and fusion (, , ), and two conserved glycines (Gly128 and Gly134) that provide the torsional flexibility necessary for the fusion loop’s tightly folded conformation (Fig. 3, A to C). The fusion loop in the AlphaFold-Multimer model has the same fold and disulfide connectivity, validating the cryo-EM model (Fig. 3, C and F).
Fig. 3.
Structure of the putative fusion loop and lipid binding pocket of Atlas GC.
(A) Local sequence alignment of the fusion loops (orange boxes) of Atlas GC, RVFV GC, and DENV E. Green dumbbells indicate disulfides. Disulfides shown below the RVFV sequence are conserved in Atlas and RVFV. (B) Residues contributing to binding of the phosphate (Phos), glycerol, and choline (Chol) moieties of glycerophospholipid (GPL) molecules by RVFV GC (). Colors indicate side chain properties: blue, positive charge; pink, negative charge; magenta, hydrophobic; green, sulfhydryl; light blue, positive charge at endosomal pH. (C) Close-up of the fusion loop and GPL binding pockets of Atlas GC, RVFV GC, and DENV E. A single protomer from the postfusion trimer is shown, with key residues shown in stick representation. Label colors indicate side chain properties: blue, positive charge; red, negative charge; orange, hydrophobic. The AlphaFold-Multimer model is colored by model confidence as in Fig. 2. (D) Close-up of the fusion loop and GPL binding pocket of the Atlas GC trimer showing histidine residues in or near the pocket. Prime symbols following residue numbers denote the protomer to which the residue belongs. (E) Surface representation of the same view as in (D), colored by side chain hydrophobicity, except for histidine residues shown in light blue. (F) Close-up of the fusion loop and GPL binding pocket of the Atlas GC AlphaFold-Multimer trimer model, same view as in (D).
Structure of the putative fusion loop and lipid binding pocket of Atlas GC.
(A) Local sequence alignment of the fusion loops (orange boxes) of Atlas GC, RVFV GC, and DENV E. Green dumbbells indicate disulfides. Disulfides shown below the RVFV sequence are conserved in Atlas and RVFV. (B) Residues contributing to binding of the phosphate (Phos), glycerol, and choline (Chol) moieties of glycerophospholipid (GPL) molecules by RVFV GC (). Colors indicate side chain properties: blue, positive charge; pink, negative charge; magenta, hydrophobic; green, sulfhydryl; light blue, positive charge at endosomal pH. (C) Close-up of the fusion loop and GPL binding pockets of Atlas GC, RVFV GC, and DENV E. A single protomer from the postfusion trimer is shown, with key residues shown in stick representation. Label colors indicate side chain properties: blue, positive charge; red, negative charge; orange, hydrophobic. The AlphaFold-Multimer model is colored by model confidence as in Fig. 2. (D) Close-up of the fusion loop and GPL binding pocket of the Atlas GC trimer showing histidine residues in or near the pocket. Prime symbols following residue numbers denote the protomer to which the residue belongs. (E) Surface representation of the same view as in (D), colored by side chain hydrophobicity, except for histidine residues shown in light blue. (F) Close-up of the fusion loop and GPL binding pocket of the Atlas GC AlphaFold-Multimer trimer model, same view as in (D).The structure and chemical properties of the Atlas GC fusion loop resemble phlebovirus and bandavirus fusion loops. Ile130, Phe136, Phe137, and the Atlas-specific disulfide (Cys129 to Cys138) form a hydrophobic surface similar to phlebovirus, bandavirus, and flavivirus fusion proteins (Fig. 3C) (, ). The area of this surface is greater than in most viral class II fusion proteins. By analogy with other viral class II fusion proteins, the location and extent of the hydrophobic surface formed by the Atlas GC fusion loop suggest that it could function as a membrane anchor.In addition to inserting nonpolar side chains into the hydrophobic region of the membrane, viral class II fusion proteins form polar contacts with lipid headgroups via the fusion loop and an adjacent GPL binding pocket (). By selecting for headgroups with complementary electrostatic potential, polar contacts confer a degree of specificity to lipid binding. In phleboviruses, a set of conserved polar residues in the GPL binding pocket bind selectively to zwitterionic GPLs (). The Atlas GC structure reveals a putative GPL binding pocket with both conserved and novel features (Fig. 3). The arginine that forms bidentate hydrogen bonds with the GPL phosphate moiety in phleboviruses () is conserved in Atlas virus (Arg86). The disulfide bond and short-chain hydrophobic residue that bind the GPL glycerol moiety are also conserved (Cys87 to Cys135 and Ala90). However, an aspartate-arginine pair that binds choline and ethanolamine GPL moieties in phleboviruses is replaced in Atlas GC by two histidines (His271 and His85). Moreover, Atlas virus has an extra residue in the fusion loop, Asp133, compared to phlebovirus GC proteins. The Asp133 side chain points into the putative GPL binding pocket and is located near the position of the choline GPL moiety in the superimposed structure of RVFV GC bound to a phosphatidylcholine (PC) ligand (), suggesting that Asp133 could compensate for the lack of a conserved aspartate at position 271 (Fig. 3, A to C). Hence, the putative GPL binding pocket of Atlas GC appears to have the necessary physicochemical attributes to support GPL binding, with Arg86 binding the phosphate moiety, Cys87/Cys135/Ala90 binding the glycerol moiety, and His85/Asp133/His271 coordinating the end of the headgroup. We noted the presence in the cryo-EM reconstruction of a bulge in the density around the GPL binding pocket that is unaccounted for by the atomic model (fig. S6C). In addition, the absorbance at 260 nm of purified Atlas GC was higher than expected despite treatment with nucleases during purification (fig. S2A). These two observations would be consistent with lipid molecules with unsaturated acyl chains copurifying with Atlas GC, but the local resolution of the map was insufficient to ascertain whether the GPL binding pocket contained a ligand.
Atlas GC binds membranes with endosome-like lipid composition at low pH
A key step in viral membrane fusion is binding of the fusion protein to the host cell membrane. We assessed binding of Atlas GC ectodomain to liposomes in density gradient centrifugation and dynamic light scattering (DLS) experiments. Viruses containing class II fusion proteins, like many retroviruses, undergo membrane insertion and fusion in endosomal compartments where the pH is acidic (–). We therefore assayed liposome binding at a range of pH values. In contrast to RVFV GC ectodomain, Atlas GC ectodomain did not bind liposomes containing PC, phosphatidylethanolamine (PE), cholesterol, and sphingomyelin (SM) at neutral or acidic pH (pH 4 to 8; Fig. 4, A and B). At neutral pH (pH 7.8), Atlas GC ectodomain also failed to bind liposomes containing anionic lipids enriched in early or late endosomes: phosphatidylserine (PS) or bis(monoacylglycerol)phosphate (BMP, also known as lysobisphosphatidic acid), respectively. At pH 4, however, Atlas GC ectodomain bound tightly to liposomes containing PS or BMP, with weaker binding observed at pH 4.6 (Fig. 4, A and B, and fig. S7). Atlas GC ectodomain bound only weakly to liposomes containing phosphatidylglycerol (PG) instead of BMP although PG and BMP are regioisomers with identical chemical composition and electrostatic charge (of −1). PG and BMP differ only in the position of the second acyl-glycerol linkage, resulting in a linear configuration for BMP instead of the usual branched configuration for PG. Our liposome binding data show that Atlas GC binds to membranes containing specific GPLs that are enriched in the endosomal pathway in a pH-dependent manner. No other class II fusion protein has been reported to require low pH, PS, or BMP for membrane insertion. However, phleboviruses require only PE or PC and cholesterol for membrane insertion (), and bandaviruses require BMP and low pH for fusion (). Similarly, flaviviruses require BMP, PS, or other anionic lipids and low pH for efficient fusion (, , ).
Fig. 4.
Atlas GC ectodomain binds liposomes in a lipid- and pH-dependent manner.
(A) Liposome coflotation lipid binding assay. A liposome-GC ectodomain mixture in 40% OptiPrep density gradient medium was overlaid with a 30% OptiPrep cushion and centrifuged at 100,000g. Flotation was defined as the amount of GC ectodomain cofloating with liposomes in the top-half (T) fraction divided by the total amount of GC ectodomain in the top- and bottom-half (B) fractions. GC ectodomain was quantified by Coomassie-stained SDS-PAGE. Error bars show the SD of three replicates except wild-type (WT) + PC:PE:SM:CH:PS (two replicates) and WT + PC:PE:SM:BMP (four replicates). See fig. S8 for uncropped gels for all replicates. (B) Binding of GC ectodomain (WT, F136A/F137A or R86A) to liposomes measured by DLS as differences in liposome diameter in the presence and absence of Atlas GC ectodomain. Error bars show the SD of three to seven replicates. Significance was determined by two-way ANOVA of the mean change in liposome diameter, using Sidak’s multiple comparisons test with a 95% confidence interval in GraphPad Prism 8 (see fig. S7). *P < 0.05; **P < 0.01; ns, not significant. See dataset S1 for source data.
Atlas GC ectodomain binds liposomes in a lipid- and pH-dependent manner.
(A) Liposome coflotation lipid binding assay. A liposome-GC ectodomain mixture in 40% OptiPrep density gradient medium was overlaid with a 30% OptiPrep cushion and centrifuged at 100,000g. Flotation was defined as the amount of GC ectodomain cofloating with liposomes in the top-half (T) fraction divided by the total amount of GC ectodomain in the top- and bottom-half (B) fractions. GC ectodomain was quantified by Coomassie-stained SDS-PAGE. Error bars show the SD of three replicates except wild-type (WT) + PC:PE:SM:CH:PS (two replicates) and WT + PC:PE:SM:BMP (four replicates). See fig. S8 for uncropped gels for all replicates. (B) Binding of GC ectodomain (WT, F136A/F137A or R86A) to liposomes measured by DLS as differences in liposome diameter in the presence and absence of Atlas GC ectodomain. Error bars show the SD of three to seven replicates. Significance was determined by two-way ANOVA of the mean change in liposome diameter, using Sidak’s multiple comparisons test with a 95% confidence interval in GraphPad Prism 8 (see fig. S7). *P < 0.05; **P < 0.01; ns, not significant. See dataset S1 for source data.We note that the liposome diameters measured by DLS were smaller at pH 4 than at pH 7.8 and differences in lipid composition correlated with further differences in liposome diameter. Reduction of the pH below 6 reduces the lipid packing density within bilayers containing anionic lipids, which can, in turn, reduce the diameter of lipid vesicles (or alter their shape), an effect ascribed to headgroup protonation leading to reduced electrostatic repulsion (). Differences in the cholesterol content of the bilayer can also contribute to fluctuations in liposome size, as cholesterol affects lipid packing and raft formation. Hence, we only compared the liposome diameters from DLS experiments performed at the same pH and with the same lipid composition, where the only difference was the presence or absence of Atlas GC (Fig. 4B and fig. S7).The optimal pH for membrane binding of Atlas GC (pH 4 to 4.5) is similar to the optimal pH of hemifusion of Uukuniemi virus (), a model phenuivirus (uukuvirus genus), and would be consistent with membrane insertion in late endosomes, as is the case for phenuiviruses (). The side chains of His85 and His271, in the GPL binding pocket of Atlas GC, would be fully protonated at pH 4 to 4.5. The resulting net positive charge of the His85/Asp133/His271 triad (+1/−1/+1), analogous to the Arg/Asp pair that coordinates the end of GPL headgroups in phleboviruses and bandaviruses (Fig. 3B), mirrors the charge of the phosphoserine headgroup of PS (−1/+1/−1). Moreover, Atlas GC contains four additional solvent-exposed histidines (residues 88, 94, 99, and 100) in the vicinity of the GPL binding pocket (Fig. 3, D and E). Protonation of these histidines at low pH may promote further interactions with anionic lipid headgroups. The presence of six histidines in and around the GPL binding pocket provides a possible explanation for the observed pH-dependent insertion of Atlas GC into membranes containing PS and BMP. Consistent with a conserved role for the GPL binding pocket in determining lipid specificity of class II fusion proteins, mutations in alphaviruses at a position equivalent to His271, in the ij loop, determine the extent to which alphaviruses depend on cholesterol for membrane binding (–).
Atlas GC membrane binding does not strictly require cholesterol and occurs via the fusion loop
In addition to GPLs, phenuiviruses and alphaviruses (but not flaviviruses) require cholesterol for efficient membrane binding and subsequent fusion (, , ). Alphaviruses additionally require sphingolipids (such as SM) for efficient fusion (, ). We found that neither cholesterol nor SM were required for Atlas GC to bind liposomes (Fig. 4, A and B). Removal of cholesterol reduced the fraction of GC bound by 50% in the liposome flotation assay (Fig. 4A), although binding was still detected in the DLS assay (Fig. 4B). Hence, although cholesterol and SM are not strictly required for binding, they enhance binding, possibly by increasing membrane fluidity. Notably, the concentration of cholesterol in nematode cell membranes is approximately 20 times lower than in vertebrates (, ). This is insufficient for cholesterol to regulate the structure or fluidity of nematode membranes, in which cholesterol is thought to be instead a precursor for low-abundance metabolites (–). Likewise, Drosophila can grow indefinitely with only trace amounts of exogenous sterols (), suggesting that arthropods, which are obligate vectors of the vast majority of viruses containing class II fusion proteins, rely on lipids other than cholesterol to regulate membrane fluidity.To determine whether Atlas GC binds membranes in a manner analogous to other viral class II fusion proteins—via the fusion loop and GPL binding pocket—we purified Atlas GC ectodomain variants with mutations in the two phenylalanine residues in the fusion loop (F136A/F137A) or in the arginine predicted to bind the GPL phosphate moiety (R86A). The F136A/F137A mutant failed to bind liposomes containing PS and coflotation with liposomes containing BMP at pH 4 and pH 4.6 was reduced to approximately one-third of wild type (Fig. 4C). The R86A mutation reduced binding to liposomes containing PS or BMP at pH 4 to one-third and 75% of wild type, respectively (Fig. 4D). For both variants, preparations contained trimers as the major species, but a small monomeric fraction was also present (fig. S2), suggesting that the mutated lipid binding residues are required for efficient trimer assembly. We conclude that Atlas GC binds to lipid membranes through insertion of hydrophobic fusion loop residues and coordination of lipid headgroups in the GPL binding pocket, as in other viral class II fusion proteins.
Evidence for monomeric and trimeric states of Atlas GC
Class II fusion protein ectodomains can be monomeric or dimeric or form icosahedral shells in their prefusion conformation, but the fusogenic conformational change is always accompanied by reorganization into trimers (). Fusion proteins from classes I and III, including retrovirus fusogens, remain trimeric throughout the fusion reaction, but no class II fusion proteins are known to be trimeric in their prefusion conformation. Having established that Atlas GC can insert into membranes as a trimer with a postfusion-like conformation, we set out to determine whether it could undergo a conformational change as seen in the fusion reaction of class II proteins from infectious viruses. The Atlas GC construct described above was expressed as a trimer with no trace of monomers or dimers (Fig. 5A and fig. S2A). However, we found that a construct with the stem region truncated, GC(DI-III), spanning residues 2330 to 2751, was expressed as a mixture of monomers, trimers, and higher-order oligomers (Fig. 5B and fig. S2B). In contrast to GC trimers, which were stable at different protein concentrations and pH values, GC(DI-III) monomers were unstable over time. As noted above, monomeric fractions were also present in preparations of the fusion loop mutant (F136A/F137A) and GPL binding pocket mutant (R86A) (fig. S2, C and D). Whether these monomeric species are in a prefusion conformation remains to be determined, but the presence of metastable monomers and stable trimers recapitulates a key property of class II fusogens from infectious viruses.
Fig. 5.
Atlas GC oligomerization and disulfide bonding.
(A) SEC–multiangle light scattering (MALS) of Atlas GC ectodomain (residues 2330 to 2772). The protein formed trimers. (B) SEC of Atlas GC(DI-III) (residues 2330 to 2751) expressed in a mixture of oligomeric states, including monomers, trimers, and larger aggregates (void peak). GC(DI-III) monomers were unstable. (C) His258, His333, and His336 form interprotomer or interdomain polar contacts predicted to stabilize the GC trimer specifically at pH < 6, when the histidine side chains are charged. Residues from different protomers are denoted with a prime symbol. (D) View along the threefold axis of the GC trimer with intramolecular disulfides between the cysteines most proximal to the axis, Cys129 and Cys138.
Atlas GC oligomerization and disulfide bonding.
(A) SEC–multiangle light scattering (MALS) of Atlas GC ectodomain (residues 2330 to 2772). The protein formed trimers. (B) SEC of Atlas GC(DI-III) (residues 2330 to 2751) expressed in a mixture of oligomeric states, including monomers, trimers, and larger aggregates (void peak). GC(DI-III) monomers were unstable. (C) His258, His333, and His336 form interprotomer or interdomain polar contacts predicted to stabilize the GC trimer specifically at pH < 6, when the histidine side chains are charged. Residues from different protomers are denoted with a prime symbol. (D) View along the threefold axis of the GC trimer with intramolecular disulfides between the cysteines most proximal to the axis, Cys129 and Cys138.
pH-dependent stabilization of the Atlas GC trimer by protonated histidine residues
The increase in positive charge resulting from histidine protonation is an important part of the pH sensing mechanism of viral class II fusion proteins. Protonation of conserved histidines at the domain I–domain III interface of alphavirus, flavivirus, and phlebovirus glycoproteins promotes the fusogenic conformational change by destabilizing the prefusion conformation and stabilizing the postfusion conformation (, , , –). For example, histidines in domain III of phlebovirus GC proteins form interprotomer salt bridges with negatively charged side chains in the postfusion trimer and mutation of one such histidine in RVFV, His1087, renders the virus uninfectious (). Similarly, in Atlas GC, His258, His333, and His336 form interprotomer or interdomain polar contacts (Fig. 5C). These histidine-dependent contacts would stabilize the trimeric postfusion-like conformation of Atlas GC specifically in acidic endosomal compartments. The parallels of how Atlas GC and class II fusogens from infectious viruses respond at the ultrastructural level to environmental cues support the hypothesis that Atlas GC would have membrane fusion activity in late endosomes, like phleboviruses and many retroviruses.
Fifteen disulfide bonds stabilize Atlas GC in its postfusion-like conformation
With 30 cysteine residues forming 15 disulfide bonds, Atlas GC contains twice the average abundance of cysteines, more than has been found in any other class II protein. Twenty of these cysteines form disulfides that are structurally conserved in phleboviruses and bandaviruses (including RVFV, DABV, and HRTV) but not in other class II proteins (fig. S5). An 11th disulfide, in domain III, is conserved in Atlas virus, DABV and HRTV but not RVFV. However, Atlas GC contains four additional disulfides: one in the fusion loop, two in domain II in the β-hairpin containing the ij loop (one of the cysteines forming these disulfides is conserved in phleboviruses but forms a disulfide with a cysteine in a different β strand in domain II), and one in domain III (Figs. 2C and 3A and fig. S5). The disulfide bonding patterns in the AlphaFold-Multimer and cryo-EM models were identical (Fig. 2, D and E). As discussed above, the Atlas-specific disulfide in the fusion loop (Cys129-Cys138) extends the hydrophobic surface formed by conserved residues in the fusion loop that are required for membrane insertion (Fig. 5D). We note that due to the location of Cys129 and Cys138 close to the threefold symmetry axis of the trimer, the side chains of the two residues can be rearranged by torsional rotation to form intermolecular disulfides across the trimer interface, thereby cross-linking all three protomers in the trimer.
Atlas GC retains membrane fusion activity
Having identified the hallmarks of a fusion protein in Atlas GC, we measured its membrane fusion activity in a cell-cell fusion assay. Chinese hamster ovary (CHO) cells were transfected with plasmids encoding Atlas GC. To promote plasma membrane localization and minimize endoplasmic reticulum retention, we replaced the predicted transmembrane domain and cytosolic tail of GC with the C-terminal transmembrane anchor and cytosolic tail from human leukocyte antigen A2 (HLA-A2), known to localize to the plasma membrane (). Plasmids encoding vesicular stomatitis virus G (VSV G), or no protein, were used as positive and negative controls, respectively. Because Atlas GC trimers require low pH and BMP or PS to efficiently bind membranes, we treated transfected cells with exogenous BMP and then transferred them to pH 4.5 buffered medium to trigger fusion. Similar treatments have been used previously to measure cell-cell fusion activity of flavi- and alphaviruses (). Confocal light microscopy with nuclear and plasma membrane stains showed that cells with three or more nuclei were common in cells expressing Atlas GC following treatment with BMP and pH 4.5, although less abundant than in cells expressing VSV G, with or without treatment (Fig. 6, A and B, and movie S1). To quantify cell-cell fusion, we counted nuclei and multinuclear cells in micrographs. We found that in cells expressing Atlas GC, the fraction of multinuclear cells, defined as cells containing two or more nuclei, was 25 ± 3% following treatment with BMP and pH 4.5 versus 10 ± 0.2% without treatment (95% confidence intervals; Fig. 6C). Transfection of the GC fusion loop mutant F136A/F137A resulted in essentially identical multinucleation fractions as with empty vector, with or without treatment with BMP and pH 4.5 (Fig. 6C), underpinning the importance of the fusion loop in fusion activity. By comparison, in cells expressing VSV G, 73 ± 8% of treated cells were multinuclear. In addition, 8 ± 1% of cells transfected with empty vector were binuclear with or without treatment, but none contained more than two nuclei (fig. S9). Incubation of transfected cells with cyclin-dependent kinase 4 (CDK4), which arrests the cell cycle at G1 phase, had no measurable effect on GC-induced cell multinucleation (fig. S9D), indicating that cytokinesis effects do not significantly contribute to this cell-cell fusion assay readout. We conclude that Atlas GC has approximately one-third of the membrane fusion activity of VSV G under the treatment conditions tested, which is substantial given that VSV G is considered highly fusogenic and widely used as a model fusogen.
Fig. 6.
Cell-cell fusion assay.
(A) Confocal micrographs of CHO cells expressing Atlas GC fused to the transmembrane anchor from HLA-A2 or VSV G. Cells were treated with pH 4.5 buffer containing BMP or pH 7.4 buffer without lipid. Blue, Hoechst 33342 nuclear stain; magenta, CellBrite Red plasma membrane dye. Micrographs were selected to show fusion events—see fig. S9 for representative raw micrographs. Scale bars, 10 μm. (B) Confocal micrograph of a multinucleated CHO cell expressing Atlas GC following treatment with pH 4.5 and BMP. Blue, Hoechst 33342; magenta, CellBrite Red; green, α-HA antibody for Atlas GC detection. Scale bar, 10 μm. A composite Z stack of this cell is shown in movie S1. (C) The fraction of multinuclear cells transfected with plasmids encoding Atlas GC, VSV G, or pHLsec empty vector was calculated by counting mono- and multinucleated cells using the Hoechst and CellBrite stains. Error bars represent SD between measurements from three distinct experiments, with 33 to 568 nuclear clusters counted per replicate. P values were calculated by two-way analysis of variance (ANOVA) (Tukey’s multiple comparisons test, GraphPad Prism 9). (D and E) GC cell surface expression analysis. Cells were biotinylated, cell surface proteins were affinity-purified with NeutrAvidin agarose, and the cell surface GC fraction was quantified by Western blot (D) and LC-MS/MS (E). See dataset S2 for source data.
Cell-cell fusion assay.
(A) Confocal micrographs of CHO cells expressing Atlas GC fused to the transmembrane anchor from HLA-A2 or VSV G. Cells were treated with pH 4.5 buffer containing BMP or pH 7.4 buffer without lipid. Blue, Hoechst 33342 nuclear stain; magenta, CellBrite Red plasma membrane dye. Micrographs were selected to show fusion events—see fig. S9 for representative raw micrographs. Scale bars, 10 μm. (B) Confocal micrograph of a multinucleated CHO cell expressing Atlas GC following treatment with pH 4.5 and BMP. Blue, Hoechst 33342; magenta, CellBrite Red; green, α-HA antibody for Atlas GC detection. Scale bar, 10 μm. A composite Z stack of this cell is shown in movie S1. (C) The fraction of multinuclear cells transfected with plasmids encoding Atlas GC, VSV G, or pHLsec empty vector was calculated by counting mono- and multinucleated cells using the Hoechst and CellBrite stains. Error bars represent SD between measurements from three distinct experiments, with 33 to 568 nuclear clusters counted per replicate. P values were calculated by two-way analysis of variance (ANOVA) (Tukey’s multiple comparisons test, GraphPad Prism 9). (D and E) GC cell surface expression analysis. Cells were biotinylated, cell surface proteins were affinity-purified with NeutrAvidin agarose, and the cell surface GC fraction was quantified by Western blot (D) and LC-MS/MS (E). See dataset S2 for source data.To confirm that GC was present on the surface of cells transfected with a plasmid encoding GC, cell surface proteins were biotinylated, affinity-purified, and analyzed by immunoblot and liquid chromatography with tandem mass spectrometry (LC-MS/MS). GC was readily detected in these fractions by Western blot (Fig. 6D) and LC-MS/MS (Fig. 6E).
DISCUSSION
Here, we identify Atlas virus as an endogenous belpaovirus containing intact gag, pol, and env genes with previously unknown features in the human hookworm A. ceylanicum. The cryo-EM structure of the Atlas Env reveals a class II viral fusion protein fold similar to that of the GC glycoprotein from RVFV. Atlas GC has the hallmarks of an active class II membrane fusion protein: a stable trimeric assembly, a putative fusion loop, membrane insertion triggered by low pH with specificity for late endosomal lipid composition, and membrane fusion activity.Our work supports the model first proposed on the basis of phylogenetic studies that the nematode belpaoviruses acquired their env by horizontal gene transfer from a virus from the family Phenuiviridae or a phlebovirus-like ancestor (). It remains unclear whether RNA encoding the phlebovirus-like glycoprotein integrated into the belpaovirus ancestor as mRNA in a splicing event or by first becoming a substrate for the RT with subsequent genomic integration as double-stranded DNA. The envelope proteins from retroviruses, including ERVs, that have been biochemically characterized were all found to be class I fusion proteins with an α-helical coiled coil as the core fold. Viral class II fusion proteins have so far been found only in nonintegrating RNA viruses. Our discovery of an endogenous belpaovirus with a potentially functional, phlebovirus-like class II fusion protein that is structurally unrelated to retrovirus Envs reveals an unexpected degree of structural and genetic plasticity in reverse-transcribing RNA viruses. More generally, the presence of the class II fusion protein fold in EVEs and across many phyla suggests that this fold is derived from a common ancestor that could have been viral or cellular in origin (–, ).While rare, horizontal gene transfer of atypical fusogens into retroelements is not unique to the belpaoviruses. The Tas element from the nematode Ascaris lumbricoides, a common parasitic worm in humans, has an env gene with weak genetic similarity to herpesvirus gB proteins (), which have a class III fusion protein fold (also found in Rhabdoviridae and Baculoviridae). Together, these findings lead us to hypothesize that acquisition of a fusion protein from an infectious virus represents a general paradigm of how retrotransposons can become retroviruses and how ancestral reverse-transcribing viruses may have originated.The env gene is often the first element to be lost in EVEs, as it is not required for intracellular proliferation, so it is notable that the GN-GC
env module is intact in the Atlas virus. With its identical LTRs and no stop codons or frameshift mutations, the Atlas virus shows all the signs of being intact and recently active. This supports the notion that the envelope may be functional (). The preserved biological activities of Atlas GC suggest these activities could have cellular functions in health and disease, as reported for a small but increasing number of ERV env and gag gene products (, , , ). RNA sequencing (RNA-seq) data for A. ceylanicum () suggest that Atlas and some other complete belpaoviruses are transcribed, with transcript abundance varying across developmental stages (fig. S10). A subset of belpaoviruses has also been reported to be highly transcribed in the parasitic trematode Schistosoma mansoni (blood fluke) (). Further studies are required to determine the full extent to which protein expression from transposable elements—and its dysregulation—contribute to basic cellular functions, embryonic development, and disease outcomes. This work provides a blueprint for such efforts.
MATERIALS AND METHODS
Genomic analyses of A. ceylanicum Atlas virus
A PSI-BLAST search for protein sequences similar to biochemically characterized phlebovirus fusion proteins identified the gene Acey_s0020.g108 (UniProt: A0A016UZK2; GenBank: JARK01001356.1; genomic translation EYC20859.1) in the human hookworm A. ceylanicum as the most similar sequence outside infectious virus taxa, with an E value of 10−20 against the RVFV GC sequence (UniProt: P03518). A second iteration performed using position-specific scoring matrix based on an alignment of sequences identified in the first iteration gave an E value of 10−144. The Acey_s0020.g108 gene is referred to here as the Atlas virus.Phylogenetic analysis of the phlebovirus GC-like protein from the Atlas virus and other GC or GC-like proteins with similar protein sequences was performed as follows. A reference multiple-sequence alignment was initially generated by aligning the Atlas GC protein sequence with the sequences of reference protein RVFV GC and other viral GC proteins identified as having structural similarity in structure comparisons with Dali (), namely, the GC proteins from DABV (formerly SFTSV; GenBank: AGM33042.1) and HRTV (GenBank: AFP33394.1). For each GC protein, the sequence range present in the corresponding Protein Data Bank (PDB) entry (7A4A, 6EGU, 5G47, or 5YOW, respectively) was defined as the GC ectodomain sequence and used to generate the reference alignment with MAFFT in SnapGene v5.1.7. The sequences of the following EVEs with detectable Env protein sequence similarity were then added to the alignment: Necator americanus NECAM_13468 (GenBank: XM_013440329.1), C. elegans Cer13 virus (GenBank: Z81510.2; WormBase, WBTransposon00000728, gene Y75D11A.5), and seven A. ceylanicum EVEs encoding complete Gag-Pol-Env polyproteins with predicted phlebovirus-like Envs (Fig. 1B; GenBank: EYC27361.1, EYC19962.1, EYC20099.1, EYC18998.1, EYC07469.1, EYB91703.1, and EYB80846.1). The tree was calculated with IQ-TREE v1.6.12 and drawn with iTOL v6.The RT phylogenetic tree was calculated on the basis of the protein sequences of Atlas virus, Cer13 virus, the seven A. ceylanicum EVEs listed above, and seven other belpaoviruses (GenBank: AF060859.1, Z29712.1, L09635.1, U23420.1, AY180917.1, D83207.1, and AF537216.1). RT domains (RVT_1; Pfam: PF00078) were identified with PfamScan in five of the sequences (GenBank: Z81510.2, EYC27361.1, EYC19962.1, EYC07469.1, and EYB80846.1). The protein sequences of these five RTs were extracted and aligned with MAFFT and Clustal Omega v1.2.4. The tree was generated with IQ-TREE v1.6.12 and iTOL v6. The Atlas virus LTRs were identified, and sequence identity matrices calculated as described in fig. S1.
Protein expression and purification
Synthetic genes encoding soluble ectodomain fragments of the Env of ERV Y032_0020g108 from A. ceylanicum were subcloned into the pMT/BiP/V5-His vector (Thermo Fisher Scientific) under the metallothionein (MT) promoter and in frame with the immunoglobulin heavy-chain binding protein (BiP) signal sequence and the C-terminal V5 and six histidine tags. The constructs referred to here as Atlas GC and Atlas GC(DI-III) span amino acids 2330 to 2772 and 2330 to 2751 from UniProt (A0A016UZK2), respectively. Atlas GC mutants were generated by Dpn I–based site-directed mutagenesis. D.mel-2 insect cells (Thermo Fisher Scientific) were cotransfected with the expression construct and blasticidin resistance marker pCoBlast (Thermo Fisher Scientific) at a 20:1 molar ration and cultured for 6 weeks in blasticidin (0.5 μg ml−1) to obtain a population of expressor cells. Expression was induced in a shaking cell suspension at 27°C with 0.5 mM CuSO4 at a cell density of 5 × 106 cells ml−1. The cell culture medium was harvested 4 to 5 days after induction, centrifuged to remove cells (2000g) and cell debris (17,000g), filtered with a 0.2-μm filter, concentrated by tangential flow filtration, and buffer-exchanged into 20 mM tris (pH 7.8), 0.3 M NaCl, 5% glycerol, 20 mM imidazole, and 0.5 mM TCEP [tris(2-carboxyethyl)phosphine]. Atlas GC was purified by nickel affinity chromatography with a HisTrap Excel column (Cytiva), followed by anion exchange chromatography with a MonoQ or Resource Q column (Cytiva) using 20 mM tris (pH 8.0), 50 mM NaCl, 5% glycerol, and 0.5 mM TCEP as the binding buffer and binding buffer plus 1 M NaCl as the elution buffer. Peak fractions were concentrated and further purified by size exclusion chromatography (SEC) with a Superdex 200 Increase (10/300) column (Cytiva) in 20 mM tris (pH 7.8 to 8.0), 0.15 M NaCl, 5% glycerol, and 0.5 mM TCEP. The C-terminal V5 and histidine tags were optionally cleaved by incubation with carboxypeptidase A (CPA) for 3 hours at 4°C (1:500 CPA:GC molar ratio).
Liposome binding assay
1-Palmitoyl-2-oleoyl-sn-glycero-3-phosphatidylcholine (PC), 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphatidylethanolamine (PE), egg SM, 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphatidyl-l-serine (PS), 1-palmitoyl-2-oleoyl-sn-glycero-3-phospho-(1′-rac-glycerol) (PG), (S,R) bis(monoacylglycero)phosphate (BMP) (Avanti Polar Lipids), and 1-cholesterol (Sigma-Aldrich) were dissolved in chloroform. Lipid solutions (25 mM) were mixed at various molar ratios and dried under nitrogen gas for more than 4 hours. The lipid film was resuspended in liposome buffer [20 mM tris (pH 7.8), 0.15 M NaCl, 5% glycerol, 0.5 mM TCEP, 2 mM MgCl2, 2 mM CaCl2, and 2 mM KCl] and subjected to five cycles of freeze-thawing in liquid nitrogen, followed by 25 cycles of extrusion through two 0.2-μm polycarbonate filter membranes (Whatman). Purified Atlas GC ectodomain was added in a 1:771 protein:lipid molar ratio and incubated at 37°C for 5 min. The pH was reduced by adding a 2 M stock solution of sodium acetate (pH 4.6) or 4.0 to a final concentration of 0.2 M. Following a 2-hour incubation at 37°C, the pH of the suspension was neutralized with 1 M tris (pH 8). OptiPrep density gradient medium (Sigma-Aldrich) was added to a concentration of 40%, maintaining 0.15 M NaCl throughout. Approximately 0.5 ml of the liposome suspension was placed in a centrifuge tube, overlaid with a 2.5-ml cushion of 30% OptiPrep solution, and centrifuged at 100,000g for 1 hour at 4°C in a TLA100.3 rotor (Beckman Coulter). Top and bottom fractions (approximately 1.5 ml each) were collected from the top meniscus with a micropipette. Atlas GC was quantified by densitometry of the absorbance at 700 nm of bands in Coomassie-stained SDS–polyacrylamide gel electrophoresis (PAGE) gels with an Odyssey scanner (LI-COR). Flotation was defined as the amount of Atlas GC ectodomain in the top fraction divided by the total amount of Atlas GC ectodomain in both fractions.For measurement of liposome diameter by DLS, the liposome suspensions were diluted 10-fold in liposome buffer before the addition of 40% OptiPrep solution. Following centrifugation, liposome diameters were measured in 384-well clear-bottomed optical imaging plates (Corning) with a DynaPro Plate Reader III (Wyatt Technologies). The mean diameter was calculated as the average of three independent measurements, each consisting of 15 2-s acquisitions. Protein-free acidified liposome controls were treated and measured in parallel, with the liposome buffer instead of the Atlas GC solution.
SEC and multiangle scattering analysis
Samples (100 μl) containing Atlas GC ectodomain (1.6 to 2.5 mg ml−1) were analyzed by SEC at 293 K on a Superdex 200 (10/300) column (Cytiva) in 20 mM tris (pH 7.8), 0.15 M NaCl, 5% glycerol, and 0.5 mM TCEP with a flow rate of 0.5 ml min−1. The SEC system was coupled to both multiangle light scattering and quasi-elastic light scattering modules (DAWN-8+, Wyatt Technology). The protein was also detected as it eluted from the column with a differential refractometer (Optilab T-rEX, Wyatt Technology) and an ultraviolet (UV) detector at 280 nm (Agilent 1260 UV, Agilent Technology). Molar masses of peaks in the elution profile were calculated from the light scattering and protein concentration, quantified using the differential refractive index of the peak assuming a specific refractive index increment, dn/dc, of 0.1860, with ASTRA6 (Wyatt Technology).
Cryo-EM sample preparation and data collection
Purified Atlas GC ectodomain trimer (3 μl at a concentration of 0.025 mg ml−1) in 20 mM tris (pH 7.8), 0.15 M NaCl, 5% glycerol, and 0.5 mM TCEP was applied onto glow-discharged R1.2/1.3 400 mesh copper grids (Quantifoil Micro Tools, Germany). The grids were blotted for 4 s and plunge-frozen in liquid ethane with a Vitrobot Mark IV (Thermo Fisher Scientific) at 4°C and 100% humidity. Preliminary sample screening and initial datasets were acquired on a FEI Tecnai F20 microscope operated at 200 kV equipped with a Falcon II direct electron detector (Thermo Fisher Scientific) at −4-μm defocus. High-resolution cryo-EM dataset collection was performed on a Titan Krios microscope (Thermo Fisher Scientific) operated at 300 kV equipped with a 20 eV slit-width GIF Quantum energy–filtered Gatan K2 Summit direct electron detector in counting mode. A total of 3027 movies were recorded at a calibrated magnification of ×130,000, leading to a magnified pixel size of 1.047 Å on the specimen. Each movie comprised 36 frames with an exposure rate of 1.28 e− Å−2 per frame, with a total exposure time of 8 s and an accumulated exposure of 46.18 e− Å−2. Data acquisition was performed with the EPU software for automated data acquisition for single-particle analysis (Thermo Fisher Scientific) with three shots per hole at −1.3- to −3.5-μm defocus.
Image processing
Micrographs from initial datasets allowed us to obtain a consistent model at ~19 Å resolution from 3790 particles selected after two-dimensional (2D) and 3D classification, and consequent autorefinement. All movies from high-resolution datasets were motion-corrected and dose-weighted with MOTIONCOR2 (). Aligned, non–dose-weighted micrographs were then used to estimate the contrast transfer function (CTF) with the program GCTF (). All subsequent image processing steps were performed using RELION 3.0 (). 2D references from initial datasets were used to autopick the micrographs. One round of reference-free 2D classification was performed to produce templates for better reference-dependent autopicking, resulting in a total of 987,570 particles. After a first round of 2D classification, 595,011 particles were selected to perform a second 2D classification, resulting in a final number of 320,041 selected particles. Then, a 3D classification imposing C3 symmetry was performed using the model from the initial datasets filtered at 40 Å resolution as the initial model. The best class, containing 197,145 particles, was selected and subjected to 3D autorefinement imposing C3 symmetry, yielding a map with an overall resolution at 4.11 Å based on the gold standard [Fourier shell correlation (FSC) = 0.143] criterion. After refinement, the CTF refinement (per-particle defocus fitting and beam tilt estimation) and Bayesian polishing routines implemented in RELION 3.0 were performed, yielding a final map with an overall resolution at 3.76 Å. Local resolution was estimated with RELION.
Model building and refinement
The most similar sequence to Atlas GC with a structure available was glycoprotein GC from RVFV. The crystal structure of RVFV GC in the postfusion conformation [PDB: 6EGU ()] was used as template to build a homology model with the sequence of Atlas GC using the SWISS-MODEL server (https://swissmodel.expasy.org). The output model was docked as a rigid body into the density with UCSF Chimera (). Initial docking was performed manually and was followed by real-space fitting with the Fit in Map routine. A preliminary step of real-space refinement was performed on the three-subunit model, with Phenix 1.13 (), with global minimization, atomic displacement parameter (ADP), simulated annealing, and morphing options selected. The model was then rebuilt in Coot () to optimize the fit to the density. Because of low-resolution information in the fusion loop region, the density was converted to .mtz file using CCP-EM software package tools, and blurring of the density allowed us to localize bulky residues and disulfide bonds and thus use them as a guide to build the entire fusion loop. A final step of real-space refinement was performed with Phenix 1.15, with global minimization and ADP options selected. The following restraints were used in the real space refinement steps: secondary structure restraints, noncrystallographic symmetry restraints between the protein subunits, side chain rotamer restraints, and Ramachandran restraints. Key refinement statistics are listed in table S1.
Model validation and analysis
The FSC curve between the final model and full map after postprocessing in RELION, model versus map, is shown in fig. S3A. Cross-validation FSC curves (fig. S3B) were calculated as follows. The atoms in the final atomic model were displaced by 0.5 Å in random directions with Phenix. The shifted coordinates were then refined against one of the half-maps generated in RELION, the “work set.” This test refinement was performed in Phenix using the same procedure as for the refinement of the final model (see above). The other half-map, the “test set,” was not used in refinement for cross-validation. FSC curves of the refined shifted model against the work set, FSCwork, and against the test set, FSCtest, are shown in fig. S3. The FSCwork and FSCtest curves are not substantially different, consistent with the absence of overfitting in our final models. The quality of the atomic models, including basic protein geometry, Ramachandran plots, and clash analysis, was assessed and validated with Coot, Phenix 1.15, and the Worldwide PDB (wwPDB) OneDep System (https://deposit-pdbe.wwpdb.org/deposition).
Cell-cell fusion assay
CHO Lec3.2.8.1 cells were transfected with pHLsec plasmids encoding ectodomain fragments of Atlas GC (residues 2330 to 2795) fused to the C-terminal transmembrane domain from HLA-A2 (residues 288 to 345) and cloned in frame with the vector’s secretion signal and a C-terminal hemagglutinin (HA) tag. Empty pHLsec plasmid and pcDNA encoding VSV G were used as negative and positive controls, respectively. Sixteen to 20 hours after transfection, cells were transferred to phosphate-buffered saline (PBS) supplemented with 2.5 mM BMP [18:1 (S,S) bis(monoacylglycero)phosphate; Avanti Polar Lipids]. To obtain a homogeneous BMP suspension, the mixture was freeze-thawed five times using liquid nitrogen and a water bath, followed by a 3-min incubation in a sonicating water bath. Cells were incubated in the BMP suspension (or PBS for the untreated control) at 37°C for 5 min, shown previously to be sufficient for anionic lipid incorporation into the plasma membrane (). Cells were transferred to pH 4.5 complete medium [Dulbecco’s modified Eagle medium adjusted to pH 4.5 with HCl supplemented with 10% fetal bovine serum (FBS)] or pH 7.4 complete medium for the untreated control and centrifuged at 2500g at 37°C for 2 min. Cells were immediately resuspended in complete media and plated out. Following reattachment, 4 to 6 hours after treatment, cells were washed with PBS, fixed with 4% formaldehyde for 5 to 10 min, and washed three times with PBS. Cell were then stained with Hoechst 33342 (Bio-Rad) and CellBrite Red cytoplasmic membrane dye (Biotium, catalog no. 30023) and imaged on a Nikon iSIM Swept Field inverted confocal microscope with a 60×/1.2–numerical aperture (NA) water objective.To control for the contribution of the cell cycle to Atlas-GC–induced multinucleation, we repeated the cell-cell fusion assay in the presence of the cell cycle inhibitor CDK4. The assay was performed as described above, except for the following modifications: The assay was performed with human embryonic kidney (HEK) 293 T cells; BMP lipid was added 36 to 40 hours after transfection; 10 μM CDK4 (Cayman Chemical, catalog no. 17648) was included in the medium used to resuspend the cells after centrifugation at pH 4.5; after staining with Hoechst 33342 and CellBrite Red, cells were stained overnight at 4°C with anti-CD98 mouse monoclonal immunoglobulin G1 κ (Santa Cruz Biotechnology, sc-376815; 1:100 dilution in 20% FBS), followed by Alexa Fluor 568 secondary antibody (Thermo Fisher Scientific, A11004; RRID:AB_ 2534072; 1:500 dilution in 20% FBS for 1 hour); and cells were imaged on a Zeiss 780 inverted confocal microscope with a 40×/1.3-NA oil objective or a Nikon CSU-W1 Spinning Disk inverted confocal microscope with a 60×/1.2-NA water objective.For most images, cluster analysis of the Hoechst channel was used to count single nuclei and identify polynuclear clusters (fig. S9). For a subset of images with small numbers of nuclei, mono- and polynuclear clusters were counted manually. Nuclei within polynuclear clusters were counted by visual inspection. The plasma membrane stain was used to confirm polynuclear clusters and count the number of multinuclear cells, defined as cells with two or more nuclei, by visual inspection. For some images containing large syncytia, due to poor plasma membrane staining of the syncytia, the Hoechst channel was used to count the total number of nuclei, and the CellBrite and Alexa Fluor 568 channels were used to manually count mononucleated cells, which had clearly distinguishable plasma membranes. The fraction of multinucleated cells (F) was calculated using the formula
Cell surface biotinylation and MS
Proteins on the cell surface were biotinylated and isolated with the Pierce Cell Surface Protein Biotinylation and Isolation Kit (Thermo Fisher Scientific, A44390). HEK293T cells were transfected with 3 μg of pHLsec-GC or enhanced green fluorescent protein–N1 (control) plasmid. One day later, cells were washed with PBS and incubated with sulfo-NHS-SS-biotin (0.25 g/liter) on ice for 10 min. Then, cells were washed with PBS and suspended in 0.5 ml of Thermo Fisher Scientific lysis buffer supplemented with cOmplete EDTA-free Protease Inhibitor Cocktail (Merck). Cells were lysed on ice for 30 min (with 5-s vortexing before and after lysis) and centrifuged at 15,000g for 5 min at 4°C. The lysate supernatant was incubated with 0.25 ml of NeutrAvidin Agarose resin for 30 min at 20°C on an end-over-end rotator. The resin was washed with wash buffer, and bound proteins were eluted with elution buffer from the kit.Samples were prepared for MS with the EasyPep Mini MS Sample Prep Kit (Thermo Fisher Scientific, A40006) following the manufacturer’s instructions. Briefly, proteins were extracted, reduced, alkylated, and digested with trypsin/Lys-C protease. Hydrophilic and hydrophobic contaminants were removed with a peptide cleanup step. LC-MS/MS analysis of peptide samples was performed on an Ultimate 3000 rapid separation liquid chromatography system connected to a Q-Exactive plus mass spectrometer (Thermo Fisher Scientific). The acquired raw files were processed with MaxQuant v1.6.6.0.
Western blotting
The cell surface biotinylated protein samples purified from HEK293T cells as described above were heated to 95°C in SDS sample buffer for 10 min. A total of 15 μl of each sample was run on a polyacrylamide gel. Gels were blotted onto polyvinylidene difluoride membranes (Merck). Blots were blocked in 5% milk in PBS and 0.2% Tween 20 and incubated overnight with anti-HA rabbit monoclonal antibody (Cell Signaling Technology, 3724; RRID:AB_1549585) diluted 1:1000 in blocking solution. Blots were imaged with the 800-nm channel of a LI-COR Odyssey fluorescent scanner after incubation with anti-rabbit DyLight 800–conjugated secondary antibody (Cell Signaling Technology, 5151; RRID:AB_10697505) at 1:5000 dilution for 30 min at room temperature.
RNA-seq analysis
We analyzed data published by Hawdon and colleagues () (fig. S10). ArrayExpress run accession numbers and corresponding life cycle stages were as follows: SRR6359160, L3; SRR6359161 and SRR6359163, L4; SRR6359164 and SRR6359165, adult mixed pooled worms; and SRR6359162 and SRR6359166, adult male pooled worms. Our mapping and counting strategy was modified as recommended for repetitive genomic features (). Reads were mapped against the reference A. ceylanicum genome () with STAR v2.7.5a with parameters: “--outSAMtype BAM SortedByCoordinate --runMode alignReads --outFilterMultimapNmax 1000 --outSAMmultNmax 1 --outFilterMismatchNmax 3 --outMultimapperOrder Random --winAnchorMultimapNmax 1000 --alignEndsType EndToEnd --alignIntronMax 1 --alignMatesGapMax 350.” FeatureCounts v2.0.1 with parameters “-M -F GFF -s 0 -p -t exon -g gene_id” was used to count reads over a modified annotation file based on the original annotation () and downloaded from Wormbase Parasite (https://parasite.wormbase.org). The original annotation file in GFF format was slightly modified to contain a “gene_id” feature in the ninth column and hence facilitate the calculation of aggregate reads by gene (see header of the GFF file, dataset S3). RNA-seq analysis scripts and instructions of how to use them are provided in the Supplementary Materials (dataset S4) and are additionally available on Github (https://github.com/annaprotasio/Merchant_et_al_2020).
Statistics
Error bars represent the SD or SE—as indicated in the respective figure legend—of two to seven replicates conducted across at least two independent experiments. SDS-PAGE gels and DLS data shown are representative of at least two independent experiments. Significance and P values were determined by two-way analysis of variance (ANOVA). For DLS data, Sidak’s multiple comparisons test was used with a 95% confidence interval, in Prism 8 (GraphPad). For cell-cell fusion data, Tukey’s multiple comparisons test was used in Prism 9 (GraphPad). Source data are provided in datasets S1 and S2. No statistical methods were used to predetermine sample size, experiments were not randomized, and the investigators were not blinded to experimental outcomes.
Authors: James P Bernot; Gabriella Rudy; Patti T Erickson; Ramesh Ratnappan; Meseret Haile; Bruce A Rosa; Makedonka Mitreva; Damien M O'Halloran; John M Hawdon Journal: Int J Parasitol Date: 2020-06-25 Impact factor: 3.981
Authors: Johannes Schindelin; Ignacio Arganda-Carreras; Erwin Frise; Verena Kaynig; Mark Longair; Tobias Pietzsch; Stephan Preibisch; Curtis Rueden; Stephan Saalfeld; Benjamin Schmid; Jean-Yves Tinevez; Daniel James White; Volker Hartenstein; Kevin Eliceiri; Pavel Tomancak; Albert Cardona Journal: Nat Methods Date: 2012-06-28 Impact factor: 28.547
Authors: P Guardado-Calvo; K Atkovska; S A Jeffers; N Grau; M Backovic; J Pérez-Vargas; S M de Boer; M A Tortorici; G Pehau-Arnaudet; J Lepault; P England; P J Rottier; B J Bosch; J S Hub; F A Rey Journal: Science Date: 2017-11-03 Impact factor: 47.728
Authors: David Kremer; Tanja Schichel; Moritz Förster; Nevena Tzekova; Corinne Bernard; Paul van der Valk; Jack van Horssen; Hans-Peter Hartung; Hervé Perron; Patrick Küry Journal: Ann Neurol Date: 2013-09-16 Impact factor: 10.422
Authors: Danillo L A Espósito; Jennifer B Nguyen; David C DeWitt; Elizabeth Rhoades; Yorgo Modis Journal: J Gen Virol Date: 2015-03-04 Impact factor: 3.891
Authors: Erich M Schwarz; Yan Hu; Igor Antoshechkin; Melanie M Miller; Paul W Sternberg; Raffi V Aroian Journal: Nat Genet Date: 2015-03-02 Impact factor: 38.330
Authors: David Moi; Shunsuke Nishio; Xiaohui Li; Clari Valansi; Mauricio Langleib; Nicolas G Brukman; Kateryna Flyak; Christophe Dessimoz; Daniele de Sanctis; Kathryn Tunyasuvunakool; John Jumper; Martin Graña; Héctor Romero; Pablo S Aguilar; Luca Jovine; Benjamin Podbilewicz Journal: Nat Commun Date: 2022-07-06 Impact factor: 17.694