Most essential functions in eukaryotic cells are catalyzed by complex molecular machines built of many subunits. To fully understand their biological function in health and disease, it is imperative to study these machines in their entirety. The provision of many essential multiprotein complexes of higher eukaryotes including humans, can be a considerable challenge, as low abundance and heterogeneity often rule out their extraction from native source material. The baculovirus expression vector system (BEVS), specifically tailored for multiprotein complex production, has proven itself to be uniquely suited for overcoming this impeding bottleneck. Here we highlight recent major achievements in multiprotein complex structure research that were catalyzed by this versatile recombinant complex expression tool.
Most essential functions in eukaryotic cells are catalyzed by complex molecular machines built of many subunits. To fully understand their biological function in health and disease, it is imperative to study these machines in their entirety. The provision of many essential multiprotein complexes of higher eukaryotes including humans, can be a considerable challenge, as low abundance and heterogeneity often rule out their extraction from native source material. The baculovirus expression vector system (BEVS), specifically tailored for multiprotein complex production, has proven itself to be uniquely suited for overcoming this impeding bottleneck. Here we highlight recent major achievements in multiprotein complex structure research that were catalyzed by this versatile recombinant complex expression tool.
Recombinant expression has had a profound impact on protein research, and heterologous expression systems are used on a daily basis in most molecular biology laboratories. The major workhorse for protein expression has been Escherichia coli, and today a very large number of expression plasmids and specialized strains, each with its own merits, exist for this prokaryotic host. The overwhelming majority of protein entries in the PBD (http://www.rcsb.org/) originate from heterologous production in E. coli. However, the increasing trend towards the study of more complex biological assemblies, sometimes with 10 or more subunits, has necessitated the development of more powerful eukaryotic expression systems that offer access to multiprotein complexes, which for a variety of reasons (large subunit size, requirement of post translational modifications and chaperones, others) cannot be made efficiently in E. coli. The baculovirus expression vector system (BEVS), introduced more than 20 years ago, has recently emerged as a particularly useful tool for producing large eukaryotic protein complexes in the quantity and quality required for detailed structural study.
MultiBac: baculovirus expression vector system for protein complex production
BEVS rely on a recombinant baculovirus carrying the gene(s) of interest to infect insect cell cultures for high-level heterologous production of the single protein or protein complex of choice. BEVS has remained for some time a relatively specialized application, but has entered mainstream in structural biology laboratories in the past several years. Engineering of the baculovirus genome towards improved protein complex production characteristics, the development of new plasmids and stream-lined methods for multigene assembly and integration into the baculovirus genome, and the establishment of standard operating procedures (SOPs) for cell culture maintenance, virus generation, infection and protein production have resulted in the MultiBac system, a BEVS that is particularly easy-to-use and sufficiently robust to be employed on a routine basis by users who do not possess specialist training [1]. These are critically important considerations for the use of BEVS in structural biology laboratories, where the overriding objective is protein sample preparation for structure analysis, and the investment in optimizing individual expression experiments necessarily needs to stay within a reasonably short timeframe. MultiBac has been instrumental for producing a number of multiprotein complexes that could not be accessed before for structural and functional studies. Recent examples of such multiprotein complexes are provided in the following, including the Head module of the Mediator from Saccharomyces cerevisiae, the yeast anaphase promoting complex APC/C, and the human general transcription factor TFIID core complex, each of which highlighting different aspects of BEVS application that were exploited to achieve successful structure determination.
The mediator head module from S. cerevisiae
Mediator is a large multiprotein complex that regulates most, if not all, gene transcription by RNA polymerase II (Pol II) [2]. Mediator is structurally and functionally conserved in all eukaryotes [3, 4]. In the yeastS. cerevisiae, where it was first discovered [5], Mediator comprises 21 subunits with a combined molecular mass exceeding 1 MDa [6, 7]. Mediator functions as an interface between DNA-bound protein activators of transcription and Pol II [2, 8, 9]. Of medical relevance, the mutations on several human Mediator subunits have been linked to neurological disorders and cancers [10, 11, 12, 13, 14, 15, 16]. Thus, elucidation of the molecular mechanisms of Mediator function is a major research subject in biomedical sciences. Despite its paramount importance in gene regulation, the large size, complexity, and low abundance of Mediator have made biochemical and structural studies extremely difficult, if not impossible, for a long time.Single-particle electron microscopy (EM) provided first impressions of the complexity underlying Mediator architecture. Unlike Pol II, which adopts a compact globular structure, Mediator from yeast is composed of three distinct modules [17, 18], termed Head, Middle, and Tail (Figure 1
). As suggested by structural, genetic and biochemical studies [19, 20, 21], the modules consist of 7–9 individual subunits each (Figure 1a). Mediator has multiple activities [5], and each module is thought to perform its own defined set of functions [22]. For example, the subunits from the Head module genetically interact with the C-terminal domain (CTD) of Pol II [23, 24], while several Middle and Tail module subunits have been shown to directly interact with transcriptional activators [9, 21, 25, 26, 27]. A reasonable strategy therefore is to address particular Mediator functions by studying the modules individually as first steps towards deciphering the overall mechanism of the whole complex. While still a formidable challenge, this approach certainly reduces the otherwise intimidating complexity of studying Mediator structure and function.
Figure 1
S. cerevisae Mediator Complex. (a) Subunit organization of Mediator. Head module subunits are colored in blue; those of the Middle module are in green and those of the Tail module are in ochre. (b) Genes encoding the Mediator Head module were assembled from multiple DNA progenitors (left) into a single multigene expression construct (right, top) for insertion by Tn7 transposition (Tn7L, Tn7R) into the MultiBac baculovirus to produce the Head module in infected insect cells (right, bottom). (c) X-ray crystal structure of the recombinant Mediator Head module showing Med17 (blue), Med11 (purple), Med22 (green), Med6 (yellow), Med8 (red), Med18 (cyan), and Med 20 (orange). The Head consists of three distinct domains, fixed jaw, movable jaw and neck (left). The neck domain is arranged in a novel multihelical bundle (right) [30].
S. cerevisae Mediator Complex. (a) Subunit organization of Mediator. Head module subunits are colored in blue; those of the Middle module are in green and those of the Tail module are in ochre. (b) Genes encoding the Mediator Head module were assembled from multiple DNA progenitors (left) into a single multigene expression construct (right, top) for insertion by Tn7 transposition (Tn7L, Tn7R) into the MultiBac baculovirus to produce the Head module in infected insect cells (right, bottom). (c) X-ray crystal structure of the recombinant Mediator Head module showing Med17 (blue), Med11 (purple), Med22 (green), Med6 (yellow), Med8 (red), Med18 (cyan), and Med 20 (orange). The Head consists of three distinct domains, fixed jaw, movable jaw and neck (left). The neck domain is arranged in a novel multihelical bundle (right) [30].
Producing the Mediator Head module in insect cells
The Mediator Head module is composed of 7 subunits (Med17, Med6, Med8, Med11, Med22, Med18, Med20) with a molecular mass of 223 kDa. Originally, a conventional co-infection approach with seven different baculoviruses, each encoding for one subunit, was pursued [28]. Highly expressing baculoviruses were carefully isolated and optimized for maximum expression, and then used in combination to co-infect insect cell cultures. This procedure was exceedingly lengthy (each experiment taking two months or more) and complicated logistics as seven viruses had to be maintained at high titers simultaneously in parallel. X-ray crystallography, however, requires rapid turnover of expression experiments. Often, the production of a large number of variants of the protein or complex studied is necessary, to identify the sample which produces diffraction quality crystals, and this must be within a timeframe that is as short as feasible.The solution to this problem came by expressing all Mediator Head subunits simultaneously from a single multigene baculovirus using the MultiBac system, thereby resolving a fundamental impediment. MultiBac utilizes a procedure termed tandem recombineering (TR) which considerably facilitates the iterative assembly of multigene cassettes from small DNA progenitors [1]. All genes encoding for the Mediator Head subunits were thus combined into a single construct by TR (Figure 1b) [29, 30••], significantly reducing experimental complexity. The multigene construct was next integrated into the engineered MultiBac baculovirus genome that is characterized by reduced proteolysis and delayed host cell lysis, thus improving the quality of the sample produced [1]. The recombinant virus was used to infect insect cell cultures to produce the Mediator Head (Figure 1b).
Crystal structure of recombinant Mediator Head module
The simplified recombinant production propelled structure determination of the Head module by X-ray crystallography. Modification of MED17 and MED18 by eliminating flexible regions was critical to obtain well-ordered single crystals suitable for diffraction data collection (Figure 1c). Incorporation of seleno-methionine in the recombinantly produced sample was instrumental for structure elucidation. The crystal structure of the Mediator Head reveals how this essential complex is built from its components combining stability as well as flexibility for transcription regulation, providing a platform for other transcription factors [30]. New features in the structure provide first impressions about architectural design principles of such large multiprotein machines. Notably, a portion of the Head named ‘neck domain’ confers stability and integrity of the complex by forming a novel multihelical bundle, engaging five of the seven subunits of the Head simultaneously in one compact structural unit (Figure 1c).
The anaphase promoting complex (APC/C)
The APC/C is an unusually large multisubunit RING E3 ubiquitin ligase that regulates cell cycle progression through the proteasome-dependent proteolysis of cell cycle regulatory proteins [31, 32, 33]. The core APC/C, formed from at least 13 different proteins, is activated on association of a regulatory coactivator subunit. The APC/C is highly conserved across eukaryotes, and 12 of the 13 S. cerevisiaeAPC/C subunits are conserved in Saccharomyces pombe and human. Owing to the presence of two copies of many APC/C proteins, the holo-APC/C is comprised between 18 and 19 subunits with an overall molecular mass ranging from 1 to 1.2 MDa [34].
Producing recombinant APC/C by BEVS: co-infection with two viruses
Early research on the APC/C was restricted to the use of native systems. Because most APC/C subunits are essential, genetic manipulations were intrinsically difficult, and the low natural abundance of APC/C limited structural and biochemical studies. The recent development of overexpression systems for S. cerevisiae and humanAPC/C [34••, 35], based on the MultiBac BEVS that reconstitutes all APC/C proteins, now enables a range of structural, biochemical and biophysical investigations. The expression and reconstitution of recombinant S. cerevisiaeAPC/C [34] used the first generation MultiBac cloning system for insect cell-baculovirus expression [36, 37, 38, 39, 40, 41]. To express S. cerevisiaeAPC/C, two viruses encoding five and eight subunits were combined for co-expression. The resultant recombinant co-expression system yielded ∼200-fold more intact APC/C than the endogenous system (0.5 mg/L insect cell culture). The reconstituted S. cerevisiaeAPC/C was correctly assembled, as judged by its structural correspondence to native APC/C (Figure 2
a and b), and its capacity to ubiquitinate mitotic cyclin in the presence of coactivator in a D box and KEN box dependent manner [34]. The ability to recapitulate the endogenous APC/C catalytic and regulatory activity using recombinant reconstituted APC/C provided strong evidence that the molecular composition of S. cerevisiaeAPC/C had been completely defined, a crucial prerequisite for understanding the complete system.
Figure 2
Structures of the APC/C and MCC. 3-D EM reconstructions of (a) recombinant S. cerevisiae APC/C and (b) endogenous S. cerevisiae APC/C. (c) Localisation of the Cdc16 dimer in Cdc16-assigned density (mesh). (d) Pseudo atomic structure of S. cerevisiae APC/C, adapted from [34]. (e) Endogenous human APC/C [51]. (f) Recombinant human APC/C [35]. (g) Structure of the S. pombe mitotic checkpoint complex [50].
Structures of the APC/C and MCC. 3-D EM reconstructions of (a) recombinant S. cerevisiaeAPC/C and (b) endogenous S. cerevisiaeAPC/C. (c) Localisation of the Cdc16 dimer in Cdc16-assigned density (mesh). (d) Pseudo atomic structure of S. cerevisiaeAPC/C, adapted from [34]. (e) Endogenous humanAPC/C [51]. (f) Recombinant humanAPC/C [35]. (g) Structure of the S. pombe mitotic checkpoint complex [50].
Dissecting recombinant yeast APC/C
Reconstitution of the holo-APC/C and APC/C subcomplexes allowed an accurate determination of the APC/C mass. An APC/C subcomplex of eight subunits was analyzed using native mass spectrometry and measured as 698.8 kDa, in good agreement with that predicted for a complex containing all subunits in unit stoichiometry plus an additional copy of Cdc23 [34]. Generation of APC/C subcomplexes, guided by an architectural map of the APC/C [42], and subsequent determination of their EM structures allowed the assignment of the three-dimensional molecular boundaries of individual subunits within the APC/C molecular envelope. As an example, comparing two APC/C subcomplexes differing by Cdc16–Cdc26 indicated difference density that corresponded closely to the crystal structure of the Cdc16–Cdc26 heterotetramer (Figure 2c) [43]. This subunit deletion approach allowed a systematic assignment of the majority of large APC/C subunits and by integrating crystal structures, and homology models of individual APC/C subunits, with a cryo-EM reconstruction of an APC/CCdh1•D-box ternary complex at 10 Å resolution [44], a high-resolution description of the subunit organization and pseudo-atomic model of the APC/C was determined (Figure 2d) [34••, 44]. The approach of subunit deletion employed by [34] accurately defines the molecular boundaries of specific APC/C subunits, enabling more reliable docking of subunit atomic models compared with that possible from the approximate locations of subunits defined by labeling N-termini and C-termini.
BEVS expression of complete human APC/C
More recently the overexpression and reconstitution of humanAPC/C in the insect/baculovirus system were reported by two groups [35, 45]. One group generated multigene vectors modified from the MultiBac plasmids pFBDM and pUCDM vectors [36], using USER ligation-independent cloning methods [46, 47] and incorporated all 14 genes (45 kb) into two baculoviruses [35], whereas the second group generated 14 individual viruses, each expressing a single APC/C subunit, which were combined for co-expression in insect cells for biochemical studies [45]. Interestingly Apc15 and Apc16 were only recently discovered as humanAPC/C subunits. Efforts to generate fully assembled recombinant humanAPC/C before the identification of Apc16 were unsuccessful [35]. This indicated that Apc16, located within the TPR subcomplex [48], is necessary for the optimal assembly of recombinant humanAPC/C in insect cells. Thus, the successful assembly and reconstitution of humanAPC/C using the insect cell/baculovirus expression system indicated that all humanAPC/C subunits necessary for a functional and catalytically active APC/C had finally been identified (Figure 2e and f).
X-ray structure of the mitotic checkpoint complex (MCC), a regulator of APC/C
To ensure correct chromosome segregation in mitosis, the spindle assembly checkpoint imposed by the MCC inhibits the APC/C until all chromosomes have achieved correct bipolar attachment to the mitotic spindle [49]. The MCC comprises the APC/C coactivator Cdc20 associated with the checkpoint proteins Mad2, Mad3/BubR1 and Bub3 and blocks D box and KEN box recognition by the APC/C. The crystal structure of S. pombe MCC (without Bub3) expressed using the MultiBac system revealed that Cdc20, Mad2 and Mad3 assemble into a triangular-shaped heterotrimer (Figure 2g) [50]. Mad3 coordinates the overall organization of the complex by forming numerous inter-subunit interactions with Mad2 and Cdc20. A helix-loop-helix motif at the N-terminus of Mad3 binds simultaneously to Mad2 and Cdc20, orienting the KEN box that acts as a pseudo-substrate inhibitor towards its receptor on Cdc20.
Human general transcription factor TFIID
Initiation of transcription by RNA polymerase II (Pol II) requires the controlled step-wise assembly of the preinitiation complex (PIC), comprising a large ensemble of proteins and protein complexes including Pol II and the general transcription factors GTFs (TFIIA, B, D, E, F, H) [52, 53, 54]. The promoter recognition complex, TFIID, is thought to be the cornerstone of PIC assembly. In addition, TFIID functions as a coactivator, mediating signals from sequence-specific activators to other components of the transcription machinery. In humans, TFIID comprises 14 protein subunits, TBP and the TBP associated factors (TAFs), ranging in size from 250 kDa to less than 20 kDa, with a multitude of functions [52, 53]. TAFs and TBP assemble into a large multisubunit complex with around 20 subunits and an overall molecular weight of ∼1.6 MDa [54].Our understanding of TFIID and its crucial role in transcription regulation is considerably hampered by a lack of detailed knowledge of the molecular architecture of this essential factor, its assembly in the cell, and its interactions with chromatin and other factors in the context of activated transcription. The overall shape of human and yeastTFIID was shown by EM, revealing an asymmetric tri-lobed structure resembling a molecular clamp [54]. The paucity and heterogeneity of the endogenous material used in these studies have limited structural insight to moderate resolution (∼30–40 Å for humanTFIID), prohibiting molecular level interpretation of TFIID architecture notwithstanding considerable effort.The current consensus regarding the copy-number of subunits within TFIID is that a subset of TAFs exists in two copies, while TBP and the remaining TAFs are present in single copy [52]. Owing to evolutionary conservation and overall similarity in shape of the yeast and humanTFIID complexes, this subunit composition is probably to be present in TFIID from most species. The concept emerged in which TAFs present in duplicate form a putative, 2-fold symmetric scaffold, around which the remaining TAFs and TBP organize as peripheral subunits [52], suggesting that a transition from symmetry to asymmetry may occur in the TFIID assembly pathway. Compelling functional support for this bipartite architectural design came from studies in Drosophila cells [55]. By means of RNAi to knockdown specific TAFs, a stable and functional core-TFIID complex, composed of TAF4, 5, 6, 9 and 12, was revealed in vivo. Further support stems from cryo-EM studies of TFIID preparations from yeast, in which a quasi-symmetric smaller shape was also found [56]. Together, these results point to the existence of a symmetric core-TFIID module of pivotal importance for the integrity and assembly of holo-TFIID.
Expressing human TFIID core complex: polyproteins adjust stoichiometries
The architecture of the humanTFIID core complex, consisting of 10 subunits (two copies each of TAF4, 5, 6, 9 and 12), was recently elucidated at nanometer resolution by an integrated approach combining recombinant overproduction by MultiBac, cryo-EM, data from X-ray crystallography and homology models [57] (Figure 3
). The structure revealed a symmetric arrangement of the subunits, validating previous experiments. Initially, the TFIID core complex was produced by using a single baculovirus containing the encoding genes arranged as a multigene construct made up of individual expression cassettes [36], similar to the strategy adopted for the successful study of the Mediator Head module from S. cerevisiae [30]. In the case of core-TFIID, however, it turned out that the expression levels of the individual subunits varied substantially, thus reducing overall yield of the recombinant complex dramatically. Large amounts of heterologous protein was produced in excess while the subunit characterized by the lowest expression level relative to the others dictated the final amount of properly assembled complex containing all subunits. A rescue strategy involving tagging of this weakly expressing subunit with an affinity tag did result in the purification of complete core-TFIID, however, EM analysis revealed the presence of smaller subassemblies lacking subunits that were contaminating the sample, impeding structure determination.
Figure 3
Human general transcription factor TFIID core complex. (a) Generally applicable polyprotein strategy for balancing the production levels of multiprotein complex subunits. Genes of interest (GOI), a protease encoding gene (TEV NIa) and a fluorescent marker (CFP) are present in a single ORF under control of a strong baculoviral promoter (polh) and flanked by a poly A signal (black square) on a MultiBac plasmid (cf. Figure 1B), spaced apart by the specific cleavage sequence of TEV NIa (tcs) [1•, 58]. (b) Polyprotein expression and purification of human core-TFIID. Negative stain EM and 2-D classification (bottom, right) were used to optimize purification until high-quality sample was obtained as demonstrated by SDS-PAGE (bottom, left). (c) TFIID core complex structure at nanometer resolution, determined by hybrid methods [57].
Human general transcription factor TFIID core complex. (a) Generally applicable polyprotein strategy for balancing the production levels of multiprotein complex subunits. Genes of interest (GOI), a protease encoding gene (TEV NIa) and a fluorescent marker (CFP) are present in a single ORF under control of a strong baculoviral promoter (polh) and flanked by a poly A signal (black square) on a MultiBac plasmid (cf. Figure 1B), spaced apart by the specific cleavage sequence of TEV NIa (tcs) [1•, 58]. (b) Polyprotein expression and purification of human core-TFIID. Negative stain EM and 2-D classification (bottom, right) were used to optimize purification until high-quality sample was obtained as demonstrated by SDS-PAGE (bottom, left). (c) TFIID core complex structure at nanometer resolution, determined by hybrid methods [57].This challenge was resolved by adopting a strategy that certain viruses such as Coronavirus use to realize their proteome [41] (Figure 3A). These viruses express long open-reading frames (ORFs) that are translated into large polyproteins. These polyproteins are then processed into individual proteins by a highly specific protease that cuts proteolytic sites in between the functional polypeptide entities. Typically, the protease is also part of the polyprotein, and liberates itself as well by means of proteolytic cleavage. This strategy was implemented in the MultiBac system by creating long single ORFs composed of the TAF genes. The gene encoding for NIa protease from Tobacco etch virus (TEV) was added at the 5′, and corresponding recognition and cleavage sequences were inserted [1•, 41]. To follow and quantify polyprotein production during expression, the gene encoding for a fluorescent protein was appended to the 3′ end of the ORF (Figure 3a). The TEV NIa protease proved to be sufficiently specific to process the polyproteins expressed properly and to completion. This resulted in a fully balanced level of the TAF proteins resulting from polyprotein processing that assembled completely into the TFIID core complex. Core-TFIID was purified to homogeneity, enabling high-resolution structure determination (Figure 3b and c). Expression of two further TAFs, TAF8 and TAF10, likewise from a polyprotein construction, and incorporation into the core-TFIID complex followed by cryo-EM structure determination, allowed to propose a molecular mechanisms of TFIID rearrangement during assembly when the symmetric core accretes further TAFs and TBP, giving rise to the complete holo-complex that is asymmetric [57].
Conclusions
Methodologies such as protein crystallography require large quantities of highly concentrated homogeneous sample to obtain high-resolution structural information. EM and single particle analysis methods do not require such high concentrations or total amounts of sample, but do still require high sample homogeneity, reasonable particle density on EM micrographs and optimal EM grid preparations. Native mass spectrometry that allows determination of subunit stoichiometry and insights into multimeric complex assembly processes is also dependent on highly concentrated, high quality sample in defined buffer conditions [58]. Highly homogenous specimens are likewise required for functional studies and for pharmaceutical applications. Recombinant expression enables to exactly specify the subunit composition of multiprotein complexes, thus increasing homogeneity. Further, engineering and mutating specific subunits offer the potential to test biological hypotheses, including examining the roles and locations of specific subunits within the context of the whole complex, which is a vital prerequisite for understanding biological function.Most essential processes in cells are carried out by large multiprotein machines, and many of these are critically dependent on recombinant expression for analyzing their structure and function in detail. Eukaryotic complexes will in many cases require eukaryotic expression systems for their provision in the quality and quantity required. Remarkable advances in eukaryotic expression technologies that can provide these specimens are being made, with new and powerful systems being put at the disposal of the community. We anticipate that new expression tools including those described here will prove to be instrumental for the detailed study of many important protein complexes that have remained elusive to date.
References and recommended reading
Papers of particular interest, published within the period of review, have been highlighted as:• of special interest•• of outstanding interest
Authors: Brian R Thornton; Tessie M Ng; Mary E Matyskiela; Christopher W Carroll; David O Morgan; David P Toczyski Journal: Genes Dev Date: 2006-02-15 Impact factor: 11.361
Authors: Belén Vicente; Julio López-Abán; Jose Rojas-Caraballo; Esther del Olmo; Pedro Fernández-Soto; Antonio Muro Journal: Parasit Vectors Date: 2016-04-18 Impact factor: 3.876
Authors: Maysam Mansouri; Itxaso Bellon-Echeverria; Aurélien Rizk; Zahra Ehsaei; Chiara Cianciolo Cosentino; Catarina S Silva; Ye Xie; Frederick M Boyce; M Wayne Davis; Stephan C F Neuhauss; Verdon Taylor; Kurt Ballmer-Hofer; Imre Berger; Philipp Berger Journal: Nat Commun Date: 2016-05-04 Impact factor: 14.919