| Literature DB >> 34041264 |
Nadide Altincekic1,2, Sophie Marianne Korn2,3, Nusrat Shahin Qureshi1,2, Marie Dujardin4, Martí Ninot-Pedrosa4, Rupert Abele5, Marie Jose Abi Saad6, Caterina Alfano7, Fabio C L Almeida8,9, Islam Alshamleh1,2, Gisele Cardoso de Amorim8,10, Thomas K Anderson11, Cristiane D Anobom8,12, Chelsea Anorma13, Jasleen Kaur Bains1,2, Adriaan Bax14, Martin Blackledge15, Julius Blechar1,2, Anja Böckmann4, Louis Brigandat4, Anna Bula16, Matthias Bütikofer6, Aldo R Camacho-Zarco15, Teresa Carlomagno17,18, Icaro Putinhon Caruso8,9,19, Betül Ceylan1,2, Apirat Chaikuad20,21, Feixia Chu22, Laura Cole4, Marquise G Crosby23, Vanessa de Jesus1,2, Karthikeyan Dhamotharan2,3, Isabella C Felli24,25, Jan Ferner1,2, Yanick Fleischmann6, Marie-Laure Fogeron4, Nikolaos K Fourkiotis26, Christin Fuks1, Boris Fürtig1,2, Angelo Gallo26, Santosh L Gande1,2, Juan Atilio Gerez6, Dhiman Ghosh6, Francisco Gomes-Neto8,27, Oksana Gorbatyuk28, Serafima Guseva15, Carolin Hacker29, Sabine Häfner30, Bing Hao28, Bruno Hargittay1,2, K Henzler-Wildman11, Jeffrey C Hoch28, Katharina F Hohmann1,2, Marie T Hutchison1,2, Kristaps Jaudzems16, Katarina Jović22, Janina Kaderli6, Gints Kalniņš31, Iveta Kaņepe16, Robert N Kirchdoerfer11, John Kirkpatrick17,18, Stefan Knapp20,21, Robin Krishnathas1,2, Felicitas Kutz1,2, Susanne Zur Lage18, Roderick Lambertz3, Andras Lang30, Douglas Laurents32, Lauriane Lecoq4, Verena Linhard1,2, Frank Löhr2,33, Anas Malki15, Luiza Mamigonian Bessa15, Rachel W Martin13,23, Tobias Matzel1,2, Damien Maurin15, Seth W McNutt22, Nathane Cunha Mebus-Antunes8,9, Beat H Meier6, Nathalie Meiser1, Miguel Mompeán32, Elisa Monaca7, Roland Montserret4, Laura Mariño Perez15, Celine Moser34, Claudia Muhle-Goll34, Thais Cristtina Neves-Martins8,9, Xiamonin Ni20,21, Brenna Norton-Baker13, Roberta Pierattelli24,25, Letizia Pontoriero24,25, Yulia Pustovalova28, Oliver Ohlenschläger30, Julien Orts6, Andrea T Da Poian9, Dennis J Pyper1,2, Christian Richter1,2, Roland Riek6, Chad M Rienstra35, Angus Robertson14, Anderson S Pinheiro8,12, Raffaele Sabbatella7, Nicola Salvi15, Krishna Saxena1,2, Linda Schulte1,2, Marco Schiavina24,25, Harald Schwalbe1,2, Mara Silber34, Marcius da Silva Almeida8,9, Marc A Sprague-Piercy23, Georgios A Spyroulias26, Sridhar Sreeramulu1,2, Jan-Niklas Tants2,3, Kaspars Tārs31, Felix Torres6, Sabrina Töws3, Miguel Á Treviño32, Sven Trucks1, Aikaterini C Tsika26, Krisztina Varga22, Ying Wang17, Marco E Weber6, Julia E Weigand36, Christoph Wiedemann37, Julia Wirmer-Bartoschek1,2, Maria Alexandra Wirtz Martin1,2, Johannes Zehnder6, Martin Hengesbach1, Andreas Schlundt2,3.
Abstract
The highly infectious disease COVID-19 caused by the Betacoronavirus SARS-CoV-2 poses a severe threat to humanity and demands the redirection of scientific efforts and criteria to organized research projects. The international COVID19-NMR consortium seeks to provide such new approaches by gathering scientific expertise worldwide. In particular, making available viral proteins and RNAs will pave the way to understanding the SARS-CoV-2 molecular components in detail. The research in COVID19-NMR and the resources provided through the consortium are fully disclosed to accelerate access and exploitation. NMR investigations of the viral molecular components are designated to provide the essential basis for further work, including macromolecular interaction studies and high-throughput drug screening. Here, we present the extensive catalog of a holistic SARS-CoV-2 protein preparation approach based on the consortium's collective efforts. We provide protocols for the large-scale production of more than 80% of all SARS-CoV-2 proteins or essential parts of them. Several of the proteins were produced in more than one laboratory, demonstrating the high interoperability between NMR groups worldwide. For the majority of proteins, we can produce isotope-labeled samples of HSQC-grade. Together with several NMR chemical shift assignments made publicly available on covid19-nmr.com, we here provide highly valuable resources for the production of SARS-CoV-2 proteins in isotope-labeled form.Entities:
Keywords: COVID-19; NMR spectroscopy; SARS-CoV-2; accessory proteins; cell-free protein synthesis; intrinsically disordered region; nonstructural proteins; structural proteins
Year: 2021 PMID: 34041264 PMCID: PMC8141814 DOI: 10.3389/fmolb.2021.653148
Source DB: PubMed Journal: Front Mol Biosci ISSN: 2296-889X
SCoV2 protein constructs expressed and purified, given with the genomic position and corresponding PDBs for construct design.
| Protein genome position (nt) | Trivial name construct expressed | Size (aa) | Boundaries | MW (kDa) | Homol. SCoV (%) | Template PDB | SCoV2 PDB |
|---|---|---|---|---|---|---|---|
|
|
|
|
|
| |||
|
| |||||||
| Full-length | 180 | 1–180 | 19.8 | 83 | |||
| Globular domain (GD) | 116 | 13–127 | 12.7 | 85 | 2GDT | 7K7P | |
|
|
|
|
| ||||
|
| |||||||
| C-terminal IDR (CtDR) | 45 | 557–601 | 4.9 | 55 | |||
|
|
|
|
| ||||
|
| |||||||
| a | Ub-like (Ubl) domain | 111 | 1–111 | 12.4 | 79 | 2IDY | 7KAG |
| a | Ub-like (Ubl) domain + IDR | 206 | 1–206 | 23.2 | 58 | ||
| b | Macrodomain | 170 | 207–376 | 18.3 | 74 | 6VXS | 6VXS |
| c | SUD-N | 140 | 409–548 | 15.5 | 69 | 2W2G | |
| c | SUD-NM | 267 | 409–675 | 29.6 | 74 | 2W2G | |
| c | SUD-M | 125 | 551–675 | 14.2 | 82 | 2W2G | |
| c | SUD-MC | 195 | 551–743 | 21.9 | 79 | 2KQV | |
| c | SUD-C | 64 | 680–743 | 7.4 | 73 | 2KAF | |
| d | Papain-like protease PLpro | 318 | 743–1,060 | 36 | 83 | 6W9C | 6W9C |
| e | NAB | 116 | 1,088–1,203 | 13.4 | 87 | 2K87 | |
| Y | CoV-Y | 308 | 1,638–1,945 | 34 | 89 | ||
|
|
|
|
|
| |||
|
| |||||||
| Full-length | 306 | 1–306 | 33.7 | 96 | 6Y84 | 6Y84 | |
|
|
|
|
| ||||
|
| |||||||
| Full-length | 83 | 1–83 | 9.2 | 99 | 6WIQ | 6WIQ | |
|
|
|
|
| ||||
|
| |||||||
| Full-length | 198 | 1–198 | 21.9 | 97 | 6WIQ | 6WIQ | |
|
|
|
|
| ||||
|
| |||||||
| Full-length | 113 | 1–113 | 12.4 | 97 | 6W4B | 6W4B | |
|
|
|
|
| ||||
|
| |||||||
| Full-length | 139 | 1–139 | 14.8 | 97 | 6W4H | 6W4H | |
|
|
|
|
|
| |||
|
| |||||||
| Full-length | 601 | 1–601 | 66.9 | 100 | 6ZSL | 6ZSL | |
|
|
|
|
|
| |||
|
| |||||||
| Full-length | 527 | 1–527 | 59.8 | 95 | 5NFY | ||
| MTase domain | 240 | 288–527 | 27.5 | 95 | |||
|
|
|
|
|
| |||
|
| |||||||
| Full-length | 346 | 1–346 | 38.8 | 89 | 6W01 | 6W01 | |
|
|
|
|
|
| |||
|
| |||||||
| Full-length | 298 | 1–298 | 33.3 | 93 | 6W4H | 6W4H | |
|
|
|
|
| ||||
|
| |||||||
| Full-length | 275 | 1–275 | 31.3 | 72 | 6XDC | 6XDC | |
|
|
|
|
|
| |||
|
| |||||||
| Full-length | 75 | 1–75 | 8.4 | 95 | 5X29 | 7K3G | |
|
|
|
|
|
| |||
|
| |||||||
| Full-length | 222 | 1–222 | 25.1 | 91 | |||
|
|
|
|
| ||||
|
| |||||||
| Full-length | 61 | 1–61 | 7.3 | 69 | |||
|
|
|
|
| ||||
|
| |||||||
| Ectodomain (ED) | 66 | 16–81 | 7.4 | 85 | 1XAK | 6W37 | |
|
|
|
|
| ||||
|
| |||||||
| Full-length | 43 | 1–43 | 5.2 | 85 | |||
|
|
|
|
| ||||
|
| |||||||
| ORF8 | Full-length | 121 | 1–121 | 13.8 | 32 | ||
| ΔORF8 | w/o signal peptide | 106 | 16–121 | 12 | 41 | 7JTL | 7JTL |
|
|
|
|
|
| |||
|
| |||||||
| IDR1-NTD-IDR2 | 248 | 1–248 | 26.5 | 90 | |||
| NTD-SR | 169 | 44–212 | 18.1 | 92 | |||
| NTD | 136 | 44–180 | 14.9 | 93 | 6YI3 | 6YI3 | |
| CTD | 118 | 247–364 | 13.3 | 96 | 2JW8 | 7C22 | |
|
|
|
|
| ||||
|
| |||||||
| Full-length | 97 | 1–97 | 10.8 | 72 | 6Z4U | 6Z4U | |
|
|
|
|
| ||||
|
| |||||||
| Full-length | 73 | 1–73 | 8 | n.a | |||
|
|
|
|
| ||||
|
| |||||||
| Full-length | 38 | 1–38 | 4.4 | 29 |
Genome position in nt corresponding to SCoV2 NCBI reference genome entry NC_045512.2, identical to GenBank entry MN908947.3.
Sequence identities to SCoV are calculated from an alignment with corresponding protein sequences based on the genome sequence of NCBI Reference NC_004718.3.
Representative PDB that was available at the beginning of construct design, either SCoV or SCoV2.
Representative PDB available for SCoV2 (as of December 2020).
Additional point mutations in fl-construct have been expressed.
n.a.: not applicable.
Summary of SCoV2 protein production results in Covid19-NMR.
| Construct expressed | Yields (mg/L) | Results | Comments | BMRB |
|
|---|---|---|---|---|---|
|
| SI1 | ||||
| fl | 5 | NMR assigned | Expression only at >20°C; after 7 days at 25°C partial proteolysis | 50620 | |
| GD | >0.5 | HSQC | High expression; mainly insoluble; higher salt increases stability (>250 mM) | ||
|
| SI2 | ||||
| CtDR | 0.7–1.5 | NMR assigned | Assignment with His-tag shown in ( | 50687 | |
|
| SI3 | ||||
| UBl | 0.7 | HSQC | Highly stable over weeks; spectrum overlays with Ubl + IDR | ||
| UBl + IDR | 2–3 | NMR assigned | Highly stable for >2 weeks at 25°C | 50446 | |
| Macrodomain | 9 | NMR assigned | Highly stable for >1 week at 25°C and > 2 weeks at 4°C | 50387 | |
| 50388 | |||||
| SUD-N | 14 | NMR assigned | Highly stable for >10 days at 25°C | 50448 | |
| SUD-NM | 17 | HSQC | Stable for >1 week at 25°C | ||
| SUD-M | 8.5 | NMR assigned | Significant precipitation during measurement; tendency to dimerize | 50516 | |
| SUD-MC | 12 | HSQC | Stable for >1 week at 25°C | ||
| SUD-C | 4.7 | NMR assigned | Stable for >10 days at 25°C | 50517 | |
| PLpro | 12 | HSQC | Solubility-tag essential for expression; tendency to aggregate | ||
| NAB | 3.5 | NMR assigned | Highly stable for >1 week at 25°C; stable for >5 weeks at 4°C | 50334 | |
| CoV-Y | 12 | HSQC | Low temperature (<25°C) and low concentrations (<0.2 mM) favor stability; gradual degradation at 25°C; lithium bromide in final buffer supports solubility | ||
|
| SI4 | ||||
| fl | 55 | HSQC | Impaired dimerization induced by artificial N-terminal residues | ||
|
| SI5 | ||||
| fl | 17 | NMR assigned | Stable for several days at 35°C; stable for >1 month at 4°C | 50337 | |
|
| SI6 | ||||
| fl | 17 | HSQC | Concentration dependent aggregation; low concentrations favor stability | ||
|
| SI7 | ||||
| fl | 4.5 | NMR assigned | Stable dimer for >4 months at 4°C and >2 weeks at 25°C | 50621 | |
| 50622 | |||||
| 50513 | |||||
|
| SI8 | ||||
| fl | 15 | NMR assigned | Zn2+ addition during expression and purification increases protein stability; stable for >1 week at 25°C | 50392 | |
|
| SI9 | ||||
| fl | 0.5 | HSQC | Low expression; protein unstable; concentration above 20 µM not possible | ||
|
| SI10 | ||||
| fl | 6 | Pure protein | Not above 50 µM; best storage: with 50% (v/v) glycerol; addition of reducing agents | ||
| MTase | 10 | Pure protein | As fl nsp14; high salt (>0.4 M) for increased stability; addition of reducing agents | ||
|
| SI11 | ||||
| fl | 5 | HSQC | Tendency to aggregate at 25°C | ||
|
| SI12 | ||||
| fl | 10 | Pure protein | Addition of reducing agents; 5% (v/v) glycerol favorable; highly unstable | ||
|
| SI13 | ||||
| fl |
| Pure protein | Addition of detergent during expression (0.05% Brij-58); stable protein | ||
|
| SI14 | ||||
| fl |
| Pure protein | Addition of detergent during expression (0.05% Brij-58); stable protein | ||
|
| SI15 | ||||
| fl |
| Pure protein | Addition of detergent during expression (0.05% Brij-58); stable protein | ||
|
| SI16 | ||||
| fl |
| HSQC | Soluble expression without detergent; stable protein; no expression with STREP-tag at N-terminus | ||
|
| SI17 | ||||
| ED | 0.4 | HSQC | Unpurified protein tends to precipitate during refolding, purified protein stable for 4 days at 25°C | ||
|
| SI18 | ||||
| fl | 0.6 | HSQC | Tendency to oligomerize; solubilizing agents needed | ||
| fl |
| HSQC | Addition of detergent during expression (0.1% MNG-3); stable protein | ||
|
| SI19 | ||||
| fl |
| HSQC | Tendency to oligomerize | ||
| ΔORF8 | 0.5 | Pure protein | |||
|
| SI20 | ||||
| IDR1-NTD- IDR2 | 12 | NMR assigned | High salt (>0.4 M) for increased stability | 50618, 50619, 50558, 50557 | |
| NTD-SR | 3 | HSQC | |||
| NTD | 3 | HSQC | 34511 | ||
| CTD | 2 | NMR assigned | Stable dimer for >4 months at 4°C and >3 weeks at 30°C | 50518 | |
|
| SI21 | ||||
| fl |
| HSQC | Expression without detergent, protein is stable | ||
|
| SI22 | ||||
| fl |
| HSQC | Addition of detergent during expression (0.05% Brij-58); stable in detergent but unstable on lipid reconstitution | ||
|
| SI23 | ||||
| fl | 2 | HSQC | Tendency to oligomerize; unstable upon tag cleavage |
Yields from bacterial expression represent the minimal protein amount in mg/L independent of the cultivation medium. Italic values indicate yields from CFPS.
Yields from CFPS represent the minimal protein amount in mg/ml of wheat-germ extract.
COVID19-nmr BMRB depositions yet to be released.
COVID19-nmr BMRB depositions.
FIGURE 1Genomic organization of proteins and current state of analysis or purification. Boxes represent the domain boundaries as outlined in the text and in Table 1. Their position corresponds with the genomic loci. Colors indicate whether the pure proteins were purified (yellow), analyzed by NMR using only HSQC (lime), or characterized in detail, including NMR resonance assignments (green).
FIGURE 21H, 15N-correlation spectra of investigated nonstructural proteins. Construct names according to Table 1 are indicated unless fl-proteins are shown. A representative SDS-PAGE lane with final samples is included as inset. Spectra for nsp3 constructs are collectively shown in Figure 3.
FIGURE 31H, 15N-correlation spectra of investigated constructs from nonstructural protein 3. Construct names of subdomains according to Table 1 are indicated unless fl-domains are shown. A representative SDS-PAGE lane with final samples is included as inset. Red boxes indicate protein bands of interest.
FIGURE 4Rationale of construct design, expression, and IPRS purification of the nsp3e nucleic acid–binding domain (NAB). (A) NMR structural ensemble of the homologous SCoV nsp3e (Serrano et al., 2009). The domain boundaries as displayed are given. (B) Sequence alignment of SCoV and SCoV2 regions representing the nsp3e locus. Arrows indicate the sequence stretch as used for the structure in panel (A). The analogous region was used for the design of the two protein expression constructs shown (C). Left, SDS-PAGE showing the expression of nsp3e constructs from panel (B) over 4 h at two different temperatures. Middle, SDS-PAGE showing the subsequent steps of IMAC. Right, SDS-PAGE showing steps and fractions obtained before and after TEV/dialysis and reverse IMAC. Boxes highlight the respective sample species of interest for further usage (D) SEC profile of nsp3e following steps in panel (C) performed with a Superdex 75 16/600 (GE Healthcare) column in the buffer as denoted in Supplementary Table SI3. The arrow indicates the protein peak of interest containing monomeric and homogenous nsp3e NAB devoid of significant contaminations of nucleic acids as revealed by the excellent 280/260 ratio. Right, SDS-PAGE shows 0.5 µL of the final NMR sample used for the spectrum in Figure 3 after concentrating relevant SEC fractions.
FIGURE 5Rationale of construct design, expression, and purification of different nsp5 constructs. (A) Sequence alignment of SCoV and SCoV2 fl nsp5. (B) X-ray structural overlay of the homologous SCoV (PDB 1P9U, light blue) and SCoV2 nsp5 (PDB 6Y2E, green) in cartoon representation. The catalytic dyad (H41 and C145) is shown in stick representation (magenta). (C) Schematics of nsp5 expression constructs involving purification and solubilization tags (blue), different N-termini and additional aa after cleavage (green), and nsp5 (magenta). Cleavage sites are indicated by an arrow. (D, E) An exemplary purification is shown for wtnsp5. IMAC (D) and SEC (E) chromatograms (upper panels) and the corresponding SDS PAGE (lower panels). Black bars in the chromatograms indicate pooled fractions. Gel samples are as follows: M: MW standard; pellet/load: pellet/supernatant after cell lysis; FT: IMAC flow-through; imidazole: eluted fractions with linear imidazole gradient; eluate: eluted SEC fractions from input (load). (F) SEC-MALS analysis with ∼0.5 µg of wtnsp5 without additional aa (wtnsp5, black) with GS (GS-nsp5, blue) and with GHM (GHM-nsp5, red)) in NMR buffer on a Superdex 75, 10/300 GL (GE Healthcare) column. Horizontal lines indicate fractions of monodisperse nsp5 used for MW determination. (G) A SDS-PAGE showing all purified nsp5 constructs. The arrow indicates nsp5. (H) Exemplary [15N, 1H]-BEST-TROSY spectra measured at 298 K for the dimeric wtnsp5 (upper spectrum) and monomeric GS-nsp5 (lower spectrum). See Supplementary Table SI4 for technical details regarding this figure.
FIGURE 6Cell-free protein synthesis of accessory ORFs and structural proteins E and M. (A) Screening for expression and solubility of different ORFs using small-scale reactions. The total cell-free reaction (CFS), the pellet after centrifugation, and the supernatant (SN) captured on magnetic beads coated with Strep-Tactin were analyzed. All tested proteins were synthesized, with the exception of ORF3b. MW, MW standard. (B) Detergent solubilization tests using three different detergents, here at the example of the M protein, shown by SDS-PAGE and Western Blot. (C) Proteins are purified in a single step using a Strep-Tactin column. For ORF3a (and also for M), a small heat-shock protein of the HSP20 family is copurified, as identified by mass spectrometry (see also * in Panel D). (D) SDS-PAGE of the 2H, 13C, 15N-labeled proteins used as NMR samples. Yields were between 0.2 and 1 mg protein per mL wheat-germ extract used. (E) SEC profiles for two ORFs. Left, ORF9b migrates as expected for a dimer. Right, OFR14 shows large assemblies corresponding to approximately 9 protein units and the DDM detergent micelle. (F) 2D [15N, 1H]-BEST-TROSY spectrum of ORF9b, recorded at 900 MHz in 1 h at 298 K, on less than 1 mg of protein. See Supplementary Tables SI13–SI19 and Supplementary Tables SI19, SI20 for technical and experimental details regarding this figure.
FIGURE 71H, 15N-correlation spectra of investigated structural and accessory proteins. Construct names according to Table 1 are indicated unless fl-proteins are shown. A representative SDS-PAGE lane with final samples is included as inset.