Literature DB >> 35677454

Dataset for suppressors of amyloid-β toxicity and their functions in recombinant protein production in yeast.

Xin Chen^1,2, Xiaowei Li¹, Boyang Ji^1,3, Yanyan Wang¹, Olena P Ishchuk¹, Egor Vorontsov⁴, Dina Petranovic^1,2, Verena Siewers^1,2, Martin K M Engqvist¹.

Abstract

The production of recombinant proteins at high levels often induces stress-related phenotypes by protein misfolding or aggregation. These are similar to those of the yeast Alzheimer's disease (AD) model in which amyloid-β peptides (Aβ42) were accumulated [1], [2]. We have previously identified suppressors of Aβ42 cytotoxicity via the genome-wide synthetic genetic array (SGA) [3] and here we use them as metabolic engineering targets to evaluate their potentiality on recombinant protein production in yeast Saccharomyces cerevisiae. In order to investigate the mechanisms linking the genetic modifications to the improved recombinant protein production, we perform systems biology approaches (transcriptomics and proteomics) on the resulting strain and intermediate strains. The RNAseq data are preprocessed by the nf-core/RNAseq pipeline and analyzed using the Platform for Integrative Analysis of Omics (PIANO) package [4]. The quantitative proteome is analyzed on an Orbitrap Fusion Lumos mass spectrometer interfaced with an Easy-nLC1200 liquid chromatography (LC) system. LC-MS data files are processed by Proteome Discoverer version 2.4 with Mascot 2.5.1 as a database search engine. The original data presented in this work can be found in the research paper titled "Suppressors of Amyloid-β Toxicity Improve Recombinant Protein Production in yeast by Reducing Oxidative Stress and Tuning Cellular Metabolism", by Chen et al. [5].

Entities: Chemical

Keywords: Amyloid-β; Gene engineering; Proteome; Transcriptome; Yeast Saccharomyces cerevisiae

Year: 2022 PMID： 35677454 PMCID： PMC9168475 DOI： 10.1016/j.dib.2022.108322

Source DB: PubMed Journal: Data Brief ISSN： 2352-3409

Specifications Table

Values of the Data

The data provide detailed information on the 2615 differentially expressed genes (DEGs) which are commonly expressed in all engineered strains compared with control strain. The data provide detailed information on the 1171 differentially expressed proteins (DEGs) which are commonly expressed in all engineered strains compared with control strain. Information on this data article will be helpful for the researchers in the eukaryotic protein production and metabolic engineering fields. This data set will be applied to look for potential targets to develop synthetic cell factories for biosynthesis of valuable proteins.

Data Description

In our previous study, we applied a yeast genome-wide synthetic genetic array (SGA) technology with our Aβ42 expression model to screen for genetic mutants in which the Aβ42 cytotoxic phenotype was altered [3]. Two yeast deletion mutant libraries were included, a nonessential genes collection (∼ 4300 deletion strains) and an essential genes collection (∼ 1200 temperature sensitive mutant strains) [3]. According to the SGA screen, 46 resistant mutants with reduced Aβ42 cytotoxicity and 20 sensitive mutants with enhanced Aβ42 cytotoxicity were selected to evaluate their effects on recombinant protein production. The gene deletions and gene overexpression strategies were applied to resistant mutants and sensitive mutants, respectively. The recombinant α-amylase was used as a model protein [7], with the α-factor leader in front of open reading frame (ORF), which has been proved to increase protein production [8]. The genes led to an improved α-amylase production yield, were chosen for combinatorial engineering to further enhance protein production. After four rounds of genetic screening, α-amylase production was serially increased to 18.7-fold in the final engineered strain A16 compared with control strain A01 (Fig. 1).

Fig. 1

Summary of combinatorial gene engineering. The best performance strain from each round of genetic screening is presented. The results are shown as the average values ± SD from four independent biological replicates. These identified genes (LSM6, SIN3, MDM34, and CDC48) were reported to be involved in different cellular networks. To get systems-level insight on how these genes were involved in improved recombinant protein production, we performed the transcriptomics and proteomics analysis on the control strain A01 and engineered strains including A02, A05, A08, and A16 (Fig. 2).

Fig. 2

Schematic workflow for transcriptomic and proteomic processes of engineered strains.

Schematic workflow for transcriptomic and proteomic processes of engineered strains. Using the transcriptomic data (raw data available at GEO Accession viewer (nih.gov)), we first analyzed the global expression pattern via the principal component analysis (PCA). The results showed that engineered strains had a different gene expression profile compared to control strain A01. Strains were clustered into three groups from the first and second PCA components: group 1(strain A01), group 2 (strain A02), and group 3 (strains A05, A08, and A16). To examine the extent of changes in the engineered strains, we further performed the pair-wise comparisons. Using the strain A01 as control, the differentially expressed genes (DEGs) were identified from all engineered strains. 2615 genes were found commonly differentially expressed in all engineered strains (p-adj < 0.01, dataset 1). The global changes of proteome were also quantified (raw data available at ProteomeXchange Dataset PXD030111). The PCA showed similar separation between control strain A01 and engineered strains. The differentially expressed proteins were analyzed and 1171 proteins were differentially expressed in all engineered strains compared with control strain A01 (p-adj < 0.05, dataset 2).

Experimental Design, Materials and Methods

Materials and Methods

Yeast strains were cultured at 30 ˚C in the SD-2 × SCAA media, which is an optimized culture media for protein production [9]. To collect cells for transcriptome and proteome analysis, a single colony was inoculated into 1 mL of SD-2 × SCAA medium and cultivated at 30 °C with 200 rpm agitation overnight. Biological triplicates were taken from each strain. The next day, the precultures were diluted to an initial OD600 of 0.05 in 20 mL of SD-2 × SCAA medium and cultivated to early exponential phase (OD600 ≈ 1) for sample collection.

Transcriptome Profiling

10 OD600 of cells were collected in 50 ml falcon tubes with ice and centrifuged at 5000 rpm for 5 min at 4 ˚C. The cell pellets were snap-frozen in liquid nitrogen and stored at -80 ˚C before RNA extraction. RNA was extracted using the RNeasy Mini Kit (Qiagen), and RNA quality was evaluated using the 2100 Bioanalyzer (Agilent Technologies). RNA-seq was performed by the National Genomics Infrastructure (NGI) of SciLifeLab in Stockholm, Sweden (https://ngisweden.scilifelab.se/). The mRNA samples were prepared for suquencing via the Illumina TruSeq Stranded mRNA Library Pre kit (Illumina). The fragments were clustered on cBot and sequenced on a HiSeq 4000 with paired ends (MID Output 2 × 150 bp). The number of read pairs for each sample ranged from 12.0 to 16.0 million. After quality control, the raw reads from each sample were mapped to the S. cerevisiae CEN.PK 113-7D reference genome suing TopHat (v 2.0.12) with 84.0–86.7% of the reads successfully mapped. The data were preprocessed by the nf-core/RNAseq pipeline (https://github.com/nf-core/rnaseq, SciLifeLab). The differential gene expression was analyzed using DESeq. The reporter GO terms and reporter TFs were analyzed using the Platform for Integrative Analysis of Omics (PIANO) package [4] in R. The Database for Annotation, Visualization and Integrated Discovery (DAVID, https://david.ncifcrf.gov/) was used to analyze functional enrichment of KEGG pathways and biological processes. Heatmaps were generated using the pheatmap package in R.

Sample Preparation for Proteomic Analysis

At early exponential phase (OD600 ≈ 1), 10 ml of cell culture was collected into pre-chilled Falcon tubes and centrifuged at 4 °C for 5 min (2000 × g). Biological triplicates were taken from each strain. Cells were washed once with ice-cold PBS buffer, snap-frozen in liquid nitrogen and stored at -80 °C. Samples were suspended in 200 ul of the lysis buffer containing 2% sodium dodecyl sulfate (SDS) and 50 mM triethylammonium bicarbonate (TEAB), and homogenized using lysis matrix D (1.4 mm ceramic spheres) on a FastPrep-24 instrument (MP Biomedicals, OH, USA) for 4 repeated 40 s cycles at 6.5 m/s. Lysed samples were centrifuged at 16,200xg for 10 min and the supernatants were transferred to clean tubes. The pellets were washed with 200 ul of the lysis buffer, centrifuged at 16,200xg for 10 min, and the supernatants were combined with the corresponding lysates from the previous step. Protein concentrations in the lysates were determined using Pierce BCA Protein Assay Kit (Thermo Fischer Scientific) and the Benchmark Plus microplate reader (Bio-Rad Laboratories, Hercules, CA, USA) with bovine serum albumin (BSA) solutions as standards. The proteomic reference pool was prepared by taking an equal aliquot from each individual sample.

Quantitative Proteome Analysis

Proteomic analysis was performed as described previously with modification [10]. In brief, 35 µg of total protein from each of the samples and from the reference pool were used for the modified filter-aided sample preparation (FASP) [11], which it included the two-stage digestion of each sample with trypsin (Pierce Trypsin Protease, Thermo Fisher Scientific) in digestion buffer containing 0.5% sodium deoxycholate (SDC) and 50 mM TEAB, and subsequent labeling with TMTpro16plex reagents (Thermo Fisher Scientific) according to the manufacturer's instructions. The labeled samples were combined into one TMTpro set and fractionated into 404 primary fractions using a Dionex Ultimate 3000 UPLC system (Thermo Fisher Scientific). Peptide pre-fractionation was performed on an XBridge BEH C18 column (3.5 µm, 3.0 × 150 mm, Waters Corporation) at pH 10 and primary fractions were concatenated into final 20 fractions for liquid chromatography-MS (LC-MS) analysis. Each fraction was analyzed using an Orbitrap Fusion Lumos mass spectrometer interfaced with an Easy-nLC1200 liquid chromatography system (both Thermo Fisher Scientific). LC-MS data files were processed in Proteome Discoverer version 2.4 (Thermo Fisher Scientific). Reference database for Saccharomyces cerevisiae ATCC 204508 / S288c was downloaded from Uniprot (November 2019) and supplemented with common proteomic contaminant sequences. Mascot version 2.5.1 (Matrix Science, London, UK) was used as a database search engine with 5 ppm precursor tolerance and 0.6 Da fragment tolerance; trypsin with 1 allowed missed cleavage as an enzyme rule; oxidation on methionine as a variable modification; methylthiolation on cysteine, TMTpro on lysine and peptide N-termini as a fixed modification; Percolator for PSM validation with the strict FDR threshold of 1%. Reporter ions were identified within 3 mmu in MS3 spectra, quantification was based on unique peptides and reporter S/N, with the correction for isotopic impurities, the average reporter S/N threshold of 10 and SPS match threshold of 40%, reporter normalization on total peptide amount. Differential protein expression (Log2-FoldChange) and corresponding significance (p-adj) were calculated by the Benjamini–Hochberg method and used as input. The reporter GO terms were analyzed using the PIANO R package [4]. The DAVID Database (https://david.ncifcrf.gov/) was used to analyze functional enrichment of KEGG pathway and biological process.

CRediT authorship contribution statement

Xin Chen: Conceptualization, Formal analysis, Investigation, Methodology, Resources, Validation, Visualization, Writing – original draft, Writing – review & editing. Xiaowei Li: Investigation, Formal analysis, Writing – review & editing. Boyang Ji: Software, Writing – review & editing. Yanyan Wang: Methodology, Validation. Olena P. Ishchuk: Investigation, Writing – review & editing. Egor Vorontsov: Methodology, Writing – review & editing. Dina Petranovic: Funding acquisition, Project administration, Writing – review & editing. Verena Siewers: Project administration, Supervision, Writing – review & editing. Martin K.M. Engqvist: Project administration, Supervision, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Subject	Biotechnology, Biological science
Specific subject area	Recombinant protein production, cell factories
Type of data	Imgaes and tables
How the data were acquired	RNA-seq was performed at the National Genomics Infrastructure (NGI) of SciLifeLab and preprocessed by the nf-core/RNAseq pipeline. Quantitative proteome was performed at the Proteomics Core Facility at the University of Gothenburg. The omics data were analyzed using the Platform for Integrative Analysis of Omics (PIANO) package and the Database for Annotation, Visualization and Integrated Discovery (DAVID).
Data format	Raw and analyzed data
Description of data collection	For RNAseq data, RNA was extracted using the RNeasy Mini Kit from 10 OD₆₀₀ of exponential growing cells. The preparation of mRNA samples was applied with the Illumina TruSeq Stranded mRNA Library Pre kit. The fragments were clustered on cBot and sequenced on a HiSeq 4000 with paired ends (2 × 150 bp). The number of read pairs for each sample ranged from 12.0 to 16.0 million. After quality control, the raw reads from each sample were mapped to the S. cerevisiae CEN.PK 113-7D reference genome suing TopHat (v 2.0.12) with 84.0 – 86.7 % of the reads successfully mapped. The data were further preprocessed by the nf-core/RNAseq pipeline.For quantitative proteome data, 35 µg of total protein from each of the samples and from the reference pool were used for the modified filter-aided sample preparation. Liquid chromatography-MS (LCMS) experiments were performed on an Orbitrap Fusion Lumos mass spectrometer interfaced with an Easy-nLC1200 nanoflow LC system. Peptide and protein identification and quantification was performed using Proteome Discoverer version 2.4 with Mascot 2.5.1 as a database search engine.
Data source location	The RNA-seq raw data were collected at the National Genomics Infrastructure (NGI) of SciLifeLab, Stockholm, Sweden.The quantitative proteomics data were collected at the Proteomics Core Facility, University of Gothenburg, Gothenburg, Sweden.
Data accessibility	The RNA-seq raw data can be downloaded from the Genome Expression Omnibus website with the dataset identifier GSE185570 (GEO Accession viewer (nih.gov)).The mass spectrometry proteomics raw data have been deposited to the ProteomeXchange Consortium via the PRIDE [6] partner repository with the dataset identifier PXD030111 (ProteomeXchange Dataset PXD030111).
Related research article	Xin Chen, Xiaowei Li, Boyang Ji, Yanyan Wang, Olena P. Ishchuk, Egor Vorontsov, Dina Petranovic, Verena Siewers, and Martin K. M. Engqvist, Suppressors of Amyloid-β Toxicity Improve Recombinant Protein Production in yeast by Reducing Oxidative Stress and Tuning Cellular Metabolism [5].

10 in total

1. Universal sample preparation method for proteome analysis.

Authors: Jacek R Wiśniewski; Alexandre Zougman; Nagarjuna Nagaraj; Matthias Mann
Journal: Nat Methods Date: 2009-04-19 Impact factor: 28.547

2. Amyloid-β peptide-induced cytotoxicity and mitochondrial dysfunction in yeast.

Authors: Xin Chen; Dina Petranovic
Journal: FEMS Yeast Res Date: 2015-07-06 Impact factor: 2.796

3. Engineering of vesicle trafficking improves heterologous protein secretion in Saccharomyces cerevisiae.

Authors: Jin Hou; Keith Tyo; Zihe Liu; Dina Petranovic; Jens Nielsen
Journal: Metab Eng Date: 2012-01-17 Impact factor: 9.783

4. Suppressors of amyloid-β toxicity improve recombinant protein production in yeast by reducing oxidative stress and tuning cellular metabolism.

Authors: Xin Chen; Xiaowei Li; Boyang Ji; Yanyan Wang; Olena P Ishchuk; Egor Vorontsov; Dina Petranovic; Verena Siewers; Martin K M Engqvist
Journal: Metab Eng Date: 2022-05-01 Impact factor: 9.783

5. Different expression systems for production of recombinant proteins in Saccharomyces cerevisiae.

Authors: Zihe Liu; Keith E J Tyo; José L Martínez; Dina Petranovic; Jens Nielsen
Journal: Biotechnol Bioeng Date: 2012-01-17 Impact factor: 4.530

6. Interplay of Energetics and ER Stress Exacerbates Alzheimer's Amyloid-β (Aβ) Toxicity in Yeast.

Authors: Xin Chen; Markus M M Bisschops; Nisha R Agarwal; Boyang Ji; Kumaravel P Shanmugavel; Dina Petranovic
Journal: Front Mol Neurosci Date: 2017-07-27 Impact factor: 5.639

7. The PRIDE database and related tools and resources in 2019: improving support for quantification data.

Authors: Yasset Perez-Riverol; Attila Csordas; Jingwen Bai; Manuel Bernal-Llinares; Suresh Hewapathirana; Deepti J Kundu; Avinash Inuganti; Johannes Griss; Gerhard Mayer; Martin Eisenacher; Enrique Pérez; Julian Uszkoreit; Julianus Pfeuffer; Timo Sachsenberg; Sule Yilmaz; Shivani Tiwary; Jürgen Cox; Enrique Audain; Mathias Walzer; Andrew F Jarnuczak; Tobias Ternent; Alvis Brazma; Juan Antonio Vizcaíno
Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971

8. Nitrogen limitation reveals large reserves in metabolic and translational capacities of yeast.

Authors: Rosemary Yu; Kate Campbell; Rui Pereira; Johan Björkeroth; Qi Qi; Egor Vorontsov; Carina Sihlbom; Jens Nielsen
Journal: Nat Commun Date: 2020-04-20 Impact factor: 14.919

9. FMN reduces Amyloid-β toxicity in yeast by regulating redox status and cellular metabolism.

Authors: Xin Chen; Boyang Ji; Xinxin Hao; Xiaowei Li; Frederik Eisele; Thomas Nyström; Dina Petranovic
Journal: Nat Commun Date: 2020-02-13 Impact factor: 14.919

10. Enriching the gene set analysis of genome-wide data by incorporating directionality of gene expression and combining statistical hypotheses and methods.

Authors: Leif Väremo; Jens Nielsen; Intawat Nookaew
Journal: Nucleic Acids Res Date: 2013-02-26 Impact factor: 16.971

10 in total