Literature DB >> 34977300

Quantitative proteomic dataset of the moss Physcomitrium patens PSEP3 KO and OE mutant lines.

Anna Mamaeva¹, Andrey Knyazev¹, Anna Glushkevich¹, Igor Fesenko¹.

Abstract

Small open reading frames (<100 codons) that are located on long noncoding RNAs (lncRNAs) can encode functional microproteins. These microproteins are shown to play important roles in different cellular processes, such as cell proliferation, development and disease response [1], [2], [3], [4], [5], [6]. However, there are only a few known lncRNA-encoded functional microproteins in plants. One such microprotein that was named PSEP3, was identified in the moss Physcomitrium patens by mass-spectrometry analysis. 57-aa PSEP3 contains Low Complexity Region (LCR) enriched with proline. We have previously shown that PSEP3 is translated in protonemata and gametophores of P. patens, and its knockout (KO line) or overexpression (OE line) affects protonemata growth [7]. We performed a quantitative proteomic analysis of the mutant lines with PSEP3 knockout and overexpression. 7-days old protonemata of wild type (WT line) and both mutant lines (KO and OE) were collected and used for iTRAQ-based proteomic experiments. LC-MS/MS data were processed using PEAKS Studio v.8 software with protein identification based on a Phytozome protein database. More analysis of PSEP3 effects on plant growth can be obtained in the paper published in Nucleic Acid Research [8].

Entities: Chemical

Keywords: Long noncoding RNA-encoded peptide; Physcomitrium patens; Proteomics; iTRAQ

Year: 2021 PMID： 34977300 PMCID： PMC8688553 DOI： 10.1016/j.dib.2021.107715

Source DB: PubMed Journal: Data Brief ISSN： 2352-3409

Specifications Table Value of the Data These data are useful for comparative studies of cellular pathways regulated by lncRNA-encoded microproteins in plants and animals. The list of differentially regulated proteins identified in PSEP3 mutant lines can be compared with microprotein-regulated genes from other organisms to shed light on mechanisms of the appearance and evolution of microproteins. These data could be useful for large-scale studies of side effects of CRISPR-Cas9-derived mutations in plants. The presented data can be re-analyzed with databases containing previously unannotated protein sequences such as other lncRNA-encoded microproteins, transposons, and alternative open reading frames. It will help to understand the role of microproteins in cellular processes better.

Data Description

Here we present a quantitative proteomics dataset of PSEP3 KO and OE mutant lines. For the proteomic experiments, moss protonemata were grown at a liquid BCD-AT medium for 7 days (Fig. 1). We used isobaric tags for relative and absolute quantitation (iTRAQ) for accurate detection of the quantitative changes at the protein level in mutant lines. The raw LC-MS/MS data is available in the Mendeley Data database (10.17632/NNCZRR9Y32.1). The protein identification and quantification were conducted using PEAKS Studio 8.0 software.

Fig. 1

Overview of the experimental workflow.

Overview of the experimental workflow. In total, we identified 2710 protein groups supported by 17748 peptide-spectrum matches (PSMs) in PSEP3 KO mutant plants (Table 1). 56 protein groups were differentially expressed (FC > 1.2, P < 0.01), 21 of these were upregulated in comparison to wild type plants (Tables are available on Mendeley Data, 10.17632/NNCZRR9Y32.1). The distributions of peptide lengths, coverage, protein mass and peptide number in analysed samples are shown at Fig. 2. According to our previous study [8], the overexpression of PSEP3 induces cell death in 48 h. Therefore, we studied the effect of short-term PSEP3 overexpression in mutant lines using the β-estradiol induction system [10]. We analyzed proteomic changes in 4 h after PSEP3 induction with β-estradiol. In total, we observed 2873 protein groups supported by 27862 PSMs in this sample (Table 1). We identified 167 differentially expressed protein groups (DEPs) after induction of PSEP3 overexpression (Tables are available on Mendeley Data, 10.17632/NNCZRR9Y32.1).

Table 1

Characteristics of the proteomic datasets.

	PSEP3 KO dataset	PSEP3 OE dataset
Number of MS/MS scans	45815	54029
Peptide-Spectrum Matches	17784	27862
Peptide sequences	11170	16724
Protein groups	2710	2873
Proteins	3461	3511

Fig. 2

Quality control metrics of the proteomic datasets. (A) The length distributions of peptides; (B) Protein sequence coverage distributions; (C) Protein mass distributions and (D) Distributions of the unique peptide numbers.

Characteristics of the proteomic datasets. Quality control metrics of the proteomic datasets. (A) The length distributions of peptides; (B) Protein sequence coverage distributions; (C) Protein mass distributions and (D) Distributions of the unique peptide numbers. Principal component analysis (PCA) based on intensities of all identified protein groups is shown in Fig. 3.

Fig. 3

Principal component analysis (PCA) of wild type with PSEP3 KO (A) and PSEP3 OE (B) mutant samples. Wild type shown by orange and mutants shown by blue. The PCA analysis included all protein groups and was performed in the Python library sklearn. We found that some photosynthetic proteins were downregulated in PSEP3 KO plants but upregulated in PSEP3 OE plants (Tables are available on Mendeley Data, 10.17632/NNCZRR9Y32.1). Such DEPs as catalase, xyloglucan endo-transglycosylase and metacaspase-4-related protein downregulated in PSEP3 KO plants, whereas dynamin-related protein 1C was upregulated in PSEP3 OE plants. These proteins are involved in antioxidant defense, cell wall structure, cell death and organelle functioning [11], [12], [13].

Experimental Design, Materials and Methods

Plant material

The moss Physcomitrium patens subsp. patens (“Gransden 2004”, Freiburg) of wild type and PSEP3 KO and OE mutant lines produced earlier using CRISPR/Cas9 technology [7] were used in this study. The moss protonemata were grown in 200 ml liquid BCD medium supplemented with 5 mM ammonium tartrate (BCD-AT) during a 16 h photoperiod at 25 °C for 7 days [9].

Protein extraction and trypsin digestion

Proteins were extracted with the phenol extraction method [14]. Samples were homogenized in 3 ml ice-cold extraction buffer (500 mM Tris–HCl, pH 8.0, 50 mM EDTA, 700 mM sucrose, 100 mM KCl, 1 mM PMSF, 1 mM DTT), followed by 10 min incubation on ice. The 3 ml of ice-cold Tris–HCl (pH 8.0)-saturated phenol was added, and the mixture was vortexed and incubated for 10 min with shaking. After centrifugation (10 min, 5500 × g, 4 °C), the phenol phase was collected and re-extracted with a 3 ml of extraction buffer. Proteins were precipitated with 8 ml of ice-cold 0.1 M ammonium acetate in methanol overnight at −20 °C. The samples were centrifuged (10 min, 5500 × g, 4 °C) and pellets were rinsed with centrifugation (10 min, 5500 × g, 4 °C) by ice-cold 0.1 M ammonium acetate in methanol three times and with ice-cold acetone once. The resulting pellets were dried and dissolved in 8 M urea, 2 M thiourea and 10 mM Tris. Proteins were quantified by Bradford protein assay (Bio-Rad, Hercules, CA USA). The 100 µg of proteins were reduced by 5 mM DTT for 30 min at 50 °C and alkylated by 10 mM iodoacetamide for 20 min at room temperature. Proteins were dissolved in 40 mM ammonium bicarbonate and digested by 1 µg sequence-grade modified trypsin (Promega, Madison, WI, USA) at 37 °C overnight. The reaction was stopped by adding trifluoroacetic acid (TFA) to the final concentration of 1%. 20 µg of each sample was desalted by Empore octadecyl C18 extraction disks (Supelco, USA) and then was dried in a vacuum concentrator. iTRAQ labeling (Applied Biosystems, Foster City, CA, USA) was conducted according to the manufacturer's manual. Proteins were labeled with the iTRAQ tags as follows: wild type - 113-115, PSEP3 KO - 116,119,121 ones; wild type (with estradiol) - 113, 114, 116, ОЕ РЕР3 (with estradiol) - 117,118, 121 ones. Samples were mixed, vacuum dried and dissolved in 1% TFA. The mixture was desalted on SCX extraction disks (Supelco, USA): columns were washed by 0.1% TFA and eluted by 5% ammonium hydroxide in 80% acetonitrile. Each sample was dried in a vacuum concentrator and dissolved in 3% acetonitrile with 0.1% trifluoroacetic acid.

Liquid chromatography and mass spectrometry

LC-MS/MS analysis was conducted as described in our previous studies [15,16]. Peptides were separated on Acclaim PepMap 100 C18 (75 µm × 50 cm) (Thermo Fisher Scientific). Reverse-phase chromatography was performed with an Ultimate 3000 Nano LC System (Thermo Fisher Scientific), which was coupled to the Q Exactive HF benchtop Orbitrap mass spectrometer (Thermo Fisher Scientific) via a nanoelectrospray source (Thermo Fisher Scientific). Peptides in 5 µL of loading buffer (3% (vol/vol) acetonitrile, 0.1% (vol/vol) TFA in Milli-Q deionized water) were loaded on a trapping column PepMap 100 C18 (0.1 × 20 mm) (Thermo Fisher Scientific) at a flow rate of 5 µL/min for 6 min. NanoLC pump mobile phases were: A – (2% (vol/vol) acetonitrile, 0.1% (vol/vol) formic acid in Milli-Q deionized water; B - (80% (vol/vol) acetonitrile, 0.1% (vol/vol) formic acid, 19.9% (vol/vol) milli-Q deionized water. Peptides were eluted from the trapping column with a linear gradient: 5–28% B for 90 min; 28-45% B for 20 min, and 45–100% B for 7 min at a flow rate of 350 nL/min. After each gradient, the column was washed with 100% buffer B for 5 min and reequilibrated with buffer A for 10 min. Peptides were analyzed on a mass spectrometer, with one full scan (375–1400 m/z, R = 120,000 at 200 m/z) at a target of 3*106 ions and max ion fill time 50 ms, followed by up to 15 data-dependent MS/MS scans with higher-energy collisional dissociation (HCD) (target 1*105 ions, max ion fill time 100 ms, isolation window 1.2 m/z, normalized collision energy (NCE) 32%), detected in the Orbitrap (R = 30,000 at fixed first mass 100 m/z). Other settings: charge exclusion – unassigned, 1, > 6; peptide match – preferred; exclude isotopes – on; dynamic exclusion – 60 s was enabled.

Protein identification and quantification

Tandem mass spectra were analysed by PEAKS Studio version 8.0 software (Bioinformatics Solutions Inc., Waterloo, Canada) [17]. The custom database was built from the Phytozome database P. patens combined with chloroplast and mitochondrial proteins (33,053 records) (Supplementary file 1). The database search was performed with the following parameters: a fragmentation mass tolerance of 0.05 Da; parent ion tolerance of 10 ppm; fixed modification – carbamidomethylation; variable modifications – oxidation (M) and acetylation (Protein N-term). The resulting protein list was filtered by a 1% false discovery rate (FDR). PEAKS Q was used for iTRAQ quantification. Normalization was performed by averaging the abundance of all peptides. Median values were used for averaging. Given that iTRAQ quantification typically underestimates the degree of real fold changes between two samples, differential protein screening was performed using a fold change ratio ⩾ 1.20 (for upregulated DEPs) or ⩽ 0.83 (for downregulated DEPs) and Significance ⩾ 15. Protein significance analysis was performed in PEAKS 8.0 (Bioinformatics Solutions Inc., Waterloo, Canada) [17]. A two-tailed t-test was used for calculation of significant differences in protein ratios (P < 0.05).

Statistical analysis

Two-component principal component analysis (PCA) was performed based on standardized intensity iTRAQ values of each protein group in the samples using the scikit-learn Python package [18]. Standardization and decomposition were performed in the scikit-learn package with default parameters. The columns "Coverage (%)'', ``Avg. Mass'' and ``#Peptides" from PEAKs result tables were used for analysis of quality control metrics of the proteomic datasets. Peptide lengths were counted as differences between Start and Stop peptide positions. Visualizations were made in Python using module seaborn 0.11.1 [19].

Ethics Statement

Not applicable.

CRediT authorship contribution statement

Anna Mamaeva: Investigation, Writing – original draft. Andrey Knyazev: Investigation. Anna Glushkevich: Writing – review & editing, Visualization. Igor Fesenko: Formal analysis, Writing – review & editing, Data curation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article.

Subject	Omics: Proteomics
Specific subject area	Plant quantitative proteomics
Type of data	Table
How data were acquired	Raw data were acquired with mass spectrometry using Q Exactive HF benchtop Orbitrap mass spectrometer (Thermo Fisher Scientific) and iTRAQ kit, analysis was performed using PEAKS Software 8.0
Data format	Raw and analyzed data
Description of data collection	The protonemata of WT, PSEP3 KO and PSEP3 OE mutant lines were grown in 200 ml liquid BCD medium supplemented with 5 mM ammonium tartrate (BCDAT) during a 16 h photoperiod at 25 °C [9]. After 7 days of cultivation, protonemata were collected for analysis. The experiment was performed in three biological replicates.
Data source location	Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of SciencesMoscowRussia
Data accessibility	Data identification number: 10.17632/NNCZRR9Y32.1 Anna Glushkevich, 2021. Quantitative proteomic dataset of the moss Physcomitrium patens PSEP3 KO and OE mutant lines. Direct URL to data:https://data.mendeley.com/datasets/nnczrr9y32/1
Related research article	I. Fesenko, S.A. Shabalina, A. Mamaeva, A. Knyazev, A. Glushkevich, I. Lyapina, R. Ziganshin, S. Kovalchuk, D. Kharlampieva, V. Lazarev, M. Taliansky, E.V. Koonin, A vast pool of lineage-specific microproteins encoded by long non-coding RNAs in plants, Nucleic Acids Research, 49, (2021) 10328–10346, 10.1093/nar/gkab816

17 in total

1. PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry.

Authors: Bin Ma; Kaizhong Zhang; Christopher Hendrie; Chengzhi Liang; Ming Li; Amanda Doherty-Kirby; Gilles Lajoie
Journal: Rapid Commun Mass Spectrom Date: 2003 Impact factor: 2.419

Review 2. Metacaspases.

Authors: L Tsiatsiani; F Van Breusegem; P Gallois; A Zavialov; E Lam; P V Bozhkov
Journal: Cell Death Differ Date: 2011-05-20 Impact factor: 15.828

3. mTORC1 and muscle regeneration are regulated by the LINC00961-encoded SPAR polypeptide.

Authors: Akinobu Matsumoto; Alessandra Pasut; Masaki Matsumoto; Riu Yamashita; Jacqueline Fung; Emanuele Monteleone; Alan Saghatelian; Keiichi I Nakayama; John G Clohessy; Pier Paolo Pandolfi
Journal: Nature Date: 2016-12-26 Impact factor: 49.962

4. A plant-specific dynamin-related protein forms a ring at the chloroplast division site.

Authors: Shin-ya Miyagishima; Keiji Nishida; Toshiyuki Mori; Motomichi Matsuzaki; Tetsuya Higashiyama; Haruko Kuroiwa; Tsuneyoshi Kuroiwa
Journal: Plant Cell Date: 2003-03 Impact factor: 11.277

5. The cancer-associated microprotein CASIMO1 controls cell proliferation and interacts with squalene epoxidase modulating lipid droplet formation.

Authors: Maria Polycarpou-Schwarz; Matthias Groß; Pieter Mestdagh; Johanna Schott; Stefanie E Grund; Catherina Hildenbrand; Joachim Rom; Sebastian Aulmann; Hans-Peter Sinn; Jo Vandesompele; Sven Diederichs
Journal: Oncogene Date: 2018-05-16 Impact factor: 9.867

6. Micropeptide CIP2A-BP encoded by LINC00665 inhibits triple-negative breast cancer progression.

Authors: Binbin Guo; Siqi Wu; Xun Zhu; Liyuan Zhang; Jieqiong Deng; Fang Li; Yirong Wang; Shenghua Zhang; Rui Wu; Jiachun Lu; Yifeng Zhou
Journal: EMBO J Date: 2019-11-22 Impact factor: 11.598

7. A vast pool of lineage-specific microproteins encoded by long non-coding RNAs in plants.

Authors: Igor Fesenko; Svetlana A Shabalina; Anna Mamaeva; Andrey Knyazev; Anna Glushkevich; Irina Lyapina; Rustam Ziganshin; Sergey Kovalchuk; Daria Kharlampieva; Vassili Lazarev; Michael Taliansky; Eugene V Koonin
Journal: Nucleic Acids Res Date: 2021-10-11 Impact factor: 16.971

8. Distinct types of short open reading frames are translated in plant cells.

Authors: Igor Fesenko; Ilya Kirov; Andrey Kniazev; Regina Khazigaleeva; Vassili Lazarev; Daria Kharlampieva; Ekaterina Grafskaia; Viktor Zgoda; Ivan Butenko; Georgy Arapidi; Anna Mamaeva; Vadim Ivanov; Vadim Govorun
Journal: Genome Res Date: 2019-08-06 Impact factor: 9.043

9. The Resistance Responses of Potato Plants to Potato Virus Y Are Associated with an Increased Cellular Methionine Content and an Altered SAM:SAH Methylation Index.

Authors: Nadezhda Spechenkova; Igor A Fesenko; Anna Mamaeva; Tatyana P Suprunova; Natalia O Kalinina; Andrew J Love; Michael Taliansky
Journal: Viruses Date: 2021-05-21 Impact factor: 5.048

10. System for stable β-estradiol-inducible gene expression in the moss Physcomitrella patens.

Authors: Minoru Kubo; Akihiro Imai; Tomoaki Nishiyama; Masaki Ishikawa; Yoshikatsu Sato; Tetsuya Kurata; Yuji Hiwatashi; Ralf Reski; Mitsuyasu Hasebe
Journal: PLoS One Date: 2013-09-27 Impact factor: 3.240