Literature DB >> 31909114

Transcriptomic and proteomic data in developing tomato fruit.

Isma Belouah¹, Camille Bénard¹, Alisandra Denton², Mélisande Blein-Nicolas³, Thierry Balliau³, Emeline Teyssier⁴, Philippe Gallusci⁴, Olivier Bouchez⁵, Björn Usadel², Michel Zivy³, Yves Gibon¹, Sophie Colombié¹.

Abstract

Transcriptomic and proteomic analyses were performed on three replicates of tomato fruit pericarp samples collected at nine developmental stages, each replicate resulting from the pooling of at least 15 fruits. For transcriptome analysis, Illumina-sequenced libraries were mapped on the tomato genome with the aim to obtain absolute quantification of mRNA abundance. To achieve this, spikes were added at the beginning of the RNA extraction procedure. From 34,725 possible transcripts identified in the tomato, 22,877 were quantified in at least one of the nine developmental stages. For the proteome analysis, label-free liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) was used. Peptide ions, and subsequently the proteins from which they were derived, were quantified by integrating the signal intensities obtained from extracted ion currents (XIC) with the MassChroQ software. Absolute concentrations of individual proteins were estimated for 2375 proteins by using a mixed effects model from log10-transformed intensities and normalized to the total protein content. Transcriptomics data are available via GEO repository with accession number GSE128739. The raw MS output files and identification data were deposited on-line using the PROTICdb database (http://moulon.inra.fr/protic/tomato_fruit_development) and MS proteomics data have also been deposited to the ProteomeXchange with the dataset identifier PXD012877. The main added value of these quantitative datasets is their use in a mathematical model to estimate protein turnover in developing tomato fruit.

Entities: Chemical Disease Species

Keywords: Absolute quantification; Pericarp; Protein turnover; Proteomics; Time-series; Tomato fruit development; Transcriptomics

Year: 2019 PMID： 31909114 PMCID： PMC6938935 DOI： 10.1016/j.dib.2019.105015

Source DB: PubMed Journal: Data Brief ISSN： 2352-3409

Specifications Table The paired quantitative transcript-protein data with a sufficient resolution in time are rather rare, making it a valuable dataset for the plant science community. The dataset should be of interest to researchers looking for time-series and quantitative data of both transcripts and proteins. The dataset constitute a great potential for using this data set to compute not only protein turnover rates but also deduct regulatory mechanisms and identify candidate genes.

Data description

Tomato plants (Solanum lycopersicum cv. Moneymaker) were grown under conditions of commercial production in a greenhouse in the south-west of France. Samples were taken from pericarp of tomato fruits, at nine stages of tomato fruit development, on the 5th, 6th and 7th trusses [1] (Fig. 1). Transcriptomics and proteomics have been performed on these samples. Hierarchical clustering (Fig. 2) and principal component analyzes (Fig. 3) provide an overview of the transcriptome and proteome changes throughout the tomato fruit development. Transcriptomic data are available via GEO with accession number GSE128739. For proteomic, raw files and data are available on-line using the PROTICdb database (http://moulon.inra.fr/protic/tomato_fruit_development) and the ProteomeXchange with identifier PXD012877; quantitative data of proteins are provided in tables. All these data have been used to model protein turnover [2] and to study redox metabolism in the developing tomato fruit [3].

Fig. 1

Fig. 2

Hierarchical clustering analysis of (A) transcript and (B) protein concentrations from tomato at nine developmental stages. The hierarchical clustering analysis was performed using Pearson's correlation on mean centered and scaled data. Hierarchical clustering analysis was performed using plyr, gplots and reshape2 packages from R studio (R 3.3.2; http://www.rstudio.com/).

Fig. 3

Principal component analysis of (A) transcriptomics and (B) proteomics data (fmol.gFW−1). Data were mean centered and scaled. Developmental stages and replicates were distinguished by colors and shapes. Principal component analysis was performed using factoextra and gplots packages from R studio (R 3.3.2; http://www.rstudio.com/).

Experimental setup. (A) The nine stages of samples with corresponding physiological phases of the tomato fruit development (Solanum lycopersicum cv. ‘Moneymaker’). (B) Description of the analyzed tissue, the pericarp, composed of endocarp, mesocarp and exocarp in tomato fruit at the last stage of development. Hierarchical clustering analysis of (A) transcript and (B) protein concentrations from tomato at nine developmental stages. The hierarchical clustering analysis was performed using Pearson's correlation on mean centered and scaled data. Hierarchical clustering analysis was performed using plyr, gplots and reshape2 packages from R studio (R 3.3.2; http://www.rstudio.com/). Principal component analysis of (A) transcriptomics and (B) proteomics data (fmol.gFW−1). Data were mean centered and scaled. Developmental stages and replicates were distinguished by colors and shapes. Principal component analysis was performed using factoextra and gplots packages from R studio (R 3.3.2; http://www.rstudio.com/).

Experimental design, materials, and methods

Plant material

Tomato plants (Solanum lycopersicum cv. Moneymaker) were cultivated in a greenhouse at Sainte-Livrade (southwest of France, 44° 239 5699 N and 0° 359 2599E) in commercial practice conditions between June and October of 2010. Lateral stems were systematically removed to promote flowering and trusses were pruned to six fruits to limit fruit size heterogeneity. Based on age and color (OECD color gauge), fruits were harvested at nine stages expressed in days post anthesis (DPA), from green/young to red/ripened fruit (8, 15, 21, 28, 34, 42, 48, 50 and 53 DPA; Fig. 1). Each biological replicate was prepared with 15–50 fruits harvested on different plants but on the same truss, which was numbered according to its order of appearance on the plant, i.e. truss 5, 6 or 7. Gel and placenta were quickly removed before 1cm2 of equatorial pericarp zone was quickly cut into small pieces that were immediately shock-frozen in liquid nitrogen. Frozen samples were transported with a dry shipper, then ground into a fine powder with liquid nitrogen using a bead mill and stored at −80 °C. At the end 26 samples were analyzed, with only two biological replicates for the 48 DPA stage.

Transcriptomics

Total RNA extraction

Total RNA was isolated from 100 mg fresh weight aliquots of the frozen powdered samples using Plant RNA Reagent (PureLink kit, Invitrogen™) followed by DNase treatment (DNA-free kit, Invitrogen™), and purification over RNeasy Mini spin columns (RNeasy Plant Mini kit, QIAGEN) following manufacturer's instructions. Total RNA concentration was determined by spectrophotometry (260 nm) considering that an absorbance of 1 unit equals 40 μg of RNA per ml. RNA quality was determined by estimating the RNA integrity number (RIN) with a RNA 6000 Nano kit (Agilent) and an Agilent 2100 Bioanalyzer. A RIN of ‘10’ stands for non-degraded RNA whereas a RIN of ‘1’ stands for a completely degraded RNA. A subsample of at least 5 μg of total RNA from each of 26 RNA extracts was sent to the Get-Plage GenoTOUL facility (Toulouse, France). To determine the absolute concentration of transcript after transcriptome sequencing, eight internal standards (AM 1780, Ambion by Life technologies, Array Control RNA spikes, Invitrogen™) at selected concentrations (in mole, 3.97.10−14 [spike 1], 4.01.10−15 [spike 2], 4.01.10−16 [spike 3], 4.02.10−17 [spike 4], 4.08.10−18 [spike 5], 4.04.10−19 [spike 6], 3.82.10−20 [spike 7], and 3.82.10−21 [spike 8]) were spiked-in the plant extracts at the beginning of the RNA purification process.

Transcript sequencing

RNA-seq libraries were prepared according to Illumina's protocols on a Tecan EVO200 liquid handler using the Illumina TruSeq Stranded mRNA sample prep kit to analyze mRNA. Briefly, mRNA were selected using poly-T beads. Then, mRNA were fragmented to generate double stranded cDNA and adaptors were ligated to be sequenced. Ten cycles of PCR were applied to amplify libraries. Before being quantified by qPCR (Kapa Library Quantification Kit), RNA samples quality was evaluated using an Agilent Bioanalyzer. RNA-seq experiments have been performed on an Illumina HiSeq2000 or HiSeq2500 sequencer using a paired-end read length of 2 × 100 pb with the Illumina TruSeq SBS sequencing kits v3.

Transcriptome analysis and quantification

Genes were mapped to the Solanum lycopersicum HEINZ assembly v2.40, concatenated with the chloroplast (gi|544163592|ref|NC_007898.3|) and mitochondrial genomes (gi|209887431|gb|FJ374974.1|), and an “artificial chromosome” containing the 8 spike sequences (Supplemental Appendix S1). Genome data was downloaded from S. lycopersicum 2.5 and the corresponding ITAG2.4 gene models were downloaded from https://solgenomics.net/(34,725 entries). The quality of library sequencing was checked with FastQC [4]. Quality and adapter trimming was performed with Trimmomatic [5] v0.32. Trimmed reads were mapped to their respective genomes with Star [6] v2.4.2a and the unique counts per locus were quantified with HTSeq [7] v0.6.1. The number of transcripts per million (TPM) was calculated from the unique counts and gene length. The normalized number of fragments per kilobase per million (FPKM) was calculated with cufflinks v2.2.1. Briefly, quantification based on FPKM corresponds to the normalization of data by depth sequencing (summed fragment per sample) divided by one million followed by a normalization by the gene length. Non-default parameters that were used are presented in Supplemental Appendix S1. FPKM were then converted to TPM quantification to get relative transcript abundance among samples. Spikes were quantified as any other transcript. In order to preserve the native dynamic of RNA concentration through tomato fruit development (highest concentration before expansion phase), a standard curve was calculated for each sample. Each standard curve was determined from spiked-in concentrations and corresponding TPM values of the spikes.

Proteomics

Total protein extraction

Proteins were extracted by phenol extraction using a modified protocol described by Faurobert et al. [8]. Frozen powder of pericarp tissue (100 mg) was suspended in 10 ml of extraction buffer (0.5 M Tris-HCl pH 7.5, 0.7 M sucrose, 50 mM EDTA, 0.1 M KCl, 10 mM thiourea, 2 mM phenylmethylsulfonyl fluoride, 2% 2-mercaptoethanol). Then an equal volume of water-saturated phenol pH 8 (Ambion) was added and the mixture was incubated with steel beads on a shaker for 30 min and at 4 °C. After 30 min centrifugation (12,000 g at 4 °C), the phenol phase was recovered and transferred into a new tube with 10 ml of extraction buffer followed by shaking without steel beads, and centrifugation (30 min, 12,000 g, 4 °C). The phenol phase was recovered and proteins were precipitated by adding the equivalent of five volumes of cold methanol and 0.1 M acetate ammonium, and incubated overnight at −20 °C. After 30 min centrifugation (10,000 g, 4 °C), the protein pellet was gently washed with methanol and then with cold acetone before being dried in a fume hood. Proteins were then solubilized in 6 M urea, 2 M thiourea, 30 mM Tris HCl pH 8.8, 10 mM dithiotreitol, 0.1% (v/v) zwitterionic acid labile surfactant I (Protea) then quantified using the Plusone 2D Quant kit (GE Healthcare). Proteins were incubated at room temperature for 30 min then alkylated with 50 mM iodoacetamide for 60 min in the dark and at room temperature. Proteins were diluted ten times in 50 mM ammonium bicarbonate buffer to decrease total urea and thiourea concentrations, and then digested overnight at 37 °C with 800 ng trypsin. Trypsin digestion was stopped by acidification with 1% (w/v) trifluoroacetic acid. The resulting peptides were purified by solid phase extraction using a polymeric C18 column (Phenomenex) with a washing solution containing 0.06% (v/v) acetic acid and 3% (v/v) acetonitrile. After elution with 0.06% acetic acid and 40% acetonitrile, peptides were dried under vacuum (Speedvac).

Protein LC-MS/MS analyses

As described in Belouah et al. [2], LC-MS/MS analyses were performed using a NanoLC-Ultra System (nano2DUltra, Eksigent, Les Ulis, France) connected to a Q-Exactive mass spectrometer (Thermo Electron, Waltham, MA, USA). For each sample, about 800 ng of protein digest were loaded onto a Biosphere C18 precolumn (0.1 × 20 mm, 100 Å, 5 μm; Nanoseparation) at 7.5 μl min−1 and desalted with 0.1% formic acid and 2% acetonitrile. After 3 min, the pre-column was connected to a Biosphere C18 nanocolumn (0.075 × 300 mm, 100 Å, 3 μm; Nanoseparation). Electrospray ionization was performed at 1.3 kV with an uncoated capillary probe (10 μm tip inner diameter; New Objective, Woburn, MA, USA). Buffers were 0.1% formic acid in water (A) and 0.1% formic acid and 100% acetonitrile (B). Peptides were separated using a linear gradient from 5 to 35% buffer B for 110 min at 300 nl min−1. One run took 120 min, including the regeneration step at 95% buffer B and the equilibration step at 100% buffer A. Peptide ions were analyzed using Xcalibur 2.1 (Thermo Electron) with the following data-dependent acquisition steps: (1) MS scan (mass-to-charge ratio (m/z) 300 to 1,400, 70,000 resolution, profile mode), (2) MS/MS (17,500 resolution, normalized collision energy of 30, profile mode). Step 2 was repeated for the eight major ions detected in step (1). Dynamic exclusion was set to 30 seconds. Xcalibur raw datafiles were transformed to mzXML open source format using msconvert software in the ProteoWizard 3.0.3706 package [9]. During conversion, MS and MS/MS data were centroided. The raw MS output files were deposited on-line using PROTICdb database [[10], [11], [12]].

Protein identification

Protein identification was performed using the protein sequence database of S. lycopersicum Heinz assembly v2.40 (ITAG2.4) downloaded from https://solgenomics.net/(34,725 entries). A contaminant database containing the sequences of standard contaminants was also interrogated (58 entries with e.g., trypsin, keratin, and serum albumin). The decoy database comprised the reverse sequences of tomato proteins. Database search was performed with X!Tandem (version 2015.04.01.1; http://www.thegpm.org/TANDEM/) with the following settings. Carboxyamidomethylation of cysteine residues was set to static modification. Oxidation of methionine residues, acetylation or deamination of glutamine and cystein residues were set to possible modifications. Precursor mass precision was set to 10 ppm. Fragment mass tolerance was 0.02 Th. Only peptides with an E-value smaller than 0.05 were reported. Identified proteins were filtered and sorted by using X!TandemPipeline (version 3.3.4, [13]). Criteria used for protein identification were (1) at least two different peptides identified with an E-value smaller than 0.01, and (2) a protein E-value (product of unique peptide E-values) smaller than 10−5.

Peptide and protein quantification

Peptide ions were quantified using extracted ion chromatograms (XIC) and the MassChroQ software [14] version 2.2 with the following parameters: “ms2_1” alignment method, tendency_halfwindow of 10, MS1 smoothing halfwindow of 0, MS2 smoothing halfwindow of 15, “quant1” quantification method, XIC extraction based on max, min and max ppm range of 10, anti-spike half of 5, mean filter half hedge, minmax_half_edge and maxmin_half_edge respectively set to 2, 4, and 3. Detection thresholds on min and max at 30,000 and 50,000, respectively, peak post-matching mode. Peptides intensities of each sample were normalized using peptides intensities of a reference sample. In the reference sample, peptide ions extract of the 26 samples were pooled and analyzed (identification, quantification) using the same pipeline used for each sample. After removing shared and dubious peptide ions (standard deviation of retention time higher than 30 seconds), proteins were quantified based on a method named Model [15]. Briefly, peptide ion intensities were log10 transformed and quantified using a mixed effect model. Abundances of proteins are given in Table 1. Absolute quantification was approximated based on the “Total Protein Amount” approach [16], which is based on the main hypothesis that the sum of MS signal corresponds to the total protein content in the cell. Then the concentration of each protein is determined as a relative abundance of the total protein content (Equation (1)). With the concentration of each proteini (i = 1:2494) in the sample k (k = 1:26) in fmol gFW-1, n the total number of protein (n = 2494), (Total protein content) k the total amount of proteins in the sample k in g gFW-1 and the molar weight (in g.mol-1) of the proteini. Total protein quantification is given in Table 2 and the proxy of absolute concentration of proteins is given in Table 3.

Specifications Table

Subject	Plant Science
Specific subject area	Plant physiology, transcriptomic and proteomic quantitative data, tomato fruit development
Type of data	Tables and Figures
How data were acquired	Illumina-sequenced libraries for transcriptomics.Label-free LC-MS/MS for proteomics
Data format	Raw and transformed in quantitative concentrations (fmol.gFW-1) for both transcripts and proteins.
Parameters for data collection	Total proteins and transcripts were extracted from the fleshy part of the tomato fruit pericarp at 9 developmental stages, i.e. at 8, 15, 21, 28, 34, 42, 48, 50 and 53 days post-anthesis.
Description of data collection	Tomato plants were grown in a greenhouse under optimal conditions of commercial production. One sample results from the pooling of at least 15 fruits. Replicates 1, 2 and 3 correspond to the 5th, 6th and 7th truss respectively.
Data source location	INRA France.
Data accessibility	Transcriptomics data are available via Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo) an international public repository with accession number GSE128739For proteomics data, the raw MS output files and identification data were deposited on-line using the PROTICdb database (http://moulon.inra.fr/protic/tomato_fruit_development)The proteomics data have also been deposited to the ProteomeXchange with the dataset identifier PXD128739 (http://proteomecentral.proteomexchange.org)
Related research article	[2] Isma Belouah, Christine Nazaret, Pierre Pétriacq, Sylvain Prigent, Camille Bénard, Virginie Mengin, Mélisande Blein-Nicolas, Alisandra K. Denton, Thierry Balliau, Ségolène Augé, Olivier Bouchez, Jean-Pierre Mazat, Mark Stitt, Björn Usadel, Michel Zivy, Bertrand Beauvoit, Yves Gibon, Sophie ColombiéModeling Protein Destiny in Developing FruitPlant Physiology 2019. DOI: https://doi.org/10.1104/pp.19.00086

Value of the data

•

The paired quantitative transcript-protein data with a sufficient resolution in time are rather rare, making it a valuable dataset for the plant science community.

•

The dataset should be of interest to researchers looking for time-series and quantitative data of both transcripts and proteins.

•

The dataset constitute a great potential for using this data set to compute not only protein turnover rates but also deduct regulatory mechanisms and identify candidate genes.

13 in total

1. PROTICdb: a web-based application to store, track, query, and compare plant proteome data.

Authors: Hélène Ferry-Dumazet; Gwenn Houel; Pierre Montalent; Luc Moreau; Olivier Langella; Luc Negroni; Delphine Vincent; Céline Lalanne; Antoine de Daruvar; Christophe Plomion; Michel Zivy; Johann Joets
Journal: Proteomics Date: 2005-05 Impact factor: 3.984

2. MassChroQ: a versatile tool for mass spectrometry quantification.

Authors: Benoît Valot; Olivier Langella; Edlira Nano; Michel Zivy
Journal: Proteomics Date: 2011-08-04 Impact factor: 3.984

3. Management and dissemination of MS proteomic data with PROTICdb: example of a quantitative comparison between methods of protein extraction.

Authors: Olivier Langella; Benoît Valot; Daniel Jacob; Thierry Balliau; Raphaël Flores; Christine Hoogland; Johann Joets; Michel Zivy
Journal: Proteomics Date: 2013-04-05 Impact factor: 3.984

4. X!TandemPipeline: A Tool to Manage Sequence Redundancy for Protein Inference and Phosphosite Identification.

Authors: Olivier Langella; Benoît Valot; Thierry Balliau; Mélisande Blein-Nicolas; Ludovic Bonhomme; Michel Zivy
Journal: J Proteome Res Date: 2016-12-19 Impact factor: 4.466

5. Modeling Protein Destiny in Developing Fruit.

Authors: Isma Belouah; Christine Nazaret; Pierre Pétriacq; Sylvain Prigent; Camille Bénard; Virginie Mengin; Mélisande Blein-Nicolas; Alisandra K Denton; Thierry Balliau; Ségolène Augé; Olivier Bouchez; Jean-Pierre Mazat; Mark Stitt; Björn Usadel; Michel Zivy; Bertrand Beauvoit; Yves Gibon; Sophie Colombié
Journal: Plant Physiol Date: 2019-04-23 Impact factor: 8.340

6. Phenol extraction of proteins for proteomic studies of recalcitrant plant tissues.

Authors: Mireille Faurobert; Esther Pelpoir; Jamila Chaïb
Journal: Methods Mol Biol Date: 2007

7. ProteoWizard: open source software for rapid proteomics tools development.

Authors: Darren Kessner; Matt Chambers; Robert Burke; David Agus; Parag Mallick
Journal: Bioinformatics Date: 2008-07-07 Impact factor: 6.937

8. HTSeq--a Python framework to work with high-throughput sequencing data.

Authors: Simon Anders; Paul Theodor Pyl; Wolfgang Huber
Journal: Bioinformatics Date: 2014-09-25 Impact factor: 6.937

9. Regulation of Pyridine Nucleotide Metabolism During Tomato Fruit Development Through Transcript and Protein Profiling.

Authors: Guillaume Decros; Bertrand Beauvoit; Sophie Colombié; Cécile Cabasson; Stéphane Bernillon; Stéphanie Arrivault; Manuela Guenther; Isma Belouah; Sylvain Prigent; Pierre Baldet; Yves Gibon; Pierre Pétriacq
Journal: Front Plant Sci Date: 2019-10-11 Impact factor: 5.753

10. A "proteomic ruler" for protein copy number and concentration estimation without spike-in standards.

Authors: Jacek R Wiśniewski; Marco Y Hein; Jürgen Cox; Matthias Mann
Journal: Mol Cell Proteomics Date: 2014-09-15 Impact factor: 5.911

3 in total

1. Impact of subinhibitory concentrations of metronidazole on proteome of Clostridioides difficile strains with different levels of susceptibility.

Authors: Tri-Hanh-Dung Doan; Stéphanie Yen-Nicolaÿ; Marie-Françoise Bernet-Camard; Isabelle Martin-Verstraete; Séverine Péchiné
Journal: PLoS One Date: 2020-11-09 Impact factor: 3.240

2. Proteomic Analysis of KCNK3 Loss of Expression Identified Dysregulated Pathways in Pulmonary Vascular Cells.

Authors: Hélène Le Ribeuz; Florent Dumont; Guillaume Ruellou; Mélanie Lambert; Thierry Balliau; Marceau Quatredeniers; Barbara Girerd; Sylvia Cohen-Kaminsky; Olaf Mercier; Stéphanie Yen-Nicolaÿ; Marc Humbert; David Montani; Véronique Capuano; Fabrice Antigny
Journal: Int J Mol Sci Date: 2020-10-07 Impact factor: 5.923

3. Comparison of Different Label-Free Techniques for the Semi-Absolute Quantification of Protein Abundance.

Authors: Aarón Millán-Oropeza; Mélisande Blein-Nicolas; Véronique Monnet; Michel Zivy; Céline Henry
Journal: Proteomes Date: 2022-01-07

3 in total