Literature DB >> 27813701

UNcleProt (Universal Nuclear Protein database of barley): The first nuclear protein database that distinguishes proteins from different phases of the cell cycle.

Nicolas Blavet¹, Jana Uřinovská², Hana Jeřábková¹, Ivo Chamrád², Jan Vrána¹, René Lenobel², Jana Beinhauer², Marek Šebela², Jaroslav Doležel¹, Beáta Petrovská¹.

Abstract

Proteins are the most abundant component of the cell nucleus, where they perform a plethora of functions, including the assembly of long DNA molecules into condensed chromatin, DNA replication and repair, regulation of gene expression, synthesis of RNA molecules and their modification. Proteins are important components of nuclear bodies and are involved in the maintenance of the nuclear architecture, transport across the nuclear envelope and cell division. Given their importance, the current poor knowledge of plant nuclear proteins and their dynamics during the cell's life and division is striking. Several factors hamper the analysis of the plant nuclear proteome, but the most critical seems to be the contamination of nuclei by cytosolic material during their isolation. With the availability of an efficient protocol for the purification of plant nuclei, based on flow cytometric sorting, contamination by cytoplasmic remnants can be minimized. Moreover, flow cytometry allows the separation of nuclei in different stages of the cell cycle (G1, S, and G2). This strategy has led to the identification of large number of nuclear proteins from barley (Hordeum vulgare), thus triggering the creation of a dedicated database called UNcleProt, http://barley.gambrinus.ueb.cas.cz/ .

Entities: Chemical Disease Gene Species

Keywords: barley; cell cycle; database; flow-cytometry; localization; mass spectrometry; nuclear proteome; nucleus

Mesh：

Substances：

Year: 2016 PMID： 27813701 PMCID： PMC5287097 DOI： 10.1080/19491034.2016.1255391

Source DB: PubMed Journal: Nucleus ISSN： 1949-1034 Impact factor: 4.197

Introduction

The nucleus is a key feature of eukaryotic cells and is characterized by the high complexity and dynamic organization of its components. Inside the nucleus, the nuclear genome is organized both in a hierarchical manner and within discrete territories. In addition to DNA, the cell nucleus contains nuclear bodies and specialized sub-nuclear compartments (for a review see refs.). Nuclear compartmentalization reflects the spatial arrangement of the genome and the DNA-related processes that occur in this organelle. Next to DNA and RNA, the most abundant class of molecules present in the nucleus are nuclear proteins. They play a major role in DNA assembly and its packing in the small space of the nuclear volume. The complex network of nuclear proteins performs diverse functions that are essential for maintaining dynamically changing genome organization and regulating of gene expression. Proteins also form the main building blocks of nuclear membrane pores, constitute lamina or lamina-like structures that modulate the size and shape of the nucleus, play an important role in chromatin organization and gene expression, and connect the nuclear lamina to the cytoskeleton, which is necessary for both nuclear positioning and migration. Though proteins represent an abundant and indispensable part of the nucleus, knowledge of the plant nuclear proteome remains limited. For that reason, scientists associated with the International Plant Nucleus Consortium have, since its establishment, been focusing on the detailed characterization of specific nuclear proteins and overall nuclear protein composition to obtain more information about the complex nuclear machinery. The first list of identified plant nuclear proteins was published in 2003 in connection with the sequenced genome of Arabidopsis thaliana. Since then, the nuclear proteomes from different types of tissues have been studied in several species (for a review, see ref.), including Arabidopsis, rice, hot pepper, chickpea, barrel clover, maize, black-stick lily, soybean and, just recently, wheat. The main aim of these studies was the analysis of nuclear proteome changes in response to abiotic and biotic stresses. Petrovská et al. used a new approach to purify the cell nuclei of barley (Hordeum vulgare) and identified 803 proteins in G1-phase nuclei. In a subsequent study, the team analyzed proteins from S- and G2-phase nuclei, which led to a dramatic increase in the total number of identified nuclear proteins of barley (Uřinovská et al., unpublished data). To organize this large amount of data, a database system of barley nuclear proteins became necessary. Here, we report on the creation of a barley (H. vulgare L., cv. Morex) nuclear protein database (UNcleProt, http://barley.gambrinus.ueb.cas.cz/) that provides information about the barley nuclear proteins that have been identified to date. The acronym UNcleProt is based on a truncated anagram “uncle” from the term “nuclear.” Besides containing the anagram, the term UNcleProt has a self-explanatory feature: U stands for “Universal“ (3 phases of the cell cycle are covered), “Ncle” for “Nuclear” and “Prot” for proteins. UNcleProt is available freely. In addition to survey sequences, a reference barley genome with fully assembled and annotated chromosomes is being completed, and the barley nuclear proteome database will be its perfect complement. Together, the 2 resources will represent one of the most complete data source on the barley nuclear proteome.

Results

The UNcleProt database includes 6 main pages: Home, Browse, Query, Blast, Download and Remarks, which are accessible via the navigation bar. The Home page provides information on the database with a short summary and 2 pictures: a Venn diagram showing the number of proteins identified thus far in each phase of the cell cycle (Fig. 1), the number of peptides detected, and a graph showing the most abundant gene ontology (GO) terms, available with Uniprot protein annotations, for each cell cycle phase (Fig. 2). A link to the first related article from our laboratory by Petrovská et al. is also provided. The Browse page contains a table with all proteins identified in the nuclei from different phases of the cell cycle (G1, S and G2) and shows their accession numbers and names, provides information on the cell cycle phase during which the proteins occur and finally presents a link to access the respective Protein and Peptide Information page for each selected protein (described below). The Query page (Fig. 3A) allows the user to search the database for accession numbers or keywords, and the result is displayed as a table similar to the one generated by the Browse page (Fig. 3B). The Blast page makes it possible to run a BLAST search using the blastp or blastx programs against each identified protein sequence. The Download page allows the set of proteins identified for each cell cycle phase to be retrieved either separately or all together. Finally, the Remarks page provides technical information on the mass spectrometry results.

Figure 1.

A Venn diagram showing the number of barley nuclear proteins identified in 3 different phases of cell cycle (unique and shared identifications).

Figure 2.

Distribution of the 10 most abundant GO terms in all 3 categories: biological process, molecular function and cellular component. GO terms were extracted from Uniprot annotations. The results are shown separately for each of the cell cycle phases.

Figure 3.

Query search and result pages: (A) a screenshot of the Query page, where searching is possible by either accession number or keyword; (B) a screenshot of the Query page with a table containing database search results.

A Venn diagram showing the number of barley nuclear proteins identified in 3 different phases of cell cycle (unique and shared identifications). Distribution of the 10 most abundant GO terms in all 3 categories: biological process, molecular function and cellular component. GO terms were extracted from Uniprot annotations. The results are shown separately for each of the cell cycle phases. Query search and result pages: (A) a screenshot of the Query page, where searching is possible by either accession number or keyword; (B) a screenshot of the Query page with a table containing database search results. In addition to the primary page layout, the Protein and Peptide Information page provides detailed information on the identified proteins (Fig. 4). The data are presented in 2 tables: the first summarizes information available from the UniProt database, such as the ID number with a hyperlink, protein name, sequence length, keywords, database cross-references such as GO and protein domain IDs, and then a sequence coverage value based on the identification by mass spectrometry (MS) and the whole amino acid sequence, in which the coverage by sequenced peptides is highlighted in red. Information on the availability of a known 3D structure is accessible through links to the appropriate databases: the Database of Comparative Protein Structure Models (ModBase) and the Database of Protein Disorder and Mobility Annotations (MobiDB, created by BiocomputingUP Lab, University of Padua, Italy). However, because a majority of the barley amino acid sequences have not yet been used for structure modeling, a model calculation in the ModBase usually takes approximately 2 d.

Figure 4.

Protein and Peptide Information page. This page presents information related to a particular protein, including the corresponding MS data and amino acid sequences of the identified peptides.

Protein and Peptide Information page. This page presents information related to a particular protein, including the corresponding MS data and amino acid sequences of the identified peptides. Most of the identified barley nuclear proteins are annotated as uncharacterized or predicted [62% (1505) and 24% (594), respectively] in the UniProt. For such proteins, a table row called GenBank might contain the 10 best hits obtained from the GenBank nucleotide database (nt). At the end of the table, a summary of the MS results with the corresponding Mascot score and the number of peptides detected for each cell cycle phase is presented. Importantly, a vast majority of uncharacterized or predicted proteins could be annotated by looking for orthologous sequences using blastp or conserved domain searches and this information will gradually be added and will appear in new releases of the database. Only less than 5 % of the total number of proteins identified in each cell cycle phase have not yet been provided with at least such an annotation. To validate the subcellular localization of the unknown, predicted or uncharacterized proteins identified by MS, 2 subcellular and one subnuclear localization programs have been used. These predictions are shown in the first table (“Protein Information”). Based on the accuracy of 6 subcellular localization programs tested by Xiong et al., we chose Plant-mPLoc (http://www.csbio.sjtu.edu.cn/bioinf/plant-multi/ and CELLO (http://cello.life.nctu.edu.tw/). Plant-mPLoc employs sequence-based predictions (amino acid composition, functional domains, sequential evolution features, GO terms). CELLO relies on feature search methods employing amino acid composition, dipeptide composition, portioned amino acid composition and the physicochemical properties of amino acids. The prediction accuracy for all identified barley nuclear proteins was monitored using the subnuclear web server Nuc-Ploc (http://www.csbio.sjtu.edu.cn/bioinf/Nuc-PLoc/). This tool specifically predicts nuclear localization with an accuracy of nearly 100 %. The second table (“Peptide Information”) contains data on all identified peptides related to a given protein, such as the cell cycle phase of the source nuclei, mass spectrometer used, sequence, measured mass/charge ratio, measured mass and charge, deviation calculated for the experimental and theoretical mass, retention time (min) in the liquid chromatographic separation of the respective digest, precursor ion intensity, peptide score given by Mascot search engine, position in the amino acid sequence, number of missed cleavage (MCV) sites and chemical modifications, i.e., if there is a modified amino acid present compared to a reference sequence.

Discussion

The UNcleProt database described here is built on the protein identification results that we have achieved since introducing the flow cytometric sorting method for barley nuclear proteomics. This approach results in low contamination by cytosolic compounds, allows the purification of nuclei from different phases of the cell cycle and is applicable even to mitotic chromosomes. Barley is a suitable model for studying the nuclear proteomes of crops because it is a diploid self-pollinating species with a range of genetic and genomic resources available. Moreover, protocols for genetic transformation and a range of both mutant lines and TILLING populations are available to facilitate functional analyses. Barley's large genome (1C = 5,428 Mbp) is reflected by its large nuclei and the high yield of proteins obtained from flow-sorted nuclei. Numerous and notable results have recently been obtained, including a draft genome sequence of barley. Finally, the reference genome sequence to be published in 2016 (N. Stein, personal communication) will provide a perfect match to the barley nuclear protein database. This barley nuclear proteome database (UNcleProt) contains 2,429 proteins identified via both MALDI-MS (matrix assisted laser desorption ionization – mass spectrometry) and ESI-MS (electrospray ionization – mass spectrometry) from G1, S and G2 cell cycle phase nuclei (Fig. 1) (Uřinovská et al., unpublished results). To complete this protein set, a total of 34,675 peptides have been identified and assigned to the corresponding protein sequences. To date, none of the existing nuclear databases such as the yeast nuclear database, vertebrate nuclear database (http://npd.hgu.mrc.ac.uk/user/), rice nuclear database (http://gene64.dna.affrc.go.jp/RPD/main.html), or TAIR (www.arabidopsis.org) incorporate the deposition of nuclear proteins identified in different stages of the cell cycle. Both protein level or localization changes and the related regulatory aspects across the cell cycle are of a big interest due to their connection with fundamental processes of cell biology (such as cell division and signal transduction) and the pathology of diseases. To our best knowledge, cell cycle phase-related alterations in plant nuclear proteomes have not yet been explored in detail. For human cells, differential proteomics studies covering the cell cycle dependence of nuclear proteins are typically targeted on a specific subgroup or process. The dynamic proteomics approach involving fluorescent tagging and microscopy demonstrated cell-cycle dependence in concentration levels for 8 of the analyzed 124 nuclear proteins from a human lung cancer cell library. In a large scale comparative experiment with G1-, G2- and S-phase nuclei, which provided the largest portion of protein identifications archived in the UNcleProt database, 266 nuclear proteins in total were found enriched at least 2-fold in different cell cycle phases based on a semiquantitative spectral counting approach. For this group of proteins, Fig. 5 summarizes GO terms describing associated biological processes and shows a distribution of the observed cell cycle phase-dependent changes. Many of the available annotations for the identified barley nuclear proteins lack any functional information in current versions of protein sequence databases. Of the whole set, 1/4 are only predicted proteins, and almost 2/3 are still uncharacterized. Nevertheless, numerous uncharacterized/predicted proteins could be annotated by blastp search, conserved domain search or associated with GO terms to tentatively describe the molecular function, biological process and cellular component. Not surprisingly, most of the barley nuclear proteins have DNA- or nucleic acid-binding functions and are involved in DNA-related processes. A high proportion of multiple cellular localizations was predicted by the programs CELLO and Plant-mPLoc. Nuc-PLoc suggested frequent nucleolar localization among the identified proteins (Fig. 6). These predictions, however, must be considered with caution and will require experimental evidence.

Figure 5.

Figure 6.

Predicted subcellular localization. The subcellular localization of nuclear proteins identified in 3 phases of the cell cycle as predicted by CELLO, Plant-mPLoc and Nuc-PLoc.

Extracted GO terms related to biological functions for proteins enriched in particular phases of the cell cycle. The radius of each circle denotes the total count belonging to the respective term and the pie chart inside shows its distribution across the cell cycle. Predicted subcellular localization. The subcellular localization of nuclear proteins identified in 3 phases of the cell cycle as predicted by CELLO, Plant-mPLoc and Nuc-PLoc. The UNcleProt database will be extended in the future by including tandem mass spectra (MS/MS spectra) for all sequenced peptides with their fragmentation patterns. Such a spectral library is expected to become useful as a reference for protein/peptide identification in other laboratories. The database is intended as an open system and will be continuously updated by adding newly discovered barley nuclear proteins. Links to all relevant publications and research output concerning the barley nuclear proteome will also be added.

Materials and methods

The 6-rowed diploid malting cultivar of barley (Hordeum vulgare L.) cv Morex was used as an experimental plant material. All steps for the separation of barley nuclei and subsequent proteomic analyses were performed according to Petrovská et al. Briefly, approx. 3 cm-long roots of barley were fixed in formaldehyde at 5°C for 10 min and washed twice in the Tris buffer. Root tips were homogenized in 1 ml of LB01-P lysis buffer (15 mM Tris, 2 mM Na2EDTA, 0.5 mM EGTA, 80 mM KCl, 20 mM NaCl, 0.1% v/v Triton X-100, 0.2 mM spermine, 0.5 mM spermidine, 14 mM 2-mercaptoethanol). Crude homogenate was filtered through a 20-µm nylon mesh, stained with 2 µg/ml 4′,6-diamidino-2-phenylindole (DAPI) and subjected to flow cytometric sorting (FACSAria SORP, BD Biosciences, San Jose, Calif., USA). Nuclei at various phases of cell cycle (G1, S, G2) were sorted into tubes containing 1 ml of LB01-P buffer supplemented with 100 mM PMSF and pelleted (300 g, 4°C, 30 min). Crude protein was extracted from the pelleted nuclei (5 million) together with a DNA digestion by DNase I in the treatment buffer. The sample was centrifuged (25,000 g, 15 min) and proteins present in the supernatant were recovered by adding 4 volumes of cold acetone (−20°C) with incubation at −20°C for at least 24 h. The acetone precipitate was collected by centrifugation (25,000 g, 15 min) and finally dissolved in 50 µl of Laemmli's sample buffer containing 2-mercaptoethanol, sonicated for 10 min and heated at 100°C for 10 min. The pellet obtained after the DNase I digestion was dissolved directly in Laemmli's buffer at 100°C. Nuclear proteins were separated by SDS-PAGE in 4% stacking and 10% resolving gels and in-gel digested by a modified thermostable trypsin. The obtained peptides were purified and analyzed by liquid chromatography coupled with tandem mass spectrometry. Raw MS and MS/MS data were processed by the software supplied with the instruments. Database searches were performed using Mascot Server 2.4 search engine (Matrix Science, London, UK) against a custom-made barley protein sequence database (105,041 sequences) downloaded from the UniProt depository and supplemented with sequences of common contaminants and reversed sequences of the barley proteins for the determination of false discovery rate (FDR). The identified protein sequences were queried against Arabidopsis thaliana, Oryza sativa and Zea mays subsets of the UniProtKB/Swiss-Prot database using the BLASTP algorithm (http://blast.ncbi.nlm.nih.gov/Blast.cgi) with BLOSUM62 matrix and following required thresholds: E-value of 1E-10; sequence identity of 70 %. For all assigned orthologs, protein names were downloaded from the UniProtKB/Swiss-Prot database. Domain searches were performed against CDD database utilizing CD search tool in a batch mode (http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi). GO Retriever application provided by the AgBase database was employed to predict GO terms for the respective identified proteins. UNcleProt database is built on a MySQL v.5.5.46 (http://www.mysql.com) database, and the web interface uses PHP v.5.4.45 (http://www.php.net), which makes the database easy to navigate. UNcleProt is hosted on a Debian 7 server with 16 CPUs and 94 Gb (gigabit) of memory, located in the Institute of Experimental Botany, Centre of the Region Haná for Biotechnological and Agricultural Research, Olomouc. The database size is currently approximately 230 Mo (mega octet) and is expected to grow as new experimental data are added, with new database releases appearing accordingly. The input data to be processed and included are usually tab-delimited text files and any addition to the database can be possible by contacting the administrator (N. Blavet). Data downloading is easy, as described in the Results section. The UNcleProt database is freely available at http://barley.gambrinus.ueb.cas.cz/. The website uses cookies to maintain information during the sessions and is compatible with all up-to-date major web browsers (tested on Firefox 46, Internet Explorer 11, Safari 5.34, Opera 37, Chrome 51).

Conclusions

UNcleProt is the first dedicated database containing plant nuclear proteins identified in nuclei during different stages of the cell cycle. This dataset will contribute to the understanding of plant nuclear proteins and their functions. It may become an important resource for plant cell biologists and contribute to the effort to understand the nuclear architecture and its relationship to genome function. Among other advantages for researchers, the database will facilitate the functional analyses of yet uncharacterized nuclear proteins.

52 in total

1. Prediction of protein subcellular localization.

Authors: Chin-Sheng Yu; Yu-Ching Chen; Chih-Hao Lu; Jenn-Kang Hwang
Journal: Proteins Date: 2006-08-15

2. Functional study of hot pepper 26S proteasome subunit RPN7 induced by Tobacco mosaic virus from nuclear proteome analysis.

Authors: Boo-Ja Lee; Sun Jae Kwon; Sung-Kyu Kim; Ki-Jeong Kim; Chang-Jin Park; Young-Jin Kim; Ohkmae K Park; Kyung-Hee Paek
Journal: Biochem Biophys Res Commun Date: 2006-10-23 Impact factor: 3.575

3. Plant-mPLoc: a top-down strategy to augment the power for predicting plant protein subcellular localization.

Authors: Kuo-Chen Chou; Hong-Bin Shen
Journal: PLoS One Date: 2010-06-28 Impact factor: 3.240

4. A physical, genetic and functional sequence assembly of the barley genome.

Authors: Klaus F X Mayer; Robbie Waugh; John W S Brown; Alan Schulman; Peter Langridge; Matthias Platzer; Geoffrey B Fincher; Gary J Muehlbauer; Kazuhiro Sato; Timothy J Close; Roger P Wise; Nils Stein
Journal: Nature Date: 2012-10-17 Impact factor: 49.962

5. Dehydration-responsive nuclear proteome of rice (Oryza sativa L.) illustrates protein network, novel regulators of cellular adaptation, and evolutionary perspective.

Authors: Mani Kant Choudhary; Debarati Basu; Asis Datta; Niranjan Chakraborty; Subhra Chakraborty
Journal: Mol Cell Proteomics Date: 2009-03-25 Impact factor: 5.911

6. Comparative proteomics of dehydration response in the rice nucleus: new insights into the molecular basis of genotype-specific adaptation.

Authors: Dinesh Kumar Jaiswal; Doel Ray; Mani Kant Choudhary; Pratigya Subba; Amit Kumar; Jitendra Verma; Rajiv Kumar; Asis Datta; Subhra Chakraborty; Niranjan Chakraborty
Journal: Proteomics Date: 2013-12 Impact factor: 3.984

Review 7. TILLING: a shortcut in functional genomics.

Authors: Marzena Kurowska; Agata Daszkowska-Golec; Damian Gruszka; Marek Marzec; Miriam Szurman; Iwona Szarejko; Miroslaw Maluszynski
Journal: J Appl Genet Date: 2011-09-13 Impact factor: 3.240

8. AgBase: a functional genomics resource for agriculture.

Authors: Fiona M McCarthy; Nan Wang; G Bryce Magee; Bindu Nanduri; Mark L Lawrence; Evelyn B Camon; Daniel G Barrell; David P Hill; Mary E Dolan; W Paul Williams; Dawn S Luthe; Susan M Bridges; Shane C Burgess
Journal: BMC Genomics Date: 2006-09-08 Impact factor: 3.969

9. High-throughput Agrobacterium-mediated barley transformation.

Authors: Joanne G Bartlett; Sílvia C Alves; Mark Smedley; John W Snape; Wendy A Harwood
Journal: Plant Methods Date: 2008-09-26 Impact factor: 4.993

10. Changes in the nuclear proteome of developing wheat (Triticum aestivum L.) grain.

Authors: Titouan Bonnot; Emmanuelle Bancel; Christophe Chambon; Julie Boudet; Gérard Branlard; Pierre Martre
Journal: Front Plant Sci Date: 2015-10-28 Impact factor: 5.753

3 in total

Review 1. Cereal Crop Proteomics: Systemic Analysis of Crop Drought Stress Responses Towards Marker-Assisted Selection Breeding.

Authors: Arindam Ghatak; Palak Chaturvedi; Wolfram Weckwerth
Journal: Front Plant Sci Date: 2017-06-02 Impact factor: 5.753

Review 2. Pseudotrypsin: A Little-Known Trypsin Proteoform.

Authors: Zdeněk Perutka; Marek Šebela
Journal: Molecules Date: 2018-10-14 Impact factor: 4.411

3. Comprehensive nuclear proteome of Arabidopsis obtained by sequential extraction.

Authors: Chieko Goto; Shoko Hashizume; Yoichiro Fukao; Ikuko Hara-Nishimura; Kentaro Tamura
Journal: Nucleus Date: 2019-12 Impact factor: 4.197

3 in total