Literature DB >> 21423406

Identification of drought-responsive universal stress proteins in viridiplantae.

Raphael D Isokpehi¹, Shaneka S Simmons, Hari H P Cohly, Stephen I N Ekunwe, Gregorio B Begonia, Wellington K Ayensu.

Abstract

Genes encoding proteins that contain the universal stress protein (USP) domain are known to provide bacteria, archaea, fungi, protozoa, and plants with the ability to respond to a plethora of environmental stresses. Specifically in plants, drought tolerance is a desirable phenotype. However, limited focused and organized functional genomic datasets exist on drought-responsive plant USP genes to facilitate their characterization. The overall objective of the investigation was to identify diverse plant universal stress proteins and Expressed Sequence Tags (ESTs) responsive to water-deficit stress. We hypothesize that cross-database mining of functional annotations in protein and gene transcript bioinformatics resources would help identify candidate drought-responsive universal stress proteins and transcripts from multiple plant species. Our bioinformatics approach retrieved, mined and integrated comprehensive functional annotation data on 511 protein and 1561 ESTs sequences from 161 viridiplantae taxa. A total of 32 drought-responsive ESTs from 7 plant genera Glycine, Hordeum, Manihot, Medicago, Oryza, Pinus and Triticum were identified. Two Arabidopsis USP genes At3g62550 and At3g53990 that encode ATP-binding motif were up-regulated in a drought microarray dataset. Further, a dataset of 80 simple sequence repeats (SSRs) linked to 20 singletons and 47 transcript assembles was constructed. Integrating the datasets on SSRs and drought-responsive ESTs identified three drought-responsive ESTs from bread wheat (BE604157), soybean (BM887317) and maritime pine (BX682209). The SSR sequence types were CAG, ATA and AT respectively. The datasets from cross-database mining provide organized resources for the characterization of USP genes as useful targets for engineering plant varieties tolerant to unfavorable environmental conditions.

Entities: CellLine Chemical Disease Gene Species

Keywords: Pfam; Uniprot; drought; expressed sequence tags; microsatellite; plants; salinity; simple sequence repeats; universal stress protein domain; viridiplantae

Year: 2011 PMID： 21423406 PMCID： PMC3045048 DOI： 10.4137/BBI.S6061

Source DB: PubMed Journal: Bioinform Biol Insights ISSN： 1177-9322

Introduction

Environmental stresses can negatively impact agricultural crop yield and quality.1,2 As an adaptive strategy, plant genomes encode genes that produce proteins that function in stress response and tolerance.3–5 Despite substantial research on response to abiotic and biotic stresses by plants, there are still knowledge gaps regarding the molecular mechanisms that regulate the diverse functions of environmental stress-associated plant genes and proteins.3 The increasing availability of genomic sequences of members of the viridiplantae (green algae and land plants) in combination with high-throughput bioinformatics tools and databases4,5 provide new opportunities for examining understudied gene families that could be central to stress response in plants. Genes encoding proteins that contain the conserved 140–160 residues Universal Stress Protein (USP) domain (Pfam Accession: PF00582) are known to provide bacteria, archaea, fungi, protozoa, and plants with the ability to respond to a plethora of environmental stresses.6–9 Nutrient starvation, drought, high salinity, extreme temperatures and exposure to toxic chemicals are examples of conditions that induce expression of genes with the USP domain. Proteins containing domain PF00582 are often collectively referred to as universal stress proteins. In Escherichia coli, the USPs have been grouped into four classes according to their structural analysis and amino acid sequence—Class I: UspA, UspC, UspD; Class II: UspF and UspG; and Class III and Class IV: Two Usp domains of UspE.10 The UspA domain of MJ0577 (also called 1MJH) from Methanocaldococcus jannaschii crystallizes with a bound ATP while the UspA domain of Haemophilus influenzae lacks both ATP-binding activity and ATP-binding residues.11,12 Structural alignment has shown that the second and third conserved glycines in the polypeptide of the ATP-binding loop G-2X-G-9X-G-(S/T) in 1MJH are replaced by bulky amino acids glutamine and methionine in UspA.11 The suggested ancestral function of the universal stress protein domain was nucleotide binding and signal transduction.16 Despite the knowledge of bacterial USP proteins, the functional diversity of the USPs in other organisms, including various plant species needs to be better defined.17,18 Kerk et al13 examined the sequence and structure of 44 Arabidopsis thaliana proteins containing similarity to the USP domain of bacteria and concluded that all Arabidopsis USPA domain-containing sequences have evolved from a 1MJH-like ancestor. Since, the publication, 13 there has been additional but limited studies aimed at understanding the function of universal stress proteins of A. thaliana.20–24 For example, AT5G54430 (AtPHOS32) and AT4G27320 (AtPHOS34) were shown to be phosphorylated in response to microbial elicitation of Arabidopsis cells.21,23 In addition, AtPHOS32 was proved to be a new substrate of the stress-regulated mitogen-activated protein kinases (MAPKs), AtMPK3 (AT3G45640) and AtMPK6 (AT2G43790). However, the precise functions of these two Arabidopsis USP as well as other members of the gene family are not yet established. In rice, another model plant species, OsUSP1, which is mediated by the gaseous plant hormone ethylene has been identified to potentially function in adaptation of deepwater rice plant to hypoxia.14 Additional plant USP genes have been characterized including legumes Astragalus sinicus15 and Vicia faba16,17 as well as in Gossypium arboreum (cotton).18 Recently, the USP genes of barley were identified, localized and their expression in anatomic and selected stress condition determined.19 Water-limiting condition (drought) is one of the key abiotic stresses that can adversely affect the growth, development and yield of crop and tree plants.20 Drought induces biochemical and physiological responses in plants21 including reduced photosynthetic carbon and energy metabolism22 leading to oxidative stress. High salinity is also accompanied by drought.20 Furthermore, wood production from forest trees can be hampered by drought.32,33 The ability to respond and tolerate drought stress is a desirable phenotype especially in plants that have to survive in environments with insufficient water. The molecular and cellular mechanisms for response and tolerance have been investigated using a range of powerful high-throughput genomic and proteomic techniques to dissect gene networks response to drought.22 Examples of drought-responsive USP genes have been reported in cotton18 and cowpea.23 The identification of drought responsive USP genes from multiple plants species will present an array of research tools for genetic manipulation of plants for drought tolerance. Therefore, we sought to develop a bioinformatics screening strategy to identify drought-responsive USP genes and transcripts from comprehensive protein and gene transcript databases. There continues to be an increase in number and diversity of bioinformatics resources storing functional annotation of protein-coding sequences including those containing the USP domain.24 The Pfam database of protein families represented by alignments and Hidden Markov Models contains at least 550 protein sequences from the viridiplantae (green algae and land plants) annotated to contain at least one USP domain.25 These sequences have identifiers of the Universal Protein Resource (UniProt), which is the most comprehensive catalog for protein sequence and functional annotation data.26 The UniProt entries have valued-added cross-references to external databases that provide diverse annotation including structural, gene expression, literature and sequence diversity. In addition, there are specialized plant databases not yet linked to UniProt. For example, the Phytozome resource page (http://www.phytozome.net/Phytozome_resources.php) provides links to resources for general plant genomics; gene expression; gene indices and Expressed Sequence Tags (ESTs); Arabidopsis; grass and cereals; legumes; forest trees; other plant species and plant pathogen genomics. The overall objective of the investigation was to identify diverse plant universal stress proteins and Expressed Sequence Tags (ESTs) responsive to water-deficit stress. We hypothesize that cross-database mining of functional annotations in protein and gene transcript bioinformatics resources would help identify candidate drought-responsive universal stress proteins and transcripts from multiple plant species. Among the ESTs and cDNA resources listed in Phytozome, we observed that the TIGR Plant Transcript Assemblies database (Plantta)27 had a wide collection of 254 plant species (as of July 2007). The ESTs and full-length cDNA are being used for discovery of genes in plant species as well as evidence of gene expression in conditions as well as anatomic parts. The identification of ESTs encoding universal stress proteins could facilitate further studies on selection of markers for comparative mapping, plant breeding and forward genetics.28,29 The Plantta resource contains simple sequence repeats (SSR) or microsatellite annotation for some transcripts. Microsatellites are 1–6 bp tandemly repeated DNA sequences that occupy a significant fraction of the nuclear genome of all eukaryotes.30 Microsatellites in protein-coding genes can inactivate or activate genes or truncate protein.31 In plants, microsatellites derived from EST sequences (EST-SSRs) have been proposed to be better candidates for gene tagging and are preferred over genomic-SSR markers for plant improvement programs owing to their higher interspecific transferability rate.32 Thus, we investigated the presence of SSR on transcript assemblies and singleton sequences in Plantta. Furthermore, since our primary interest was on drought-responsive genes, we sought to identify USP-annotated Plantta ESTs that contain text relevant to drought in their dbEST33 entries. The keyword search provided an indication of the experimental condition for generating the cDNA libraries. Finally, we determined the overlap of EST dataset containing SSR entries with the EST dataset annotated with drought or water stress. The bioinformatics strategy described can be adapted for analyzing a set of viridiplantae protein sequences defined by a Pfam protein domain. Furthermore, plant transcripts from other abiotic and biotic stress conditions can be mined and analyzed. In summary, we identified diverse plant universal stress proteins and transcripts responsive to drought including those that contain microsatellite markers that may regulate their function.

Methods

Construction of dataset of viridiplantae universal stress proteins

Viridiplantae proteins annotated in Pfam database25 with Pfam domain PF00582 were downloaded and computationally processed with a suite of UNIX and PERL scripts to retrieve their respective UniProt Identifiers. Subsequently, for non-obsolete or deleted UniProt entries, the protein domain architecture, organism source of sequence, protein sequence length and protein molecular weight were extracted from XML-formatted UniProt entries (UniProt release 2010_10—Oct 5, 2010). These selected annotations are typically available for UniProt entries. Overview of the USP dataset construction is illustrated in Figure 1. Analysis of the protein domain architecture annotation provided a prediction of the number of USP domains as well as additional types of protein domain(s) present.

Figure 1.

Flowchart for constructing dataset of viridiplantae universal stress proteins.

Orthologous viridiplantae drought-responsive genes encoding universal stress proteins

A UniProt entry for a protein sequence contains value-added cross-references to other databases (http://www.uniprot.org/docs/dbxref). The cross-referenced databases for each viridiplantae USP entry was computationally extracted from the XML formatted files. A non-redundant list of the databases was assembled and used to construct a presence-absence matrix consisting of rows of UniProt protein identifiers and columns of selected databases. A zero (0) was used to encode absence of cross-referencing to a database and one (1) for presence of cross-reference to a database. This matrix was then searched for USP entries with cross-reference to the Gene Expression Atlas (a subset of ArrayExpress)45 and Ortholog MAtrix Project (OMA) Browser.34 The matrix was visualized using a Linux version of matrix2png.35 The Gene Expression Atlas (GXA) stores microarray and other gene expression data and was selected because it had annotation for “Experimental Factors”, which included a subsection on “Environmental Stresses” such as drought. Furthermore, the OMA Browser allows for exploration of orthologous relations between protein sequences for 1000 species (Release of May 2010). A combination of the data from GXA and OMA allowed us to identify orthologous plant proteins in which a member has been demonstrated to be responsive to drought. Additional homologous sequences for the identified drought up-regulated USPs were retrieved from PLAZA—a resource for plant comparative genomics36 and their multiple sequence alignment generated using ClustalW2 at http://www.ebi.ac.uk/clustalw/.

Viridiplantae universal stress protein transcripts derived from drought conditions

The TIGR Plant Transcript Assemblies (Plantta; http://plantta.jcvi.org/)27 consists of a collection of transcripts (assembled ESTs and singletons) for at least 215 plant species. The content of webpage for each USP transcripts in the Plantta resource was also parsed to identify those with microsatellite (SSR) annotation. We sought to identify universal stress protein ESTs from cDNA library source derived from drought stress. The first step involved retrieving from Plantta, transcripts annotated with the text “ universal stress protein”. In the second step, all the ESTs identifiers in dbEST33 associated with the Plantta transcripts were retrieved and the entries in GenBank downloaded and searched for text “drought”. Another search strategy, the dbEST entries were searched for text “water” and then the retrieved subset searched with text “stress”. The assumption was that the presence of “drought” or combination of “water” and “stress” was indicative of a cDNA library derived from drought stress conditions. This mining of text in the dbEST entries was done to help identify universal stress protein ESTs as research tools for understanding stress response in a large number of plant species of agricultural, economic, ecological or industrial importance but without complete genome sequences.

Results

A total of 511 viridiplantae proteins annotated with universal stress protein domain (PF00582) from 43 unique taxa (NCBI Taxonomy IDs) were downloaded from UniProt on October 24, 2010 (Table 1). The protein count per taxa ranged from 1 to 88. The protein counts for Liliopsida (monocotyledons), dicotyledons, and other viridiplantae including green algae were 235, 203 and 73 respectively. Furthermore, land plants with at least 50 USP records in UniProt from the Pfam dataset were Oryza sativa subsp. japonica, Arabidopsis thaliana, Populus trichocarpa, Oryza sativa subsp. indica and Zea mays. The green algae genera represented in the dataset were Chlamydomonas, Ostreococcus and Micromonas. The sequence length ranged from 29 (A7Y7Q4) to 1223 (A8HRL3) with 251 unique lengths observed (Supplementary File 1 and Fig. 2). Finally, 39 sequences were annotated as fragments.

Table 1.

Dataset of viridiplantae universal stress proteins entries in UniProt.

Scientific name	Common name	NCBI Taxonomy ID	Number of UniProt entries	USP domain only
Oryza sativa subsp. japonica	Rice	39947	88	60
Arabidopsis thaliana	Mouse-ear cress	3702	78	53
Populus trichocarpa	Western balsam poplar	3694	59	45
Oryza sativa subsp. indica	Rice	39946	52	32
Zea mays	Maize	4577	52	45
Ricinus communis	Castor bean	3988	43	32
Picea sitchensis	Sitka spruce	3332	21	21
Physcomitrella patens	Moss	3218	18	15
Vitis vinifera	Grape	29760	18	11
Micromonas pusilla CCMP1545		564608	10	10
Medicago truncatula	Barrel medic	3880	8	8
Micromonas sp. RCC299		296587	8	8
Brassica campestris	Field mustard	3711	7	7
Chlamydomonas reinhardtii		3055	6	4
Ostreococcus lucimarinus (strain CCE9901)		436017	5	5
Ostreococcus tauri		70448	4	4
Vicia faba	Broad bean	3906	4	4
Brachypodium distachyon	Purple false brome	15368	3	1
Gossypium arboreum	Tree cotton	29729	2	2
Oryza sativa	Rice	4530	2	0
Arachis hypogaea	Peanut	3818	1	1
Astragalus sinicus	Chinese milk vetch	47065	1	1
Brachypodium sylvaticum	False brome	29664	1	0
Brassica oleracea	Chinese kale	3714	1	1
var. alboglabra
Capsicum chinense	Scotch bonnet	80379	1	1
Cicer arietinum	Chickpea	3827	1	1
Gossypium barbadense	Sea-island cotton	3634	1	1
Hordeum bulbosum	Bulbous barley	4516	1	1
Hordeum vulgare	Barley	4513	1	1
Hordeum vulgare var. distichum	Two-rowed barley	112509	1	1
Marchantia polymorpha	Liverwort	3197	1	0
Mirabilis jalapa	Garden four-o’clock	3538	1	1
Pisum sativum	Garden pea	3888	1	1
Populus trichocarpa ×	Black cottonwood ×	3695	1	1
Populus deltoids	Eastern cottonwood
Potamogeton distinctus	Roundleaf pondweed	62344	1	1
Prunus dulcis	Almond	3755	1	1
Solanum lycopersicum	Tomato	4081	1	1
Solanum tuberosum	Potato	4113	1	1
Sonneratia alba	Mangrove Apple	122812	1	1
Sonneratia apetala	Mangrove	122813	1	1
Sonneratia caseolaris	Mangrove Crabapple	122814	1	1
Sonneratia ovata	Mangrove	122816	1	1
Triticum aestivum	Wheat	4565	1	1

Figure 2.

Distribution of sequence length of 511 viridiplantae universal stress proteins.

A total of 17 Pfam protein domains arranged in 17 architectures were associated with the dataset (Table 2 and Fig. 3). Ten of the 17 protein domains occurred only in one protein, most of which are uncharacterized as with sequences from Oryza sativa subsp indica, Oryza sativa subsp japonica, Vitis vinifera and Zea mays. Two sequences in this subset had names that indicated possible function: flagellar associated protein from Chlamydomonas reinhardtii and Anti-bacterial protein from Solanum tuberosum (potato). As expected the universal stress protein family (PF00582) domain was present in all the proteins analyzed. The protein kinase domain (PF00069), U-box domain (PF04564) and protein tyrosine kinase (PF07714) were found in at least 20 proteins (Table 2). The combination of domains for the USP and the transmembrane sodium/hydrogen exchanger family (PF00999) was observed in 5 proteins: B9S492 (Ricinus communis), A5BEW1 (Vitis vinifera), B9I6U4 (Populus trichocarpa), B9INS2 (Populus trichocarpa) and A9T441 (Physcomitrella patens). A total of 387 protein sequences had only the USP domain. In a subset of 12 sequences having tandem USP domains, 9 sequences were from green algae (Table 3).

Table 2.

Distribution of protein families in viridiplantae universal stress proteins.

Pfam ID*	Pfam name	Count
PF00582	Universal stress protein family	511
PF00069	Protein kinase domain	87
PF04564	U-box domain	34
PF07714	Protein tyrosine kinase	21
PF00999	Sodium/hydrogen exchanger family	5
PF03107	C1 domain	2
PF07649	C1-like domain	2
PF00651	BTB/POZ domain	1
PF01370	NAD dependent epimerase/dehydratase family	1
PF02637	GatB domain	1
PF03061	Thioesterase superfamily	1
PF04147	Nop14-like family	1
PF04185	Phosphoesterase family	1
PF05139	Erythromycin esterase	1
PF05699	hAT family dimerisation domain	1
PF08879	WRC	1
PF08880	QLQ	1

Note:

Description of protein domains available at http://pfam.sanger.ac.uk/.

Figure 3.

Protein domain architectures, examples and counts in dataset of plant universal stress proteins. Architecture images obtained from InterPro (www.ebi.ac.uk/interpro), an integrated database of predictive protein “signatures” for protein annotation and classification. The examples are UniProt identifiers with abbreviations for the plant taxa as follows—ORYSI: Oryza sativa subsp. indica (Rice); BRASY: Brachypodium sylvaticum (False brome); ARATH: Arabidopsis thaliana (Mouse-ear cress); ORYSJ: Oryza sativa subsp. japonica (Rice); VITVI: Vitis vinifera (Grape); CHLRE: Chlamydomonas reinhardtii; SOLTU: Solanum tuberosum (Potato); PHYPA: Physcomitrella patens subsp. patens; MAIZE: Zea mays (Maize).

Table 3.

Viridiplantae universal stress proteins with tandem USP domains.

UniProt Identifier	Organism	Protein Length	Domain Coordinates
A8IXV1	Brassica campestris	220	13–56 79–198
B0YQX1	Gossypium arboreum	169	4–64 78–169
C1N7W4	Micromonas pusilla CCMP1545	343	90–160 187–322
C1MYP7	Micromonas pusilla CCMP1545	581	84–241 518–569
C1N599	Micromonas pusilla CCMP1545	396	48–102 155–249
C1E4R1	Micromonas sp” RCC299	567	295–355 431–567
C1FHK1	Micromonas sp” RCC299	267	12–93 120–256
A2ZLH5	Oryza sativa subsp” indica	320	21–155 167–312
A4RVM8	Ostreococcus lucimarinus (strain CCE9901)	274	67–164 195–250
A4RZS6	Ostreococcus lucimarinus (strain CCE9901)	401	26–215 237–378
Q015R5	Ostreococcus tauri	401	173–212 234–376
Q01BC4	Ostreococcus tauri	215	23–83 85–191

The UniProtKB database cross-references for each viridiplantae USP entry stored in the XML format were extracted to determine the availability of each database annotation across the dataset of entries. Table 4 shows databases that were used to annotate at least 100 USPs. The complete list of 45 cross-references is available in Supplementary File 1. The Gene Ontology, InterPro, NCBI Taxonomy, and Pfam were found in all the 511 UniProt entries. In order to construct a matrix, 40 of the cross-references were selected with references present in all entries removed as well as RefSeq, which had an identical number of entries with Entrez Gene database. The matrix is available in the Supplementary File 1.

Table 4.

Selected UniProt cross-reference resources linked to plant universal stress proteins.

Database	USP UniProt entry count	Database	Web server
GO	511	Gene Ontology	http://www.geneontology.org/
InterPro	511	Integrated resource of protein families, domains and functional sites	http://www.ebi.ac.uk/interpro/
NCBI Taxonomy	511	NCBI Taxonomy Database	http://www.ncbi.nlm.nih.gov/taxonomy
Pfam	511	Pfam protein domain database	http://pfam.sanger.ac.uk/
EMBL	506	EMBL nucleotide sequence database	http://www.ebi.ac.uk/embl/
ProteinModelPortal	407	Protein Model Portal of the PSI-Nature Structural Biology Knowledgebase	http://www.proteinmodelportal.org/
Gene3D	379	Gene3D Structural and Functional Annotation of Protein Families	http://gene3d.biochem.ucl.ac.uk/Gene3D/
PubMed	366	PubMed	http://www.pubmed.gov
DOI	361	Digital Object Identifier	http://www.doi.org/
GeneID	264	Database of genes from NCBI RefSeq genomes	http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene
RefSeq	264	NCBI Reference Sequences	http://www.ncbi.nlm.nih.gov/RefSeq/
EnsemblPlants	236	EnsemblPlants	http://plants.ensembl.org/
KEGG	234	KEGG: Kyoto Encyclopedia of Genes and Genomes	http://www.genome.jp/kegg/
PRINTS	227	Protein Motif fingerprint database; a protein domain database	http://umber.sbs.man.ac.uk/dbbrowser/PRINTS/
UniGene	163	UniGene gene-oriented nucleotide sequence clusters	http://www.ncbi.nlm.nih.gov/sites/entrez?db=UniGene
SMR	128	SWISS-MODEL Repository—a database of annotated 3D protein structure models	http://swissmodel.expasy.org/repository/
HOGENOM	116	The HOGENOM Database of Homologous Genes from Fully Sequenced Organisms	http://pbil.univ-lyon1.fr/databases/hogenom.php
PROSITE	110	PROSITE; a protein domain and family database	http://www.expasy.org/prosite/
SUPFAM	110	Superfamily database of structural and functional annotation	http://supfam.org
ProtClustDB	108	Entrez Protein Clusters	http://www.ncbi.nlm.nih.gov/sites/entrez?db=proteinclusters

Twelve USP sequences were annotated with both the ArrayExpress and Ortholog Matrix Project (OMA) Browser (Fig. 4). Three Arabidopsis USP genes (Q93W91 [At3g62550], Q9LPF5 [At1g44760] and Q9M328 [AT3g53990]) were up regulated in a drought microarray experiment stored in ArrayExpress and were annotated in the OMA Browser. Box plots of the three genes obtained from ArrayExpess as well as multiple sequence alignment of orthologs are presented in Figure 5. The OMA Browser provides multiple sequence alignment for groups of orthologs for each protein sequence (Fig. 5). Orthologous sequences were from Oryza sativa, Sorghum bicolor, Populus trichocarpa and Vitis vinifera.

Figure 4.

Visualization of matrix of availability of annotation with 40 external database references for selected plant universal stress proteins in UniProt. Description of column headings is documented in Supplementary File 1.

Notes: Red, presence of database annotation; Green, absence of database annotation.

Figure 5.

Gene expression and protein sequence alignment of Arabidopsis thaliana USPs up-regulated in response to drought. Detail gene expression and protein sequence alignment can be obtained by using the following weblinks respectively by replacing the with the UniProt protein identifier.

http://www.ebi.ac.uk/gxa/experiment/E-MEXP-1863/.

http://omabrowser.org/cgi-bin/gateway.pl?f=DisplayGroup&p1=.

Visual inspection of the alignments showed that the G-2X-G-9X-G (S/T) motif for small phosphoryl/ribosyl-binding residues of Adenosine Triphosphate (ATP)49 was present in Q9M328 and Q93W91 but absent in Q9LPF5. Additional homologous sequences for the drought-responsive proteins provided by PLAZA36 and ClustalW2 generated sequence alignments can be found in the Supplementary File 2. The multiple sequence alignment for 16 homologous sequences including drought responsive ATP-binding motif containing At3g62550 is presented in Figure 6. The conserved Aspartate (D) residue in position 12 of At3g62550 is known to be involved in adenine binding in ATP-binding USPs.15,50

Figure 6.

Multiple sequence alignment of drought-responsive Arabidopsis thaliana universal stress protein At3g53990 and homologs. The conserved Aspartate (D) residue in position 12 of At3g62550 (marked with +) is known to be involved in adenine binding in ATP-binding USPs.12,44 The region for small phosphoryl/ribosyl-binding residues of ATP is indicated with a series of #. The first two letters of the sequence name correspond to the plant: AL, Arabidopsis lyrata; AT, Arabidopsis thaliana; BD, Brachypodium distachyon; CP, Carica papaya; GM, Glycine max; MD, Malus domestica; ME, Manihot esculenta; MT, Medicago truncatula; OS, Oryza sativa ssp. Japonica; OSAINDICA, Oryza sativa ssp. Indica; PT, Populus trichocarpa; RC, Ricinus communis; SB, Sorghum bicolor; VV, Vitis vinifera.

Viridiplantae universal stress protein gene transcripts derived from drought conditions

A total of 1561 ESTs clustered into 360 singletons and 185 Transcript Assembles from 137 unique viridiplantae members (82 genera) and annotated with text “universal stress protein” were obtained from the TIGR Plant Transcript Assemblies (Supplementary File 1). Triticum aestivum (bread wheat), Oryza sativa Japonica Group and Glycine max (soybean) had at least 100 ESTs annotated as encoding universal stress proteins. The 82 plant genera represented in the universal stress protein gene transcript dataset were clustered according to number of species or species combination (Table 5).

Table 5.

Plant genera represented in universal stress protein gene transcripts dataset.

Genus	Number of species
Populus	9
Helianthus	6
Citrus	5
Oryza Picea	4
Fragaria Gossypium Lactuca Medicago Nicotiana Prunus Saccharum Sorghum Triticum	3
Agrostis Apium Arachis Centaurea Euphorbia Hordeum Petunia Phaseolus Pinus Pseudotsuga Rosa Solanum Taraxacum Vitis	2
Aegilops Allium Ananas Antirrhinum Avena Avicennia Brachypodium Brassica Capsicum Catharanthus Ceratopteris Chlamydomonas Cichorium Coffea Curcuma Cyamopsis Cycas Eragrostis Eucalyptus Festuca Ginkgo Glycine Hedyotis Ipomoea Juglans Lolium Lotus Malus Manihot Marchantia Mesembryanthemum Mesostigma Mimulus Panax Panicum Pennisetum Phalaenopsis Physcomitrella Pisum Rhododendron Ricinus Salvia Secale Selaginella Sesamum Syntrichia Thellungiella Theobroma Vaccinium Welwitschia Zamia Zantedeschia Zingiber Zinnia	1

A dataset of 80 simple sequence repeats (SSRs) linked to 20 singletons and 47 transcript assembles was constructed (Supplementary File 1). A total of 31 types of SSRs (3 uninucleotide; 7 dinucleotides; 16 trinucleotides; 1 tetranucleotide; 3 pentanucleotides; and 1 hexa-nucleotides) were retrieved (Table 6). The transcript count associated with each SSRs was also determined to identify potential unique EST-SSR markers. For example, the dinucleotide TA was unique for singleton DY959747 from Lactuca sativa (lettuce). The suggested primers for the identified EST-SSRs are available from the Plantta website at http://planta.jcvi.org/.

Table 6.

Simple Sequence Repeats (SSR) linked to universal stress protein gene transcripts.

SSR	Universal stress protein gene transcripts	Nucleotide count	Transcript count
A	CN463769 TA10796_2711 TA10796_2711 TA33493_4530 TA36088_4113 TA36088_4113 TA70577_4565	1	7
T	CI395583 EC939973 TA70577_4565 TA72057_3847	1	4
G	TA70577_4565	1	1
AG	DW142996 TA1316_69721 TA35794_29760 TA4456_3988	2	4
AT	TA1967_153471 TA2761_80863 TA3030_71647	2	3
GA	TA14075_2711 TA3367_309804	2	2
TG	TA49270_4530 TA49271_4530	2	2
CT	TA16418_3330	2	1
GT	TA33493_4530	2	1
TA	DY959747	2	1
CAG	AJ610677 AL825196 CJ661413 CJ668094 TA52848_4565 TA53203_4565 TA53303_4565 TA53312_4565 TA53408_4565	3	9
CGC	CA181583 CI776988 DT694744 TA26627_4558 TA32279_4513 TA33491_4530 TA36096_4547 TA3991_132711	3	8
CCG	TA2412_4568 TA41380_4513 TA41381_4513 TA41598_4513 TA55332_4565 TA55439_4565	3	6
AAG	DY942109 DY953240 TA10188_4232 TA1962_3197	3	4
GAA	DT694744 DY923603 EE657448 TA25103_4558	3	4
ATA	BQ473543 TA48543_3847 TA48544_3847	3	3
GGT	CI602544 TA49270_4530 TA49271_4530	3	3
CGT	TA46082_3847 TA48499_4547	3	2
GCA	CD220323 TA25103_4558	3	2
GGC	TA2176_4120 TA51553_4530	3	2
AGG	TA75441_4530	3	1
ATT	TA36088_4113	3	1
CGA	DT694744	3	1
CGG	TA1497_94328	3	1
GAG	TA15268_29730	3	1
TGT	DR575687	3	1
TTGT	TA3984_73275	4	1
AAAAT	TA762_4615	5	1
CACCC	TA32279_4513	5	1
TTTAA	TA3585_36596	5	1
GCGGCT	TA41381_4513	6	1

The bioinformatics strategy retrieved 32 drought-responsive ESTs from 7 plant genera Glycine, Hordeum, Manihot, Medicago, Oryza, Pinus and Triticum (Table 7). Furthermore, the strategy revealed differentially expressed ESTs. In domesticated barley, two ESTs BM369974 and BQ761388 were expressed in the root while CD662497 was expressed in the lower leaf epidermis. In rice, two ESTs CK665047 and CA764828 were expressed in drought stressed leaf and drought stress panicle respectively. Integrating the datasets on SSRs and drought-responsive ESTs identified three drought-responsive ESTs from Triticum aestivium (BE604157), Glycine max (BM887317) and Pinus pinaster (BX682209) (Table 8). The SSR sequence types were (CAG)4, (ATA)4 and (AT)5 respectively.

Table 7.

Drought-annotated plant Expressed Sequence Tags (ESTs)

EST	Plant	Source of EST library
BM886962	Glycine max (soybean)	*
BM887317	Glycine max (soybean)	*
CD662497	Hordeum vulgare subsp. vulgare (domesticated barley)	Lower leaf epidermis
BM369974	Hordeum vulgare subsp. vulgare (domesticated barley)	Root
BQ761388	Hordeum vulgare subsp. vulgare (domesticated barley)	Root
DV442544	Manihot esculenta (cassava)	**
DV442765	Manihot esculenta (cassava)	**
DV443464	Manihot esculenta (cassava)	**
DV444643	Manihot esculenta (cassava)	**
DV446035	Manihot esculenta (cassava)	**
DV446427	Manihot esculenta (cassava)	**
DV447334	Manihot esculenta (cassava)	**
DV454753	Manihot esculenta (cassava)	***
DV455089	Manihot esculenta (cassava)	***
DV455235	Manihot esculenta (cassava)	***
DV455909	Manihot esculenta (cassava)	***
DV456031	Manihot esculenta (cassava)	***
DV456176	Manihot esculenta (cassava)	***
DV456576	Manihot esculenta (cassava)	***
DV456911	Manihot esculenta (cassava)	***
DV457684	Manihot esculenta (cassava)	***
BE248764	Medicago truncatula (barrel medic)	Plantlets
BF631735	Medicago truncatula (barrel medic)	Plantlets
BF634145	Medicago truncatula (barrel medic)	Plantlets
BF634785	Medicago truncatula (barrel medic)	Plantlets
CK665047	Oryza sativa Indica Group	Leaf
CA764828	Oryza sativa Indica Group	Panicles
BX680935	Pinus pinaster	Root
BX682209	Pinus pinaster	Root
BE604157	Triticum aestivum (bread wheat)	Leaf
BE428779	Triticum turgidum subsp. durum (durum wheat)	Root
BE429106	Triticum turgidum subsp. durum (durum wheat)	Root

Notes:

Leaf, drought stressed, 1 month old plants, greenhouse grown;

Mature leaf and petiole, young leaf and apical meristem, root, tuber and tuber peel, young leaf and apical meristem midnight;

Young leaf and apical meristem, mature leaf and petiole, root, tuber and tuber peel from water stressed plants.

Table 8.

Drought–responsive Expressed Sequence Tags (ESTs) with microsatellites.

EST	Plant	Tissue	Plantta TA	Plantta SSR ID	SSR	Number of repeats	Transcript length (bp)	Start	End
BE604157	Triticum aestivum (bread wheat)	Leaf	TA53312_4565	233072	CAG	4	810	205	216
BM887317	Glycine max (soybean)	Leaf, drought stressed, 1 month old plants, greenhouse grown	TA48544_3847	815126	ATA	4	889	544	555
BX682209	Pinus pinaster (maritime pine)	Root	TA3030_71647	751586	AT	5	414	199	208

Discussion

Plants are continuously exposed to abiotic and biotic stresses that require adaptation for survival. The availability of genomic sequences from a variety of viridiplantae has facilitated the dissection of the molecular, cellular and developmental responses to environmental stresses including drought.37 Our investigation demonstrates the benefits of integrating data on universal stress proteins from comprehensive protein and transcript databases. The value-added and prioritized datasets produced presents new opportunities to better investigate the function of universal stress proteins from diverse plants. According to the focus of the investigation, the protein and gene transcript datasets are discussed in the context of response to drought and salt stress. We have retrieved, mined and integrated comprehensive functional annotation data on 511 universal stress protein and 1561 ESTs sequences from the viridiplantae. A total of 161 plants with unique NCBI Taxonomy Identifier were associated with the sequences. Thus, we have provided a catalog of protein and gene transcripts from model and non-model plant species those of importance in agriculture, ecology, industry and alternative energy. A catalog limited to Arabidopsis universal stress proteins has been published.13 The cross-database references available in our investigation present other researchers with a “one-stop-shopping” for sequences information on viridiplantae universal stress proteins. The bioinformatics strategy extracted functional annotation data from comprehensive public domain protein and gene transcript databases. The Pfam protein family database36 served as the source of protein sequences for which their functional annotation data in the UniProt protein resource26 were extracted and integrated with other specialized databases including those storing data on gene expression38 and protein sequence evolution.34 We also extracted functional annotation data from the Plantta EST resource, since ESTs are a source of genomic information especially for plants without complete genome sequencing projects. The bioinformatics approach presented could be useful for other researchers interested in other protein families. The particular function of a protein depends on its combination of domains. In general, the presence of the USP domain may provide the ability for the function of the other domain to be expressed under stress conditions. The USP domain appears as a single domain in small USP proteins (∼14–15 kDa), as two domains arranged in tandem in larger USP proteins (∼30 kDa), or as one or two USP domains together with other functional domains.9,13 Our analysis extracted and organized the domain combinations present in the 511 plant USPs thereby providing function-categorized subsets of the dataset. The categories can be investigated for shared function and regulation. Protein phosphorylation by kinases is a known pathway utilized by plants to response to osmotic stress.52,53 Five proteins had annotation for the sodium/hydrogen exchanger family domain (PF00999), a domain for transport of sodium ions either out of cell or organelles in exchange for hydrogen ions to prevent toxic accumulation of sodium ions.54,55 The Arabidopsis gene encoding Na+/H+ exchanger termed salt overly sensitive (SOS1) is an important determinant of salt tolerance.39 The list of uncharacterized proteins with both USP and Na_H_Exchanger included protein A9T441 from the moss Physcomitrella patens, the oldest clade of land plants40 and that is highly tolerant against hyper salinity and severe water limitations.41 The 18 P. patens USPs in the dataset warrants further investigation to understand the evolution of USPs from small land plants to higher plants after 450 years. The recognition of P. patens has a versatile tool for plant functional genomics could accelerate additional research of benefit to higher plants of importance in agriculture (eg, grapevine), industry (eg, castor plant) and cellulosic biofuels (eg, poplar). Nine of the 12 protein sequences with tandem USP domains were from green algae. There are currently a limited number of reports on functional characterization of proteins with tandem USP domains.10,42,43 In Escherichia coli, mutants of UspE that contain tandem USP domains were unable to form cell-cell interactions and cell aggregates in stationary phase. In Mycobacterium tuberculosis, which has 8 of its 10 USPs having tandem domains, Rv2623 has growth-regulating capability linked to ATP-binding.42 A recent investigation observed higher degree of sequence identity between tandem domains in prokaryotes compared to eukaryotes.44 The dataset analyzed did not including tandem USP domains. A starting point for characterization of tandem USP domain of plants could be to determine the sequence conservation between the domains. Cross-referencing of specialized databases to a protein sequence entry in UniProtKB provides additional functional annotation that can help accelerate selection of plant USPs for characterization. The UniProtKB provides links to at least 126 specialized resources including plant bioinformatics databases such as The Arabidopsis Resource (TAIR),45 Gramene,46 and EnsemblPlants.47 We have integrated available database cross-references to provide a visual view of databases across the viridiplantae USPs analyzed. The utility of such view was demonstrated on a subset of proteins that were annotated with ArrayExpress45 and Ortholog MAtrix Project (OMA) Browser.34 This view enabled us to easily identify Q9SW11 (U-box domain-containing protein 35; At4g25160, PUB35) as an enzyme based on the presence of the Enzyme Commission (EC) number (Fig. 4: Column 4, Row 10). The U-box domain for regulated protein ubiquitination and degradation is a modified RING-finger domain involved in protein that lacks metal-binding ability.48 Comparative structural and functional assays could reveal the interactions of the USP domain and the enzyme domains present in Q9SW11. Orthologous drought-responsive universal stress proteins could be candidates to engineer desired phenotypes in plants. Our analyses identified three Arabidopsis proteins (Fig. 5) and their orthologs in Oryza sativa, Sorghum bicolor, Populus trichocarpa and Vitis vinifera. Q9M328 and Q93W91 and their homologs could be regulated by ATP based on the presence of ATP-binding motif (Fig. 6). Expressed Sequence Tags generated from stress-challenged plant tissues have been used as high quality transcripts to discover genes, identify candidate stress-responsive genes/transcripts and identify functional markers such as genic microsatellites and single nucleotide polymorphisms.49–51 The effects of SSR type as well as number of repeats on gene regulation, transcription and protein function are poorly understood in plants when compared to human or animal systems.51 In this article we report automatic extraction of information on simple sequence repeats (SSRs) associated with 1561 ESTs in the Plantta resource.27 Our analysis identified candidate USP gene transcripts in multiple plants (Supplementary File 1 and Table 5); organized the SSRs into types (Table 6), drought-annotated USP ESTs (Table 7) and USP EST-SSRs from drought-stress tissues (Table 8). The majority (49 of 80) of the USP EST-SSRs was the trinucleotide type, which has been reported to be the most abundant in rice, wheat and barley52,53 as well as peanut54 and citrus.55 All together, our analyses provide a comprehensive collection of USP ESTs including those responsive to drought. We have clustered the plant genera based on the number of species to facilitate investigating the EST-SSR and EST-Single Nucleotide Polymorphisms (SNPs) in USP genes for comparative mapping, transferability, genetic diversity and plant improvement.

Conclusions

The molecular mechanisms by which genes encoding the universal stress protein domain are able to confer in plants the ability to respond and adapt to environmental changes are not well defined. We have computationally retrieved, mined and integrated functional annotations on protein and gene transcripts that encode the universal stress protein domain. The datasets from cross-database mining provide organized resources for the characterization of USP genes as useful targets for engineering plant varieties tolerant to unfavorable environmental conditions.

54 in total

1. Structure of the universal stress protein of Haemophilus influenzae.

Authors: M C Sousa; D B McKay
Journal: Structure Date: 2001-12 Impact factor: 5.006

2. Increased sequence conservation of domain repeats in prokaryotic proteins.

Authors: Dan Reshef; Zohar Itzhaki; Ora Schueler-Furman
Journal: Trends Genet Date: 2010-07-17 Impact factor: 11.639

3. Universal stress proteins in Escherichia coli.

Authors: Deborah A Siegele
Journal: J Bacteriol Date: 2005-09 Impact factor: 3.490

Review 4. Understanding regulatory networks and engineering for enhanced drought tolerance in plants.

Authors: Babu Valliyodan; Henry T Nguyen
Journal: Curr Opin Plant Biol Date: 2006-02-17 Impact factor: 7.834

5. A novel nodule-enhanced gene encoding a putative universal stress protein from Astragalus sinicus.

Authors: Min-Xia Chou; Xin-Yuan Wei; Da-Song Chen; Jun-Chu Zhou
Journal: J Plant Physiol Date: 2006-08-01 Impact factor: 3.549

6. The nodulin vfENOD18 is an ATP-binding protein in infected cells of Vicia faba L. nodules.

Authors: J D Becker; L M Moreira; D Kapp; S C Frosch; A Pühler; A M Perlic
Journal: Plant Mol Biol Date: 2001-12 Impact factor: 4.076

7. Ensembl Genomes: extending Ensembl across the taxonomic space.

Authors: P J Kersey; D Lawson; E Birney; P S Derwent; M Haimel; J Herrero; S Keenan; A Kerhornou; G Koscielny; A Kähäri; R J Kinsella; E Kulesha; U Maheswari; K Megy; M Nuhn; G Proctor; D Staines; F Valentin; A J Vilella; A Yates
Journal: Nucleic Acids Res Date: 2009-11-01 Impact factor: 16.971

8. Providing web servers and training in Bioinformatics: 2010 update on the Bioinformatics Links Directory.

Authors: Michelle D Brazas; Joseph T Yamada; B F Francis Ouellette
Journal: Nucleic Acids Res Date: 2010-06-11 Impact factor: 16.971

9. Metabolic control of the Escherichia coli universal stress protein response through fructose-6-phosphate.

Authors: Orjan Persson; Asa Valadi; Thomas Nyström; Anne Farewell
Journal: Mol Microbiol Date: 2007-07-19 Impact factor: 3.501

10. EST and EST-SSR marker resources for Iris.

Authors: Shunxue Tang; Rebecca A Okashah; Marie-Michele Cordonnier-Pratt; Lee H Pratt; Virgil Ed Johnson; Christopher A Taylor; Michael L Arnold; Steven J Knapp
Journal: BMC Plant Biol Date: 2009-06-10 Impact factor: 4.215

26 in total

1. Draft genome sequence of pigeonpea (Cajanus cajan), an orphan legume crop of resource-poor farmers.

Authors: Rajeev K Varshney; Wenbin Chen; Yupeng Li; Arvind K Bharti; Rachit K Saxena; Jessica A Schlueter; Mark T A Donoghue; Sarwar Azam; Guangyi Fan; Adam M Whaley; Andrew D Farmer; Jaime Sheridan; Aiko Iwata; Reetu Tuteja; R Varma Penmetsa; Wei Wu; Hari D Upadhyaya; Shiaw-Pyng Yang; Trushar Shah; K B Saxena; Todd Michael; W Richard McCombie; Bicheng Yang; Gengyun Zhang; Huanming Yang; Jun Wang; Charles Spillane; Douglas R Cook; Gregory D May; Xun Xu; Scott A Jackson
Journal: Nat Biotechnol Date: 2011-11-06 Impact factor: 54.908

2. A Universal Stress Protein Involved in Oxidative Stress Is a Phosphorylation Target for Protein Kinase CIPK6.

Authors: Emilio Gutiérrez-Beltrán; José María Personat; Fernando de la Torre; Olga Del Pozo
Journal: Plant Physiol Date: 2016-11-29 Impact factor: 8.340

3. Populus euphratica: the transcriptomic response to drought stress.

Authors: Sha Tang; Haiying Liang; Donghui Yan; Ying Zhao; Xiao Han; John E Carlson; Xinli Xia; Weilun Yin
Journal: Plant Mol Biol Date: 2013-07-16 Impact factor: 4.076

4. Identification of salt treated proteins in sorghum using gene ontology linkage.

Authors: Manoj Kumar Sekhwal; Ajit Kumar Swami; Renu Sarin; Vinay Sharma
Journal: Physiol Mol Biol Plants Date: 2012-07

5. In-depth proteome analysis of the rubber particle of Hevea brasiliensis (para rubber tree).

Authors: Longjun Dai; Guijuan Kang; Yu Li; Zhiyi Nie; Cuifang Duan; Rizhong Zeng
Journal: Plant Mol Biol Date: 2013-04-04 Impact factor: 4.076

6. The Gene Encoding the Universal Stress Protein AtUSP is Regulated by Phytohormones and Involved in Seed Germination of Arabidopsis thaliana.

Authors: D S Gorshkova; I A Getman; A S Voronkov; S I Chizhova; Vl V Kuznetsov; E S Pojidaeva
Journal: Dokl Biochem Biophys Date: 2018-05-19 Impact factor: 0.788

7. A gene-phenotype network based on genetic variability for drought responses reveals key physiological processes in controlled and natural environments.

Authors: David Rengel; Sandrine Arribat; Pierre Maury; Marie-Laure Martin-Magniette; Thibaut Hourlier; Marion Laporte; Didier Varès; Sébastien Carrère; Philippe Grieu; Sandrine Balzergue; Jérôme Gouzy; Patrick Vincourt; Nicolas B Langlade
Journal: PLoS One Date: 2012-10-08 Impact factor: 3.240

8. SpUSP, an annexin-interacting universal stress protein, enhances drought tolerance in tomato.

Authors: Rachid Loukehaich; Taotao Wang; Bo Ouyang; Khurram Ziaf; Hanxia Li; Junhong Zhang; Yongen Lu; Zhibiao Ye
Journal: J Exp Bot Date: 2012-08-21 Impact factor: 6.992

9. Inferences on the biochemical and environmental regulation of universal stress proteins from Schistosomiasis parasites.

Authors: Andreas N Mbah; Ousman Mahmud; Omotayo R Awofolu; Raphael D Isokpehi
Journal: Adv Appl Bioinform Chem Date: 2013-05-10

10. Functional Annotation Analytics of Bacillus Genomes Reveals Stress Responsive Acetate Utilization and Sulfate Uptake in the Biotechnologically Relevant Bacillus megaterium.

Authors: Baraka S Williams; Raphael D Isokpehi; Andreas N Mbah; Antoinesha L Hollman; Christina O Bernard; Shaneka S Simmons; Wellington K Ayensu; Bianca L Garner
Journal: Bioinform Biol Insights Date: 2012-11-21