Literature DB >> 21423406

Identification of drought-responsive universal stress proteins in viridiplantae.

Raphael D Isokpehi1, Shaneka S Simmons, Hari H P Cohly, Stephen I N Ekunwe, Gregorio B Begonia, Wellington K Ayensu.   

Abstract

Genes encoding proteins that contain the universal stress protein (USP) domain are known to provide bacteria, archaea, fungi, protozoa, and plants with the ability to respond to a plethora of environmental stresses. Specifically in plants, drought tolerance is a desirable phenotype. However, limited focused and organized functional genomic datasets exist on drought-responsive plant USP genes to facilitate their characterization. The overall objective of the investigation was to identify diverse plant universal stress proteins and Expressed Sequence Tags (ESTs) responsive to water-deficit stress. We hypothesize that cross-database mining of functional annotations in protein and gene transcript bioinformatics resources would help identify candidate drought-responsive universal stress proteins and transcripts from multiple plant species. Our bioinformatics approach retrieved, mined and integrated comprehensive functional annotation data on 511 protein and 1561 ESTs sequences from 161 viridiplantae taxa. A total of 32 drought-responsive ESTs from 7 plant genera Glycine, Hordeum, Manihot, Medicago, Oryza, Pinus and Triticum were identified. Two Arabidopsis USP genes At3g62550 and At3g53990 that encode ATP-binding motif were up-regulated in a drought microarray dataset. Further, a dataset of 80 simple sequence repeats (SSRs) linked to 20 singletons and 47 transcript assembles was constructed. Integrating the datasets on SSRs and drought-responsive ESTs identified three drought-responsive ESTs from bread wheat (BE604157), soybean (BM887317) and maritime pine (BX682209). The SSR sequence types were CAG, ATA and AT respectively. The datasets from cross-database mining provide organized resources for the characterization of USP genes as useful targets for engineering plant varieties tolerant to unfavorable environmental conditions.

Entities:  

Keywords:  Pfam; Uniprot; drought; expressed sequence tags; microsatellite; plants; salinity; simple sequence repeats; universal stress protein domain; viridiplantae

Year:  2011        PMID: 21423406      PMCID: PMC3045048          DOI: 10.4137/BBI.S6061

Source DB:  PubMed          Journal:  Bioinform Biol Insights        ISSN: 1177-9322


Introduction

Environmental stresses can negatively impact agricultural crop yield and quality.1,2 As an adaptive strategy, plant genomes encode genes that produce proteins that function in stress response and tolerance.3–5 Despite substantial research on response to abiotic and biotic stresses by plants, there are still knowledge gaps regarding the molecular mechanisms that regulate the diverse functions of environmental stress-associated plant genes and proteins.3 The increasing availability of genomic sequences of members of the viridiplantae (green algae and land plants) in combination with high-throughput bioinformatics tools and databases4,5 provide new opportunities for examining understudied gene families that could be central to stress response in plants. Genes encoding proteins that contain the conserved 140–160 residues Universal Stress Protein (USP) domain (Pfam Accession: PF00582) are known to provide bacteria, archaea, fungi, protozoa, and plants with the ability to respond to a plethora of environmental stresses.6–9 Nutrient starvation, drought, high salinity, extreme temperatures and exposure to toxic chemicals are examples of conditions that induce expression of genes with the USP domain. Proteins containing domain PF00582 are often collectively referred to as universal stress proteins. In Escherichia coli, the USPs have been grouped into four classes according to their structural analysis and amino acid sequence—Class I: UspA, UspC, UspD; Class II: UspF and UspG; and Class III and Class IV: Two Usp domains of UspE.10 The UspA domain of MJ0577 (also called 1MJH) from Methanocaldococcus jannaschii crystallizes with a bound ATP while the UspA domain of Haemophilus influenzae lacks both ATP-binding activity and ATP-binding residues.11,12 Structural alignment has shown that the second and third conserved glycines in the polypeptide of the ATP-binding loop G-2X-G-9X-G-(S/T) in 1MJH are replaced by bulky amino acids glutamine and methionine in UspA.11 The suggested ancestral function of the universal stress protein domain was nucleotide binding and signal transduction.16 Despite the knowledge of bacterial USP proteins, the functional diversity of the USPs in other organisms, including various plant species needs to be better defined.17,18 Kerk et al13 examined the sequence and structure of 44 Arabidopsis thaliana proteins containing similarity to the USP domain of bacteria and concluded that all Arabidopsis USPA domain-containing sequences have evolved from a 1MJH-like ancestor. Since, the publication, 13 there has been additional but limited studies aimed at understanding the function of universal stress proteins of A. thaliana.20–24 For example, AT5G54430 (AtPHOS32) and AT4G27320 (AtPHOS34) were shown to be phosphorylated in response to microbial elicitation of Arabidopsis cells.21,23 In addition, AtPHOS32 was proved to be a new substrate of the stress-regulated mitogen-activated protein kinases (MAPKs), AtMPK3 (AT3G45640) and AtMPK6 (AT2G43790). However, the precise functions of these two Arabidopsis USP as well as other members of the gene family are not yet established. In rice, another model plant species, OsUSP1, which is mediated by the gaseous plant hormone ethylene has been identified to potentially function in adaptation of deepwater rice plant to hypoxia.14 Additional plant USP genes have been characterized including legumes Astragalus sinicus15 and Vicia faba16,17 as well as in Gossypium arboreum (cotton).18 Recently, the USP genes of barley were identified, localized and their expression in anatomic and selected stress condition determined.19 Water-limiting condition (drought) is one of the key abiotic stresses that can adversely affect the growth, development and yield of crop and tree plants.20 Drought induces biochemical and physiological responses in plants21 including reduced photosynthetic carbon and energy metabolism22 leading to oxidative stress. High salinity is also accompanied by drought.20 Furthermore, wood production from forest trees can be hampered by drought.32,33 The ability to respond and tolerate drought stress is a desirable phenotype especially in plants that have to survive in environments with insufficient water. The molecular and cellular mechanisms for response and tolerance have been investigated using a range of powerful high-throughput genomic and proteomic techniques to dissect gene networks response to drought.22 Examples of drought-responsive USP genes have been reported in cotton18 and cowpea.23 The identification of drought responsive USP genes from multiple plants species will present an array of research tools for genetic manipulation of plants for drought tolerance. Therefore, we sought to develop a bioinformatics screening strategy to identify drought-responsive USP genes and transcripts from comprehensive protein and gene transcript databases. There continues to be an increase in number and diversity of bioinformatics resources storing functional annotation of protein-coding sequences including those containing the USP domain.24 The Pfam database of protein families represented by alignments and Hidden Markov Models contains at least 550 protein sequences from the viridiplantae (green algae and land plants) annotated to contain at least one USP domain.25 These sequences have identifiers of the Universal Protein Resource (UniProt), which is the most comprehensive catalog for protein sequence and functional annotation data.26 The UniProt entries have valued-added cross-references to external databases that provide diverse annotation including structural, gene expression, literature and sequence diversity. In addition, there are specialized plant databases not yet linked to UniProt. For example, the Phytozome resource page (http://www.phytozome.net/Phytozome_resources.php) provides links to resources for general plant genomics; gene expression; gene indices and Expressed Sequence Tags (ESTs); Arabidopsis; grass and cereals; legumes; forest trees; other plant species and plant pathogen genomics. The overall objective of the investigation was to identify diverse plant universal stress proteins and Expressed Sequence Tags (ESTs) responsive to water-deficit stress. We hypothesize that cross-database mining of functional annotations in protein and gene transcript bioinformatics resources would help identify candidate drought-responsive universal stress proteins and transcripts from multiple plant species. Among the ESTs and cDNA resources listed in Phytozome, we observed that the TIGR Plant Transcript Assemblies database (Plantta)27 had a wide collection of 254 plant species (as of July 2007). The ESTs and full-length cDNA are being used for discovery of genes in plant species as well as evidence of gene expression in conditions as well as anatomic parts. The identification of ESTs encoding universal stress proteins could facilitate further studies on selection of markers for comparative mapping, plant breeding and forward genetics.28,29 The Plantta resource contains simple sequence repeats (SSR) or microsatellite annotation for some transcripts. Microsatellites are 1–6 bp tandemly repeated DNA sequences that occupy a significant fraction of the nuclear genome of all eukaryotes.30 Microsatellites in protein-coding genes can inactivate or activate genes or truncate protein.31 In plants, microsatellites derived from EST sequences (EST-SSRs) have been proposed to be better candidates for gene tagging and are preferred over genomic-SSR markers for plant improvement programs owing to their higher interspecific transferability rate.32 Thus, we investigated the presence of SSR on transcript assemblies and singleton sequences in Plantta. Furthermore, since our primary interest was on drought-responsive genes, we sought to identify USP-annotated Plantta ESTs that contain text relevant to drought in their dbEST33 entries. The keyword search provided an indication of the experimental condition for generating the cDNA libraries. Finally, we determined the overlap of EST dataset containing SSR entries with the EST dataset annotated with drought or water stress. The bioinformatics strategy described can be adapted for analyzing a set of viridiplantae protein sequences defined by a Pfam protein domain. Furthermore, plant transcripts from other abiotic and biotic stress conditions can be mined and analyzed. In summary, we identified diverse plant universal stress proteins and transcripts responsive to drought including those that contain microsatellite markers that may regulate their function.

Methods

Construction of dataset of viridiplantae universal stress proteins

Viridiplantae proteins annotated in Pfam database25 with Pfam domain PF00582 were downloaded and computationally processed with a suite of UNIX and PERL scripts to retrieve their respective UniProt Identifiers. Subsequently, for non-obsolete or deleted UniProt entries, the protein domain architecture, organism source of sequence, protein sequence length and protein molecular weight were extracted from XML-formatted UniProt entries (UniProt release 2010_10—Oct 5, 2010). These selected annotations are typically available for UniProt entries. Overview of the USP dataset construction is illustrated in Figure 1. Analysis of the protein domain architecture annotation provided a prediction of the number of USP domains as well as additional types of protein domain(s) present.
Figure 1.

Flowchart for constructing dataset of viridiplantae universal stress proteins.

Orthologous viridiplantae drought-responsive genes encoding universal stress proteins

A UniProt entry for a protein sequence contains value-added cross-references to other databases (http://www.uniprot.org/docs/dbxref). The cross-referenced databases for each viridiplantae USP entry was computationally extracted from the XML formatted files. A non-redundant list of the databases was assembled and used to construct a presence-absence matrix consisting of rows of UniProt protein identifiers and columns of selected databases. A zero (0) was used to encode absence of cross-referencing to a database and one (1) for presence of cross-reference to a database. This matrix was then searched for USP entries with cross-reference to the Gene Expression Atlas (a subset of ArrayExpress)45 and Ortholog MAtrix Project (OMA) Browser.34 The matrix was visualized using a Linux version of matrix2png.35 The Gene Expression Atlas (GXA) stores microarray and other gene expression data and was selected because it had annotation for “Experimental Factors”, which included a subsection on “Environmental Stresses” such as drought. Furthermore, the OMA Browser allows for exploration of orthologous relations between protein sequences for 1000 species (Release of May 2010). A combination of the data from GXA and OMA allowed us to identify orthologous plant proteins in which a member has been demonstrated to be responsive to drought. Additional homologous sequences for the identified drought up-regulated USPs were retrieved from PLAZA—a resource for plant comparative genomics36 and their multiple sequence alignment generated using ClustalW2 at http://www.ebi.ac.uk/clustalw/.

Viridiplantae universal stress protein transcripts derived from drought conditions

The TIGR Plant Transcript Assemblies (Plantta; http://plantta.jcvi.org/)27 consists of a collection of transcripts (assembled ESTs and singletons) for at least 215 plant species. The content of webpage for each USP transcripts in the Plantta resource was also parsed to identify those with microsatellite (SSR) annotation. We sought to identify universal stress protein ESTs from cDNA library source derived from drought stress. The first step involved retrieving from Plantta, transcripts annotated with the text “ universal stress protein”. In the second step, all the ESTs identifiers in dbEST33 associated with the Plantta transcripts were retrieved and the entries in GenBank downloaded and searched for text “drought”. Another search strategy, the dbEST entries were searched for text “water” and then the retrieved subset searched with text “stress”. The assumption was that the presence of “drought” or combination of “water” and “stress” was indicative of a cDNA library derived from drought stress conditions. This mining of text in the dbEST entries was done to help identify universal stress protein ESTs as research tools for understanding stress response in a large number of plant species of agricultural, economic, ecological or industrial importance but without complete genome sequences.

Results

A total of 511 viridiplantae proteins annotated with universal stress protein domain (PF00582) from 43 unique taxa (NCBI Taxonomy IDs) were downloaded from UniProt on October 24, 2010 (Table 1). The protein count per taxa ranged from 1 to 88. The protein counts for Liliopsida (monocotyledons), dicotyledons, and other viridiplantae including green algae were 235, 203 and 73 respectively. Furthermore, land plants with at least 50 USP records in UniProt from the Pfam dataset were Oryza sativa subsp. japonica, Arabidopsis thaliana, Populus trichocarpa, Oryza sativa subsp. indica and Zea mays. The green algae genera represented in the dataset were Chlamydomonas, Ostreococcus and Micromonas. The sequence length ranged from 29 (A7Y7Q4) to 1223 (A8HRL3) with 251 unique lengths observed (Supplementary File 1 and Fig. 2). Finally, 39 sequences were annotated as fragments.
Table 1.

Dataset of viridiplantae universal stress proteins entries in UniProt.

Scientific nameCommon nameNCBI Taxonomy IDNumber of UniProt entriesUSP domain only
Oryza sativa subsp. japonicaRice399478860
Arabidopsis thalianaMouse-ear cress37027853
Populus trichocarpaWestern balsam poplar36945945
Oryza sativa subsp. indicaRice399465232
Zea maysMaize45775245
Ricinus communisCastor bean39884332
Picea sitchensisSitka spruce33322121
Physcomitrella patensMoss32181815
Vitis viniferaGrape297601811
Micromonas pusilla CCMP15455646081010
Medicago truncatulaBarrel medic388088
Micromonas sp. RCC29929658788
Brassica campestrisField mustard371177
Chlamydomonas reinhardtii305564
Ostreococcus lucimarinus (strain CCE9901)43601755
Ostreococcus tauri7044844
Vicia fabaBroad bean390644
Brachypodium distachyonPurple false brome1536831
Gossypium arboreumTree cotton2972922
Oryza sativaRice453020
Arachis hypogaeaPeanut381811
Astragalus sinicusChinese milk vetch4706511
Brachypodium sylvaticumFalse brome2966410
Brassica oleraceaChinese kale371411
var. alboglabra
Capsicum chinenseScotch bonnet8037911
Cicer arietinumChickpea382711
Gossypium barbadenseSea-island cotton363411
Hordeum bulbosumBulbous barley451611
Hordeum vulgareBarley451311
Hordeum vulgare var. distichumTwo-rowed barley11250911
Marchantia polymorphaLiverwort319710
Mirabilis jalapaGarden four-o’clock353811
Pisum sativumGarden pea388811
Populus trichocarpa ×Black cottonwood ×369511
Populus deltoidsEastern cottonwood
Potamogeton distinctusRoundleaf pondweed6234411
Prunus dulcisAlmond375511
Solanum lycopersicumTomato408111
Solanum tuberosumPotato411311
Sonneratia albaMangrove Apple12281211
Sonneratia apetalaMangrove12281311
Sonneratia caseolarisMangrove Crabapple12281411
Sonneratia ovataMangrove12281611
Triticum aestivumWheat456511
Figure 2.

Distribution of sequence length of 511 viridiplantae universal stress proteins.

A total of 17 Pfam protein domains arranged in 17 architectures were associated with the dataset (Table 2 and Fig. 3). Ten of the 17 protein domains occurred only in one protein, most of which are uncharacterized as with sequences from Oryza sativa subsp indica, Oryza sativa subsp japonica, Vitis vinifera and Zea mays. Two sequences in this subset had names that indicated possible function: flagellar associated protein from Chlamydomonas reinhardtii and Anti-bacterial protein from Solanum tuberosum (potato). As expected the universal stress protein family (PF00582) domain was present in all the proteins analyzed. The protein kinase domain (PF00069), U-box domain (PF04564) and protein tyrosine kinase (PF07714) were found in at least 20 proteins (Table 2). The combination of domains for the USP and the transmembrane sodium/hydrogen exchanger family (PF00999) was observed in 5 proteins: B9S492 (Ricinus communis), A5BEW1 (Vitis vinifera), B9I6U4 (Populus trichocarpa), B9INS2 (Populus trichocarpa) and A9T441 (Physcomitrella patens). A total of 387 protein sequences had only the USP domain. In a subset of 12 sequences having tandem USP domains, 9 sequences were from green algae (Table 3).
Table 2.

Distribution of protein families in viridiplantae universal stress proteins.

Pfam ID*Pfam nameCount
PF00582Universal stress protein family511
PF00069Protein kinase domain87
PF04564U-box domain34
PF07714Protein tyrosine kinase21
PF00999Sodium/hydrogen exchanger family5
PF03107C1 domain2
PF07649C1-like domain2
PF00651BTB/POZ domain1
PF01370NAD dependent epimerase/dehydratase family1
PF02637GatB domain1
PF03061Thioesterase superfamily1
PF04147Nop14-like family1
PF04185Phosphoesterase family1
PF05139Erythromycin esterase1
PF05699hAT family dimerisation domain1
PF08879WRC1
PF08880QLQ1

Note:

Description of protein domains available at http://pfam.sanger.ac.uk/.

Figure 3.

Protein domain architectures, examples and counts in dataset of plant universal stress proteins. Architecture images obtained from InterPro (www.ebi.ac.uk/interpro), an integrated database of predictive protein “signatures” for protein annotation and classification. The examples are UniProt identifiers with abbreviations for the plant taxa as follows—ORYSI: Oryza sativa subsp. indica (Rice); BRASY: Brachypodium sylvaticum (False brome); ARATH: Arabidopsis thaliana (Mouse-ear cress); ORYSJ: Oryza sativa subsp. japonica (Rice); VITVI: Vitis vinifera (Grape); CHLRE: Chlamydomonas reinhardtii; SOLTU: Solanum tuberosum (Potato); PHYPA: Physcomitrella patens subsp. patens; MAIZE: Zea mays (Maize).

Table 3.

Viridiplantae universal stress proteins with tandem USP domains.

UniProt IdentifierOrganismProtein LengthDomain Coordinates
A8IXV1Brassica campestris22013–56 79–198
B0YQX1Gossypium arboreum1694–64 78–169
C1N7W4Micromonas pusilla CCMP154534390–160 187–322
C1MYP7Micromonas pusilla CCMP154558184–241 518–569
C1N599Micromonas pusilla CCMP154539648–102 155–249
C1E4R1Micromonas sp” RCC299567295–355 431–567
C1FHK1Micromonas sp” RCC29926712–93 120–256
A2ZLH5Oryza sativa subsp” indica32021–155 167–312
A4RVM8Ostreococcus lucimarinus (strain CCE9901)27467–164 195–250
A4RZS6Ostreococcus lucimarinus (strain CCE9901)40126–215 237–378
Q015R5Ostreococcus tauri401173–212 234–376
Q01BC4Ostreococcus tauri21523–83 85–191
The UniProtKB database cross-references for each viridiplantae USP entry stored in the XML format were extracted to determine the availability of each database annotation across the dataset of entries. Table 4 shows databases that were used to annotate at least 100 USPs. The complete list of 45 cross-references is available in Supplementary File 1. The Gene Ontology, InterPro, NCBI Taxonomy, and Pfam were found in all the 511 UniProt entries. In order to construct a matrix, 40 of the cross-references were selected with references present in all entries removed as well as RefSeq, which had an identical number of entries with Entrez Gene database. The matrix is available in the Supplementary File 1.
Table 4.

Selected UniProt cross-reference resources linked to plant universal stress proteins.

DatabaseUSP UniProt entry countDatabaseWeb server
GO511Gene Ontologyhttp://www.geneontology.org/
InterPro511Integrated resource of protein families, domains and functional siteshttp://www.ebi.ac.uk/interpro/
NCBI Taxonomy511NCBI Taxonomy Databasehttp://www.ncbi.nlm.nih.gov/taxonomy
Pfam511Pfam protein domain databasehttp://pfam.sanger.ac.uk/
EMBL506EMBL nucleotide sequence databasehttp://www.ebi.ac.uk/embl/
ProteinModelPortal407Protein Model Portal of the PSI-Nature Structural Biology Knowledgebasehttp://www.proteinmodelportal.org/
Gene3D379Gene3D Structural and Functional Annotation of Protein Familieshttp://gene3d.biochem.ucl.ac.uk/Gene3D/
PubMed366PubMedhttp://www.pubmed.gov
DOI361Digital Object Identifierhttp://www.doi.org/
GeneID264Database of genes from NCBI RefSeq genomeshttp://www.ncbi.nlm.nih.gov/sites/entrez?db=gene
RefSeq264NCBI Reference Sequenceshttp://www.ncbi.nlm.nih.gov/RefSeq/
EnsemblPlants236EnsemblPlantshttp://plants.ensembl.org/
KEGG234KEGG: Kyoto Encyclopedia of Genes and Genomeshttp://www.genome.jp/kegg/
PRINTS227Protein Motif fingerprint database; a protein domain databasehttp://umber.sbs.man.ac.uk/dbbrowser/PRINTS/
UniGene163UniGene gene-oriented nucleotide sequence clustershttp://www.ncbi.nlm.nih.gov/sites/entrez?db=UniGene
SMR128SWISS-MODEL Repository—a database of annotated 3D protein structure modelshttp://swissmodel.expasy.org/repository/
HOGENOM116The HOGENOM Database of Homologous Genes from Fully Sequenced Organismshttp://pbil.univ-lyon1.fr/databases/hogenom.php
PROSITE110PROSITE; a protein domain and family databasehttp://www.expasy.org/prosite/
SUPFAM110Superfamily database of structural and functional annotationhttp://supfam.org
ProtClustDB108Entrez Protein Clustershttp://www.ncbi.nlm.nih.gov/sites/entrez?db=proteinclusters
Twelve USP sequences were annotated with both the ArrayExpress and Ortholog Matrix Project (OMA) Browser (Fig. 4). Three Arabidopsis USP genes (Q93W91 [At3g62550], Q9LPF5 [At1g44760] and Q9M328 [AT3g53990]) were up regulated in a drought microarray experiment stored in ArrayExpress and were annotated in the OMA Browser. Box plots of the three genes obtained from ArrayExpess as well as multiple sequence alignment of orthologs are presented in Figure 5. The OMA Browser provides multiple sequence alignment for groups of orthologs for each protein sequence (Fig. 5). Orthologous sequences were from Oryza sativa, Sorghum bicolor, Populus trichocarpa and Vitis vinifera.
Figure 4.

Visualization of matrix of availability of annotation with 40 external database references for selected plant universal stress proteins in UniProt. Description of column headings is documented in Supplementary File 1.

Notes: Red, presence of database annotation; Green, absence of database annotation.

Figure 5.

Gene expression and protein sequence alignment of Arabidopsis thaliana USPs up-regulated in response to drought. Detail gene expression and protein sequence alignment can be obtained by using the following weblinks respectively by replacing the with the UniProt protein identifier.

http://www.ebi.ac.uk/gxa/experiment/E-MEXP-1863/.

http://omabrowser.org/cgi-bin/gateway.pl?f=DisplayGroup&p1=.

Visual inspection of the alignments showed that the G-2X-G-9X-G (S/T) motif for small phosphoryl/ribosyl-binding residues of Adenosine Triphosphate (ATP)49 was present in Q9M328 and Q93W91 but absent in Q9LPF5. Additional homologous sequences for the drought-responsive proteins provided by PLAZA36 and ClustalW2 generated sequence alignments can be found in the Supplementary File 2. The multiple sequence alignment for 16 homologous sequences including drought responsive ATP-binding motif containing At3g62550 is presented in Figure 6. The conserved Aspartate (D) residue in position 12 of At3g62550 is known to be involved in adenine binding in ATP-binding USPs.15,50
Figure 6.

Multiple sequence alignment of drought-responsive Arabidopsis thaliana universal stress protein At3g53990 and homologs. The conserved Aspartate (D) residue in position 12 of At3g62550 (marked with +) is known to be involved in adenine binding in ATP-binding USPs.12,44 The region for small phosphoryl/ribosyl-binding residues of ATP is indicated with a series of #. The first two letters of the sequence name correspond to the plant: AL, Arabidopsis lyrata; AT, Arabidopsis thaliana; BD, Brachypodium distachyon; CP, Carica papaya; GM, Glycine max; MD, Malus domestica; ME, Manihot esculenta; MT, Medicago truncatula; OS, Oryza sativa ssp. Japonica; OSAINDICA, Oryza sativa ssp. Indica; PT, Populus trichocarpa; RC, Ricinus communis; SB, Sorghum bicolor; VV, Vitis vinifera.

Viridiplantae universal stress protein gene transcripts derived from drought conditions

A total of 1561 ESTs clustered into 360 singletons and 185 Transcript Assembles from 137 unique viridiplantae members (82 genera) and annotated with text “universal stress protein” were obtained from the TIGR Plant Transcript Assemblies (Supplementary File 1). Triticum aestivum (bread wheat), Oryza sativa Japonica Group and Glycine max (soybean) had at least 100 ESTs annotated as encoding universal stress proteins. The 82 plant genera represented in the universal stress protein gene transcript dataset were clustered according to number of species or species combination (Table 5).
Table 5.

Plant genera represented in universal stress protein gene transcripts dataset.

GenusNumber of species
Populus9
Helianthus6
Citrus5
Oryza Picea4
Fragaria Gossypium Lactuca Medicago Nicotiana Prunus Saccharum Sorghum Triticum3
Agrostis Apium Arachis Centaurea Euphorbia Hordeum Petunia Phaseolus Pinus Pseudotsuga Rosa Solanum Taraxacum Vitis2
Aegilops Allium Ananas Antirrhinum Avena Avicennia Brachypodium Brassica Capsicum Catharanthus Ceratopteris Chlamydomonas Cichorium Coffea Curcuma Cyamopsis Cycas Eragrostis Eucalyptus Festuca Ginkgo Glycine Hedyotis Ipomoea Juglans Lolium Lotus Malus Manihot Marchantia Mesembryanthemum Mesostigma Mimulus Panax Panicum Pennisetum Phalaenopsis Physcomitrella Pisum Rhododendron Ricinus Salvia Secale Selaginella Sesamum Syntrichia Thellungiella Theobroma Vaccinium Welwitschia Zamia Zantedeschia Zingiber Zinnia1
A dataset of 80 simple sequence repeats (SSRs) linked to 20 singletons and 47 transcript assembles was constructed (Supplementary File 1). A total of 31 types of SSRs (3 uninucleotide; 7 dinucleotides; 16 trinucleotides; 1 tetranucleotide; 3 pentanucleotides; and 1 hexa-nucleotides) were retrieved (Table 6). The transcript count associated with each SSRs was also determined to identify potential unique EST-SSR markers. For example, the dinucleotide TA was unique for singleton DY959747 from Lactuca sativa (lettuce). The suggested primers for the identified EST-SSRs are available from the Plantta website at http://planta.jcvi.org/.
Table 6.

Simple Sequence Repeats (SSR) linked to universal stress protein gene transcripts.

SSRUniversal stress protein gene transcriptsNucleotide countTranscript count
ACN463769 TA10796_2711 TA10796_2711 TA33493_4530 TA36088_4113 TA36088_4113 TA70577_456517
TCI395583 EC939973 TA70577_4565 TA72057_384714
GTA70577_456511
AGDW142996 TA1316_69721 TA35794_29760 TA4456_398824
ATTA1967_153471 TA2761_80863 TA3030_7164723
GATA14075_2711 TA3367_30980422
TGTA49270_4530 TA49271_453022
CTTA16418_333021
GTTA33493_453021
TADY95974721
CAGAJ610677 AL825196 CJ661413 CJ668094 TA52848_4565 TA53203_4565 TA53303_4565 TA53312_4565 TA53408_456539
CGCCA181583 CI776988 DT694744 TA26627_4558 TA32279_4513 TA33491_4530 TA36096_4547 TA3991_13271138
CCGTA2412_4568 TA41380_4513 TA41381_4513 TA41598_4513 TA55332_4565 TA55439_456536
AAGDY942109 DY953240 TA10188_4232 TA1962_319734
GAADT694744 DY923603 EE657448 TA25103_455834
ATABQ473543 TA48543_3847 TA48544_384733
GGTCI602544 TA49270_4530 TA49271_453033
CGTTA46082_3847 TA48499_454732
GCACD220323 TA25103_455832
GGCTA2176_4120 TA51553_453032
AGGTA75441_453031
ATTTA36088_411331
CGADT69474431
CGGTA1497_9432831
GAGTA15268_2973031
TGTDR57568731
TTGTTA3984_7327541
AAAATTA762_461551
CACCCTA32279_451351
TTTAATA3585_3659651
GCGGCTTA41381_451361
The bioinformatics strategy retrieved 32 drought-responsive ESTs from 7 plant genera Glycine, Hordeum, Manihot, Medicago, Oryza, Pinus and Triticum (Table 7). Furthermore, the strategy revealed differentially expressed ESTs. In domesticated barley, two ESTs BM369974 and BQ761388 were expressed in the root while CD662497 was expressed in the lower leaf epidermis. In rice, two ESTs CK665047 and CA764828 were expressed in drought stressed leaf and drought stress panicle respectively. Integrating the datasets on SSRs and drought-responsive ESTs identified three drought-responsive ESTs from Triticum aestivium (BE604157), Glycine max (BM887317) and Pinus pinaster (BX682209) (Table 8). The SSR sequence types were (CAG)4, (ATA)4 and (AT)5 respectively.
Table 7.

Drought-annotated plant Expressed Sequence Tags (ESTs)

ESTPlantSource of EST library
BM886962Glycine max (soybean)*
BM887317Glycine max (soybean)*
CD662497Hordeum vulgare subsp. vulgare (domesticated barley)Lower leaf epidermis
BM369974Hordeum vulgare subsp. vulgare (domesticated barley)Root
BQ761388Hordeum vulgare subsp. vulgare (domesticated barley)Root
DV442544Manihot esculenta (cassava)**
DV442765Manihot esculenta (cassava)**
DV443464Manihot esculenta (cassava)**
DV444643Manihot esculenta (cassava)**
DV446035Manihot esculenta (cassava)**
DV446427Manihot esculenta (cassava)**
DV447334Manihot esculenta (cassava)**
DV454753Manihot esculenta (cassava)***
DV455089Manihot esculenta (cassava)***
DV455235Manihot esculenta (cassava)***
DV455909Manihot esculenta (cassava)***
DV456031Manihot esculenta (cassava)***
DV456176Manihot esculenta (cassava)***
DV456576Manihot esculenta (cassava)***
DV456911Manihot esculenta (cassava)***
DV457684Manihot esculenta (cassava)***
BE248764Medicago truncatula (barrel medic)Plantlets
BF631735Medicago truncatula (barrel medic)Plantlets
BF634145Medicago truncatula (barrel medic)Plantlets
BF634785Medicago truncatula (barrel medic)Plantlets
CK665047Oryza sativa Indica GroupLeaf
CA764828Oryza sativa Indica GroupPanicles
BX680935Pinus pinasterRoot
BX682209Pinus pinasterRoot
BE604157Triticum aestivum (bread wheat)Leaf
BE428779Triticum turgidum subsp. durum (durum wheat)Root
BE429106Triticum turgidum subsp. durum (durum wheat)Root

Notes:

Leaf, drought stressed, 1 month old plants, greenhouse grown;

Mature leaf and petiole, young leaf and apical meristem, root, tuber and tuber peel, young leaf and apical meristem midnight;

Young leaf and apical meristem, mature leaf and petiole, root, tuber and tuber peel from water stressed plants.

Table 8.

Drought–responsive Expressed Sequence Tags (ESTs) with microsatellites.

ESTPlantTissuePlantta TAPlantta SSR IDSSRNumber of repeatsTranscript length (bp)StartEnd
BE604157Triticum aestivum (bread wheat)LeafTA53312_4565233072CAG4810205216
BM887317Glycine max (soybean)Leaf, drought stressed, 1 month old plants, greenhouse grownTA48544_3847815126ATA4889544555
BX682209Pinus pinaster (maritime pine)RootTA3030_71647751586AT5414199208

Discussion

Plants are continuously exposed to abiotic and biotic stresses that require adaptation for survival. The availability of genomic sequences from a variety of viridiplantae has facilitated the dissection of the molecular, cellular and developmental responses to environmental stresses including drought.37 Our investigation demonstrates the benefits of integrating data on universal stress proteins from comprehensive protein and transcript databases. The value-added and prioritized datasets produced presents new opportunities to better investigate the function of universal stress proteins from diverse plants. According to the focus of the investigation, the protein and gene transcript datasets are discussed in the context of response to drought and salt stress. We have retrieved, mined and integrated comprehensive functional annotation data on 511 universal stress protein and 1561 ESTs sequences from the viridiplantae. A total of 161 plants with unique NCBI Taxonomy Identifier were associated with the sequences. Thus, we have provided a catalog of protein and gene transcripts from model and non-model plant species those of importance in agriculture, ecology, industry and alternative energy. A catalog limited to Arabidopsis universal stress proteins has been published.13 The cross-database references available in our investigation present other researchers with a “one-stop-shopping” for sequences information on viridiplantae universal stress proteins. The bioinformatics strategy extracted functional annotation data from comprehensive public domain protein and gene transcript databases. The Pfam protein family database36 served as the source of protein sequences for which their functional annotation data in the UniProt protein resource26 were extracted and integrated with other specialized databases including those storing data on gene expression38 and protein sequence evolution.34 We also extracted functional annotation data from the Plantta EST resource, since ESTs are a source of genomic information especially for plants without complete genome sequencing projects. The bioinformatics approach presented could be useful for other researchers interested in other protein families. The particular function of a protein depends on its combination of domains. In general, the presence of the USP domain may provide the ability for the function of the other domain to be expressed under stress conditions. The USP domain appears as a single domain in small USP proteins (∼14–15 kDa), as two domains arranged in tandem in larger USP proteins (∼30 kDa), or as one or two USP domains together with other functional domains.9,13 Our analysis extracted and organized the domain combinations present in the 511 plant USPs thereby providing function-categorized subsets of the dataset. The categories can be investigated for shared function and regulation. Protein phosphorylation by kinases is a known pathway utilized by plants to response to osmotic stress.52,53 Five proteins had annotation for the sodium/hydrogen exchanger family domain (PF00999), a domain for transport of sodium ions either out of cell or organelles in exchange for hydrogen ions to prevent toxic accumulation of sodium ions.54,55 The Arabidopsis gene encoding Na+/H+ exchanger termed salt overly sensitive (SOS1) is an important determinant of salt tolerance.39 The list of uncharacterized proteins with both USP and Na_H_Exchanger included protein A9T441 from the moss Physcomitrella patens, the oldest clade of land plants40 and that is highly tolerant against hyper salinity and severe water limitations.41 The 18 P. patens USPs in the dataset warrants further investigation to understand the evolution of USPs from small land plants to higher plants after 450 years. The recognition of P. patens has a versatile tool for plant functional genomics could accelerate additional research of benefit to higher plants of importance in agriculture (eg, grapevine), industry (eg, castor plant) and cellulosic biofuels (eg, poplar). Nine of the 12 protein sequences with tandem USP domains were from green algae. There are currently a limited number of reports on functional characterization of proteins with tandem USP domains.10,42,43 In Escherichia coli, mutants of UspE that contain tandem USP domains were unable to form cell-cell interactions and cell aggregates in stationary phase. In Mycobacterium tuberculosis, which has 8 of its 10 USPs having tandem domains, Rv2623 has growth-regulating capability linked to ATP-binding.42 A recent investigation observed higher degree of sequence identity between tandem domains in prokaryotes compared to eukaryotes.44 The dataset analyzed did not including tandem USP domains. A starting point for characterization of tandem USP domain of plants could be to determine the sequence conservation between the domains. Cross-referencing of specialized databases to a protein sequence entry in UniProtKB provides additional functional annotation that can help accelerate selection of plant USPs for characterization. The UniProtKB provides links to at least 126 specialized resources including plant bioinformatics databases such as The Arabidopsis Resource (TAIR),45 Gramene,46 and EnsemblPlants.47 We have integrated available database cross-references to provide a visual view of databases across the viridiplantae USPs analyzed. The utility of such view was demonstrated on a subset of proteins that were annotated with ArrayExpress45 and Ortholog MAtrix Project (OMA) Browser.34 This view enabled us to easily identify Q9SW11 (U-box domain-containing protein 35; At4g25160, PUB35) as an enzyme based on the presence of the Enzyme Commission (EC) number (Fig. 4: Column 4, Row 10). The U-box domain for regulated protein ubiquitination and degradation is a modified RING-finger domain involved in protein that lacks metal-binding ability.48 Comparative structural and functional assays could reveal the interactions of the USP domain and the enzyme domains present in Q9SW11. Orthologous drought-responsive universal stress proteins could be candidates to engineer desired phenotypes in plants. Our analyses identified three Arabidopsis proteins (Fig. 5) and their orthologs in Oryza sativa, Sorghum bicolor, Populus trichocarpa and Vitis vinifera. Q9M328 and Q93W91 and their homologs could be regulated by ATP based on the presence of ATP-binding motif (Fig. 6). Expressed Sequence Tags generated from stress-challenged plant tissues have been used as high quality transcripts to discover genes, identify candidate stress-responsive genes/transcripts and identify functional markers such as genic microsatellites and single nucleotide polymorphisms.49–51 The effects of SSR type as well as number of repeats on gene regulation, transcription and protein function are poorly understood in plants when compared to human or animal systems.51 In this article we report automatic extraction of information on simple sequence repeats (SSRs) associated with 1561 ESTs in the Plantta resource.27 Our analysis identified candidate USP gene transcripts in multiple plants (Supplementary File 1 and Table 5); organized the SSRs into types (Table 6), drought-annotated USP ESTs (Table 7) and USP EST-SSRs from drought-stress tissues (Table 8). The majority (49 of 80) of the USP EST-SSRs was the trinucleotide type, which has been reported to be the most abundant in rice, wheat and barley52,53 as well as peanut54 and citrus.55 All together, our analyses provide a comprehensive collection of USP ESTs including those responsive to drought. We have clustered the plant genera based on the number of species to facilitate investigating the EST-SSR and EST-Single Nucleotide Polymorphisms (SNPs) in USP genes for comparative mapping, transferability, genetic diversity and plant improvement.

Conclusions

The molecular mechanisms by which genes encoding the universal stress protein domain are able to confer in plants the ability to respond and adapt to environmental changes are not well defined. We have computationally retrieved, mined and integrated functional annotations on protein and gene transcripts that encode the universal stress protein domain. The datasets from cross-database mining provide organized resources for the characterization of USP genes as useful targets for engineering plant varieties tolerant to unfavorable environmental conditions.
  54 in total

1.  Structure of the universal stress protein of Haemophilus influenzae.

Authors:  M C Sousa; D B McKay
Journal:  Structure       Date:  2001-12       Impact factor: 5.006

2.  Increased sequence conservation of domain repeats in prokaryotic proteins.

Authors:  Dan Reshef; Zohar Itzhaki; Ora Schueler-Furman
Journal:  Trends Genet       Date:  2010-07-17       Impact factor: 11.639

3.  Universal stress proteins in Escherichia coli.

Authors:  Deborah A Siegele
Journal:  J Bacteriol       Date:  2005-09       Impact factor: 3.490

Review 4.  Understanding regulatory networks and engineering for enhanced drought tolerance in plants.

Authors:  Babu Valliyodan; Henry T Nguyen
Journal:  Curr Opin Plant Biol       Date:  2006-02-17       Impact factor: 7.834

5.  A novel nodule-enhanced gene encoding a putative universal stress protein from Astragalus sinicus.

Authors:  Min-Xia Chou; Xin-Yuan Wei; Da-Song Chen; Jun-Chu Zhou
Journal:  J Plant Physiol       Date:  2006-08-01       Impact factor: 3.549

6.  The nodulin vfENOD18 is an ATP-binding protein in infected cells of Vicia faba L. nodules.

Authors:  J D Becker; L M Moreira; D Kapp; S C Frosch; A Pühler; A M Perlic
Journal:  Plant Mol Biol       Date:  2001-12       Impact factor: 4.076

7.  Ensembl Genomes: extending Ensembl across the taxonomic space.

Authors:  P J Kersey; D Lawson; E Birney; P S Derwent; M Haimel; J Herrero; S Keenan; A Kerhornou; G Koscielny; A Kähäri; R J Kinsella; E Kulesha; U Maheswari; K Megy; M Nuhn; G Proctor; D Staines; F Valentin; A J Vilella; A Yates
Journal:  Nucleic Acids Res       Date:  2009-11-01       Impact factor: 16.971

8.  Providing web servers and training in Bioinformatics: 2010 update on the Bioinformatics Links Directory.

Authors:  Michelle D Brazas; Joseph T Yamada; B F Francis Ouellette
Journal:  Nucleic Acids Res       Date:  2010-06-11       Impact factor: 16.971

9.  Metabolic control of the Escherichia coli universal stress protein response through fructose-6-phosphate.

Authors:  Orjan Persson; Asa Valadi; Thomas Nyström; Anne Farewell
Journal:  Mol Microbiol       Date:  2007-07-19       Impact factor: 3.501

10.  EST and EST-SSR marker resources for Iris.

Authors:  Shunxue Tang; Rebecca A Okashah; Marie-Michele Cordonnier-Pratt; Lee H Pratt; Virgil Ed Johnson; Christopher A Taylor; Michael L Arnold; Steven J Knapp
Journal:  BMC Plant Biol       Date:  2009-06-10       Impact factor: 4.215

View more
  26 in total

1.  Draft genome sequence of pigeonpea (Cajanus cajan), an orphan legume crop of resource-poor farmers.

Authors:  Rajeev K Varshney; Wenbin Chen; Yupeng Li; Arvind K Bharti; Rachit K Saxena; Jessica A Schlueter; Mark T A Donoghue; Sarwar Azam; Guangyi Fan; Adam M Whaley; Andrew D Farmer; Jaime Sheridan; Aiko Iwata; Reetu Tuteja; R Varma Penmetsa; Wei Wu; Hari D Upadhyaya; Shiaw-Pyng Yang; Trushar Shah; K B Saxena; Todd Michael; W Richard McCombie; Bicheng Yang; Gengyun Zhang; Huanming Yang; Jun Wang; Charles Spillane; Douglas R Cook; Gregory D May; Xun Xu; Scott A Jackson
Journal:  Nat Biotechnol       Date:  2011-11-06       Impact factor: 54.908

2.  A Universal Stress Protein Involved in Oxidative Stress Is a Phosphorylation Target for Protein Kinase CIPK6.

Authors:  Emilio Gutiérrez-Beltrán; José María Personat; Fernando de la Torre; Olga Del Pozo
Journal:  Plant Physiol       Date:  2016-11-29       Impact factor: 8.340

3.  Populus euphratica: the transcriptomic response to drought stress.

Authors:  Sha Tang; Haiying Liang; Donghui Yan; Ying Zhao; Xiao Han; John E Carlson; Xinli Xia; Weilun Yin
Journal:  Plant Mol Biol       Date:  2013-07-16       Impact factor: 4.076

4.  Identification of salt treated proteins in sorghum using gene ontology linkage.

Authors:  Manoj Kumar Sekhwal; Ajit Kumar Swami; Renu Sarin; Vinay Sharma
Journal:  Physiol Mol Biol Plants       Date:  2012-07

5.  In-depth proteome analysis of the rubber particle of Hevea brasiliensis (para rubber tree).

Authors:  Longjun Dai; Guijuan Kang; Yu Li; Zhiyi Nie; Cuifang Duan; Rizhong Zeng
Journal:  Plant Mol Biol       Date:  2013-04-04       Impact factor: 4.076

6.  The Gene Encoding the Universal Stress Protein AtUSP is Regulated by Phytohormones and Involved in Seed Germination of Arabidopsis thaliana.

Authors:  D S Gorshkova; I A Getman; A S Voronkov; S I Chizhova; Vl V Kuznetsov; E S Pojidaeva
Journal:  Dokl Biochem Biophys       Date:  2018-05-19       Impact factor: 0.788

7.  A gene-phenotype network based on genetic variability for drought responses reveals key physiological processes in controlled and natural environments.

Authors:  David Rengel; Sandrine Arribat; Pierre Maury; Marie-Laure Martin-Magniette; Thibaut Hourlier; Marion Laporte; Didier Varès; Sébastien Carrère; Philippe Grieu; Sandrine Balzergue; Jérôme Gouzy; Patrick Vincourt; Nicolas B Langlade
Journal:  PLoS One       Date:  2012-10-08       Impact factor: 3.240

8.  SpUSP, an annexin-interacting universal stress protein, enhances drought tolerance in tomato.

Authors:  Rachid Loukehaich; Taotao Wang; Bo Ouyang; Khurram Ziaf; Hanxia Li; Junhong Zhang; Yongen Lu; Zhibiao Ye
Journal:  J Exp Bot       Date:  2012-08-21       Impact factor: 6.992

9.  Inferences on the biochemical and environmental regulation of universal stress proteins from Schistosomiasis parasites.

Authors:  Andreas N Mbah; Ousman Mahmud; Omotayo R Awofolu; Raphael D Isokpehi
Journal:  Adv Appl Bioinform Chem       Date:  2013-05-10

10.  Functional Annotation Analytics of Bacillus Genomes Reveals Stress Responsive Acetate Utilization and Sulfate Uptake in the Biotechnologically Relevant Bacillus megaterium.

Authors:  Baraka S Williams; Raphael D Isokpehi; Andreas N Mbah; Antoinesha L Hollman; Christina O Bernard; Shaneka S Simmons; Wellington K Ayensu; Bianca L Garner
Journal:  Bioinform Biol Insights       Date:  2012-11-21
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.