| Literature DB >> 21423406 |
Raphael D Isokpehi1, Shaneka S Simmons, Hari H P Cohly, Stephen I N Ekunwe, Gregorio B Begonia, Wellington K Ayensu.
Abstract
Genes encoding proteins that contain the universal stress protein (USP) domain are known to provide bacteria, archaea, fungi, protozoa, and plants with the ability to respond to a plethora of environmental stresses. Specifically in plants, drought tolerance is a desirable phenotype. However, limited focused and organized functional genomic datasets exist on drought-responsive plant USP genes to facilitate their characterization. The overall objective of the investigation was to identify diverse plant universal stress proteins and Expressed Sequence Tags (ESTs) responsive to water-deficit stress. We hypothesize that cross-database mining of functional annotations in protein and gene transcript bioinformatics resources would help identify candidate drought-responsive universal stress proteins and transcripts from multiple plant species. Our bioinformatics approach retrieved, mined and integrated comprehensive functional annotation data on 511 protein and 1561 ESTs sequences from 161 viridiplantae taxa. A total of 32 drought-responsive ESTs from 7 plant genera Glycine, Hordeum, Manihot, Medicago, Oryza, Pinus and Triticum were identified. Two Arabidopsis USP genes At3g62550 and At3g53990 that encode ATP-binding motif were up-regulated in a drought microarray dataset. Further, a dataset of 80 simple sequence repeats (SSRs) linked to 20 singletons and 47 transcript assembles was constructed. Integrating the datasets on SSRs and drought-responsive ESTs identified three drought-responsive ESTs from bread wheat (BE604157), soybean (BM887317) and maritime pine (BX682209). The SSR sequence types were CAG, ATA and AT respectively. The datasets from cross-database mining provide organized resources for the characterization of USP genes as useful targets for engineering plant varieties tolerant to unfavorable environmental conditions.Entities:
Keywords: Pfam; Uniprot; drought; expressed sequence tags; microsatellite; plants; salinity; simple sequence repeats; universal stress protein domain; viridiplantae
Year: 2011 PMID: 21423406 PMCID: PMC3045048 DOI: 10.4137/BBI.S6061
Source DB: PubMed Journal: Bioinform Biol Insights ISSN: 1177-9322
Figure 1.Flowchart for constructing dataset of viridiplantae universal stress proteins.
Dataset of viridiplantae universal stress proteins entries in UniProt.
| Rice | 39947 | 88 | 60 | |
| Mouse-ear cress | 3702 | 78 | 53 | |
| Western balsam poplar | 3694 | 59 | 45 | |
| Rice | 39946 | 52 | 32 | |
| Maize | 4577 | 52 | 45 | |
| Castor bean | 3988 | 43 | 32 | |
| Sitka spruce | 3332 | 21 | 21 | |
| Moss | 3218 | 18 | 15 | |
| Grape | 29760 | 18 | 11 | |
| 564608 | 10 | 10 | ||
| Barrel medic | 3880 | 8 | 8 | |
| 296587 | 8 | 8 | ||
| Field mustard | 3711 | 7 | 7 | |
| 3055 | 6 | 4 | ||
| 436017 | 5 | 5 | ||
| 70448 | 4 | 4 | ||
| Broad bean | 3906 | 4 | 4 | |
| Purple false brome | 15368 | 3 | 1 | |
| Tree cotton | 29729 | 2 | 2 | |
| Rice | 4530 | 2 | 0 | |
| Peanut | 3818 | 1 | 1 | |
| Chinese milk vetch | 47065 | 1 | 1 | |
| False brome | 29664 | 1 | 0 | |
| Chinese kale | 3714 | 1 | 1 | |
| Scotch bonnet | 80379 | 1 | 1 | |
| Chickpea | 3827 | 1 | 1 | |
| Sea-island cotton | 3634 | 1 | 1 | |
| Bulbous barley | 4516 | 1 | 1 | |
| Barley | 4513 | 1 | 1 | |
| Two-rowed barley | 112509 | 1 | 1 | |
| Liverwort | 3197 | 1 | 0 | |
| Garden four-o’clock | 3538 | 1 | 1 | |
| Garden pea | 3888 | 1 | 1 | |
| Black cottonwood | 3695 | 1 | 1 | |
| Eastern cottonwood | ||||
| Roundleaf pondweed | 62344 | 1 | 1 | |
| Almond | 3755 | 1 | 1 | |
| Tomato | 4081 | 1 | 1 | |
| Potato | 4113 | 1 | 1 | |
| Mangrove Apple | 122812 | 1 | 1 | |
| Mangrove | 122813 | 1 | 1 | |
| Mangrove Crabapple | 122814 | 1 | 1 | |
| Mangrove | 122816 | 1 | 1 | |
| Wheat | 4565 | 1 | 1 |
Figure 2.Distribution of sequence length of 511 viridiplantae universal stress proteins.
Distribution of protein families in viridiplantae universal stress proteins.
| PF00582 | Universal stress protein family | 511 |
| PF00069 | Protein kinase domain | 87 |
| PF04564 | U-box domain | 34 |
| PF07714 | Protein tyrosine kinase | 21 |
| PF00999 | Sodium/hydrogen exchanger family | 5 |
| PF03107 | C1 domain | 2 |
| PF07649 | C1-like domain | 2 |
| PF00651 | BTB/POZ domain | 1 |
| PF01370 | NAD dependent epimerase/dehydratase family | 1 |
| PF02637 | GatB domain | 1 |
| PF03061 | Thioesterase superfamily | 1 |
| PF04147 | Nop14-like family | 1 |
| PF04185 | Phosphoesterase family | 1 |
| PF05139 | Erythromycin esterase | 1 |
| PF05699 | hAT family dimerisation domain | 1 |
| PF08879 | WRC | 1 |
| PF08880 | QLQ | 1 |
Note:
Description of protein domains available at http://pfam.sanger.ac.uk/.
Figure 3.Protein domain architectures, examples and counts in dataset of plant universal stress proteins. Architecture images obtained from InterPro (www.ebi.ac.uk/interpro), an integrated database of predictive protein “signatures” for protein annotation and classification. The examples are UniProt identifiers with abbreviations for the plant taxa as follows—ORYSI: Oryza sativa subsp. indica (Rice); BRASY: Brachypodium sylvaticum (False brome); ARATH: Arabidopsis thaliana (Mouse-ear cress); ORYSJ: Oryza sativa subsp. japonica (Rice); VITVI: Vitis vinifera (Grape); CHLRE: Chlamydomonas reinhardtii; SOLTU: Solanum tuberosum (Potato); PHYPA: Physcomitrella patens subsp. patens; MAIZE: Zea mays (Maize).
Viridiplantae universal stress proteins with tandem USP domains.
| A8IXV1 | 220 | 13–56 79–198 | |
| B0YQX1 | 169 | 4–64 78–169 | |
| C1N7W4 | 343 | 90–160 187–322 | |
| C1MYP7 | 581 | 84–241 518–569 | |
| C1N599 | 396 | 48–102 155–249 | |
| C1E4R1 | 567 | 295–355 431–567 | |
| C1FHK1 | 267 | 12–93 120–256 | |
| A2ZLH5 | 320 | 21–155 167–312 | |
| A4RVM8 | 274 | 67–164 195–250 | |
| A4RZS6 | 401 | 26–215 237–378 | |
| Q015R5 | 401 | 173–212 234–376 | |
| Q01BC4 | 215 | 23–83 85–191 |
Selected UniProt cross-reference resources linked to plant universal stress proteins.
| GO | 511 | Gene Ontology | |
| InterPro | 511 | Integrated resource of protein families, domains and functional sites | |
| NCBI Taxonomy | 511 | NCBI Taxonomy Database | |
| Pfam | 511 | Pfam protein domain database | |
| EMBL | 506 | EMBL nucleotide sequence database | |
| ProteinModelPortal | 407 | Protein Model Portal of the PSI-Nature Structural Biology Knowledgebase | |
| Gene3D | 379 | Gene3D Structural and Functional Annotation of Protein Families | |
| PubMed | 366 | PubMed | |
| DOI | 361 | Digital Object Identifier | |
| GeneID | 264 | Database of genes from NCBI RefSeq genomes | |
| RefSeq | 264 | NCBI Reference Sequences | |
| EnsemblPlants | 236 | EnsemblPlants | |
| KEGG | 234 | KEGG: Kyoto Encyclopedia of Genes and Genomes | |
| PRINTS | 227 | Protein Motif fingerprint database; a protein domain database | |
| UniGene | 163 | UniGene gene-oriented nucleotide sequence clusters | |
| SMR | 128 | SWISS-MODEL Repository—a database of annotated 3D protein structure models | |
| HOGENOM | 116 | The HOGENOM Database of Homologous Genes from Fully Sequenced Organisms | |
| PROSITE | 110 | PROSITE; a protein domain and family database | |
| SUPFAM | 110 | Superfamily database of structural and functional annotation | |
| ProtClustDB | 108 | Entrez Protein Clusters |
Figure 4.Visualization of matrix of availability of annotation with 40 external database references for selected plant universal stress proteins in UniProt. Description of column headings is documented in Supplementary File 1.
Notes: Red, presence of database annotation; Green, absence of database annotation.
Figure 5.Gene expression and protein sequence alignment of Arabidopsis thaliana USPs up-regulated in response to drought. Detail gene expression and protein sequence alignment can be obtained by using the following weblinks respectively by replacing the
http://www.ebi.ac.uk/gxa/experiment/E-MEXP-1863/
http://omabrowser.org/cgi-bin/gateway.pl?f=DisplayGroup&p1=
Figure 6.Multiple sequence alignment of drought-responsive Arabidopsis thaliana universal stress protein At3g53990 and homologs. The conserved Aspartate (D) residue in position 12 of At3g62550 (marked with +) is known to be involved in adenine binding in ATP-binding USPs.12,44 The region for small phosphoryl/ribosyl-binding residues of ATP is indicated with a series of #. The first two letters of the sequence name correspond to the plant: AL, Arabidopsis lyrata; AT, Arabidopsis thaliana; BD, Brachypodium distachyon; CP, Carica papaya; GM, Glycine max; MD, Malus domestica; ME, Manihot esculenta; MT, Medicago truncatula; OS, Oryza sativa ssp. Japonica; OSAINDICA, Oryza sativa ssp. Indica; PT, Populus trichocarpa; RC, Ricinus communis; SB, Sorghum bicolor; VV, Vitis vinifera.
Plant genera represented in universal stress protein gene transcripts dataset.
| 9 | |
| 6 | |
| 5 | |
| 4 | |
| 3 | |
| 2 | |
| 1 |
Simple Sequence Repeats (SSR) linked to universal stress protein gene transcripts.
| A | CN463769 TA10796_2711 TA10796_2711 TA33493_4530 TA36088_4113 TA36088_4113 TA70577_4565 | 1 | 7 |
| T | CI395583 EC939973 TA70577_4565 TA72057_3847 | 1 | 4 |
| G | TA70577_4565 | 1 | 1 |
| AG | DW142996 TA1316_69721 TA35794_29760 TA4456_3988 | 2 | 4 |
| AT | TA1967_153471 TA2761_80863 TA3030_71647 | 2 | 3 |
| GA | TA14075_2711 TA3367_309804 | 2 | 2 |
| TG | TA49270_4530 TA49271_4530 | 2 | 2 |
| CT | TA16418_3330 | 2 | 1 |
| GT | TA33493_4530 | 2 | 1 |
| TA | DY959747 | 2 | 1 |
| CAG | AJ610677 AL825196 CJ661413 CJ668094 TA52848_4565 TA53203_4565 TA53303_4565 TA53312_4565 TA53408_4565 | 3 | 9 |
| CGC | CA181583 CI776988 DT694744 TA26627_4558 TA32279_4513 TA33491_4530 TA36096_4547 TA3991_132711 | 3 | 8 |
| CCG | TA2412_4568 TA41380_4513 TA41381_4513 TA41598_4513 TA55332_4565 TA55439_4565 | 3 | 6 |
| AAG | DY942109 DY953240 TA10188_4232 TA1962_3197 | 3 | 4 |
| GAA | DT694744 DY923603 EE657448 TA25103_4558 | 3 | 4 |
| ATA | BQ473543 TA48543_3847 TA48544_3847 | 3 | 3 |
| GGT | CI602544 TA49270_4530 TA49271_4530 | 3 | 3 |
| CGT | TA46082_3847 TA48499_4547 | 3 | 2 |
| GCA | CD220323 TA25103_4558 | 3 | 2 |
| GGC | TA2176_4120 TA51553_4530 | 3 | 2 |
| AGG | TA75441_4530 | 3 | 1 |
| ATT | TA36088_4113 | 3 | 1 |
| CGA | DT694744 | 3 | 1 |
| CGG | TA1497_94328 | 3 | 1 |
| GAG | TA15268_29730 | 3 | 1 |
| TGT | DR575687 | 3 | 1 |
| TTGT | TA3984_73275 | 4 | 1 |
| AAAAT | TA762_4615 | 5 | 1 |
| CACCC | TA32279_4513 | 5 | 1 |
| TTTAA | TA3585_36596 | 5 | 1 |
| GCGGCT | TA41381_4513 | 6 | 1 |
Drought-annotated plant Expressed Sequence Tags (ESTs)
| BM886962 | ||
| BM887317 | ||
| CD662497 | Lower leaf epidermis | |
| BM369974 | Root | |
| BQ761388 | Root | |
| DV442544 | ||
| DV442765 | ||
| DV443464 | ||
| DV444643 | ||
| DV446035 | ||
| DV446427 | ||
| DV447334 | ||
| DV454753 | ||
| DV455089 | ||
| DV455235 | ||
| DV455909 | ||
| DV456031 | ||
| DV456176 | ||
| DV456576 | ||
| DV456911 | ||
| DV457684 | ||
| BE248764 | Plantlets | |
| BF631735 | Plantlets | |
| BF634145 | Plantlets | |
| BF634785 | Plantlets | |
| CK665047 | Leaf | |
| CA764828 | Panicles | |
| BX680935 | Root | |
| BX682209 | Root | |
| BE604157 | Leaf | |
| BE428779 | Root | |
| BE429106 | Root |
Notes:
Leaf, drought stressed, 1 month old plants, greenhouse grown;
Mature leaf and petiole, young leaf and apical meristem, root, tuber and tuber peel, young leaf and apical meristem midnight;
Young leaf and apical meristem, mature leaf and petiole, root, tuber and tuber peel from water stressed plants.
Drought–responsive Expressed Sequence Tags (ESTs) with microsatellites.
| BE604157 | Leaf | TA53312_4565 | 233072 | CAG | 4 | 810 | 205 | 216 | |
| BM887317 | Leaf, drought stressed, 1 month old plants, greenhouse grown | TA48544_3847 | 815126 | ATA | 4 | 889 | 544 | 555 | |
| BX682209 | Root | TA3030_71647 | 751586 | AT | 5 | 414 | 199 | 208 |