Literature DB >> 17597893

A database of annotated tentative orthologs from crop abiotic stress transcripts.

Jayashree Balaji1, Jonathan H Crouch, Prasad V N S Petite, David A Hoisington.   

Abstract

A minimal requirement to initiate a comparative genomics study on plant responses to abiotic stresses is a dataset of orthologous sequences. The availability of a large amount of sequence information, including those derived from stress cDNA libraries allow for the identification of stress related genes and orthologs associated with the stress response. Orthologous sequences serve as tools to explore genes and their relationships across species. For this purpose, ESTs from stress cDNA libraries across 16 crop species including 6 important cereal crops and 10 dicots were systematically collated and subjected to bioinformatics analysis such as clustering, grouping of tentative orthologous sets, identification of protein motifs/patterns in the predicted protein sequence, and annotation with stress conditions, tissue/library source and putative function. All data are available to the scientific community at http://intranet.icrisat.org/gt1/tog/homepage.htm. We believe that the availability of annotated plant abiotic stress ortholog sets will be a valuable resource for researchers studying the biology of environmental stresses in plant systems, molecular evolution and genomics.

Entities:  

Year:  2006        PMID: 17597893      PMCID: PMC1891691     

Source DB:  PubMed          Journal:  Bioinformation        ISSN: 0973-2063


Background

Integrated approaches to the study of abiotic stress response in plants are important especially since drought and salinity stress are primary reasons for crop losses worldwide. The study of stress response pathways includes analysis of information from stress related metabolic and physiological changes, comparative genomics, gene expression studies and structural, and functional data of stress proteins. Plants have stress specific adaptive responses as well as responses which protect the plants from more than one environmental stress. Multiple stress perception and signaling pathways exist - some specific; others may cross talk at various steps. [1,2] Identification of genes related to stress is an important aspect in the study of plant response to abiotic stress. A minimal requirement to initiate a comparative genomics study across abiotic stress conditions is a dataset of orthologs. The availability of a large amount of sequence information, especially that derived from cDNA libraries in response to abiotic stress allows for the generation of a putative list of candidate genes using the orthologs approach. Orthologs are genes in different species that have evolved from a common ancestral gene by speciation and generally retain an equivalent or similar function in the course of evolution. A high degree of sequence conservation across species and the availability of partial gene sequence data led to the development of comprehensive orthologous gene alignment such as the TOGA (tentative orthologous gene alignments from EST datasets) [3] and the COG (clusters of orthologous groups of proteins) databases. [4,5] The TOGA database currently contains 25 plant species while fewer plant species are represented in the COG database. We report here the generation and availability of tentative orthologous annotated datasets for 16 economically important crop species that are vulnerable to the abiotic stresses of heat, dehydration, cold and salt; and for which ESTs generated from stress cDNA libraries are available in the public domain. The aim in building the dataset is to provide users with a catalogue of annotated sequences associated with abiotic stress, identify elements common to all conditions from those that differ, identify categories of functions that are affected under stress conditions and provide users with a list of genes that have the highest representation across tentative orthologous sets.

Methodology

Dataset

Sequences derived from cDNA libraries generated from tissues subject to heat, dehydration, salt and cold stress from sixteen crop species were used to construct the database. The sequences were downloaded from TIGR [6], NCBI [7] in 2003 and updated in June 2005.

Bioinformatics analysis

The sequences were assembled into contigs and singletons crop-wise using a parallelized version of cap3 [8 ] on a paracel HPC. To construct tentative ortholog sets, each species-specific dataset consisting of contigs and singletons was Blast searched against every other dataset using Blastn (standalone BLAST version 2.2.6). If a reciprocal best-hit (RBH) relationship between these sequences was revealed, then the reciprocal best hits formed a tentative ortholog set. An additional constraint was that each set must comprise sequences from at least three crop species. Scripts were written in Visual Basic to search and assemble tentative ortholog sets after the Blast searches. Sequences were searched for microsatellite markers using the tool SSRIT. [9] Sequences in each dataset were translated and searched for protein motifs/patterns against the Prosite database of protein families and domains. All datasets were searched against the species specific plant repeats database [10] and hits with an e-value < 1e-5 and an alignment of over 30% of length of query sequence were annotated as repeats. Tentative functional descriptions for the remainder of the sequences were retrieved from each of the databases. These annotations were classified under the 28 functional categories described in the MIPS Functional catalogue Funcat. [ 11] Scripts written in Java were used to carry out this classification. Multiple sequence alignments have been built using ClustalW (version 1.83).

Database and GUI

The data is housed in a relational database on the MSSQL server 2000. The database GUI has been developed using Active Server Pages (ASP).

Utility

The database provides a collection of annotated tentative orthologous sequences from sixteen crop species (Table 1) across four abiotic stress conditions (Table 2). The suite of user interfaces (Figure 1) allow the user to browse the database and query for: (a) annotated transcripts that are expressed across stress conditions, (b) transcripts with microsatellites that could be used as conserved functional markers, (c) conserved hypothetical genes that have orthologs in many other species but for which no function has been determined, and (d) ortholog sets with sequence alignment based on annotation, stress conditions or cluster size. The availability of this dataset is a useful resource for researchers studying the biology and genomics of stress response in plants and in the molecular evolution of genes involved in the stress response.
Table 1

Coverage of monocot and dicot stress related sequences

SpeciesNumber of stress librariesESTsNumber of clusters (singletons + contigs)ESTs in orthologous setsClusters in orthologous sets
Wheat28201301103783942806
Maize19214391019492923032
Rice1013784812848901939
Barley812414731559762403
Sorghum53759013815168283321
Pearl millet319451443824464
Rye21351945938594
Arabidopsis3718637103623675984
Common bean1141220625997
Tomato6901637419275
Soybean4182361036351031571
Cowpea338371414
Groundnut2860679356266
Potato2171274
Chickpea1358565519
Medicago1829451402444976
Total142156406803695947418765
Table 2

Number of ortholog sets sharing sequences across stress conditions

Stress ConditionNumber of tentative ortholog sets
Heat + Cold91
Heat + Dehydration1171
Heat + Salt69
Cold + Dehydration6851
Cold + Salt348
Dehydration + Salt3304
Heat + Cold + Dehydration2105
Heat + Dehydration + Salt2323
Cold + Dehydration + Salt10416
Heat + Cold + Salt371
Heat + Cold + Salt + Dehydration8390
Figure 1

Screen captures of the database GUI. (A) Home page, (B) plant species covered in the current version of the database, (C - H) query pages

Future development

We routinely update and expand the database and analyses as additional sequence data becomes available; annotate sequence data with experimental information on candidate genes; and provide users with a reliability score for the ortholog sets constructed along with an analysis of orthologs developed using alternative algorithms.
  7 in total

1.  PCAP: a whole-genome assembly program.

Authors:  Xiaoqiu Huang; Jianmin Wang; Srinivas Aluru; Shiaw-Pyng Yang; LaDeana Hillier
Journal:  Genome Res       Date:  2003-09       Impact factor: 9.043

2.  The COG database: a tool for genome-scale analysis of protein functions and evolution.

Authors:  R L Tatusov; M Y Galperin; D A Natale; E V Koonin
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

3.  Computational and experimental analysis of microsatellites in rice (Oryza sativa L.): frequency, length variation, transposon associations, and genetic marker potential.

Authors:  S Temnykh; G DeClerck; A Lukashova; L Lipovich; S Cartinhour; S McCouch
Journal:  Genome Res       Date:  2001-08       Impact factor: 9.043

4.  Cross-referencing eukaryotic genomes: TIGR Orthologous Gene Alignments (TOGA).

Authors:  Yuandan Lee; Razvan Sultana; Geo Pertea; Jennifer Cho; Svetlana Karamycheva; Jennifer Tsai; Babak Parvizi; Foo Cheung; Valentin Antonescu; Joseph White; Ingeborg Holt; Feng Liang; John Quackenbush
Journal:  Genome Res       Date:  2002-03       Impact factor: 9.043

Review 5.  Molecular genetic perspectives on cross-talk and specificity in abiotic stress signalling in plants.

Authors:  Viswanathan Chinnusamy; Karen Schumaker; Jian-Kang Zhu
Journal:  J Exp Bot       Date:  2003-12-12       Impact factor: 6.992

6.  Monitoring expression profiles of rice genes under cold, drought, and high-salinity stresses and abscisic acid application using cDNA microarray and RNA gel-blot analyses.

Authors:  M Ashiq Rabbani; Kyonoshin Maruyama; Hiroshi Abe; M Ayub Khan; Koji Katsura; Yusuke Ito; Kyoko Yoshiwara; Motoaki Seki; Kazuo Shinozaki; Kazuko Yamaguchi-Shinozaki
Journal:  Plant Physiol       Date:  2003-11-26       Impact factor: 8.340

7.  The COG database: an updated version includes eukaryotes.

Authors:  Roman L Tatusov; Natalie D Fedorova; John D Jackson; Aviva R Jacobs; Boris Kiryutin; Eugene V Koonin; Dmitri M Krylov; Raja Mazumder; Sergei L Mekhedov; Anastasia N Nikolskaya; B Sridhar Rao; Sergei Smirnov; Alexander V Sverdlov; Sona Vasudevan; Yuri I Wolf; Jodie J Yin; Darren A Natale
Journal:  BMC Bioinformatics       Date:  2003-09-11       Impact factor: 3.169

  7 in total
  1 in total

1.  RiceSRTFDB: a database of rice transcription factors containing comprehensive expression, cis-regulatory element and mutant information to facilitate gene function analysis.

Authors:  Pushp Priya; Mukesh Jain
Journal:  Database (Oxford)       Date:  2013-05-09       Impact factor: 3.451

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.