Literature DB >> 17142239

PlantQTL-GE: a database system for identifying candidate genes in rice and Arabidopsis by gene expression and QTL information.

Huazong Zeng1, Lijun Luo, Weixiong Zhang, Jie Zhou, Zuofeng Li, Hongyan Liu, Tiansheng Zhu, Xiangqian Feng, Yang Zhong.   

Abstract

We have designed and implemented a web-based database system, called PlantQTL-GE, to facilitate quantitatine traits locus (QTL) based candidate gene identification and gene function analysis. We collected a large number of genes, gene expression information in microarray data and expressed sequence tags (ESTs) and genetic markers from multiple sources of Oryza sativa and Arabidopsis thaliana. The system integrates these diverse data sources and has a uniform web interface for easy access. It supports QTL queries specifying QTL marker intervals or genomic loci, and displays, on rice or Arabidopsis genome, known genes, microarray data, ESTs and candidate genes and similar putative genes in the other plant. Candidate genes in QTL intervals are further annotated based on matching ESTs, microarray gene expression data and cis-elements in regulatory sequences. The system is freely available at http://www.scbit.org/qtl2gene/new/.

Entities:  

Mesh:

Year:  2006        PMID: 17142239      PMCID: PMC1669735          DOI: 10.1093/nar/gkl814

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Quantitative trait locus (QTL) analysis is an effective method for locating chromosomal regions harboring genetic variants that affect a continuously distributed, polygenic phenotype (1). However, the identification of genes affecting complex traits is one of the most difficult tasks in genetics (2). The primary challenge is to identify and clone the responsive genes underlying a QTL. Recently, several computational methods, including comparative genomics, combined cross analysis, interval-specific and genome-wide haplotype analysis, have been developed for narrowing animal QTLs (3). However, no bioinformatics tool for QTL-based candidate gene identification in plants is currently available. Rice (Oryza sativa) and Arabidopsis thaliana, model plants for monocotyledonous and dicotyledonous species, respectively, have many quantitative traits of agronomic importance and biological significance. Most of these quantitative traits are the results of interactions of multiple genetic variations as well as interactions of genetic variations and environmental factors. In recent years, many quantitative traits in rice and A.thaliana have been discovered by QTL mapping. Specifically, as of the end of 2005, more than 7000 QTLs controlling various complex traits have been located on different chromosome regions in rice (). In practice, the candidate gene approach has been applied in plant genetics in the past for the characterization and cloning of QTLs (4). It is complementary to map-based cloning for identifying the genes placed within the QTL intervals. Using the sequenced and (partially) annotated A.thaliana (5) and rice genomes (6–8), it now becomes feasible to retrieve genomic sequences, select candidate genes within a QTL interval, and analyze gene expression data [i.e. expressed sequence tags (ESTs) and microarray gene expression data] for discovering novel genes in rice and A.thaliana. As a result, the candidate gene approach has been commonly used to select appropriate candidate genes and relevant gene expression data that are possibly associated with particular QTLs. With such refined association information among candidate genes, their expression information and QTLs, further functional analysis of genes and quantitative trait (e.g. linkage analysis) can be effectively carried out. Therefore, it is an important task to develop a bioinformatics platform for candidate gene finding that integrates the information of gene expression and QTLs. To this end, we have designed and developed an integrated database system, called PlantQTL-GE, where GE stands for Genes and Expressions, for identifying candidate genes and searching for relevant gene expression information from microarray gene expression and EST data. This resource provides a novel tool to assist the user to focus on those candidate genes that are restricted to the QTLs of interest. In practice, the user could get a list of candidate genes located in the interested QTL region, and then isolate, map, and characterize these genes through future experiments such as differential display analysis.

DATA SOURCES AND CONTENT

Data sources

Various known functional genes, microarray gene expression data, ESTs, genetic markers, QTLs and candidate genes of rice and Arabidopsis were collected from the literature and various public databases, and organized and maintained in an integrated database in MySQL (Figure 1). Rice QTLs were further classified into seven categories and A.thaliana QTLs into four categories, according to the trait ontology (TO) () in GRAMENE database (9). Then, information of known functional genes related to different trait categories and their corresponding EST and microarray data were extracted from literature. The microarray data included both up and down regulated genes or probes. The ESTs were retrieved from NCBI EST database () by using relevant keywords of traits and classified based on trait ontology. Genetic markers with known genomic positions were retrieved from the marker databases in GRAMENE (release 19) () and TAIR (&type=marker). Gene annotations were retrieved from TIGR () for rice and TAIR () for A.thaliana, respectively. The loci information of a sequence was obtained by aligning it against the genomic sequences using BLAST with an E-value of 1e-05. The release of rice Nipponbare genomic sequences used in the study is IRGSP Build 3.0 () and the release of A.thaliana genomic sequences is TIGR v5 ().
Figure 1

A sketch of the system architecture and data flow of PlantQTL-GE, which integrates information of genomic sequences, genes, gene expression data, ESTs, genetic markers and trait ontology.

A sketch of the system architecture and data flow of PlantQTL-GE, which integrates information of genomic sequences, genes, gene expression data, ESTs, genetic markers and trait ontology. To help select candidate genes, we used known cis-regulatory elements to annotate candidate genes. In particular, we took all 25 conserved cis-elements from the PLACE plant motif database () which are associated with quantitative traits defined in trait ontology mentioned above. These 25 motifs are listed at our website . As of June, 2006, PlantQTL-GE contained 1558 and 1896 known genes, 3633 and 9270 microarray data entries, 883 598 and 162 596 ESTs, 21 523 and 2283 genetic markers, as well as 58 687 and 31 527 annotated genes for rice and Arabidopsis, respectively.

QTL-based candidate gene identification

Since EST and microarray gene expression data as well as cis-elements have been proven to be very useful for identifying candidate genes of interest (10,11), we used them to discover and annotate candidate genes. For the genes within the two genetic markers delineating a QTL of interest, our method executes the following three-step procedure: In Step 1, sequences of length 1000 bp upstream of rice and Arabidopsis candidate genes within the interval of a given QTL are retrieved from GRAMENE and TAIR database, respectively. A motif-scanning method named Motif-finder (12) is then applied to locate the cis-elements in the upstream sequences of targeted genes. The resulting genomic loci and occurrence number of each cis-element in the promoter regions of the genes are stored in a cis-element table in the MySQL database. The candidate genes are subsequently annotated using the identified cis-elements, in Step 2. Specifically, a gene is annotated as plus (+) or minus (−) with respect to a cis-element if its promoter contains the cis-element or it does not, respectively. Such cis-element annotations are stored in the cis-element fields in a gene annotation table. Similarly, in Step 3 the genes are also annotated with respect to gene expression data (i.e. EST and microarray gene expression data) which are relevant to the quantitative trait of consideration. A sequence is annotated as plus (+) with respect to a gene expression if it matches to the corresponding EST or microarray probe, or as minus (−), otherwise. The gene expression annotation information is recorded in the gene expression fields of the annotation table.

Similarity searches of putative genes

We compared the sequences of Arabidopsis genes against the genomic sequences of rice using BLAST with a stringent E-value of 1e-15. We carried out a reciprocal analysis by comparing the sequences of rice genes against the genomic sequences of A.thaliana with the same stringency level. As a result, 2445 Arabidopsis putative genes were mapped onto the rice genome and 4588 rice putative genes were mapped onto the Arabidopsis genome. The results are stored in the MySQL database.

SYSTEM IMPLEMENTATION AND QUERY INTERFACE

We have implemented the PlantQTL–GE system, which is available online at . It has a web-based query interface, supported by a MySQL database with enhanced Perl scripts. The interface, shown in Figure 2A, allows the user to search information about candidate genes and relevant ESTs and microarray gene expression data within a QTL interval in rice or A.thaliana. The following steps should be followed to retrieve genetic information associated with a QTL of interest
Figure 2

(A) The interface of PlantQTL-GE; (B) A snapshot of the result of searching for genetic elements related to a QTL between markers RM526 and RM525, which controls dry-weight of grains.

Select a plant species. Currently, rice and A.thaliana are available. Select a query option, which can be linked genetic markers flanking the targeted QTL interval or chromosomal positions in the selected genome, and then input two markers or the start and stop genomic positions of the markers. Select a trait category and one or more keywords on traits related to the targeted QTL specified by the user. Select an output type, which can be a combination of known genes, microarray data, ESTs, candidate genes, and similar putative genes in the other species. (A) The interface of PlantQTL-GE; (B) A snapshot of the result of searching for genetic elements related to a QTL between markers RM526 and RM525, which controls dry-weight of grains. The result of a query includes all selected genes, microarray and/or ESTs within the targeted QTL region. The entities in the output are also hot linked in that further detailed information of the entities can be obtained by following the corresponding hyperlinks. A snapshot of a case study of searching for genetic elements related to a QTL between markers RM526 and RM525, which controls dry-weight of grains (an abiotic related trait) (13), is shown in Figure 2B. The complete output of this example is available in the supplementary material in our website ().

SYSTEM UPDATE AND FUTURE WORK

We will update PlantQTL-GE database bimonthly to accommodate new information of genes (from the Entrez-Nucleotide database, ), microarray data (from public databases and literature) and ESTs (from the EST database, ). To do so, we have written and tested a Perl script based on the NCBI eUtils tools (). We also plan to expand PlantQTL-GE system to support more plant species, such as Populus trichocarpa () whose whole genome sequence is now available. In addition, we will enhance the cis-element annotation by including motifs identified by a motif-finding method, such as WordSpy (14), on all genes responsive to biotic and abiotic stresses.
  11 in total

Review 1.  The nature and identification of quantitative trait loci: a community's view.

Authors:  Oduola Abiola; Joe M Angel; Philip Avner; Alexander A Bachmanov; John K Belknap; Beth Bennett; Elizabeth P Blankenhorn; David A Blizard; Valerie Bolivar; Gundrun A Brockmann; Kari J Buck; Jean-Francoise Bureau; William L Casley; Elissa J Chesler; James M Cheverud; Gary A Churchill; Melloni Cook; John C Crabbe; Wim E Crusio; Ariel Darvasi; Gerald de Haan; Peter Dermant; R W Doerge; Rosemary W Elliot; Charles R Farber; Lorraine Flaherty; Jonathan Flint; Howard Gershenfeld; John P Gibson; Jing Gu; Weikuan Gu; Heinz Himmelbauer; Robert Hitzemann; Hui-Chen Hsu; Kent Hunter; Fuad F Iraqi; Ritsert C Jansen; Thomas E Johnson; Byron C Jones; Gerd Kempermann; Frank Lammert; Lu Lu; Kenneth F Manly; Douglas B Matthews; Juan F Medrano; Margarete Mehrabian; Guy Mittlemann; Beverly A Mock; Jeffrey S Mogil; Xavier Montagutelli; Grant Morahan; John D Mountz; Hiroki Nagase; Richard S Nowakowski; Bruce F O'Hara; Alexander V Osadchuk; Beverly Paigen; Abraham A Palmer; Jeremy L Peirce; Daniel Pomp; Michael Rosemann; Glenn D Rosen; Leonard C Schalkwyk; Ze'ev Seltzer; Stephen Settle; Kazuhiro Shimomura; Siming Shou; James M Sikela; Linda D Siracusa; Jimmy L Spearow; Cory Teuscher; David W Threadgill; Linda A Toth; Ayo A Toye; Csaba Vadasz; Gary Van Zant; Edward Wakeland; Robert W Williams; Huang-Ge Zhang; Fei Zou
Journal:  Nat Rev Genet       Date:  2003-11       Impact factor: 53.242

Review 2.  Bioinformatics toolbox for narrowing rodent quantitative trait loci.

Authors:  Keith DiPetrillo; Xiaosong Wang; Ioannis M Stylianou; Beverly Paigen
Journal:  Trends Genet       Date:  2005-10-13       Impact factor: 11.639

3.  Gene discovery in dbEST.

Authors:  M S Boguski; C M Tolstoshev; D E Bassett
Journal:  Science       Date:  1994-09-30       Impact factor: 47.728

4.  Analysis of the genome sequence of the flowering plant Arabidopsis thaliana.

Authors: 
Journal:  Nature       Date:  2000-12-14       Impact factor: 49.962

5.  Cis-regulatory element based targeted gene finding: genome-wide identification of abscisic acid- and abiotic stress-responsive genes in Arabidopsis thaliana.

Authors:  Weixiong Zhang; Jianhua Ruan; Tuan-Hua David Ho; Youngsook You; Taotao Yu; Ralph S Quatrano
Journal:  Bioinformatics       Date:  2005-05-12       Impact factor: 6.937

6.  A draft sequence of the rice genome (Oryza sativa L. ssp. indica).

Authors:  Jun Yu; Songnian Hu; Jun Wang; Gane Ka-Shu Wong; Songgang Li; Bin Liu; Yajun Deng; Li Dai; Yan Zhou; Xiuqing Zhang; Mengliang Cao; Jing Liu; Jiandong Sun; Jiabin Tang; Yanjiong Chen; Xiaobing Huang; Wei Lin; Chen Ye; Wei Tong; Lijuan Cong; Jianing Geng; Yujun Han; Lin Li; Wei Li; Guangqiang Hu; Xiangang Huang; Wenjie Li; Jian Li; Zhanwei Liu; Long Li; Jianping Liu; Qiuhui Qi; Jinsong Liu; Li Li; Tao Li; Xuegang Wang; Hong Lu; Tingting Wu; Miao Zhu; Peixiang Ni; Hua Han; Wei Dong; Xiaoyu Ren; Xiaoli Feng; Peng Cui; Xianran Li; Hao Wang; Xin Xu; Wenxue Zhai; Zhao Xu; Jinsong Zhang; Sijie He; Jianguo Zhang; Jichen Xu; Kunlin Zhang; Xianwu Zheng; Jianhai Dong; Wanyong Zeng; Lin Tao; Jia Ye; Jun Tan; Xide Ren; Xuewei Chen; Jun He; Daofeng Liu; Wei Tian; Chaoguang Tian; Hongai Xia; Qiyu Bao; Gang Li; Hui Gao; Ting Cao; Juan Wang; Wenming Zhao; Ping Li; Wei Chen; Xudong Wang; Yong Zhang; Jianfei Hu; Jing Wang; Song Liu; Jian Yang; Guangyu Zhang; Yuqing Xiong; Zhijie Li; Long Mao; Chengshu Zhou; Zhen Zhu; Runsheng Chen; Bailin Hao; Weimou Zheng; Shouyi Chen; Wei Guo; Guojie Li; Siqi Liu; Ming Tao; Jian Wang; Lihuang Zhu; Longping Yuan; Huanming Yang
Journal:  Science       Date:  2002-04-05       Impact factor: 47.728

7.  A draft sequence of the rice genome (Oryza sativa L. ssp. japonica).

Authors:  Stephen A Goff; Darrell Ricke; Tien-Hung Lan; Gernot Presting; Ronglin Wang; Molly Dunn; Jane Glazebrook; Allen Sessions; Paul Oeller; Hemant Varma; David Hadley; Don Hutchison; Chris Martin; Fumiaki Katagiri; B Markus Lange; Todd Moughamer; Yu Xia; Paul Budworth; Jingping Zhong; Trini Miguel; Uta Paszkowski; Shiping Zhang; Michelle Colbert; Wei-lin Sun; Lili Chen; Bret Cooper; Sylvia Park; Todd Charles Wood; Long Mao; Peter Quail; Rod Wing; Ralph Dean; Yeisoo Yu; Andrey Zharkikh; Richard Shen; Sudhir Sahasrabudhe; Alun Thomas; Rob Cannings; Alexander Gutin; Dmitry Pruss; Julia Reid; Sean Tavtigian; Jeff Mitchell; Glenn Eldredge; Terri Scholl; Rose Mary Miller; Satish Bhatnagar; Nils Adey; Todd Rubano; Nadeem Tusneem; Rosann Robinson; Jane Feldhaus; Teresita Macalma; Arnold Oliphant; Steven Briggs
Journal:  Science       Date:  2002-04-05       Impact factor: 47.728

Review 8.  Searching for genetic determinants in the new millennium.

Authors:  N J Risch
Journal:  Nature       Date:  2000-06-15       Impact factor: 49.962

9.  A steganalysis-based approach to comprehensive identification and characterization of functional regulatory elements.

Authors:  Guandong Wang; Weixiong Zhang
Journal:  Genome Biol       Date:  2006       Impact factor: 13.583

10.  Gramene: development and integration of trait and gene ontologies for rice.

Authors:  Pankaj Jaiswal; Doreen Ware; Junjian Ni; Kuan Chang; Wei Zhao; Steven Schmidt; Xiaokang Pan; Kenneth Clark; Leonid Teytelman; Samuel Cartinhour; Lincoln Stein; Susan McCouch
Journal:  Comp Funct Genomics       Date:  2002
View more
  9 in total

1.  Genome-wide targeted prediction of ABA responsive genes in rice based on over-represented cis-motif in co-expressed genes.

Authors:  Sangram K Lenka; Bikash Lohia; Abhay Kumar; Viswanathan Chinnusamy; Kailash C Bansal
Journal:  Plant Mol Biol       Date:  2008-11-08       Impact factor: 4.076

2.  XGAP: a uniform and extensible data model and software platform for genotype and phenotype experiments.

Authors:  Morris A Swertz; K Joeri van der Velde; Bruno M Tesson; Richard A Scheltema; Danny Arends; Gonzalo Vera; Rudi Alberts; Martijn Dijkstra; Paul Schofield; Klaus Schughart; John M Hancock; Damian Smedley; Katy Wolstencroft; Carole Goble; Engbert O de Brock; Andrew R Jones; Helen E Parkinson; Ritsert C Jansen
Journal:  Genome Biol       Date:  2010-03-09       Impact factor: 13.583

3.  Genome-wide classification and expression analysis of MYB transcription factor families in rice and Arabidopsis.

Authors:  Amit Katiyar; Shuchi Smita; Sangram Keshari Lenka; Ravi Rajwanshi; Viswanathan Chinnusamy; Kailash Chander Bansal
Journal:  BMC Genomics       Date:  2012-10-10       Impact factor: 3.969

4.  OsGRAS23, a rice GRAS transcription factor gene, is involved in drought stress response through regulating expression of stress-responsive genes.

Authors:  Kai Xu; Shoujun Chen; Tianfei Li; Xiaosong Ma; Xiaohua Liang; Xuefeng Ding; Hongyan Liu; Lijun Luo
Journal:  BMC Plant Biol       Date:  2015-06-13       Impact factor: 4.215

5.  Dr. Yang Zhong: An explorer on the road forever.

Authors:  Fan Chen; Bao-Rong Lu; James C Crabbe; Jia-Yuan Zhao; Bo-Jian Zhong; Yu-Peng Geng; Yu-Fang Zheng; Hong-Yan Wang
Journal:  Protein Cell       Date:  2018-02       Impact factor: 14.870

6.  PlantGM: a database for genetic markers in rice (Oryza sativa) and Chinese cabbage (Brassica rapa).

Authors:  Chang Kug Kim; Jung Sun Kim; Gang Seob Lee; Beom Seok Park; Jang Ho Hahn
Journal:  Bioinformation       Date:  2008-10-17

7.  The Sorghum QTL Atlas: a powerful tool for trait dissection, comparative genomics and crop improvement.

Authors:  Emma Mace; David Innes; Colleen Hunt; Xuemin Wang; Yongfu Tao; Jared Baxter; Michael Hassall; Adrian Hathorn; David Jordan
Journal:  Theor Appl Genet       Date:  2018-10-20       Impact factor: 5.699

8.  Rice molecular breeding laboratories in the genomics era: Current status and future considerations.

Authors:  Bert C Y Collard; Casiana M Vera Cruz; Kenneth L McNally; Parminder S Virk; David J Mackill
Journal:  Int J Plant Genomics       Date:  2008

9.  Screening of candidate genes and fine mapping of drought tolerance quantitative trait loci on chromosome 4 in rice (Oryza sativa L.) under drought stress.

Authors:  Yuan-Yuan Nie; Lin Zhang; Yun-Hua Wu; Hao-Jie Liu; Wei-Wei Mao; Juan Du; Hai-Lin Xiu; Xiao-Yu Wu; Xia Li; Yu-Wei Yan; Guo-Lan Liu; Hong-Yan Liu; Song-Ping Hu
Journal:  Ecol Evol       Date:  2015-10-15       Impact factor: 2.912

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.