Literature DB >> 16381969

OryGenesDB: a database for rice reverse genetics.

G Droc¹, M Ruiz, P Larmande, A Pereira, P Piffanelli, J B Morel, A Dievart, B Courtois, E Guiderdoni, C Périn.

Abstract

Insertional mutant databases containing Flanking Sequence Tags (FSTs) are becoming key resources for plant functional genomics. We have developed OryGenesDB (http://orygenesdb.cirad.fr/), a database dedicated to rice reverse genetics. Insertion mutants of rice genes are catalogued by Flanking Sequence Tag (FST) information that can be readily accessed by this database. Our database presently contains 44166 FSTs generated by most of the rice insertional mutagenesis projects. The OryGenesDB genome browser is based on the powerful Generic Genome Browser (GGB) developed in the framework of the Generic Model Organism Project (GMOD). The main interface of our web site displays search and analysis interfaces to look for insertions in any candidate gene of interest. Several starting points can be used to exhaustively retrieve the insertions positions and associated genomic information using blast, keywords or gene name search. The toolbox integrated in our database also includes an 'anchoring' option that allows immediate mapping and visualization of up to 50 nucleic acid sequences in the rice Genome Browser of OryGenesDB. As a first step toward plant comparative genomics, we have linked the rice and Arabidopsis whole genome using all the predicted pairs of orthologs by best BLAST mutual hit (BBMH) connectors.

Entities: Disease Species

Mesh：

Substances：
Plant Proteins

Year: 2006 PMID： 16381969 PMCID： PMC1347375 DOI： 10.1093/nar/gkj012

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Rice (Oryza sativa L.) has emerged as a model plant for cereal genomics particularly because of its compact genome (389 Mb) (1), the smallest among graminaceous crops, and the availability of vast genetic and molecular resources. Comparative genomics, notably in cereals (2), offers additional clues to the function of candidate sequences by allowing the reciprocal transfer of information accumulated in other related species to and from rice. The high quality, full length sequence of the 12 chromosomes of rice has been recently completed by the International Rice Genomics Sequencing Consortium (1) and independent automated annotations have revealed an unsuspected wealth of predicted genes, merely half of which have a clearly identified homolog in the Arabidopsis genome. The rice scientific community, which is organized around an International Rice Functional Genomics Consortium, is now facing the challenge of determining the function of most of the rice genes in the next decade. Reverse genetics, which provides a link between a candidate gene and a phenotype, represents the most straightforward experimental strategy to assign a biochemical, cellular, developmental or adaptive role to these sequences (3). To facilitate the implementation of reverse genetics strategies for gene validation in rice and other cereals and integrate most of the available resources in a convenient platform, we developed OryGenesDB. OryGenesDB is a web accessible, user-friendly navigation tool based on the rice genome sequence originally created to readily retrieve in a single step all the publicly available features around a sequence query. This includes the positional information of flanking sequence tags (FSTs) of insertional mutagens, generated by the ongoing sequence cataloguing effort of rice insertional mutagenesis libraries (4).

MATERIALS AND METHODS

Programming and database implementation

OryGenesDB is comprised of web pages delivered by an apache HTTP server () and the server integrates Perl CGI (5) to dynamically produce web pages based upon user's input. The interface is generated by custom Perl code (6) that increasingly incorporates object-oriented coding practices to improve extensibility and re-usability of the individual software components. Bioperl (7) is used for specific tasks, such as parsing the Genbank files or TIGR XML files containing the O.sativa annotation. The interface interacts with a customized database backend utilizing Structured Query Language (SQL) and the open source MySQL () database engine. We also integrated in OryGenesDB a Genome Browser (GBrowse) based on the Generic Genome Browser developed by Generic Model Organism Project (GMOD) (8). GBrowse is a web-based application for displaying genomic annotations and other features. GBrowse is freely available under an open source licence ().

Database content

We have developed an automated pipeline to incorporate the data fields derived with public data from genome assembly and annotation projects. The workflow of the pipeline is displayed at and described below.

Genomic data

The reference annotation layer consists of the 12 rice pseudo-chromosomes released by the Institute for Genomic Research (TIGR) by the division of the Rice Genome Annotation Database and Resource (). The rice genome and its annotation were downloaded from the TIGR FTP site. We also superimposed the Genbank annotations from IRGSP (9) on the TIGR annotations. The annotation data were assembled onto the pseudo-chromosomes and each feature was given new coordinates related to the newly assembled sequence. Dedicated Perl scripts were written to retrieve various attributes of genome features, in particular the gene names, positions in the pseudomolecules and description for all putative proteins. This information was then used as input files into our database. We have identified 10 679 putative pairs of orthologs between O.sativa and Arabidopsis thaliana (E-value cut off 0.1) using the Best Blast Mutual Hit (BBMH) strategy. The protein datasets all.pep and ATH1.pep files for rice (release 3.0) and Arabidopsis (release 5.0), respectively, were downloaded from TIGR for ortholog prediction. A reduced dataset of Arabidopsis annotation (Version 5.0) and all the predicted orthologs were added in rice and Arabidopsis genome navigators in OryGenesDB to shuttle back and forth between both genomes. From the FTP site of the Knowledge-based Oryza Molecular Biological Encyclopaedia () we downloaded the distinct tabular text files (10) and mapped 19 552 unique rice full length cDNAs onto the 12 pseudo-chromosomes using KOME cDNA BAC mapping information. The TIGR Gene Indices () is a collection of species-specific databases that use a highly refined protocol to analyze expressed sequence tag (11). We mapped the Tentative Consensus sequences from rice, maize, wheat, barley and sorghum on rice pseudo-chromosomes using BLASTN (12) with a cut off of 1e−10. A similar procedure was followed to integrate data coming from the Rice Expression Database () (13).

Flanking sequence tag mapping

OryGenesDB contains data generated by our group such as T-DNA and Ds FSTs deriving from the genomics initiative Génoplante (14) () and the European consortium Cereal Gene Tags (15). In addition, all of the public FSTs information available from other groups was also integrated (Table 1). The flanking sequences were aligned against the rice pseudo-chromosomes using BLASTN with a cut off of 1e−10. If a flanking sequence had several hits in the rice genome, we chose the hit with the highest E-value. The mapped insertions were then assigned to BACs and genes using TIGR genome annotations. A gene was defined as beginning 800 bp before the ATG and ending at the 3′-untranslated region (3′-UTR).

Table 1

Rice insertion resources integrated in OryGenesDB

Institution	Mutagen	Source	No. of flanking sequences	No. of mapped sequences
CIRAD-INRA-IRD-CNRS, Genoplante	T-DNA		7480	7140
CerealGene Tags, European Union	Ac/Ds	(15)	1381	1381
National Institute of Agrobiological Sciences	Tos17		18 024	17 933
Zhejiang University	T-DNA		1017	917
National University of Singapore	Ac/Ds		1469	1434
Postech	T-DNA		11 741	11 741
PMBBRC	Ac/Ds		1072	1040
University of California at Davis	Ac/Ds		1191	1170

RESULTS

GBrowse interface

The main interface of OryGenesDB is based on GBrowse, the Generic Genome Browser (8), developed in the framework of the GMOD consortium. GBrowse is a powerful web-based application for displaying genomic annotations and other features, suitable for any genomes. At the top of the browser a graphical representation of the 12 rice chromosomes was added. Clicking on a given chromosome allows the user to quickly access to a specific chromosomal region. To select a more specific region of the genome, the user enters its reference in the text field labeled ‘Landmark or Region’ (e.g. chromosome name, gene name, clone name or accession). GBrowse then fetches the region of the genome specified by the user's search criteria. Once a genomic region is displayed, the user can navigate through it using a navigation tool bar which allow scrolling and zooming using different scales ranging from base pairs to mega base pairs. Indeed, GBrowse provides a navigation bar which allow scrolling and zooming through arbitrary regions of the genome. Moreover, to get more information on a specific feature, landmarks on each track contain additional links (Figure 1). This could be a page on the browser's website (e.g. IRGSP genes), or a page on an external website (e.g. NCBI link for the T-DNA insertion lines) or a contextual pop-up window.

Figure 1

Insertions within Os01g09550.1. Predicted gene Os01g09550.1, cDNA 001-037-H07 and Arabidopsis ortholog At4g29230.1 overlie flanking sequence data. Red, green and blue colors represent Tos17, T-DNA and Ac/Ds insertions, respectively. A contextual popup is open each time the mouse moves under a feature and links or feature-specific actions can be activated. See () for a tutorial and a more complete description. The text file containing the mapped sequences can be modified and is directly integrated in OryGenesDB using the GBrowse displaying third party tool. A list of clickable annotated landmarks is displayed and the mapping feature can be visualized as a glyph in the annotation layer (QUERY). See () for a tutorial and a more complete description.

ORYGENESDB QUERY INTERFACE

‘Gene Search’

‘Gene Search’ allows retrieving a set of gene/cDNA information with a keyword search through feature annotations. All the matched FSTs inside or in the promoter region of the selected genes are displayed. The user may enter one or several terms in the search text box to restrict the search, including boolean operators (AND, OR and NOT). The ‘Flanking Sequence Tag’ form enables the user to select the insertion type (T-DNA, Tos17, Ds, Ac or all), ‘Gene Annotations’ searches keyword terms in text annotation against one or several databases (TIGR, IRGSP, full length cDNA or all), ‘Region’ restricts the search for FSTs in a specific gene region (promoter, gene, 3′-UTR or all) and ‘Orientation’ restricts the search in the forward, reverse or both gene orientations. All the results are displayed as a table with details on FSTs, their positions, associated features and a direct link to the rice genome browser. The output table can be downloaded as an Excel™ formatted file.

‘Domain Search’

In the ‘Domain Search’ interface, the genes are selected according to one or more PFAM domains (16). The type of flanking insertion elements, the gene region of insertion and the gene orientation can be used to restrict the search as in the ‘Gene Search’.

‘BLAST Search’

A FST search in a specific gene can also be done by searching the O.sativa genome with a set of query sequences. The user can cut and paste, or type a sequence into the large text window, or upload a file containing nucleotide or protein sequences in FASTA format. By running NCBI-BLAST (12,17) against the DNA or protein database of O.sativa, the user can retrieve a specific sequence and localize nearby FSTs. The resulting pages will display the gene and related FST information. Information resulting from these searches is retrieved as tables which can be downloaded as Excel™ files (Download Result). A link to the output BLAST alignments is also available (View Alignment). The output tables display details on the BLAST matches (hit name, expect value, percentage of similarity and so on). When the selected database is TIGR, the output table contains details on the corresponding FSTs, their positions, a link to the Genome Browser and external related links, and the resulting sequences can be downloaded in FASTA format. The sequence query matches can be downloaded in GFF format to be used as personal annotations in GBrowse.

‘Adding annotations to the Rice Genome’

Personal FASTA sequences can be uploaded and viewed in the context of the rice or Arabidopsis genomes. Web forms allow to submit up to 50 nucleic FASTA sequences that can be searched (by BLASTN) on rice or Arabidopsis BAC/PAC sequences. The BLAST results are converted in GFF format and integrated directly in GBrowse (Figure 1).

‘Locus Search’

This tool was designed to facilitate the batch retrieval of FSTs, for a given set of genes, cDNAs, proteins, transcripts or any other feature contained in OryGenesDB. Using this tool the user can in one single query retrieve efficiently and exhaustively all matching FSTs.

DISCUSSION

OryGenesDB is a genome database developed for reverse genetics in rice. It is a repository to store insertion data produced by our laboratory (14) and some other related transposon/T-DNA/retrotransposon insertion lines projects [for a review see (4)]. Most of the rice insertional projects were first developed as institutional databases containing FSTs and few basic tools to query FSTs. Few of them, namely TIGR, RiceGE, Flagdb++ and Gramene integrate a Genome Browser and contain FSTs from several labs. OryGenesDB to date represents the most populated database with >44 166 FSTs and, with RiceGE, is the only one specifically dedicated to rice reverse genetics. OryGenesDB comes with a dedicated toolbox specifically designed for reverse genetics applications. The set of user friendly interfaces, complementary to GGB tools, developed for OryGenesDB provides the user with several powerful ways to search for insertions in candidate genes. This ensures to search, as exhaustively as possible, for all insertions and their positions for a given list of candidates. As output, an Excel™ formatted file containing all insertions with their position, the matching features and a hyperlink to GGB navigator can be downloaded making them immediately available for further analyses. Raw sequences can also be directly mapped to the rice pseudomolecules using the ‘adding annotation tool’. A multi-FASTA file is uploaded and a BLASTN is performed against the whole rice genome. This tool generates the GFF code using the BLAST output and matching sequences can be mapped using the GGB third party annotation tool. Several further developments of OryGenesDB are underway. We will first continue to add new sequenced insertions publicly available from our laboratory or from other international projects. An ongoing project aims to generate a whole proteome ortholog prediction between A.thaliana and O.sativa by phylogenomics and integrate both genome ortholog predictions in OryGenesDB. This approach will be extended later to other plants of agronomic interest. As a first step in that direction, we added 10 679 pairs of orthologs predicted by BBMH between rice and A.thaliana and linked our Rice navigator with an A.thaliana navigator in OryGenesDB through these putative orthologs showing the potential of OryGenesDB for plant comparative genomics. OryGenesDB is now the fastest growing database among those dedicated to rice reverse genetics and will greatly help to take up the next challenge of determining the function of most of the rice genes in the next decade.

14 in total

Review 1. Comparative genomics in the grass family: molecular characterization of grass genome structure and evolution.

Authors: Catherine Feuillet; Beat Keller
Journal: Ann Bot Date: 2002-01 Impact factor: 4.357

2. The generic genome browser: a building block for a model organism system database.

Authors: Lincoln D Stein; Christopher Mungall; ShengQiang Shu; Michael Caudy; Marco Mangone; Allen Day; Elizabeth Nickerson; Jason E Stajich; Todd W Harris; Adrian Arva; Suzanna Lewis
Journal: Genome Res Date: 2002-10 Impact factor: 9.043

3. The Bioperl toolkit: Perl modules for the life sciences.

Authors: Jason E Stajich; David Block; Kris Boulez; Steven E Brenner; Stephen A Chervitz; Chris Dagdigian; Georg Fuellen; James G R Gilbert; Ian Korf; Hilmar Lapp; Heikki Lehväslaiho; Chad Matsalla; Chris J Mungall; Brian I Osborne; Matthew R Pocock; Peter Schattner; Martin Senger; Lincoln D Stein; Elia Stupka; Mark D Wilkinson; Ewan Birney
Journal: Genome Res Date: 2002-10 Impact factor: 9.043

4. High throughput T-DNA insertion mutagenesis in rice: a first step towards in silico reverse genetics.

Authors: Christophe Sallaud; Céline Gay; Pierre Larmande; Martine Bès; Pietro Piffanelli; Benoit Piégu; Gaétan Droc; Farid Regad; Emmanuelle Bourgeois; Donaldo Meynard; Christophe Périn; Xavier Sabau; Alain Ghesquière; Jean Christophe Glaszmann; Michel Delseny; Emmanuel Guiderdoni
Journal: Plant J Date: 2004-08 Impact factor: 6.417

5. Basic local alignment search tool.

Authors: S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal: J Mol Biol Date: 1990-10-05 Impact factor: 5.469

6. EU-OSTID: a collection of transposon insertional mutants for functional genomics in rice.

Authors: L J G van Enckevort; Gaëtan Droc; Pietro Piffanelli; Raffaella Greco; Cyril Gagneur; Christele Weber; Víctor M González; Pere Cabot; Fabio Fornara; Stefano Berri; Berta Miro; Ping Lan; Marta Rafel; Teresa Capell; Pere Puigdomènech; Pieter B F Ouwerkerk; Annemarie H Meijer; Enrico Pe'; Lucia Colombo; Paul Christou; Emmanuel Guiderdoni; Andy Pereira
Journal: Plant Mol Biol Date: 2005-09 Impact factor: 4.076

7. Collection, mapping, and annotation of over 28,000 cDNA clones from japonica rice.

Authors: Shoshi Kikuchi; Kouji Satoh; Toshifumi Nagata; Nobuyuki Kawagashira; Koji Doi; Naoki Kishimoto; Junshi Yazaki; Masahiro Ishikawa; Hitomi Yamada; Hisako Ooka; Isamu Hotta; Keiichi Kojima; Takahiro Namiki; Eisuke Ohneda; Wataru Yahagi; Kohji Suzuki; Chao Jie Li; Kenji Ohtsuki; Toru Shishiki; Yasuhiro Otomo; Kazuo Murakami; Yoshiharu Iida; Sumio Sugano; Tatsuto Fujimura; Yutaka Suzuki; Yuki Tsunoda; Takashi Kurosaki; Takeko Kodama; Hiromi Masuda; Michie Kobayashi; Quihong Xie; Min Lu; Ryuya Narikawa; Akio Sugiyama; Kouichi Mizuno; Satoko Yokomizo; Junko Niikura; Rieko Ikeda; Junya Ishibiki; Midori Kawamata; Akemi Yoshimura; Junichirou Miura; Takahiro Kusumegi; Mitsuru Oka; Risa Ryu; Mariko Ueda; Kenichi Matsubara; Jun Kawai; Piero Carninci; Jun Adachi; Katsunori Aizawa; Takahiro Arakawa; Shiro Fukuda; Ayako Hara; Wataru Hashizume; Norihito Hayatsu; Koichi Imotani; Yoshiyuki Ishii; Masayoshi Itoh; Ikuko Kagawa; Shinji Kondo; Hideaki Konno; Ai Miyazaki; Naoki Osato; Yoshimi Ota; Rintaro Saito; Daisuke Sasaki; Kenjiro Sato; Kazuhiro Shibata; Akira Shinagawa; Toshiyuki Shiraki; Masayasu Yoshino; Yoshihide Hayashizaki; Ayako Yasunishi
Journal: Science Date: 2003-07-18 Impact factor: 47.728

8. Rice mutant resources for gene discovery.

Authors: Hirohiko Hirochika; Emmanuel Guiderdoni; Gynheung An; Yue-Ie Hsing; Moo Young Eun; Chang-Deok Han; Narayana Upadhyaya; Srinivasan Ramachandran; Qifa Zhang; Andy Pereira; Venkatesan Sundaresan; Hei Leung
Journal: Plant Mol Biol Date: 2004-02 Impact factor: 4.076

9. The TIGR rice genome annotation resource: annotating the rice genome and creating resources for plant biologists.

Authors: Qiaoping Yuan; Shu Ouyang; Jia Liu; Bernard Suh; Foo Cheung; Razvan Sultana; Dan Lee; John Quackenbush; C Robin Buell
Journal: Nucleic Acids Res Date: 2003-01-01 Impact factor: 16.971

10. The Pfam protein families database.

Authors: Alex Bateman; Lachlan Coin; Richard Durbin; Robert D Finn; Volker Hollich; Sam Griffiths-Jones; Ajay Khanna; Mhairi Marshall; Simon Moxon; Erik L L Sonnhammer; David J Studholme; Corin Yeats; Sean R Eddy
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

26 in total

1. Structure, allelic diversity and selection of Asr genes, candidate for drought tolerance, in Oryza sativa L. and wild relatives.

Authors: Romain Philippe; Brigitte Courtois; Kenneth L McNally; Pierre Mournet; Redouane El-Malki; Marie Christine Le Paslier; Denis Fabre; Claire Billot; Dominique Brunel; Jean-Christophe Glaszmann; Dominique This
Journal: Theor Appl Genet Date: 2010-05-08 Impact factor: 5.699

2. Large-scale characterization of Tos17 insertion sites in a rice T-DNA mutant library.

Authors: Pietro Piffanelli; Gaétan Droc; Delphine Mieulet; Nadège Lanau; Martine Bès; Emmanuelle Bourgeois; Claire Rouvière; Fréderick Gavory; Corinne Cruaud; Alain Ghesquière; Emmanuel Guiderdoni
Journal: Plant Mol Biol Date: 2007-09-15 Impact factor: 4.076

3. Overexpression of BAX INHIBITOR-1 Links Plasma Membrane Microdomain Proteins to Stress.

Authors: Toshiki Ishikawa; Toshihiko Aki; Shuichi Yanagisawa; Hirofumi Uchimiya; Maki Kawai-Yamada
Journal: Plant Physiol Date: 2015-08-21 Impact factor: 8.340

4. Exploration of plant genomes in the FLAGdb++ environment.

Authors: Sandra Dèrozier; Franck Samson; Jean-Philippe Tamby; Cécile Guichard; Véronique Brunaud; Philippe Grevet; Séverine Gagnot; Philippe Label; Jean-Charles Leplé; Alain Lecharny; Sébastien Aubourg
Journal: Plant Methods Date: 2011-03-29 Impact factor: 4.993

5. The rice resistance protein pair RGA4/RGA5 recognizes the Magnaporthe oryzae effectors AVR-Pia and AVR1-CO39 by direct binding.

Authors: Stella Cesari; Gaëtan Thilliez; Cécile Ribot; Véronique Chalvon; Corinne Michel; Alain Jauneau; Susana Rivas; Ludovic Alaux; Hiroyuki Kanzaki; Yudai Okuyama; Jean-Benoit Morel; Elisabeth Fournier; Didier Tharreau; Ryohei Terauchi; Thomas Kroj
Journal: Plant Cell Date: 2013-04-02 Impact factor: 11.277

6. OsMADS26 Negatively Regulates Resistance to Pathogens and Drought Tolerance in Rice.

Authors: Giang Ngan Khong; Pratap Kumar Pati; Frédérique Richaud; Boris Parizot; Przemyslaw Bidzinski; Chung Duc Mai; Martine Bès; Isabelle Bourrié; Donaldo Meynard; Tom Beeckman; Michael Gomez Selvaraj; Ishitani Manabu; Anna-Maria Genga; Christophe Brugidou; Vinh Nang Do; Emmanuel Guiderdoni; Jean-Benoit Morel; Pascal Gantet
Journal: Plant Physiol Date: 2015-09-30 Impact factor: 8.340

7. Turning rice meiosis into mitosis.

Authors: Delphine Mieulet; Sylvie Jolivet; Maud Rivard; Laurence Cromer; Aurore Vernet; Pauline Mayonove; Lucie Pereira; Gaëtan Droc; Brigitte Courtois; Emmanuel Guiderdoni; Raphael Mercier
Journal: Cell Res Date: 2016-10-21 Impact factor: 25.617

8. MNU-induced mutant pools and high performance TILLING enable finding of any gene mutation in rice.

Authors: Tadzunu Suzuki; Mitsugu Eiguchi; Toshihiro Kumamaru; Hikaru Satoh; Hiroaki Matsusaka; Kazuki Moriguchi; Yasuo Nagato; Nori Kurata
Journal: Mol Genet Genomics Date: 2007-10-19 Impact factor: 3.291

9. Diversity of the Ty-1 copia retrotransposon Tos17 in rice (Oryza sativa L.) and the AA genome of the Oryza genus.

Authors: Julie Petit; Emmanuelle Bourgeois; Wilfried Stenger; Martine Bès; Gaétan Droc; Donaldo Meynard; Brigitte Courtois; Alain Ghesquière; François Sabot; Olivier Panaud; Emmanuel Guiderdoni
Journal: Mol Genet Genomics Date: 2009-10-25 Impact factor: 3.291

10. OryGenesDB 2008 update: database interoperability for functional genomics of rice.

Authors: Gaëtan Droc; Christophe Périn; Sébastien Fromentin; Pierre Larmande
Journal: Nucleic Acids Res Date: 2008-11-26 Impact factor: 16.971