| Literature DB >> 24147765 |
Reena Narsai1, James Devenish, Ian Castleden, Kabir Narsai, Lin Xu, Huixia Shou, James Whelan.
Abstract
Omics research in Oryza sativa (rice) relies on the use of multiple databases to obtain different types of information to define gene function. We present Rice DB, an Oryza information portal that is a functional genomics database, linking gene loci to comprehensive annotations, expression data and the subcellular location of encoded proteins. Rice DB has been designed to integrate the direct comparison of rice with Arabidopsis (Arabidopsis thaliana), based on orthology or 'expressology', thus using and combining available information from two pre-eminent plant models. To establish Rice DB, gene identifiers (more than 40 types) and annotations from a variety of sources were compiled, functional information based on large-scale and individual studies was manually collated, hundreds of microarrays were analysed to generate expression annotations, and the occurrences of potential functional regulatory motifs in promoter regions were calculated. A range of computational subcellular localization predictions were also run for all putative proteins encoded in the rice genome, and experimentally confirmed protein localizations have been collated, curated and linked to functional studies in rice. A single search box allows anything from gene identifiers (for rice and/or Arabidopsis), motif sequences, subcellular location, to keyword searches to be entered, with the capability of Boolean searches (such as AND/OR). To demonstrate the utility of Rice DB, several examples are presented including a rice mitochondrial proteome, which draws on a variety of sources for subcellular location data within Rice DB. Comparisons of subcellular location, functional annotations, as well as transcript expression in parallel with Arabidopsis reveals examples of conservation between rice and Arabidopsis, using Rice DB (http://ricedb.plantenergy.uwa.edu.au).Entities:
Keywords: Arabidopsis; Arabidopsis thaliana; Oryza sativa; protein; rice; subcellular location; transcript expression
Mesh:
Substances:
Year: 2013 PMID: 24147765 PMCID: PMC4253041 DOI: 10.1111/tpj.12357
Source DB: PubMed Journal: Plant J ISSN: 0960-7412 Impact factor: 6.417
Figure 1The user-friendly interface of Rice DB. (a) The major data types presented in Rice DB, showing the linked connections between them. The colour coding of each data type is maintained in the headings and side bars. (b) Screenshot showing the front page, where links to information about Rice DB, including the ‘About’, ‘Data’ and ‘Tutorial’ sections are shown on the left, next to a large search box allowing various entries. Note Boolean queries (AND/OR) are allowed. The right column shows examples of recent queries. The output after the search for arginase is shown here as a summary. Although ‘arginase’ returns one gene, note that when multiple genes are entered multiple rows are shown. (c) The summary output when ‘arginase’ was searched. Details of each data type are shown, with the coloured side bars representing the data type.
Outline of data presented in Rice DB
| Data in Rice DB | Data subtype (brief description) | References |
|---|---|---|
| MSU RGAP identifiers | ( | |
| All identifiers were collated and matched to MSU identifiers; gene symbols were specifically curated and added into Rice DB | RAP-DB identifiers and descriptions Gramene GenesDB symbol names | ( |
| MSU putative function annotation | ( | |
| Annotations were compiled to enable keyword searches across any of the sources. | RAP gene description Genebin (Funcats at ANU) | ( |
| GO slim: ontology, domain (C/F/P) Coil predictions FingerPRINTScan Gene3D InterPro PROSITE Panther Pfam SMART SuperFamily TMHMM | | |
| Transcript expression: data from multiple sources (see Experimental procedures) | ||
| Microarray data were normalized and expression annotations were generated for Rice DB. The occurrence of all possible hexamers and | RNA tissue data (no. tissues expressed in) Expressed in (‘expression annotation’) Stress expression (‘expression annotation’) Experimentally confirmed DNA binding motifs: matched in 1 kb upstream regions, occurrences calculated Motifs/CAREs known to be functional miRNAs: known miRNA targets and sequences presented in miRNA | Analysed and compiled for Rice DB Analysed, compiled and annotated for Rice DB Analysed, compiled and annotated for Rice DB AGRIS, Athamap, matched in rice promoters ( |
| Predicted locations: output from each predictor is presented for Rice DB | ||
| Peptide sequences for all the encoded genes in the rice genome were run through each computational predictor, and outputs are presented in Rice DB. In addition, published literature was searched and compiled showing experimentally determined localization. For those with experimentally determined localization, the phenotype is indicated if one was determined upon genetic alteration. | Ambiguous targeting predictor ChloroP location predictor MitoProt 2 location predictor PredictNLS location predictor Predotar location predictor ProteinProwler location predictor PTS1 location predictor SignalP location predictor TargetP location predictor WoLFPSort location predictor YLoc location predictor | ( |
| Experimentally confirmed locations: compiled for Rice DB | ||
| Experimentally confirmed phenotypes: compiled for Rice DB | ||
| Arabidopsis: predicted and experimental locations shown from SUBA ( | ||
| Orthology data were matched and compiled for Rice DB from Inparanoid, Gramene and Expressologs. | Inparanoid orthology: clusters of orthologous genes with scores are shown ( | |
Manual amendment or addition to this category for Rice DB. Details of all these sources, including web links, are also shown on the Data page in Rice DB.
Figure 2An example of the output after three genes/proteins in rice were searched in Rice DB. After clicking on the down arrow present in the orthology column, it is possible to see parallel information for the orthologous gene(s) in Arabidopsis within the Rice DB output table. Examples demonstrating the usefulness of showing Arabidopsis gene descriptions, expression annotations and subcellular locations in parallel are shown (in the pink, yellow and blue boxes, respectively). Examples of the pop-up windows are also shown for the expression and protein subcellular location(s) data.
Figure 3Microarray analysis workflow using Rice DB. (a) The summary output after a list of differentially expressed Affymetrix probe sets are entered into Rice DB. (b) The output after clicking on the expand spreadsheet icon in the ‘Annotations’ column or after typing ‘Show annotations for…’. All columns show the annotations from the various sources (listed in Table1). (c) The output after clicking the expand spreadsheet icon in the ‘Expressed_in’ column, or after typing ‘Show expression profiles for…’. The normalized expression intensities across the 41 different developmental tissues are shown after log transforming and viewing using the custom heat map in MS Excel. (di) Output showing the genes containing the experimentally confirmed motifs after the expand icon was clicked in the ‘Exp_shown_motifs’ column. (dii) The output after the ‘View hexamers for…’ was entered for a shortlist of genes. This shows the numerical and percentage occurrence of the 4096 possible hexamers in the input gene list, as well as these occurrences in the genome.
Figure 4Subcellular location of rice proteins. (a) Seven genes are shown, representing combinations of the three ways that Rice DB can give insight into subcellular location: i.e. computational prediction (‘Predicted in rice’); experimentally determined subcellular location of orthologous proteins in Arabidopsis (‘Exp. shown in Arabidopsis’); and experimentally determined subcellular location of rice proteins (‘Exp. shown in rice’). (b) Overlapping numbers of proteins identified as mitochondrial on the basis of these three approaches: i.e. (i) orthologues to AT mito. set; (ii) the Rice mito. set; and iiii) computational prediction based on four or more predictors. All three total sets, as well as the exclusive set of 839 proteins determined by orthology alone, were significantly enriched in the ‘Energy’ Genebins category (P < 0.01, indicated by ∧), and the number of these in each set is indicated in brackets. Gene expression patterns for the genome (defined as all genes on the Affymetrix Rice genome microarray) and each of the three gene sets were examined. For each set, the percentage of genes expressed in none of the microarrays, between one and 36 of the tissues/stages and >90% of all tissues/developmental stages (i.e. more than 37 out of the 41 different developmental tissues/stages) is indicated. *Gene sets enriched in these proportions, compared with the genome. (c) The orthologue summary for the rice proteins identified as mitochondrial on the basis of orthology: orthologues to AT mito. set. (d) The pop-up search box where specific predictors can be selected. The outputs for each of these were presented as percentages (for most, where possible), and the expanded output of these is shown. (e) The pop-up window showing the possible data types that can be searched in Rice DB, when ‘choose data type’ is selected on the homepage. The example output is shown for experimentally determined locations. See references for predictors in the ‘Data’ pages in Rice DB.
Figure 5Inter-connections within Rice DB: Oryza information portal. (a) Rice DB creates a network for rice that connects identifiers, annotations, transcript data and protein data, and links these with information for orthologous genes in Arabidopsis. Data subtypes are shown below each heading. By connecting these data types for rice, it is possible to follow these connections and gain insight into function, including for rice genes with very little or no functional information. (b) Tutorial examples, as shown below the search box in Rice DB. These can be used as templates to use the functions in Rice DB. Note that only single examples are shown per data type (the full list is shown below the search box in Rice DB).