| Literature DB >> 21227933 |
Kei Iida1, Shuji Kawaguchi, Norio Kobayashi, Yuko Yoshida, Manabu Ishii, Erimi Harada, Kousuke Hanada, Akihiro Matsui, Masanori Okamoto, Junko Ishida, Maho Tanaka, Taeko Morosawa, Motoaki Seki, Tetsuro Toyoda.
Abstract
Recent advances in technologies for observing high-resolution genomic activities, such as whole-genome tiling arrays and high-throughput sequencers, provide detailed information for understanding genome functions. However, the functions of 50% of known Arabidopsis thaliana genes remain unknown or are annotated only on the basis of static analyses such as protein motifs or similarities. In this paper, we describe dynamic structure-based dynamic expression (DSDE) analysis, which sequentially predicts both structural and functional features of transcripts. We show that DSDE analysis inferred gene functions 12% more precisely than static structure-based dynamic expression (SSDE) analysis or conventional co-expression analysis based on previously determined gene structures of A. thaliana. This result suggests that more precise structural information than the fixed conventional annotated structures is crucial for co-expression analysis in systems biology of transcriptional regulation and dynamics. Our DSDE method, ARabidopsis Tiling-Array-based Detection of Exons version 2 and over-representation analysis (ARTADE2-ORA), precisely predicts each gene structure by combining two statistical analyses: a probe-wise co-expression analysis of multiple transcriptome measurements and a Markov model analysis of genome sequences. ARTADE2-ORA successfully identified the true functions of about 90% of functionally annotated genes, inferred the functions of 98% of functionally unknown genes and predicted 1,489 new gene structures and functions. We developed a database ARTADE2DB that integrates not only the information predicted by ARTADE2-ORA but also annotations and other functional information, such as phenotypes and literature citations, and is expected to contribute to the study of the functional genomics of A. thaliana. URL: http://artade.org.Entities:
Mesh:
Year: 2011 PMID: 21227933 PMCID: PMC3037080 DOI: 10.1093/pcp/pcq202
Source DB: PubMed Journal: Plant Cell Physiol ISSN: 0032-0781 Impact factor: 4.927
Types of analyses for studying gene functions computationally
| Types of analysis | Structure | Expression | Analysis | Reliability of the coding sequence on the gene | Reliability of dynamic expression analysis | Ability to find novel genes | Tools/Databases |
|---|---|---|---|---|---|---|---|
| SSSA | Static | Static | Homology/motif search based on reference gene structures | ⊙ | Not applicable | − | BLAST (1), Pfam (2) |
| SSDE | Static | Dynamic | Co-expression analysis based on reference gene structures | ⊙ | ◯ | − | ATTED-II (3), CressExpress (4) |
| DSDE | Dynamic | Dynamic | Simultaneous elucidation of gene structures and dynamism of expression | ◯ | ⊙ | + | ARTADE2-ORA |
SSSA, static structure-based static analysis; SSDE, static structure-based dynamic expression; DSDE, dynamic structure-based dynamic expression.
Static structures are pre-defined gene structures such as annotated genes, whereas dynamic structures are constructed gene models depending on the studied transcriptome.
Static expression indicates gene expression analyses in which conditional changes are ignored. An example is a cDNA collection study for correcting gene structures. Dynamic expression indicates gene expression changes observed under multiple conditions.
⊙ , very good; ◯, good.
+, positive; −, negative.
References: 1, Altschul et al. (1997); 2 Gunasekaran et al. (2010); 3, Obayashi et al. (2009); 4, Srinivasasainagendra et al. (2008).
Fig. 1Success rates of functional annotation terms of genes with ORA. In the graph, only GO terms were considered. Genes are categorized into four groups: genes with annotation (category A), genes annotated on the basis of similarities (category B), unknown genes (category C) and pseudogenes/transposable elements (data not shown). Summaries of gene categories A, B and C are also shown. Gray, yellow and green bars represent the results of annotated genes (SSDE), annotated genes with corresponding ARTADE2 gene models (filtered SSDE) and ARTADE2 gene models (DSDE), respectively. A similar graph drawn with the ORA results considering all of GO, PO and other annotation terms can be found in Supplementary Fig. S1.
Fig. 2(A) Success rate vs. number of predicted GO terms. The number of GO terms is shown with respect to the number of GO terms appearing in the annotation. The P-value thresholds for ORA range from 1e-2 (upper right) to 1e-50 (lower left). For DSDE, about twice as many GO terms are described as with ORA, and the result showed a success rate of >90%, with the most relaxed threshold. The success rate at the threshold is about 85 and 78% in the filtered SSDE and SSDE, respectively. In this study, 1,142 GO/PO/annotation terms were tested with ORA. We plotted the filled symbols to show the results with a P-value threshold; P < 8.76e-06 which corresponded to P < 1e-2 when considering multiple testing correction. (B) Precision Recall graph of DSDE, filtered SSDE and SSDE. Although all results were similar, DSDE and filtered SSDE showed slightly better performances. More detailed results used for the graphs are given in Supplementary Table S1.
Fig. 3Schema of information stored in ARTADE2DB. We have two sets of gene models: a set of ARTADE2 gene models and a set of annotated gene models. The two sets are connected on the basis of their overlap. Each gene model in either set contains information about the expression profile, the correlation plot, a list of co-expressed genes and the ORA result. Annotated gene models and GO or PO terms act as gates to information stored in SciNetS.
Fig. 4Example of search results with the query term ‘drought’. The database search engine locates gene models containing the query term. Terms appearing in entry descriptions that have semantic links against the gene models can be query words.
Fig. 5An example of a dynamic gene structure and supporting correlation plot. In the upper figure, exon–intron structures of a dynamic gene model (OMAT1P119680) and the overlapping TAIR gene model (AT1G76080.1) are drawn. Boxes show exons and lines show introns. The light blue region on the TAIR gene model shows the CDS region. Arrows indicate the directions of the gene models. The lower figure is a correlation plot on the locus. The upper and lower figures share the x-axis. In the correlation plot, positive probe-wise correlation values are shown in red and negative values in blue. We found high positive correlation values within the first and second exons of the ARTADE2 gene model. In addition, correlation between probes located on the first and second exon is also high. Similar figures are available at the gene information page of ARTADE2DB.
Fig. 6An example of ORA results (OMAT1P119680). GO and PO are classified as belonging to the rule of the RIKEN SciNetS. Several database constructed on the SciNetS share the data format which may help data integration in the future. Added to ORA P, PosMed P are described. The table is available at the gene information page of the ARTADE2DB. Other examples of the ARTADE2DB contents can found in Supplementary Fig. S2.