| Literature DB >> 23660284 |
Jonathan Vincent1, Zhanwu Dai, Catherine Ravel, Frédéric Choulet, Said Mouzeyar, M Fouad Bouzidi, Marie Agier, Pierre Martre.
Abstract
The functional annotation of genes based on sequence homology with genes from model species genomes is time-consuming because it is necessary to mine several unrelated databases. The aim of the present work was to develop a functional annotation database for common wheat Triticum aestivum (L.). The database, named dbWFA, is based on the reference NCBI UniGene set, an expressed gene catalogue built by expressed sequence tag clustering, and on full-length coding sequences retrieved from the TriFLDB database. Information from good-quality heterogeneous sources, including annotations for model plant species Arabidopsis thaliana (L.) Heynh. and Oryza sativa L., was gathered and linked to T. aestivum sequences through BLAST-based homology searches. Even though the complexity of the transcriptome cannot yet be fully appreciated, we developed a tool to easily and promptly obtain information from multiple functional annotation systems (Gene Ontology, MapMan bin codes, MIPS Functional Categories, PlantCyc pathway reactions and TAIR gene families). The use of dbWFA is illustrated here with several query examples. We were able to assign a putative function to 45% of the UniGenes and 81% of the full-length coding sequences from TriFLDB. Moreover, comparison of the annotation of the whole T. aestivum UniGene set along with curated annotations of the two model species assessed the accuracy of the annotation provided by dbWFA. To further illustrate the use of dbWFA, genes specifically expressed during the early cell division or late storage polymer accumulation phases of T. aestivum grain development were identified using a clustering analysis and then annotated using dbWFA. The annotation of these two sets of genes was consistent with previous analyses of T. aestivum grain transcriptomes and proteomes. Database URL: urgi.versailles.inra.fr/dbWFA/Entities:
Mesh:
Substances:
Year: 2013 PMID: 23660284 PMCID: PMC3649639 DOI: 10.1093/database/bat014
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1Simplified diagram of the data integration process.
Figure 2Screen capture of the web interface of the dbWFA database. (A) Page for querying PMN pathways. Similar pages can be used to query the MIPS Functional Category, TAIR gene families, GO and MapMan bins. A list of GO can be queried simultaneously. (B) Page for querying UniGene or Full-length cDNA sequences annotations. (C) Result page for annotated UniGenes.
Number of T. aestivum transcripts from the NCBI UniGene set (build #60) and full-length cDNA (FL cDNA) sequences retrieved from the TriFLDB database, annotated with a putative function (coverage >50%, identity >45%) in at least one annotation system
| Functional annotation systems | Number of annotated transcripts | |||||
|---|---|---|---|---|---|---|
| Total | ||||||
| NCBI UniGene | FL cDNA | NCBI UniGene | FL cDNA | NCBI UniGene | FL cDNA | |
| MIPS functional classification | 12 943 | 10 864 | 12 943 | 10 864 | ||
| PlantCyc pathway reactions | 2193 | 2106 | 2093 | 2208 | 3067 | 2911 |
| GOs | 13 142 | 8014 | 10 444 | 10 850 | 16 079 | 12 279 |
| TAIR | 4498 | 3797 | 4498 | 3797 | ||
| MapMan bins | 19 248 | 14 032 | 13 202 | 10 897 | 20 033 | 14 224 |
| Curated pathways or functions | ||||||
| Hormone-responsive genes | 467 | |||||
| Ubiquitin-proteasome system | 876 | |||||
| Transcription factors | 2891 | |||||
aNumber of transcripts and full-length cDNA sequences annotated with a putative function in at least one model species.
Figure 3Radar plot (log scale) of the MapMan bin annotations for A. thaliana, O. sativa and T. aestivum UniGene (build #60) and full-length coding sequences. Data are percent of the total number of MapMan bin annotations (Table 1). Similar results were obtained with builds #55, #58 and #59 (data not shown). Some bins have been merged to make the figure clearer.
Figure 4Functional annotation of genes specifically expressed during either the early cell division or late SPA phases of T. aestivum grain development. (A) Heat map of expression for early- and late-development-specific genes. (B) Normalized expression of the early and late development specific gene clusters. Transcripts with normalized expression <7 were not considered to be expressed (i.e. not different from the background noise). Data are medians ± 1 SD. (C) MIPS Functional Categories of genes from both UniGene clusters.
Find all T. aestivum transcripts likely to have a phytoene synthase activity
| UniGene | Matching sequences | Alignment parameters | ||||
|---|---|---|---|---|---|---|
| Id number | Representing sequence | Description | Id number | Description | Coverage (%) | Identity (%) |
| Ta.41960 | Ta_S16057905 | LOC_OS06G51290 | Phytoene synthase, chloroplast precursor, putative, expressed | 59.7 | 81.4 | |
| AT5G17230 | Phytoene synthase | 58.0 | 79.6 | |||
| Ta.66029 | Ta_S26027774 | FGAS000498 | LOC_OS06G51290 | phytoene synthase, chloroplast precursor, putative, expressed | 55.3 | 48.9 |
| AT5G17230 | Phytoene synthase | 59.7 | 47.08 | |||
Find as much information as possible about a list of transcripts
| UniGene | GO | TAIR | MIPS | PlantCyc | MapMan | |
|---|---|---|---|---|---|---|
| Id number | Match | |||||
| Ta.41960 | GO:0009507 | 01.06.06.13 | 2.5.1.32 | 16.1.4.1 | ||
| Phytoene synthase | GO:0016117 | 70.26.03 | 2.5.1.32 | |||
| GO:0016767 | ||||||
| GO:0046905 | ||||||
Find all the transcripts putatively involved in the glycolytic pathway for a transcriptome analysis in MapMan
| Bin code | Name | Identifier | Description | Type |
|---|---|---|---|---|
| 4.1 | Glycolysis.cytosolic branch | Ta_S16058223 | Similar to UTP–glucose-1-phosphate uridylyltransferase, putative, expressed | T |
| Coverage: 99.5745%, identity: 92.75% | ||||
| 4.1.10 | Glycolysis.cytosolic branch.non-phosphorylating glyceraldehyde 3-phosphate dehydrogenase (NPGAP-DH) | Ta_S13048872 | Similar to aldehyde dehydrogenase | T |
| Coverage: 100%, identity: 87.1% | ||||
| 4.1.10 | Glycolysis.cytosolic branch.non-phosphorylating glyceraldehyde 3-phosphate dehydrogenase (NPGAP-DH) | Ta_S13048873 | Similar to aldehyde dehydrogenase | T |
| Coverage: 100%, identity: 79.23% | ||||
| 4.1.11 | Glycolysis.cytosolic branch.aldolase | Ta_S15902802 | Similar to aldolase superfamily protein | T |
| Coverage: 50.1873%, identity: 85.07% | ||||
| 4.1.11 | Glycolysis.cytosolic branch.aldolase | Ta_S17888674 | Similar to aldolase superfamily protein | T |
| Coverage: 88.5475%, identity: 48.91% |