| Literature DB >> 17130142 |
Nathalie Pavy1, James J Johnson, John A Crow, Charles Paule, Timothy Kunau, John MacKay, Ernest F Retzel.
Abstract
ForestTreeDB is intended as a resource that centralizes large-scale expressed sequence tag (EST) sequencing results from several tree species (http://foresttree.org/ftdb). It currently encompasses 344,878 quality sequences from 68 libraries, from diverse organs of conifer and hybrid poplar trees. It utilizes the Nimbus data model to provide a hosting system for multiple projects, and uses object-relational mapping APIs in Java and Perl for data accesses within an Oracle database designed to be scalable, maintainable and extendable. Transcriptome builds or unigene sets occupy the focal point of the system. Several of the five current species-specific unigenes were used to design microarrays and SNP resources. The ForestTreeDB web application provides the means for multiple combination database queries. It presents the user with a list of discrete queries to retrieve and download large EST datasets or sequences from precompiled unigene assemblies. Functional annotation assignment is not trivial in conifers which are distantly related to angiosperm model plants. Optimal annotations are achieved through database queries that integrate results from several procedures based open-source tools. ForestTreeDB aims to facilitate sequence mining of coherent annotations in multiple species to support comparative genomic approaches. We plan to continuously enrich ForestTreeDB with other resources through collaborations with other genomic projects.Entities:
Mesh:
Year: 2006 PMID: 17130142 PMCID: PMC1716727 DOI: 10.1093/nar/gkl882
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1Tissue sampling. Number of clones representing the different organs from pine, spruce or poplar used to prepare the cDNA libraries. The full description of the 68 cDNA libraries is available in the database in the Summaries section, Library Descriptions ().
Number of unigenes found in the three major conifer UnigeneSets and number of unigenes sharing 80% of identity over at least 100 nt, in pairwise comparisons
| UnigeneSet | Spruce_Arborea_Release8 | Pine_NSF | Pine_UGA |
|---|---|---|---|
| Total number of unigenes | 16 602 | 20 483 | 122 079 |
| Pine_NSF | 8874 (53.45%) | ||
| Pine_UGA | 12 470 (75.1%) | 13 571 (66.25%) |
Only one of the two spruce unigene is presented here because they are largely overlapping.
Compilation of some statistics about sequence annotation extracted from ForestTreeDB with the query ‘GO Accn-BLAST’ in combination with a condition on the UnigeneSetID
| UnigeneSet | Species | Number of unigenes | Unigenes associated with the GO with a | Unigenes with a match found with | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Category molecular function | Category biological process | Category cellular component | Uniref100 (blast× hit) | PFAM (HMM hit) | SUPERFAMILY (HMM hit) | TIGRFAM (HMM hit) | SMART motifs (HMM hit) | |||
| Spruce | Spruce | 16 602 | 1429 (8.6%) | 1550 (9.3%) | 771 (4.6%) | 10 205 (61.5%) | 7230 (43.5%) | 5783 (34.8%) | 12 016 (72.4%) | 10 730 (64.6%) |
| NSF_pine | Pine | 20 483 | 1880 (9.2%) | 2055 (10%) | 1022 (5%) | 6896 (33.7%) | 8687 (42.4%) | 4397 (21.5%) | 15 547 (75.9%) | 4924 (24%) |
| UGA_pine | Pine | 122 079 | 7548 (6.2%) | 8291 (6.8%) | 4203 (3.4%) | 82 261 (67.4%) | 108 825 (89.1%) | 63 963 (52.4%) | 106 548 (87.3%) | 87 173 (71.4%) |
| Arborea poplar | Poplar | 5911 | 536 (9.1%) | 585 (9.9%) | 317 (5.4%) | 4243 (71.8%) | 4393 (74.3%) | 2637 (44.6%) | 5025 (85%) | 3165 (53.5%) |
Matches were filtered out based on the E-value threshold of 1E−10. Queries were completed with the following GO accessions: GO:0005575 (cellular component), GO:0003674 (molecular function) and GO:0008150 (biological process). The number of Unigenes annotated following a blast search or HMM search against several protein databases are indicated.
Figure 2Database environment. EST and genomic resources access the same core targets and annotation support and semantic web services [semantic BioMoby] provide the access mechanism.
Figure 3ForestTreeDB screenshot showing a query combining a GO search and a specific UnigeneSet search. Unigenes were searched belonging to the UnigeneSetID 6 (pine unigenes derived from the UGA assembly) and correlated to the GO term ‘DNA binding’ with P-value < 1E−10. The query resulted in 562 Unigenes.
Figure 4ForestTreeDB screenshot displaying the annotation assigned to one of the unigene retrieved with the query from Figure 3. For each annotation method, all the matches are displayed. Here, only the top of the screenshot is shown for each query (to limit the size of the figure). (a) Contig information including identifier, sequence, links to retrieve the sequence. (b) Blastx matches including hit's accession, hit location, similarity parameters. (c) Hits found by HMMER, including accession and description of the hit, location match and similarity parameter. (d) List of terms from Gene Ontology inferred to this contig.