| Literature DB >> 17430557 |
Barbara Lazzari1, Andrea Caprera, Cristian Cosentino, Alessandra Stella, Luciano Milanesi, Angelo Viotti.
Abstract
BACKGROUND: The ESTuber database (http://www.itb.cnr.it/estuber) includes 3,271 Tuber borchii expressed sequence tags (EST). The dataset consists of 2,389 sequences from an in-house prepared cDNA library from truffle vegetative hyphae, and 882 sequences downloaded from GenBank and representing four libraries from white truffle mycelia and ascocarps at different developmental stages. An automated pipeline was prepared to process EST sequences using public software integrated by in-house developed Perl scripts. Data were collected in a MySQL database, which can be queried via a php-based web interface.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17430557 PMCID: PMC1885842 DOI: 10.1186/1471-2105-8-S1-S13
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The ESTuber db pipeline. Data flow among the main programs included in the ESTuber db pipeline. Accessory programs were excluded from the chart.
Statistics on the ESTuber db status. Sequences are considered annotated for Best Blast Hit E-values ≤ 1e-10.
| Total number of sequences | 3,271 |
| Average base count | 530.43 |
| Number of singletons | 2,071 |
| Number of contigs | 356 |
| Number of putative unigenes | 2,427 |
| Sequences annotated | 60.19 |
| Sequences annotated | 48.88 |
| Sequences annotated | 8.6 |
| Repeats-containing sequences (%) | 7.64 |
| Sequences containing PROSITE matches (%) | 16.84 |
Figure 2The ESTuber database. Main tables of the ESTuber db.
Figure 3The ESTuber db contig page. A detailed page for each contig is included in the database, where a graphical display, the contig consensus sequence and the contig alignment are given, together with any other supplementary information.
Figure 4Discovery rates for the ascocarp and mycelium datasets. Organ unique sequences plotted against the number of ESTs. The discovery rate is represented by the angular coefficient of the line connecting two consecutive points. The straight line (y=x) corresponds to a discovery rate of 1.
Ontologies distribution in GO categories.
| Total number of sequences | 683 | 2,588 | 35,924 | 10,302 | 53,102 | 28,089 | 3,041 |
| Unigenes percentage | 62.22 | 84.89 | 91.6 | 91.67 | 94.78 | 56.28 | 99.44 |
| GO annotation percentage | 55.49 | 46.14 | 61.09 | 46.89 | 44.54 | 44.15 | 78.72 |
| 39.97 | 42.19 | 54.2 | 43.74 | 40.77 | 40.14 | 67.91 | |
| binding | 20.94 | 21.14 | 25.41 | 19.78 | 18.84 | 19.5 | 43.47 |
| catalytic activity | 17.72 | 20.83 | 32.1 | 30.35 | 20.39 | 21.98 | 30.71 |
| structural molecule activity | 6.59 | 6.45 | 5.11 | 2.3 | 5.57 | 7.77 | 16.38 |
| transporter activity | 3.51 | 3.71 | 6.43 | 6.57 | 5.76 | 3.9 | 6.91 |
| molecular function unknown | 3.37 | 0.97 | 0.88 | 0.62 | 0.72 | 0.54 | 1.84 |
| signal transducer activity | 0.15 | 0.39 | 1.05 | 0.49 | 0.6 | 0.61 | 0.72 |
| translation regulator activity | 1.02 | 1.43 | 1.2 | 1.48 | 1.82 | 0.95 | 1.51 |
| transcription regulator activity | 0 | 0.46 | 1.79 | 0.79 | 2.39 | 1.08 | 2.7 |
| enzyme regulator activity | 0.15 | 0.7 | 0.39 | 0.19 | 0.34 | 0.2 | 1.71 |
| antioxidant activity | 0.73 | 0.35 | 0.62 | 0.14 | 0.55 | 0.48 | 1.25 |
| 42.02 | 35.55 | 43.59 | 34.49 | 32.45 | 34.18 | 68.33 | |
| physiological process | 41.43 | 32.92 | 40.93 | 32.93 | 28.91 | 32.4 | 67.41 |
| cellular process | 37.92 | 30.87 | 35.44 | 30.36 | 25.61 | 28.93 | 66 |
| regulation of biological process | 0.73 | 2.09 | 3.7 | 1.96 | 4.04 | 2.21 | 6.68 |
| 15.96 | 15.92 | 15.56 | 8.83 | 11.89 | 12.84 | 55.87 | |
| organelle | 13.76 | 11.67 | 11.81 | 6.88 | 8.97 | 10.5 | 50.51 |
| protein complex | 13.91 | 11.01 | 8.79 | 4.46 | 7.11 | 7.73 | 28.74 |
Ontologies distribution in the three main GO categories and in their subclasses of the Tb ascocarps and mycelia subsets and of the five fungi EST datasets considered for comparison purposes. Data on EST collection redundancies and on the GO annotation percentages are given at the top of the table. Matches with GO entries are expressed as percentage of matching sequences with respect to the total number of sequences of the considered dataset. Only GO categories where at least one of the considered organisms had a percentage of matches > 1 were included in the table.
Figure 5Pie representation of the ontology occurrences. Distribution of the matching ontologies in the second-level GO categories for the different considered organisms. Only GO categories where at least one of the organisms had a percentage of matches > 1 were included in pies.