| Literature DB >> 17900339 |
Martijn J T N Timmermans1, Muriel E de Boer, Benjamin Nota, Tjalf E de Boer, Janine Mariën, Rene M Klein-Lankhorst, Nico M van Straalen, Dick Roelofs.
Abstract
BACKGROUND: Environmental quality assessment is traditionally based on responses of reproduction and survival of indicator organisms. For soil assessment the springtail Folsomia candida (Collembola) is an accepted standard test organism. We argue that environmental quality assessment using gene expression profiles of indicator organisms exposed to test substrates is more sensitive, more toxicant specific and significantly faster than current risk assessment methods. To apply this species as a genomic model for soil quality testing we conducted an EST sequencing project and developed an online database. DESCRIPTION: Collembase is a web-accessible database comprising springtail (F. candida) genomic data. Presently, the database contains information on 8686 ESTs that are assembled into 5952 unique gene objects. Of those gene objects approximately 40% showed homology to other protein sequences available in GenBank (blastx analysis; non-redundant (nr) database; expect-value < 10-5). Software was applied to infer protein sequences. The putative peptides, which had an average length of 115 amino-acids (ranging between 23 and 440) were annotated with Gene Ontology (GO) terms. In total 1025 peptides (approximately 17% of the gene objects) were assigned at least one GO term (expect-value < 10-25). Within Collembase searches can be conducted based on BLAST and GO annotation, cluster name or using a BLAST server. The system furthermore enables easy sequence retrieval for functional genomic and Quantitative-PCR experiments. Sequences are submitted to GenBank (Accession numbers: EV473060 - EV481745).Entities:
Mesh:
Substances:
Year: 2007 PMID: 17900339 PMCID: PMC2234260 DOI: 10.1186/1471-2164-8-341
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Relative abundance of six cDNAs before (upper) and after (lower) normalization as measured using quantitative PCR. Act: β-actin; 28S: 28S rDNA; De: RNA helicase Dead1; RXR: RXR-USP; Ub: Ultrabithorax; Kr: Kruppel.
Remaining sequences after the Trace2dbest process
| Library | # Clones sequenced | # Passed (%) |
| Normalized | 8064 | 7329 (91) |
| Cadmium enriched | 960 | 705 (73) |
| Phenanthrene enriched | 960 | 808 (84) |
| Total | 9984 | 8842 (89) |
Contigs per cluster, as generated by CLOBB and Phrap
| # Clusters | # Contigs/cluster | Total number of contigs | |
| Clusters | 1 | 11 | 11 |
| 1 | 10 | 10 | |
| 1 | 5 | 5 | |
| 7 | 3 | 21 | |
| 83 | 2 | 166 | |
| 1313 | 1 | 1313* | |
| Singletons | 4686 | 1 | 4686 |
| Total | 6092 | 6212 | |
* In 75 instances Phrap did not assemble the contigs, in those cases the pseudo-contig files generated by PartiGene were used.
Figure 2Venn-diagram showing the cluster overlap between the three libraries for the total dataset: Cad: cadmium enriched library; Phe: phenanthrene enriched library; Nor: normalized library.
A) Clusters that contain sequences from all three libraries and B) the most abundantly sequenced transcripts for each of the three F. candida cDNA libraries. n = the number of sequences that are found in a cluster and that originate from the library specified. e-values for blast analyses against 'nr'-databases
| Library | Cluster (n) | Overview of related sequences (blastx) | Species | GenBank Accession | blastx e-value |
| A. | |||||
| All three | Fcc00101 (5) | Hypothetical protein | CAA90252 | 1e-32 | |
| BCS1-like | AAH19781 | 3e-29 | |||
| Fcc02080 (3) | No Significant Hit | - | - | ||
| Fcc00256 (6) | No Significant Hit | - | - | ||
| Fcc00343 (22) | Hypothetical protein | XP_001397474 | 1e-19 | ||
| Haloacid dehalogenase-like hydrolase | XP_001260321 | 2e-19 | |||
| Hypothetical protein | NP_001017717 | 8e-06 | |||
| Fcc01457 (8) | Cytochrome c oxidase s.u.II | AAS66294 | 7e-93 | ||
| Fcc03109 (3) | No Significant Hit | - | - | ||
| Fcc00170 (27) | Alpha-aminoadipyl-cysteinyl -valine synthetase | BAA08846 | 4e-58 | ||
| B. | |||||
| Normalized | Fcc00179 (31) | No Significant Hit | - | - | |
| Fcc00087 (25) | No Significant Hit | - | - | ||
| Fcc00164 (16) | No Significant Hit | - | - | ||
| Fcc00632 (14) | No Significant Hit | - | - | ||
| Fcc00225 (12) | GA19585-PA | EAL32218 | 2e-06 | ||
| Phenanthrene | Fcc00058 (98) | Dipeptidyl peptidase | XP_001607433 | 9e-35 | |
| Cytochrome P450 | AAF97937 | 1e-12 | |||
| Fcc00015 (91) | Cytochrome P450 | AAN05727 | 9e-15 | ||
| Fcc00021 (35) | Monooxygenase, DBH-like 1 | AAH91331 | 1e-21 | ||
| Fcc00217 (25) | Monooxygenase, DBH-like 1 | NP_989955 | 2e-08 | ||
| Fcc04217 (23) | Cytochrome P450 | XP_392000 | 3e-12 | ||
| Cadmium | Fcc01017 (16) | Hypothetical protein | XP_757859 | 5e-13 | |
| Endo-1,3 1,4-beta-D -glucanase precursor | XP_480878 | 2-07 | |||
| Fcc00170 (15) | Alpha-aminoadipyl-cysteinyl -valine synthetase | BAA08846 | 4e-58 | ||
| Fcc01428 (16) | 16S ribosomal RNA gene | AY555551 | 1e-66* | ||
| Fcc01142(12) | No Significant Hit | - | - | ||
| Fcc00018 (9) | No Significant Hit | - | - | ||
*) e-value from blastn
Percentages of contigs showing sequence similarity (e-value < 10-5) with sequences stored in GenBank (nr, est databases and nr database restricted to the Insecta) and proteins of Caenorhabditis elegans, Drosophila melanogaster and Mus musculus (April 2007)
| Database | BLAST | Significant hits for the total dataset | Significant hits excl. 140 clusters* |
| nr | blastx | 42 | 41 |
| nr | blastn | 9 | 7 |
| est | tblastx | 40 | 39 |
| nr – Insecta** | blastx | 36 | 35 |
| blastx | 25 | 24 | |
| blastx | 32 | 30 | |
| blastx | 31 | 29 |
* In total 140 clusters showed high similarity to yeast and human DNA sequences stored in GenBank and were therefore regarded as contamination.
** Blast analysis performed August 2007
GO slim terms for F. candida genes based on a BLAST search (e-value < 10-25) against the GO annotated UniProt database as generated by Annot8r_blast2GO
| Electron transport | GO:0006118 | 53 |
| Response to stimulus | GO:0050896 | 18 |
| Amino acid and derivative metabolism | GO:0006519 | 31 |
| Behavior | GO:0007610 | 1 |
| Physiological process | GO:0007582 | 500 |
| Transport | GO:0006810 | 140 |
| Regulation of biological process | GO:0050789 | 2 |
| Cell communication | GO:0007154 | 38 |
| Nucleobase, nucleoside, nucleotide and nucleic acid metabolism | GO:0006139 | 158 |
| Cell motility | GO:0006928 | 3 |
| Development | GO:0007275 | 30 |
| Cellular process | GO:0009987 | 6 |
| Biological process unknown | GO:0000004 | 3 |
| Motor activity | GO:0003774 | 8 |
| Transcription regulator activity | GO:0030528 | 8 |
| Antioxidant activity | GO:0016209 | 2 |
| Signal transducer activity | GO:0004871 | 16 |
| Enzyme regulator activity | GO:0030234 | 15 |
| Catalytic activity | GO:0003824 | 571 |
| Binding | GO:0005488 | 543 |
| Nucleic acid binding | GO:0003676 | 128 |
| Molecular function unknown | GO:0005554 | 31 |
| Structural molecule activity | GO:0005198 | 82 |
| Transporter activity | GO:0005215 | 65 |
| Extracellular region | GO:0005576 | 16 |
| Intracellular | GO:0005622 | 444 |
| Unlocalized protein complex | GO:0005941 | 3 |
| Cellular component unknown | GO:0008372 | 2 |
| Cell | GO:0005623 | 200 |