| Literature DB >> 19640306 |
Elodie Fleury1, Arnaud Huvet, Christophe Lelong, Julien de Lorgeril, Viviane Boulo, Yannick Gueguen, Evelyne Bachère, Arnaud Tanguy, Dario Moraga, Caroline Fabioux, Penelope Lindeque, Jenny Shaw, Richard Reinhardt, Patrick Prunet, Grace Davey, Sylvie Lapègue, Christopher Sauvage, Charlotte Corporeau, Jeanne Moal, Frederick Gavory, Patrick Wincker, François Moreews, Christophe Klopp, Michel Mathieu, Pierre Boudry, Pascal Favrel.
Abstract
BACKGROUND: Although bivalves are among the most-studied marine organisms because of their ecological role and economic importance, very little information is available on the genome sequences of oyster species. This report documents three large-scale cDNA sequencing projects for the Pacific oyster Crassostrea gigas initiated to provide a large number of expressed sequence tags that were subsequently compiled in a publicly accessible database. This resource allowed for the identification of a large number of transcripts and provides valuable information for ongoing investigations of tissue-specific and stimulus-dependant gene expression patterns. These data are crucial for constructing comprehensive DNA microarrays, identifying single nucleotide polymorphisms and microsatellites in coding regions, and for identifying genes when the entire genome sequence of C. gigas becomes available. DESCRIPTION: In the present paper, we report the production of 40,845 high-quality ESTs that identify 29,745 unique transcribed sequences consisting of 7,940 contigs and 21,805 singletons. All of these new sequences, together with existing public sequence data, have been compiled into a publicly-available Website http://public-contigbrowser.sigenae.org:9090/Crassostrea_gigas/index.html. Approximately 43% of the unique ESTs had significant matches against the SwissProt database and 27% were annotated using Gene Ontology terms. In addition, we identified a total of 208 in silico microsatellites from the ESTs, with 173 having sufficient flanking sequence for primer design. We also identified a total of 7,530 putative in silico, single-nucleotide polymorphisms using existing and newly-generated EST resources for the Pacific oyster.Entities:
Mesh:
Year: 2009 PMID: 19640306 PMCID: PMC2907693 DOI: 10.1186/1471-2164-10-341
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Summary statistics of the Pacific oyster cDNA libraries.
| Description | Tissue | Vector used | No. of sequences retrieved | No. of valid sequences | Average length insert (bp) |
|---|---|---|---|---|---|
| cDNA gonad * | gonad | PAL32CV | 12162 | 8809 | 511 |
| cDNA embryos and larvae and central nervous system (GENOSCOPE) | embryos, larvae, CNS | pAL17.3 | 13191 | 12730 | 618 |
| cDNA hemocytes (GENOSCOPE) | hemocyte | pBluscriptIISK+ | 14472 | 13773 | 415 |
| cDNA digestive gland subtracted library (AQUAFIRST) | digestive gland | PCR2.1 | 1536 | 1362 | 428 |
| cDNA mantle-edge subtracted library (AQUAFIRST) | mantle-edge | PCR2.1 | 1536 | 1343 | 405 |
| cDNA hemocyte subtracted library (AQUAFIRST) | hemocyte | PCR2.1 | 1152 | 125 | 291 |
| cDNA gonad subtracted library (AQUAFIRST) | gonad | PCR2.1 | 768 | 559 | 382 |
| cDNA muscle subtracted library (AQUAFIRST) | muscle | PCR2.1 | 1536 | 1117 | 312 |
| cDNA gills subtracted library (AQUAFIRST) | gills | PCR2.1 | 1536 | 1027 | 359 |
| Total | 47889 | 40845 | Mean: 413 |
* Library partially published in Tanguy et al.[25] with 1894 sequenced clones.
Figure 1Processing chain of the GigasDatabase. The data resources of the GigasDatabase includes cleaning processes, batch statistics, assembling sequences into contigs, annotation of the contigs, visualization of the contigs, and summary statistics concerning each library.
Summary statistics of the ESTs generated from the Pacific oyster Crassostrea gigas available in the GigasDatabase.
| Feature | Value |
|---|---|
| Number of high quality ESTs (new ESTs + public) | 56327 (40845 + 15482) |
| Average length of high quality ESTs (bp) | 798 |
| Number of contigs | 7940 |
| Number of ESTs in contigs | 34522 |
| Number of singletons | 21805 |
| Number of unique sequences | 29745 |
BlastX searches and contig analysis for the complete collection of oyster contigs, based on GigasDatabase EST clustering.
| Number of unique sequences | 29745 |
| Number of unique sequences with BlastX hits | 12790 |
| Percentage of unique sequences with BlastX hits | 43% |
| Number of contigs containing: | |
| 2 ESTs | 4208 |
| 3 ESTs | 1588 |
| 4 ESTs | 794 |
| 5 ESTs | 397 |
| > 6 ESTs | 873 |
Figure 2Graphical view of the contig by "ContigView" available in GigasDatabase. The ContigView screen gives a graphical overview of the contig structure. Each sequence is represented as a line, and colors indicate the type of sequence. The first level corresponds to the sequence fragment overview. The second level is the detailed view of the individual sequences belonging to the contig. The red frame represents the visualized section on the third level. The third level is the base-pair view of the DNA contigs.
Figure 3Filter page available with BioMart in GigasDatabase. Filter criteria are deposited in blocks. The name is in the upper left corner of the block, and this section contains a list of elements that can be used for selection, corresponding to one table in the database structure. The filter criteria can be based upon contigs, EST and mRNA, clones, protein hits, nucleotide hits, genomic hits, expression, keywords, gene ontology, repeats, and SNP.
Figure 4Output page available with BioMart in GigasDatabase. Once the filters have been selected, it is possible to select elements for output. For example, best SwissProt (SP) description, best SP hit score, or best SP hit E-value can be exported in several output formats (HTML, Text, MS Excel).
Gene ontology annotation using the KEGG Automatic Annotation Server for the unique Crassostrea gigas sequences from the GigasDatabase.
| Categories | Number ESTs | % |
|---|---|---|
| Carbohydrate Metabolism | 590 | 7.6 |
| Energy Metabolism | 302 | 3.9 |
| Lipid Metabolism | 484 | 6.3 |
| Nucleotide Metabolism | 153 | 2.0 |
| Amino Acid Metabolism | 619 | 8.0 |
| Metabolism of Other Amino Acids | 171 | 2.2 |
| Glycan Biosynthesis and Metabolism | 273 | 3.5 |
| Biosynthesis of Polyketides and Nonribosomal Peptides | 4 | 0.1 |
| Metabolism of Cofactors and Vitamins | 173 | 2.2 |
| Biosynthesis of Secondary Metabolites | 117 | 1.5 |
| Xenobiotics Biodegradation and Metabolism | 355 | 4.6 |
| Transcription | 76 | 1.0 |
| Translation | 335 | 4.3 |
| Folding, Sorting and Degradation | 370 | 4.8 |
| Replication and Repair | 173 | 2.2 |
| Membrane Transport | 82 | 1.1 |
| Signal Transduction | 830 | 10.7 |
| Signaling Molecules and Interaction | 184 | 2.4 |
| Cell Motility | 188 | 2.4 |
| Cell Growth and Death | 324 | 4.2 |
| Cell Communication | 544 | 7.0 |
| Endocrine System | 611 | 7.9 |
| Immune System | 381 | 4.9 |
| Nervous System | 160 | 2.1 |
| Sensory System | 90 | 1.2 |
| Development | 137 | 1.8 |
| Behavior | 3 | 0.0 |
"Number ESTs" indicates the number of ESTs associated in the corresponding Gene Ontology, and "%" the corresponding percentage. 100% was established as the total number of unique sequences (7733) having an assigned gene ontology term.
Selection of some candidate Crassostrea gigas ESTs similar to genes potentially involved in some physiological regulatory networks.
| Accession No | Best hit description | |
|---|---|---|
| activin/myostatin like | ||
| inhibin bA like | ||
| smad | ||
| follistatin 1 | ||
| thrombospondin | ||
| cysteine rich bmp regulator 2 | ||
| tolloid-like protein | ||
| c1q-like adipose specific protein | ||
| leptin receptor overlapping transcript-like 1 | ||
| ovary-specific c1q-like factor | ||
| phosphatidylinositol 3-kinase p110 beta | ||
| acetyl-coenzyme a carboxylase alpha | ||
| adiponectin receptor 1 | ||
| camp-dependent protein kinase | ||
| carnitine o-acyltransferase | ||
| neuropeptide y | ||
| sterol regulatory element binding factor 1 | ||
| caspase | ||
| cactus | ||
| myeloid differentiation primary response gene | ||
| toll | ||
| kappa-b | ||
| big defensin | ||
| gigasin 2 protein | ||
| lbp bpi | ||
| lipopolysaccharide binding protein | ||
| lps-induced tn factor | ||
in silico microsatellite (Msat) mining of the GigasDatabase.
| Number of Msat | Percentage | |
|---|---|---|
| Total | 208 | 100 |
| Dinucleotide | 158 | 76.0 |
| Trinucleotide | 22 | 10.6 |
| Tetranucleotide | 18 | 8.7 |
| Pentanucleotide | 10 | 4.8 |
Percentage represents the percentage of each kind of repeated motif.
Putative Single Nucleotide Polymorphism (SNP) identification from the GigasDatabase.
| SNP Type | Number | Percentage |
|---|---|---|
| Non Synonymous | 1344 | 17.85 |
| Synonymous | 5097 | 67.69 |
| Indel | 1089 | 14.46 |
| Total | 7530 | 100 |
| A/G | 1860 | 36.49 |
| T/C | 1494 | 29.31 |
| A/C | 551 | 10.81 |
| A/T | 284 | 5.57 |
| T/G | 381 | 7.47 |
| G/C | 518 | 10.16 |
| TriNucleotides | 9 | 0.19 |
| Total | 7530 | 100 |
Indels are Insertions and Deletions detected in the sequences with a range of 1 to 13 bases
Two distinct subcategories of SNP have been identified:
- Transition (ts) e.g. A⇔G; T⇔C that are mutations between two puric bases (A/G) or between two pyrimidic bases (C/T)
- Transversion (tv) e.g. A⇔C; A⇔T; T⇔G; G⇔C that are mutations between puric and pyrimidic bases
Trinucleotides are polymorphic sites that show three alleles at the same locus.
Putative SNP distribution in contigs with various number of ESTs.
| Number of contigs | Putative SNP sites | |
|---|---|---|
| with > 50 sequences | 24 | 432 |
| with 11–50 sequences | 433 | 721 |
| with 6–10 sequences | 1352 | 1663 |
| with 5 sequences | 647 | 235 |
| with 4 sequences | 1753 | 1044 |
| with 3 sequences | 1145 | 1358 |
| with 2 sequences | 2586 | 2077 |
| Total | 7940 | 7530 |