| Literature DB >> 21762488 |
Noé Fernández-Pozo1, Javier Canales, Darío Guerrero-Fernández, David P Villalobos, Sara M Díaz-Moreno, Rocío Bautista, Arantxa Flores-Monterroso, M Ángeles Guevara, Pedro Perdiguero, Carmen Collada, M Teresa Cervera, Alvaro Soto, Ricardo Ordás, Francisco R Cantón, Concepción Avila, Francisco M Cánovas, M Gonzalo Claros.
Abstract
BACKGROUND: Pinus pinaster is an economically and ecologically important species that is becoming a woody gymnosperm model. Its enormous genome size makes whole-genome sequencing approaches are hard to apply. Therefore, the expressed portion of the genome has to be characterised and the results and annotations have to be stored in dedicated databases. DESCRIPTION: EuroPineDB is the largest sequence collection available for a single pine species, Pinus pinaster (maritime pine), since it comprises 951 641 raw sequence reads obtained from non-normalised cDNA libraries and high-throughput sequencing from adult (xylem, phloem, roots, stem, needles, cones, strobili) and embryonic (germinated embryos, buds, callus) maritime pine tissues. Using open-source tools, sequences were optimally pre-processed, assembled, and extensively annotated (GO, EC and KEGG terms, descriptions, SNPs, SSRs, ORFs and InterPro codes). As a result, a 10.5× P. pinaster genome was covered and assembled in 55 322 UniGenes. A total of 32 919 (59.5%) of P. pinaster UniGenes were annotated with at least one description, revealing at least 18 466 different genes. The complete database, which is designed to be scalable, maintainable, and expandable, is freely available at: http://www.scbi.uma.es/pindb/. It can be retrieved by gene libraries, pine species, annotations, UniGenes and microarrays (i.e., the sequences are distributed in two-colour microarrays; this is the only conifer database that provides this information) and will be periodically updated. Small assemblies can be viewed using a dedicated visualisation tool that connects them with SNPs. Any sequence or annotation set shown on-screen can be downloaded. Retrieval mechanisms for sequences and gene annotations are provided.Entities:
Mesh:
Year: 2011 PMID: 21762488 PMCID: PMC3152544 DOI: 10.1186/1471-2164-12-366
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Gene libraries providing sequences for EuroPineDB
| Gene library | Tissue | Species | Experimental conditions |
|---|---|---|---|
| Pp-454 | Roots, stem, embryos, callus, cones, male and female strobili, buds, xylem, phloem. | ESTs from several different tissues | |
| LG0BCA | Buds | ESTs, adult buds | |
| GEMINIa | Xylem | ESTs from normal, compression, opposite, early and late wood | |
| SSH Xylem | Xylem | SSH, compression vs. opposite, and juvenile vs. mature | |
| UPM | Roots, stem, needles | SSH, drought stress | |
| ARG | Roots | SSH, ammonium excess vs. ammonium deficiency | |
| SSH Lac-Pine | Roots | SSH, inoculated with | |
| SSH Mic | Roots | SSH, mycorrhizal vs. not mycorrhizal | |
| CK16b | Cotyledons | SSH, adventitious shoot induction | |
| SSH Embryos | Embryos | SSH, lack of N vs. normal N | |
| Pin | Cotyledons | ESTs from photosynthetic tissues | |
| EMBL v. 102 | - | Miscellaneous |
a GEMINI gene library was described in [8]
b CK16 gene library was described in [4]
Figure 1A, Size distribution of pre-processed 454 and Sanger's reads used for EuroPineDB. As expected, Sanger reads were longer than 454 reads in length. B, Contig size distribution within EuroPineDB.
Statistics for the gene libraries shown in Table 1
| Gene library | Raw | Curated | Mean | Singletons | Contigs | UniGenes | Discarded nt (%) by | ||
|---|---|---|---|---|---|---|---|---|---|
| QV | Vector | Artefacts | |||||||
| Pp-454 | 913 786 | 844 737 | 227 | 471 | 54 960 | 55 431 (59.5%) | 52.5% | NA | 3.03% |
| LG0BCA | 8766 | 8766 | 608 | 3834 | 1363 | 5197 (68.2%) | NA | NA | 0.24% |
| GEMINI | 13 057 | 7916 | 458 | 3066 | 1124 | 4190 (49.9%) | 9.4% | 10.4% | 2.9% |
| SSH Xylem | 992 | 790 | 474 | 385 | 142 | 527 (49.5%) | 5.35% | 31.8% | 2.5% |
| UPM | 2806 | 1115 | 465 | 258 | 157 | 415 (31.8%) | 3.2% | 15.9% | 21.04% |
| ARG | 218 | 148 | 394 | 127 | 7 | 134 (47.8%) | 22.5% | 5.1% | 5.3% |
| SSH Lac-Pine | 351 | 231 | 350 | 210 | 8 | 218 (34.4%) | 18.5% | 4.7% | 2.64% |
| SSH Mic | 294 | 194 | 314 | 149 | 13 | 162 (38.3%) | 15.3% | 13.4% | 5.75% |
| CK16 | 358 | 282 | 575 | 221 | 24 | 245 (65.3%) | NA | 0.05% | 6.6% |
| SSH Embryos | 96 | 57 | 437 | 34 | 6 | 40 (57.5%) | 1.7% | 20.6% | 8.8% |
| Pin | 863 | 617 | 532 | 335 | 86 | 421 (68.9%) | 10.2% | 9% | 2.9% |
| EMBL v. 102 | 13 206 | 12 673 | 502 | 3704 | 1963 | 5667 (NA) | NA | 0.1% | 0.58% |
| TOTAL | 954 793 | 880 295 | |||||||
| | 951 641 | 877 523 | 597 | 684 | 54 648 | 55 332 (59.5%) | |||
| | 2770 | 2466 | 730 | 476 | 203 | 679 (65.9%) | |||
| | 382 | 306 | 574 | 239 | 27 | 266 (63.2%) | |||
QV, quality value. NA, not applicable.
Mean lengths are calculated with gene library reads. Nevertheless, they are calculated for contigs in the last three rows corresponding to the three species.
Artefacts include poly-A, poly-T, adaptors, contaminant sequences, and chimerical inserts.
Figure 2An example of microarray page in EuroPineDB Web. The upper part contains general information about the microarray as well as some statistical representation of the GO term distribution. The lower part is a representation of all sequences printed in a selected block. The colour codes are defined at the bottom of the Web page (not shown) and in the text.
Figure 3Navigating through EuroPineDB. Arrowheads indicate the direction of navigation. Green boxes correspond to available views from all pages (thus, no incoming arrowhead is specified). Violet text indicates the option of downloading sequences in FASTA format.