| Literature DB >> 24885229 |
José B Pereira-Leal1, Isabel A Abreu, Cláudia S Alabaça, Maria Helena Almeida, Paulo Almeida, Tânia Almeida, Maria Isabel Amorim, Susana Araújo, Herlânder Azevedo, Aleix Badia, Dora Batista, Andreas Bohn, Tiago Capote, Isabel Carrasquinho, Inês Chaves, Ana Cristina Coelho, Maria Manuela Ribeiro Costa, Rita Costa, Alfredo Cravador, Conceição Egas, Carlos Faro, Ana M Fortes, Ana S Fortunato, Maria João Gaspar, Sónia Gonçalves, José Graça, Marília Horta, Vera Inácio, José M Leitão, Teresa Lino-Neto, Liliana Marum, José Matos, Diogo Mendonça, Andreia Miguel, Célia M Miguel, Leonor Morais-Cecílio, Isabel Neves, Filomena Nóbrega, Maria Margarida Oliveira, Rute Oliveira, Maria Salomé Pais, Jorge A Paiva, Octávio S Paulo, Miguel Pinheiro, João A P Raimundo, José C Ramalho, Ana I Ribeiro, Teresa Ribeiro, Margarida Rocheta, Ana Isabel Rodrigues, José C Rodrigues, Nelson J M Saibo, Tatiana E Santo, Ana Margarida Santos, Paula Sá-Pereira, Mónica Sebastiana, Fernanda Simões, Rómulo S Sobral, Rui Tavares, Rita Teixeira, Carolina Varela, Maria Manuela Veloso, Cândido P P Ricardo.
Abstract
BACKGROUND: Cork oak (Quercus suber) is one of the rare trees with the ability to produce cork, a material widely used to make wine bottle stoppers, flooring and insulation materials, among many other uses. The molecular mechanisms of cork formation are still poorly understood, in great part due to the difficulty in studying a species with a long life-cycle and for which there is scarce molecular/genomic information. Cork oak forests are of great ecological importance and represent a major economic and social resource in Southern Europe and Northern Africa. However, global warming is threatening the cork oak forests by imposing thermal, hydric and many types of novel biotic stresses. Despite the economic and social value of the Q. suber species, few genomic resources have been developed, useful for biotechnological applications and improved forest management.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24885229 PMCID: PMC4070548 DOI: 10.1186/1471-2164-15-371
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Tissues and conditions used to produce the RNA libraries
| cDNAlibrary | Library description |
|---|---|
| L-1 | Phloem (adult trees) |
| L-2 | Xylem (adult trees) |
| L-3 | Abiotic stress: control (leaves) |
| L-4 | Abiotic stress: cold (leaves) |
| L-5 | Abiotic stress: heat (leaves) |
| L-6 | Seed germination |
| L-7 | Female flowers |
| L-8 | Male flowers |
| L-9 | Embryos from fruits at 4 developmental stages |
| L-10 | Whole fruits at 7 developmental stages |
| L-11 | Biotic Stress: roots (germinated acorns) infected by |
| L-12 | Biotic Stress: roots (thin white roots from 18-month-old plants) infected by |
| L-13 | Mycorrhizal symbiosis (roots). |
| L-14 | Annual stems from cork producing Quercus suber x cerris hybrid trees |
| L-15 | Annual stems from cork non-producing Quercus suber x cerris hybrid trees |
| L-16 | Bud sprouting (bud phases 1 and 2). |
| L-17 | Bud sprouting (bud phases 3 and 4). |
| L-18 | Abiotic Stress: drought, salt and oxidative stresses (roots and shoots) |
| L-19 | Leaves (from 8 locations for polymorphism detection) |
| L-20 | High quality cork |
| L-21 | Low quality cork |
All libraries were normalized.
Sequencing statistics
| Raw reads | Processed reads | Individual assemblies | |||||
|---|---|---|---|---|---|---|---|
| Library | # | <l> | # | <l> | # total | Contigs | Singlets |
| L-1 | 392152 | 200.2 | 216861 | 232.3 | 30220 | 26693 | 3527 |
| L-2 | 315360 | 203.0 | 208162 | 237.6 | 23962 | 21499 | 2463 |
| L-3 | 182571 | 193.6 | 118708 | 209.1 | 16399 | 15272 | 1127 |
| L-4 | 215084 | 195.7 | 147735 | 210.8 | 19573 | 18060 | 1513 |
| L-5 | 153898 | 185.2 | 97870 | 203.0 | 14372 | 13255 | 1117 |
| L-6 | 371060 | 286.7 | 279793 | 304.5 | 32700 | 27735 | 4965 |
| L-7 | 346435 | 235.1 | 216309 | 253.7 | 30694 | 28179 | 2515 |
| L-8 | 393501 | 248.9 | 285776 | 264.2 | 33550 | 29758 | 3792 |
| L-9 | 524852 | 295.0 | 433762 | 307.9 | 48799 | 37357 | 11442 |
| L-10 | 570370 | 308.3 | 449849 | 321.8 | 50522 | 39471 | 11051 |
| L-11 | 220568 | 273.4 | 149645 | 294.3 | 18215 | 17186 | 1029 |
| L-12 | 104517 | 281.2 | 73958 | 298.3 | 8442 | 8188 | 254 |
| L-13 | 743576 | 248.8 | 411035 | 263.7 | 42318 | 38830 | 3488 |
| L-14 | 413925 | 271.2 | 323372 | 278.6 | 38794 | 34102 | 4692 |
| L-15 | 401170 | 261.0 | 321153 | 269.2 | 38359 | 33447 | 4912 |
| L-16 | 320673 | 259.2 | 190983 | 277.7 | 21694 | 19607 | 2087 |
| L-17 | 350843 | 262.0 | 203567 | 282.3 | 23857 | 21989 | 1868 |
| L-18 | 774553 | 254.5 | 506642 | 268.6 | 46983 | 41086 | 5897 |
| L-19 | 650604 | 272.3 | 333283 | 288.9 | 37926 | 29543 | 8383 |
Processed Reads represents the number of nuclear sequences after the pre-processing (Figure 1). # stands for number,
Figure 1Schematic representation of the bioinformatics pipeline, indicating the software used at each step.
Figure 2Assembly and predicted peptide statistics. (A) Unigene length distribution after multi-library assembly. There are 12 additional unigenes longer than 4600 bases, not shown on the plot, with the longest one being 9189 bases. (B) Unigene coverage (reads per unigene). (C) Serial clustering of predicted proteins based on the cork oak unigenes, and of the predicted proteins from the genomes of two model plant species.
Assembly metrics of this project compared with those of two large oak transcriptome sequencing projects
|
|
|
| |
|---|---|---|---|
| Sequencing platform | 454 | 454 + Sanger | 454 + Illumina |
| Libraries | 21 | 14 (454) + 20 (Sanger) | 16 (454) + 8 (Illumina) |
| Total reads | 7,445,712 | 1,578,192 (454) + 145,827 (Sanger) | 821,534 (454) + 255,237,702 (Illumina) |
| Contigs & single reads | 159,298 | 222,671 | 65,712 |
| mean length | 148.5 | 235.8 | 1003 |
Figure 3Gene Ontology classification of nuclear unigenes. Classification was performed using CateGOrizer, counting single occurrences and the Generic GO Slim [25]. Percentages are shown down to 3% only, and the functional classes are ordered by frequency.
Unigene naming criteria are as follows
| Method | Assignment | |
|---|---|---|
|
| Ortholog | |
|
| ||
| Alignment length | identity | |
| > 85% | > 35% | High confidence |
| > 70% | > 25% | Homolog |
| < 70% | > 30% | Conserved domain |
| < 70% | < 30% | Low confidence |
If a gene is bi-directional best hit (BDBH) of X in A. thaliana (or P. trichocarpa), we term it ortholog of X; if it is similar to X in A. thaliana (or P. trichocarpa) using BLASTp and it aligns in 85% of its length with more than 35% identity, we term it a High confidence X in Q. suber, etc.
Figure 4Distribution of annotation classes in the cork oak translated unigenes.
Figure 5Unique Interpro domains assigned to the unigenes and two other transcriptomes for and , as well as for species with completely sequenced genomes and .
Figure 6Number of the cork oak’s predicted peptides unique BLAST hits in other plant genomes.
Figure 7Overlap between the cork oak unigenes (brown) and the unigenes of the red oak, English oak and Chinese chestnut. Numbers represent homologues defined at a e < 10-5 cut off, and in parentheses at e < 10-2.
Figure 8CorkOakdb.org. Screenshot of the top part of the gene view.