| Literature DB >> 21791039 |
Nicolas Blavet1, Delphine Charif, Christine Oger-Desfeux, Gabriel A B Marais, Alex Widmer.
Abstract
BACKGROUND: The genus Silene is widely used as a model system for addressing ecological and evolutionary questions in plants, but advances in using the genus as a model system are impeded by the lack of available resources for studying its genome. Massively parallel sequencing cDNA has recently developed into an efficient method for characterizing the transcriptomes of non-model organisms, generating massive amounts of data that enable the study of multiple species in a comparative framework. The sequences generated provide an excellent resource for identifying expressed genes, characterizing functional variation and developing molecular markers, thereby laying the foundations for future studies on gene sequence and gene expression divergence. Here, we report the results of a comparative transcriptome sequencing study of eight individuals representing four Silene and one Dianthus species as outgroup. All sequences and annotations have been deposited in a newly developed and publicly available database called SiESTa, the Silene EST annotation database.Entities:
Mesh:
Year: 2011 PMID: 21791039 PMCID: PMC3157477 DOI: 10.1186/1471-2164-12-376
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
SiESTa sequence content
| Library | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| SlM | SlF | SlFf | SdM | SdF | SmM | SvH | Ds | supSL | supSD | |
| # ESTs | 119 | 136 | 110 | 113 | 115 | 127 | 123 | 198 | 347 | 228 |
| Nucleotides (Mbp) | 28 | 32 | 25 | 27 | 27 | 29 | 29 | 46 | 85 | 54 |
| # Unigenes | 61 | 40 | 49 | 71 | 69 | 51 | 32 | 30 | 129 | 129 |
| % Contigs | 17 | 34 | 17 | 15 | 17 | 28 | 36 | 43 | 24 | 18 |
| % ESTs in contigs | 57 | 81 | 63 | 47 | 50 | 71 | 83 | 91 | 72 | 54 |
| Avg. EST length (bp) | 235 | 232 | 225 | 235 | 233 | 230 | 234 | 232 | 230 | 233 |
| Avg. contig length (bp) | 403 | 430 | 413 | 395 | 385 | 392 | 401 | 463 | 422 | 396 |
(#) Units in thousands of sequences.
Figure 1Relative frequencies of the most represented Biological Process GO sub-classes across libraries. Figure 1 shows the ten most frequent biological process GO terms at level 3 in the five species Silene latifolia, S. dioica, S. marizii, S. vulgaris and Dianthus superbus.
Expression differences among all eight libraries for the ten most frequently represented GO Slim terms
| GO Slim term | Expression percentage | |||||||
|---|---|---|---|---|---|---|---|---|
| SlM | SlF | SlFf | SdM | SdF | SmM | SvH | Ds | |
| Response to stress | 2.3% | 8.1% | 2.3% | 1.5% | 2.8% | 6.6% | 8.2% | 4.2% |
| Cellular component organization | 2.6% | 2.9% | 2.0% | 1.2% | 2.0% | 3.7% | 3.5% | 7.5% |
| Translation | 1.3% | 7.3% | 1.8% | 1.0% | 4.7% | 2.2% | 3.4% | 2.5% |
| Photosynthesis | 2.2% | 6.2% | 2.3% | 0.8% | 2.7% | 4.6% | 3.8% | 0.4% |
| Kinase activity | 2.8% | 2.7% | 2.5% | 0.8% | 1.1% | 3.2% | 3.2% | 4.6% |
| Cell communication | 2.3% | 2.2% | 2.7% | 1.2% | 1.7% | 2.1% | 1.9% | 2.2% |
| Signal transduction | 2.2% | 2.2% | 2.6% | 1.1% | 1.6% | 2.1% | 1.9% | 2.1% |
| Response to abiotic stimulus | 1.0% | 3.5% | 1.2% | 0.5% | 1.2% | 2.5% | 3.1% | 2.2% |
| Transcription | 1.2% | 1.8% | 1.1% | 0.8% | 1.8% | 1.3% | 1.4% | 1.4% |
| Response to biotic stimulus | 0.4% | 2.1% | 0.4% | 0.3% | 0.5% | 1.2% | 1.4% | 0.2% |
(Biological processes and Molecular functions). For each library, the expression percentage is calculated as the number of reads included in contigs matching a term divided by the total number of reads included in all contigs. Terms are sorted by the total expression common to all libraries in descending order.
Prot4EST ORF prediction results
| Library | Predicted ORFs | ||||
|---|---|---|---|---|---|
| Similarity | ESTScan | Longest ORF | Average | Total | |
| supSL | 39% | 31% | 31% | 208 | 129251 |
| supSD | 29% | 36% | 35% | 193 | 129154 |
| SmM | 36% | 28% | 35% | 205 | 50798 |
| SvH | 60% | 24% | 16% | 245 | 32131 |
| Ds | 59% | 23% | 18% | 259 | 29668 |
ORF prediction based on similarity with BLAST results, ESTScan prediction and longest reading frame.
BLASTX hits of contigs and singletons in the eight individual libraries with different proteomes
| Library | Contigs | |||||||
|---|---|---|---|---|---|---|---|---|
| Uniprot | ||||||||
| %hit | % unique | %hit | % unique | %hit | % unique | %hit | % unique | |
| SlM | 31% | 21% | 31% | 20% | 33% | 23% | 41% | 33% |
| SlF | 76% | 49% | 76% | 46% | 78% | 53% | 78% | 66% |
| SlFf | 35% | 24% | 35% | 24% | 36% | 26% | 56% | 47% |
| SdM | 18% | 13% | 19% | 13% | 22% | 14% | 31% | 22% |
| SdF | 31% | 22% | 31% | 21% | 32% | 24% | 49% | 43% |
| SmM | 55% | 35% | 54% | 33% | 56% | 39% | 57% | 48% |
| SvH | 74% | 48% | 74% | 45% | 76% | 52% | 76% | 64% |
| Ds | 73% | 47% | 73% | 43% | 74% | 51% | 75% | 63% |
| Library | ||||||||
| %hit | % unique | %hit | % unique | %hit | % unique | %hit | % unique | |
| SlM | 14% | 9% | 16% | 9% | 21% | 10% | 32% | 19% |
| SlF | 42% | 27% | 44% | 26% | 46% | 31% | 47% | 38% |
| SlFf | 17% | 11% | 18% | 11% | 22% | 12% | 45% | 30% |
| SdM | 10% | 6% | 13% | 6% | 21% | 7% | 32% | 14% |
| SdF | 16% | 10% | 18% | 10% | 22% | 11% | 39% | 25% |
| SmM | 26% | 17% | 28% | 17% | 30% | 20% | 32% | 25% |
| SvH | 44% | 30% | 47% | 29% | 50% | 33% | 55% | 41% |
| Ds | 46% | 33% | 46% | 32% | 48% | 36% | 49% | 42% |
Table 4 shows the number of hits for both contigs and singletons. Non-redundant accessions are recorded in the '% unique' column. A cut-off E-value of 1E-4 was used for each database.
Silene contigs with hits that are exclusive to the A. thaliana, V. vinifera, and P. trichocarpa proteomes
| Library | # hits in At, Vv, Pt | # hits At | # hits Vv | # hits Pt |
|---|---|---|---|---|
| SlM | 2886 | 74 | 78 | 220 |
| SlF | 9887 | 49 | 102 | 152 |
| SlFf | 2732 | 69 | 104 | 105 |
| SdM | 1758 | 32 | 77 | 391 |
| SdF | 3339 | 99 | 70 | 212 |
| SmM | 7418 | 66 | 107 | 187 |
| SvH | 8270 | 44 | 71 | 130 |
| Ds | 8885 | 64 | 93 | 120 |
| Mean | 5646 | 62 | 87 | 189 |
In the second column are numbers of contigs with hits occurring in all three species; the following columns give the numbers of contigs with hits exclusively to A. thaliana (At) (3rd column) (these sequences do not have significant matches with either V. vinifera or P. trichocarpa), V. vinifera (Vv) (4th column) and P. trichocarpa (Pt) (5th column).
Figure 2Identification of potential Caryophyllaceae-specific genes. The first step identifies sequences without known homologues in reference species; the second and the third steps select sequences that are found in at least two SiESTa libraries. Sequences that partially match repeated elements are removed. In our final step we compared the remaining sequences with the Silene EST library of Moccia et al. [2] to identify potential Caryophyllaceae-specific genes.
Library SNP content
| Library | SlM | SlF | SlFf | SdM | SdF | SmM | SvH | Ds |
|---|---|---|---|---|---|---|---|---|
| Contigs* | 2909 | 5486 | 2287 | 2993 | 2912 | 4982 | 5028 | 6094 |
| Contigs with SNPs | 1221 | 1517 | 976 | 1333 | 1078 | 1709 | 1619 | 2513 |
| SNPs | 6648 | 6308 | 5576 | 6307 | 4653 | 7361 | 7381 | 12282 |
| Substitutions | 4847 | 5165 | 3873 | 5356 | 3516 | 5681 | 6402 | 9927 |
| % Transitions/transversions | 52/48 | 61/39 | 47/53 | 63/37 | 56/44 | 57/43 | 60/40 | 61/39 |
| % heterozygous positions | 0.39 | 0.19 | 0.43 | 0.38 | 0.28 | 0.26 | 0.27 | 0.32 |
* Only contigs assembled from at least 4 reads were considered. The total length of these contigs was used to calculate the percentage of heterozygous positions. All SNPs that are not due to substitutions are indels.