| Literature DB >> 25366320 |
Hicham Benzekri, Paula Armesto, Xavier Cousin, Mireia Rovira, Diego Crespo, Manuel Alejandro Merlo, David Mazurais, Rocío Bautista, Darío Guerrero-Fernández, Noe Fernandez-Pozo, Marian Ponce, Carlos Infante, Jose Luis Zambonino, Sabine Nidelet, Marta Gut, Laureana Rebordinos, Josep V Planas, Marie-Laure Bégout, M Gonzalo Claros, Manuel Manchado1.
Abstract
BACKGROUND: Senegalese sole (Solea senegalensis) and common sole (S. solea) are two economically and evolutionary important flatfish species both in fisheries and aquaculture. Although some genomic resources and tools were recently described in these species, further sequencing efforts are required to establish a complete transcriptome, and to identify new molecular markers. Moreover, the comparative analysis of transcriptomes will be useful to understand flatfish evolution.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25366320 PMCID: PMC4232633 DOI: 10.1186/1471-2164-15-952
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Pre-processing summary of raw reads
| NGS platform | |||||
|---|---|---|---|---|---|
| Reference to Figure
| Illumina | 454 | |||
| Species |
|
|
| ||
| Total Input Reads | #1 | 1,800,249,230 | 2,101,324,072 | 5,663,225 | |
| Mean length |
|
|
| ||
| Rejected (total) | #2 | N | 237,941,945 | 345,251,849 | 1,562,661 |
| % | 13.5 | 17.1 | 26.8 | ||
| By contamination | N | 144,247,943 | 226,627,909 | 156.921 | |
| % | 8.2 | 11.2 | 3.0 | ||
| Useful reads | #3 | 1,561,416,814 | 1,746,258,741 | 3,774,412 | |
| 86.7 | 83.1 | 67.6 | |||
| Paired reads | N | 1,503,882,050 | 1,676,160,406 | - | |
| % | 83.3 | 79.5 | - | ||
| Single reads | N | 57,534,764 | 70,098,335 | 3,774,412 | |
| % | 3.2 | 3.3 | 67.6 | ||
| Mean length |
|
|
| ||
Figure 1Schematic representation of the pre-processing, assembling and reconciliation approach to obtain the final transcriptome. A, strategy for a set of only long reads. B, strategy combining long and short reads. The numbers with “#” serve as a reference for Table 1 description.
Figure 2Representation of transcript abundance with respect of their lengths in the (dark) and (grey) transcriptomes.
Overview of assembled transcriptomes in and
|
|
| |||||
|---|---|---|---|---|---|---|
|
|
|
| ||||
| Transcripts | % | Transcripts | % | Transcripts | % | |
| Transcripts | 252,416 | - | 697,125 | - | 523,637 | - |
| Artifacts2 | nc | - | 7,095 | 1.02 | 10,086 | 1.92 |
| Valid transcripts | 252,416 | 100.00 | 701,767 | 100.00 | 531,463 | 100.00 |
| >500pb | 37,593 | 14.90 | 156,083 | 22.24 | 165,860 | 31.22 |
| >200pb | 168,914 | 66.92 | 385,411 | 54.92 | 338,967 | 63.89 |
| Longest transcript | 6,050 | - | 40,163 | - | 30,526 | - |
| Transcripts with ortholog1 | 81,348 | 32.23 | 147,536 | 21.02 | 121,696 | 22.90 |
| Different orthologous IDs | 41,792 | 51.37 | 45,063 | 30.54 | 38,402 | 31.56 |
| Complete ORFs | 6,742 | 8.31 | 39,727 | 26.93 | 52,051 | 42.77 |
| Different, complete ORFs | 4,376 | 5.38 | 18,738 | 12.70 | 22,683 | 18.64 |
| C-terminus | 14,757 | 18.14 | 27,080 | 18.35 | 19,579 | 16.09 |
| N-terminus | 11,298 | 13.88 | 27,638 | 18.73 | 25,131 | 20.65 |
| Internal | 47,529 | 58.43 | 53,091 | 35.99 | 24,935 | 20.49 |
| Putative ncRNAs | 539 | 0.21 | 1,252 | 0.18 | 1,075 | 0.20 |
| Transcripts without ortholog1 | 171,067 | 67.56 | 545,491 | 77.73 | 408,692 | 76.90 |
| Putative new transcripts | 22,612 | 13.21 | 39,812 | 7.30 | 34,194 | 8.37 |
| Non-redundant new transcripts | nc | – | 14,451 | 2.65 | 15,603 | 3.55 |
| Unknown | 147,916 | 86.48 | 505,679 | 92.70 | 374,498 | 91.63 |
|
| nc | – |
| 8.48 |
| 10.16 |
The values were calculated using FullLengtherNext software. The minimum number of transcripts that can be considered as a reference transcriptome is shown in bold.
1Percentages for subclassifications of this category were calculated using this line as 100% reference.
2Artifacts refer mainly to misassemblies and chimeric contigs.
nc: Non-calculated.
Figure 3Screen captures of SoleaDB interface. A, illustration of the “Assemblies” tab containing all information about all transcriptome versions and subversions. B, capture of the part of the screen corresponding to the “Assembly info” tab where general information about the transcriptome as well as downloadable files and other useful tools can be found.
Figure 4Distribution of the level of similarity between both sole reference transcriptomes for those transcripts with (dark bars) or without (grey bars) a zebrafish ortholog.
Figure 5Venn’s diagrams reflecting coincidences by species among sole, Blast-based orthologs and transcripts with RefSeq/ENSEMBL ortholog for zebrafish. Diagrams are comparing the 11,743 Blast-based orthologs with the unique zebrafish RefSeq identifiers in SoleaDB for S. senegalensis (39,851) and S. solea (34,949) and with the unique zebrafish ENSEMBL identifiers in SoleaDB for S. senegalensis (39,270) and S. solea (34.389). Within the Venn’s diagrams, the numbers refer to the amount of transcripts in SoleaDB for S. senegalensis (Sse) and S. solea (Sso), the number of transcript in SoleaDB with a zebrafish RefSeq identifier (R) of with a zebrafish ENSEMBL identifier (E).
Figure 6Phylogenetic tree of Crybb and Crybb-like proteins in vertebrates. A neighbor-joining tree based on the alignment of vertebrates Crybb and Crybb-like sequences was built. Species are indicated as Sse (Solea senegalensis), Sso (Solea solea) Dre (Danio rerio), Tni (Tetraodon nigroviridis), Oni (Oreochromis niloticus), Ola (Oryzia slatipes), Cse (Cynoglossus semilaevis), Xla (Xenopus laevis) and Gga (Gallus gallus; see Additional file 7 for accession numbers). Solea sequences are indicated according to the transcript name assigned in SoleaDB. Clusters are indicated as arcs of a circle. The tree obtained was rooted using Xenopus laevis Cryga. Numbers adjacent to nodes indicate percentage bootstrap support; only values larger than 70% (over 1,000 replicates) are shown.
SSR summary statistics for whole and reference transcriptomes
| Type of SSR |
|
|
|---|---|---|
| Whole transcriptome | 266,434 | 316,388 |
| Di-nucleotide | 107,828 | 126,260 |
| Tri-nucleotide | 96,076 | 114,198 |
| Tetra-nucleotide | 39,102 | 44,118 |
| Others | 23,428 | 31,812 |
| Reference transcriptome | 49,955 | 67,610 |
| Di-nucleotide | 16,405 | 22,371 |
| Tri-nucleotide | 22,394 | 29,764 |
| Tetra-nucleotide | 6,935 | 8,829 |
| Others | 4,221 | 6,646 |
| Blast-based orthologs | 12,418 | 18,486 |
| Species-specific SSR1 | 1,273 | 4,803 |
| Conserved SSR | 11,145 | 13,683 |
| Same repeat motif2 | 6,596 | 6,772 |
| Different repeat motif | 4,549 | 6,911 |
Total number of SSRs and frequency according to their repeat motif are indicated.
(1)SSRs present in one species but not in orthologs of the other species.
(2)Exactly the same SSR repeat motif was found in both orthologs; in a few cases, SSR occurs once in one ortholog and twice in the other.
Figure 7Schematic representation of the probe selection strategy for the construction of the Senegalese sole oligonucleotide microarray. The number of transcripts that resulted after the described filtration is indicated.
Validation of microarray data using qPCR
| Microarray | qPCR | |||||
|---|---|---|---|---|---|---|
| SoleaDBcode | Gene | Gene name | FC |
| FC |
|
| Unigene18736 | Angiotensin I converting enzyme 2 |
| 4.5 | <0.001 | 4.9 | <0.05 |
| Unigene49603 | Angiotensinogen |
| 3.5 | <0.01 | 4.7 | <0.05 |
| Unigene39473 | Na-K-Cl cotransporter2 |
| 2.5 | <0.01 | 3.13 | <0.01 |
| Unigene252320 | Transferrin |
| 15.6 | <0.001 | 10.5 | <0.01 |
| Unigene214993 | Ferritin |
| 2.1 | <0.01 | 2.3 | <0.05 |
| Unigene39196 | Heat shock protein 90-alpha |
| 2.7 | <0.01 | 2.3 | <0.01 |
| Unigene54412 | Trypsinogen1a |
| 17.6 | <0.001 | 12.0 | <0.001 |
| Unigene31826 | Trypsinogen2 |
| 4.7 | <0.001 | 7.8 | <0.05 |
| Unigene53434 | Chymotrypsinogen2 |
| 7.2 | <0.001 | 6.3 | <0.05 |
| Unigene52166 | Elastase1 |
| 8.7 | <0.001 | 7.8 | <0.05 |
| Unigene53593 | Elastase4 |
| 7.1 | <0.001 | 4.6 | <0.05 |
| Unigene54920 | Complement component C3 |
| 3.8 | <0.05 | 34.0 | <0.05 |
| Unigene53521 | Lysozyme g |
| 2.5 | <0.05 | 3.6 | <0.05 |
| Unigene219622 | Thyroid stimulating hormone, beta |
| 2.5 | <0.05 | 4.6 | <0.001 |
| Unigene52404 | Transaldolase |
| 2.1 | <0.05 | 2.5 | <0.05 |
Fold-changes (FC) and p-values obtained for target genes by microarray and qPCR are indicated. Moreover, the transcript code in the SoleaDB for S. senegalensis v3 transcriptome is also shown. For qPCR, data were normalized to those of gapdh2 and referred to the calibrator group (36 ppt 3 DPH).