| Literature DB >> 19014685 |
Ben F Koop1, Kristian R von Schalburg, Jong Leong, Neil Walker, Ryan Lieph, Glenn A Cooper, Adrienne Robb, Marianne Beetz-Sargent, Robert A Holt, Richard Moore, Sonal Brahmbhatt, Jamie Rosner, Caird E Rexroad, Colin R McGowan, William S Davidson.
Abstract
BACKGROUND: Salmonids are of interest because of their relatively recent genome duplication, and their extensive use in wild fisheries and aquaculture. A comprehensive gene list and a comparison of genes in some of the different species provide valuable genomic information for one of the most widely studied groups of fish.Entities:
Mesh:
Year: 2008 PMID: 19014685 PMCID: PMC2628678 DOI: 10.1186/1471-2164-9-545
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Salmonid cDNA libraries, sequencing and assembly summary statistics for data provided in this study.
| Thymus (evd) | 31488 | 1.5 | 59264 | 23768 | 8685 | 66 | 2.3 | 16 |
| Thyroid (eve) | 30720 | 1.9 | 58700 | 28045 | 12378 | 37 | 2.1 | 15 |
| Head kidney (evf) | 31104 | 1.5 | 59541 | 28316 | 10832 | 30 | 2.1 | 16 |
| Pyloric Caecum (pla, plb, plc, plna, plnb, pha, phc) | 9584 | 0.9 | 13543 | 5691 | 2766 | 35 | 2.3 | 17 |
| Brain, kidney, spleen (rgb2) | 60288 | 1.6 | 97171 | 42562 | 26504 | 58 | 2.1 | 15 |
| Brain, kidney, spleen (sjb) | 5835 | 1.8 | 10085 | 6656 | 3541 | 8 | 1.5 | 18 |
| Brain, kidney, spleen (rgd) | 3840 | 2.1 | 5935 | 3941 | 2970 | 31 | 1.5 | 82 |
| Brain, kidney, spleen (evc) | 3744 | 1.8 | 5729 | 3841 | 2487 | 10 | 1.5 | 80 |
| Brain, kidney, spleen (rge) | 7296 | 2.0 | 10813 | 6123 | 3924 | 173 | 1.8 | 98 |
| Brain, kidney, spleen (evi) | 5376 | 1.4 | 10051 | 5424 | 1247 | 9 | 1.9 | 100 |
| Eye, kidney, spleen (evb) | 4800 | 1.6 | 8630 | 5537 | 3359 | 12 | 1.6 | 93 |
| Brain, kidney, spleen (evl) | 5760 | 1.5 | 10975 | 5926 | 1309 | 6 | 1.9 | 100 |
| Brain, kidney, spleen (bkhp) | 2304 | 0.9 | 3624 | 2420 | 1346 | 6 | 1.5 | 100 |
a number of clones from which at least one sequence (5' or 3') was obtained
b average EST fragment size cloned (kb), estimated from > 30 clone digests.
c number of 5' and 3' EST sequences obtained
d number of EST contigs (1st stage assembly) that includes singlets
e number of contigs containing a single sequence
f the size of the contig containing the largest number of sequences
g the average size of all contigs (includes singletons)
h percent of the putative transcripts that are unique to the species.
Summary of salmonid ESTs and contig assemblies.
| # EST sequencesa | 436629 | 246704 | 14535 | 12056 | 10051 | 10842 | 10975 | 3624 | 36785 |
| Assembly Stage1b | |||||||||
| # contigs (2+)c | 70,845 | 42423 | 2890 | 2480 | 4178 | 4464 | 4616 | 1074 | 9044 |
| # singletonsd | 47,139 | 26935 | 6295 | 4118 | 1247 | 2510 | 1314 | 1346 | 7019 |
| # transcriptse | 117,984 | 69358 | 9185 | 6598 | 5425 | 6974 | 5930 | 2420 | 16063 |
| Assembly Stage2f | |||||||||
| # transcriptsg | 81398 | 51199 | 8517 | 6200 | 4946 | 6446 | 5408 | 2380 | 12159 |
| # hitsh | 29844 | 19266 | 3684 | 3561 | 1838 | 2314 | 1780 | 198 | 6139 |
| % with hitsi | 37 | 38 | 43 | 57 | 37 | 36 | 33 | 8 | 50 |
a number of EST sequences for all of the species including those in GenBank
b Assembly stage 1 refers to PHRAP assembly using parameters 0.99 repeat_frequency and 100 minscore
c number of contigs with 2 or more sequences
d number of contigs with 1 sequence
e total number of transcripts including singletons
f Assembly stage 2 refers to PHRAP assembly using parameters 0.96 repeat_frequency and 300 minscore
g the number of transcripts that result from a re-assembly of all stage 1 transcripts using PHRAP parameters 96 repeat_frequency and 300 minscore
h number of transcripts that have a BLASTX hit of < 1e-10 to SwissProt/CDD databases.
i percent of stage 2 assembled transcripts that have a BLASTX hit.
Figure 1Number of aligned contigs (y-axis) out of 81,398 total contigs is plotted against percent similarity of alignments (x- axis).
Cross-species comparisons of contig transcripts.
| # contigs | # missing | # missing | # missing | % sim | Avelene | % sim | Avelen | |
| Atlantic salmon (SJ)g | 5781 | 479 | 1210 | 354 | 98.4 | 705 | 93.4 | 493 |
| Atlantic salmon (all)h | 81398 | 36351 | 93.3 | 504 | ||||
| Rainbow trout | 50256 | 13626 | 93.8 | 495 | ||||
| Chinook salmon | 8517 | 797 | 1224 | 426 | 94.2 | 510 | 95.5 | 510 |
| Sockeye salmon | 6200 | 577 | 770 | 298 | 94.6 | 571 | 95.7 | 569 |
| Brook trout | 5424 | 285 | 627 | 174 | 94.4 | 580 | 93.9 | 522 |
| Lake whitefish | 6446 | 804 | 1420 | 608 | 92.5 | 425 | 92.2 | 399 |
| Grayling | 5408 | 657 | 1136 | 506 | 91.7 | 435 | 91.3 | 400 |
| Northern pike | 2380 | 1894 | 2001 | 1846 | 89.6 | 241 | 89.4 | 251 |
| Rainbow smelt | 12159 | 7462 | 7812 | 6920 | 86.2 | 431 | 86.1 | 419 |
a number of contigs that are not found in the Atlantic salmon database
b number of contigs that are not found in the rainbow trout database
c number of contigs that are not found in either the Atlantic salmon or rainbow trout database
d percent identity compared to the top BLASTN hit to the Atlantic salmon database over 200 bp and e-value < 1e-25. In the case of Atlantic salmon (SJ) the comparison is to the McConnell strain.
e average length of the BLASTN hit
f percent identity compared to the top BLASTN hit to the rainbow trout database over 200 bp and e-value < 1e-25
g only Atlantic salmon ESTs from the Saint John River strain
h all Atlantic salmon ESTs other than those in note "g" above
Figure 2Screen shot of Atlantic salmon contig viewer. The top panel shows the alignment of 100/99 (first stage) clusters along with the number of individual EST reads in each. The second panel shows the 5 largest ORFs and reading frame, the BLASTX hits and reading frame, the Phred quality scores for each aligned position, and indicates whether TargetIdentifier has indicated that this clone is full-length and the predicted position of the START codon (green triangle). Selectable colored bars provide alignment links. The third panel gives specifics of the database hits and links to alignments and database entries.
Gene sets used in phylogenetic analysis.*
| 1 | 25 | 302 | Om/Sf | - | no | 0.0E+00 | Polyadenylate-binding protein 1 | |
| 2 | 11 | 287 | - | - | yes | 1.0E-103 | AP-3 complex subunit sigma-1 | |
| 3 | 18 | 455 | Om/Sf | - | yes | 4.0E-31 | DNA-binding protein inhibitor ID-1 | |
| 4 | 16 | 283 | - | - | yes | 2.0E-46 | Dynein light chain 1, cytoplasmic | |
| 5 | 11 | 438 | Om/Sf | C/T | yes | 0.0E+00 | Guanine nucleotide-binding protein G(i) | |
| 6 | 26 | 271 | - | - | - | 1.0E-138 | SPARC precursor | |
| 7 | 12 | 301 | - | - | no | Unknown | ||
| 8 | 17 | 307 | Om/Sf | - | yes | 4.0E-80 | Calmodulin | |
| 9 | 12 | 341 | Ss/Sf | - | no | 4.0E-80 | RNA-binding protein 8A | |
| 10 | 16 | 370 | Om/Sf | - | no | 4.0E-95 | 60S ribosomal protein L9 | |
| 11 | 11 | 305 | Om/Sf | - | yes | 3.0E-95 | Proteasome subunit beta type-6 precursor | |
| 12 | 8 | 411 | - | - | - | 1.0E-82 | Sorting nexin-3 | |
| 13 | 7 | 448 | Ss/Sf | - | yes | 8.0E-97 | Chloride intracellular channel protein 2 | |
| 14 | 13 | 500 | Om/Sf | - | no | 6.0E-73 | Nicotinamide riboside kinase 2 | |
| 15 | 14 | 379 | Om/Sf | S/C | no | 2.0E-47 | Stathmin | |
| 16 | 7 | 638 | Ss/Sf | - | no | 1.0E-154 | Ribosome production factor 1 | |
| 17 | 11 | 713 | - | - | yes | 6.0E-23 | Uncharacterized protein C8orf4 homolog | |
| 18 | 9 | 438 | - | - | - | 1.0E-125 | U3 small nucleolar ribonucleoprotein | |
| 19 | 10 | 314 | - | - | yes | 1.0E-35 | CCAAT/enhancer-binding protein beta | |
| 20 | 18 | 313 | - | - | yes | 1.0E-68 | Proteasome activator complex subunit 1 | |
| 21 | 10 | 505 | - | C/T | yes | 1.0E-135 | DCN1-like protein 1 | |
| 22 | 10 | 620 | - | - | yes | 1.0E-91 | Complement 1 Q subcomponent-binding | |
| 23 | 12 | 442 | - | S/C | - | 1.0E-149 | ADP/ATP translocase 2 | |
| 24 | 8 | 517 | - | - | no | 2.0E-78 | Transmembrane and coiled-coil domain | |
| 25 | 14 | 471 | Om/Sf | - | yes | 1.0E-101 | 60S ribosomal protein L10a | |
| 26 | 13 | 409 | - | - | no | 2.0E-67 | Receptor expression-enhancing protein 5 | |
| 27 | 14 | 308 | Om/Sf | - | yes | 0.0E+00 | Rab GDP dissociation inhibitor beta | |
| 28 | 19 | 311 | Ss/Sf | S/C | yes | 8.0E-64 | Peroxiredoxin-5, mitochondrial prec. | |
| 29 | 6 | 428 | - | S/C | no | 1.0E-115 | Calcium-dependent serine proteinase | |
| 30 | 10 | 355 | - | S/C | yes | Unknown | ||
| 31 | 15 | 486 | - | - | yes | 8.0E-78 | Transmembrane protein 50A | |
| 32 | 11 | 332 | - | - | yes | 2.0E-90 | Ras-related protein Rap-1b precursor | |
| 33 | 22 | 291 | Om/Sf | - | yes | 3.0E-71 | Cellular nucleic acid-binding protein | |
| 34 | 10 | 268 | - | - | yes | 0.0E+00 | Dolichyl-diphosphooligosaccharide | |
| 35 | 8 | 367 | Om/Sf | S/C | no | 1.0E-117 | Hypoxanthine-guanine phosphoribosyltran. | |
| 36 | 9 | 609 | Om/Sf | - | no | 2.0E-48 | FK506-binding protein 1A | |
| 37 | 7 | 379 | - | - | yes | 1.0E-161 | Ubiquitin carboxyl-terminal hydrolase | |
| 38 | 17 | 599 | Ss/Sf | C/T | yes | 1.0E-120 | Palmitoyl-protein thioesterase 1 prec. | |
| 39 | 12 | 408 | - | S/C | no | 7.0E-44 | Translation machinery-associated protein | |
| 40 | 15 | 389 | Ss/Sf | - | yes | 4.0E-76 | Proteasome activator complex subunit 2 | |
| 41 | 8 | 413 | - | - | no | 1.0E-100 | Synapse-associated protein 1 | |
| 42 | 20 | 333 | - | S/C | yes | 8.0E-66 | Reticulon-4 | |
| 43 | 17 | 336 | - | S/T | no | 5.0E-97 | Signal peptidase complex catalytic sub. | |
| 44 | 11 | 483 | - | S/C | yes | 1.0E-112 | Prohibitin-2 | |
| 45 | 7 | 419 | Om/Sf | - | yes | 1.0E-136 | F-actin-capping protein subunit alpha-2 | |
| 46 | 14 | 465 | - | - | no | 7.0E-16 | Hypoxia-inducible factor 1 alpha inhibitor | |
| 47 | 9 | 610 | Om/Sf | - | yes | 9.0E-96 | Survival of motor neuron-related-splicing | |
| 48 | 19 | 604 | - | C/T | yes | 4.0E-61 | Gamma-aminobutyric acid receptor- | |
| 49 | 17 | 413 | - | C/T | yes | 1.0E-161 | COP9 signalosome complex sub. 6 | |
| 50 | 13 | 350 | - | S/C | yes | 1.0E-141 | Coatomer subunit epsilon | |
| 51 | 16 | 289 | - | - | yes | 7.0E-55 | DNA-binding protein inhibitor ID-2 | |
| 52 | 12 | 484 | Om/Sf | S/C | yes | 0.0E+00 | Protein disulfide-isomerase A3 prec. | |
| 53 | 19 | 356 | - | - | yes | 4.0E-81 | Density-regulated protein | |
| 54 | 9 | 439 | Om/Sf | S/C | yes | 4.0E-99 | ADP-ribosylation factor 6 | |
| 55 | 8 | 300 | - | yes | 1.0E-124 | Neuronal membrane glycoprotein M6-b | ||
| 56 | 10 | 273 | - | - | yes | 0.0E+00 | WD repeat-containing protein 1 | |
| 57 | 11 | 304 | - | S/T | yes | 1.0E-115 | Annexin A4 | |
| 58 | 12 | 562 | - | - | no | 3.0E-63 | Transcription initiation factor TFIID sub. | |
| 59 | 7 | 553 | - | - | yes | 2.0E-18 | Mitochondrial import receptor subunit | |
| 60 | 8 | 561 | Ss/Sf | S/C | no | 1.0E-158 | Mortality factor 4-like protein 1 | |
| 61 | 11 | 378 | - | - | no | 1.0E-95 | Retinol dehydrogenase 3 | |
| 62 | 16 | 509 | - | - | yes | 2.0E-87 | Actin-related protein 2/3 complex sub. 4 | |
| 63 | 7 | 619 | - | - | no | Unknown | ||
| 64 | 12 | 310 | - | - | no | 4.0E-52 | Small nuclear ribonucleoprotein Sm D2 | |
| 65 | 10 | 398 | - | - | yes | 8.0E-23 | Myristoylated alanine-rich C-kinase sub. | |
| 66 | 10 | 277 | - | S/T | yes | Unknown | ||
| 67 | 7 | 388 | - | - | yes | 3.0E-80 | Nuclear factor erythroid 2-related factor 2 | |
| 68 | 8 | 311 | - | - | no | 1.0E-124 | rRNA 2'-O-methyltransferase fibrillarin | |
| 69 | 14 | 300 | Om/Sf | - | yes | 1.0E-157 | Malate dehydrogenase, mito. prec. | |
| 70 | 9 | 589 | - | - | yes | 1.0E-138 | Glucosamine-6-phosphate isomerase | |
| 71 | 13 | 577 | Ss/Sf | S/C | yes | 1.0E-127 | Proteasome subunit alpha type-3 | |
| 72 | 14 | 210 | - | C/T | yes | 0.0E+00 | 26S proteasome non-ATPase reg. sub. | |
| 73 | 12 | 348 | C/T | no | Unknown | |||
| 74 | 18 | 409 | - | yes | 7.0E-77 | Reticulon-1 | ||
| 75 | 19 | 462 | - | yes | 5.0E-39 | Nuclear protein Hcc-1 | ||
| 76 | 8 | 621 | - | yes | 1.0E-145 | DNA-directed RNA polymerase II subunit | ||
| 77 | 16 | 482 | - | yes | 1.0E-106 | 60S ribosomal protein L6 | ||
| 78 | 11 | 552 | C/T | yes | 3.0E-95 | Selenoprotein T1a precursor | ||
* Listed is the gene set identifier (tree number) along with the number of contigs used in each data set, the length of the respective nucleotide alignment (no gaps), and tentative identification based on BLASTX hits to the SwissProt database (accession number, E-value and description). For each gene set, the tree support for the various arrangements is listed; for example, Om/Sf supports an Oncorhynchus mykiss/Salvelinus fontinalis grouping; or S/C supports a Salmoninae/Coregoninae grouping. In addition there is an indication whether a tree is consistent (yes/no) with an ancestral Salmonidae gene duplication. "-" indicates that the data provides no clear evidence for any particular tree. All EST accession numbers used to make contig consensus sequences, all alignments and the 70% consensus trees are available [see Additional file 1] or online at the GRASP website [24].
Figure 3Summary of 78 gene set consensus (70%) trees depicting the relationships among the major groups of Salmonidae. Each branch shows the number of consensus trees supporting the branch, the number of trees providing no information and the number of trees contradicting the branch. The diamond at the base of the Salmonidae cladogram indicates the position where the majority of gene duplications were identified. The individual gene trees that pertain to each branch position are indicated in Table 4.
Cross species hybridization results for the salmonid 32 K cDNA microarray. *
| Salmonid Species | % +'ve | %CV |
| Atlantic salmon (n = 4) | 48.6% | 12.0% |
| Rainbow trout (n = 4) | 58.1% | 9.8% |
| Coho (n = 4) | 52.3% | 23.2% |
| Brook Trout (n = 4) | 35.0% | 2.4% |
| Whitefish (n = 4) | 47.7% | 7.8% |
* Percent elements on cDNA array with median signal intensity greater than threshold (background signal+ 2SD). %CV is percent coefficient of variation and "n" is the number of biological replicates.