| Literature DB >> 29310597 |
Madeleine Carruthers1, Andrey A Yurchenko1, Julian J Augley2,3, Colin E Adams1,4, Pawel Herzyk2,5, Kathryn R Elmer6.
Abstract
BACKGROUND: Salmonid fishes exhibit high levels of phenotypic and ecological variation and are thus ideal model systems for studying evolutionary processes of adaptive divergence and speciation. Furthermore, salmonids are of major interest in fisheries, aquaculture, and conservation research. Improving understanding of the genetic mechanisms underlying traits in these species would significantly progress research in these fields. Here we generate high quality de novo transcriptomes for four salmonid species: Atlantic salmon (Salmo salar), brown trout (Salmo trutta), Arctic charr (Salvelinus alpinus), and European whitefish (Coregonus lavaretus). All species except Atlantic salmon have no reference genome publicly available and few if any genomic studies to date.Entities:
Keywords: Annotation; BLAST; BUSCO; Gene Ontology (GO) analysis; OrthoFinder; RNA-seq; Salmonids; Transcriptome
Year: 2018 PMID: 29310597 PMCID: PMC5759245 DOI: 10.1186/s12864-017-4379-x
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Phylogenetic relationship of salmonids and the closest teleost out-group, Esox lucius. Phylogenetic positions and estimated WGD timing follow [11]. The highlighted tree branches represent the phylogenetic positions of species for which assemblies have been generated in the current study, yellow = Atlantic salmon, green = Brown trout, blue = Arctic charr and red = European whitefish
Fig. 2Schematic of the de novo transcriptome reconstruction and analysis pipeline used to generate the protein-coding transcriptome assemblies for Atlantic salmon, brown trout, Arctic charr and European whitefish
Summary of sequencing data used to generate the de novo transcriptome assemblies for each species based on paired-end (2 × 75 bp) Illumina sequencing
| Feature | Atlantic salmon | Brown trout | Arctic charr | European whitefish |
|---|---|---|---|---|
| Total number of paired-end reads (~Million) | 192 | 190 | 180 | 210 |
| Average number of paired reads per sample (~Million) | 23 | 24 | 23 | 26 |
Assembly statistics for the Atlantic salmon, brown trout, Arctic charr and European whitefish de novo transcriptome assemblies
| Feature | Atlantic salmon | Brown trout | Arctic charr | European whitefish |
|---|---|---|---|---|
| Number of base pairs in cleaned reads | 64,909,254,125 | 67,282,460,986 | 65,841,176,651 | 73,342,359,278 |
| Number of paired-end reads | 191,977,874 | 190,239,319 | 180,232,708 | 209,578,198 |
| Number of base pairs in initial assembly | 182,476,550 | 179,378,175 | 156,753,048 | 162,053,186 |
| Number of transcripts in initial assembly | 235,515 | 242,899 | 200,760 | 209,920 |
| Number of base pairs in final assembly | 73,403,213 | 69,587,826 | 64,848,138 | 63,007,687 |
| Number of transcripts in final assembly | 36,505 | 35,736 | 33,126 | 33,697 |
| Average transcript length (bp) | 2011 | 1947 | 1957 | 1902 |
| Minimum transcript length (bp) | 297 | 297 | 297 | 298 |
| Maximum transcript length (bp) | 17,114 | 15,967 | 15,742 | 15,887 |
| N50 | 2464 | 2393 | 2411 | 2325 |
| N90 | 1115 | 1080 | 1087 | 1062 |
Fig. 3Cumulative number of genes with alignment to the NCBI protein database for Atlantic salmon (GCF_000233375.1) at a given coverage: Atlantic salmon (yellow), brown trout (green), Arctic charr (blue) and European whitefish (red)
Summary of the complete, duplicated, fragmented and missing orthologs inferred from Benchmarking Universal Single-Copy Orthologs (BUSCO) search against the 4584 single-copy orthologs for Actinopterygii
| BUSCO statistic | Atlantic salmon | Brown trout | Arctic charr | European whitefish | PhyloFish Brown trout | PhyloFish European whitefish | NCBI Atlantic salmon RefSeq Proteins |
|---|---|---|---|---|---|---|---|
| Complete BUSCOs | 3461 (79%) | 3596 (78%) | 3589 (78%) | 3512 (76%) | 1181 (26%) | 1189 (26%) | 4476 (97%) |
| Complete - single-copy BUSCOs | 1900 (42%) | 1897 (41%) | 1988 (44%) | 1938 (42%) | 974 (21%) | 995 (22%) | 1398 (30%) |
| Complete – duplicated BUSCOs | 1741 (37%) | 1699 (37%) | 1601 (34%) | 1574 (34%) | 207 (5%) | 194 (4%) | 3078 (67%) |
| Fragmented BUSCOs | 439 (10%) | 424 (10%) | 431 (10%) | 452 (11%) | 155 (3%) | 136 (3%) | 80 (1.7%) |
| Missing BUSCOs | 504 (10%) | 564 (12%) | 564 (12%) | 620 (13%) | 3248 (71%) | 3259 (71%) | 28 (0.6%) |
Fig. 4Venn diagram showing the number of overlapping orthologous protein groups between the four salmonid transcriptome assemblies. Orthologous proteins were identified with OrthoFinder
Alignment statistics of the new de novo transcriptomes mapping to the Atlantic salmon reference genome ICSASG_v2
| Assembly | Number of transcripts in assembly | Total number of transcripts mapped | % Mapped transcripts |
|---|---|---|---|
| Atlantic salmon | 36,505 | 36,305 | 99.5 |
| Brown trout | 35,736 | 35,186 | 98.5 |
| Arctic charr | 33,126 | 32,745 | 98.9 |
| European whitefish | 33,697 | 33,262 | 98.7 |
Comparison of full-length transcript reconstruction between the four current assemblies and three previously published transcriptomes for Arctic charr [34], brown trout [35] and European whitefish [35]. The table shows the number (percent) of transcripts from each assembly that aligned to the NCBI protein database for Atlantic salmon (GCF_000233375.1)
| % Coverage against NCBI Atl. Salmon RefSeq Proteins | Atlantic salmon | Brown trout | Arctic charr | European whitefish | Magnanou et al. Arctic charr | PhyloFish Brown trout | PhyloFish European whitefish |
|---|---|---|---|---|---|---|---|
| 100 | 13,546 (37%) | 12,688 (36%) | 12,127 (37%) | 11,099 (33%) | 4411 (12%) | 19,624 (26%) | 5073 (7%) |
| 90–99 | 3072 (8%) | 3232 (9%) | 3220 (10%) | 3659 (11%) | 962 (3%) | 4574 (6%) | 1307 (2%) |
| 80–89 | 2279 (6%) | 2336 (7%) | 2102 (6%) | 2326 (7%) | 777 (2%) | 2306 (3%) | 582 (1%) |
| 70–79 | 2472 (7%) | 2439 (7%) | 2207 (7%) | 2306 (7%) | 933 (3%) | 2185 (3%) | 496 (1%) |
| 60–69 | 3026 (8%) | 2862 (8%) | 2587 (8%) | 2664 (8%) | 1142 (3%) | 2450 (3%) | 508 (1%) |
| 50–59 | 3458 (9%) | 3461 (10%) | 2966 (9%) | 3172 (9%) | 1514 (4%) | 2883 (4%) | 586 (1%) |
| 40–49 | 3936 (10%) | 3944 (11%) | 3565 (11%) | 3874 (11%) | 2097 (6%) | 3644 (5%) | 738 (1%) |
| 30–39 | 4716 (10%) | 4774 (13%) | 4352 (13%) | 4597 (14%) | 2587 (7%) | 4348 (6%) | 900 (1%) |
| 20–29 | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 3211 (9%) | 4868 (6%) | 1017 (1%) |
| 10–19 | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 3229 (9%) | 4752 (6%) | 908 (1%) |
| 0–9 | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 1709 (5%) | 2932 (4%) | 567 (1%) |
| No hit | 0 (0%) | 0 (0%) | 0 (0%) | 0 (0%) | 12,118 (35%) | 20,782 (28%) | 62,019 (83%) |
Fig. 5Venn diagrams showing the number of overlapping sequences between the current and previously published transcriptome assemblies for a Arctic charr (current vs. ref. [34]), b brown trout (current vs. ref. [35]) and c European whitefish (current vs. ref. [35])
Number (and %) of transcripts with significant BLAST alignments to the databases listed
| Database | Atlantic salmon | Brown trout | Arctic charr | European whitefish |
|---|---|---|---|---|
| NCBI Atlantic salmon proteins | 36,505 (100%) | 35,736 (100%) | 33,126 (100%) | 33,697 (100%) |
| SwissProt | 34,843 (95%) | 34,027 (95%) | 31,607 (95%) | 32,193 (96%) |
Fig. 6Proportions of gene ontology annotations for transcripts of Atlantic salmon (yellow), brown trout (green), Arctic charr (blue) and European whitefish (red): a molecular function, b biological process and c cellular component
Number and percent of putative paralogous transcripts present in each species’ assembly, as identified by OrthoFinder algorithms
| Assembly | Number of transcripts in assembly | Number of putative paralogous transcripts | % putative paralogous transcripts |
|---|---|---|---|
| Atlantic salmon | 36,505 | 13,474 | 37 |
| Brown trout | 35,736 | 12,746 | 36 |
| Arctic charr | 33,126 | 11,381 | 34 |
| European whitefish | 33,697 | 11,518 | 34 |