| Literature DB >> 35228861 |
Apollo Marco Lizano1, Irina Smolina1, Marvin Choquet1,2, Martina Kopp1, Galice Hoarau1.
Abstract
Copepods of the zooplankton genus Calanus play a key role in marine ecosystems in the northern seas. Although being among the most studied organisms on Earth, due to their ecological importance, genomic resources for Calanus spp. remain scarce, mostly due to their large genome size (from 6 to 12 Gbps). As an alternative to whole-genome sequencing in Calanus spp., we sequenced and de novo assembled transcriptomes of five Calanus species: Calanus glacialis, C. hyperboreus, C. marshallae, C. pacificus, and C. helgolandicus. Functional assignment of protein families based on clusters of orthologous genes (COG) and gene ontology (GO) annotations showed analogous patterns of protein functions across species. Phylogenetic analyses using maximum likelihood (ML) of 191 protein-coding genes mined from RNA-seq data fully resolved evolutionary relationships among seven Calanus species investigated (five species sequenced for this study and two species with published datasets), with gene and site concordance factors showing that 109 out of 191 protein-coding genes support a separation between three groups: the C. finmarchicus group (including C. finmarchicus, C. glacialis, and C. marshallae), the C. helgolandicus group (including C. helgolandicus, C. sinicus, and C. pacificus) and the monophyletic C. hyperboreus group. The tree topology obtained in ML analyses was similar to a previously proposed phylogeny based on morphological criteria and cleared certain ambiguities from past studies on evolutionary relationships among Calanus species.Entities:
Keywords: Calanus; RNA‐seq; concordance factor; de novo transcriptome; phylotranscriptomics
Year: 2022 PMID: 35228861 PMCID: PMC8861592 DOI: 10.1002/ece3.8606
Source DB: PubMed Journal: Ecol Evol ISSN: 2045-7758 Impact factor: 2.912
Sampling information for Calanus species from the North Atlantic, Arctic, and North Pacific Oceans used in this study
| Species | Individual ID | Date of collection | Sampling site | Coordinates | Sampling depth (m) | Developmental stage | Collector or study | |
|---|---|---|---|---|---|---|---|---|
| Lat | Lon | |||||||
|
| Cgla_007 | 06/2019 | Skjerstadfjord | 67°14′N | 14°44′E | 300–500 | CV | M. Krogstad |
| Cgla_010 | ||||||||
| Cgla_011 | ||||||||
|
| Chype_012 | 09/2018 | West Greenland Sea | 74°34′N | 11°18′W | 0–350 | Adult female | E. Friis Møller |
| Chype_021 | ||||||||
| Chype_030 | ||||||||
|
| Cmar_005 | 03/2018 | Main basin of Puget Sound | 47°40′N | 122°28′W | 0–140 | CV | A. Bucklin & B. Frost |
| Cmar_007 | ||||||||
| Cmar_008 | ||||||||
|
| Cpac_006 | 03/2018 | Main basin of Puget Sound | 47°40′N | 122°28′W | 0–140 | CV | A. Bucklin & B. Frost |
| Cpac_007 | ||||||||
| Cpac_008 | ||||||||
|
| Chelg_003 | 04/2019 | Stonehaven ‐ north‐east Scotland | 56°57′N | 02°07′W | 0–48 | CV | L. Noble |
| Chelg_007 | ||||||||
| Chelg_008 | ||||||||
|
| Cfin_SRR1153468 | 07/2011 | Mount Desert Rock, Gulf of Maine | 44°2′N | 68°3′W | Not specified | CV | Lenz et al. ( |
| Cfin_SRR1141107 | 05/2012 | NTNU/SINTEF Sealab facility Trondheim, Norway | Not specified | Not specified | 70 | Tarrant et al. ( | ||
| Cfin_SRR1141110 | Not specified | Not specified | ||||||
|
| Csin_DRR144876 | 10/2015 | Off the coast of Japan along the Kuroshio Current | 34°00′N | 138°00′E | 0–100 | Adult female | Ohnishi et al. ( |
| Csin_DRR144878 | ||||||||
| Csin_SRP032493 | 05/2013 | Yellow Sea | 38°45′N | 121°45′E | Not specified | Adult copepod unspecified sex | Yang et al. ( | |
|
| Atonsa_Nilsson | 09/2016 | Øresund Denmark | 56°N | 12°E | Culture | Adult | Nilsson et al. (2018) |
|
| Eurytemora_affinis | NA | Bred at WHOI for 1 year | NA | NA | Culture | Adult female | Almada & Tarrant (2016) |
Three individuals were used for each species. For C. finmarchicus and C. sinicus, previously published data were used, individual ID contains the reference number for sequences downloaded from the NCBI SRA database.
FIGURE 1Sampling locations for the seven species of Calanus analyzed in this study
Transcriptome assembly statistics for Calanus spp. and two outgroup species Acartia tonsa and Eurytemora affinis downloaded from NCBI SRA database showing the total no. of assembled bases, total no. of genes, total no. of transcripts, %GC content, % alignment, no. of retained transcripts, no. of peptides (ORF ≥ 100 aa), and BUSCO results
| Species | Individual ID | Total no of assembled bases | Total no genes | Total no of transcripts | %GC | % alignment | No of retained transcripts | No of peptides (OR | BUSCO |
|---|---|---|---|---|---|---|---|---|---|
|
| Cgla_007 | 80,786,787 | 107,689 | 191,809 | 42.61 | 98.10 | 58,057 | 55,050 | C: 93.7% [S: 53.0%, D: 40.7%], F: 1.2%, M: 5.1% |
| Cgla_010 | 52,916,092 | 107,265 | 191,130 | 42.63 | 95.64 | 82,924 | 69,473 | C: 93.8% [S: 51.8%, D: 42.0%], F: 0.8%, M: 5.4% | |
| Cgla_011 | 83,871,692 | 117,560 | 208,862 | 42.63 | 95.62 | 41,072 | 72,199 | C: 93.2% [S: 50.1%, D: 43.1%], F: 1.4%, M: 5.4% | |
|
| Chype_012 | 64,862,613 | 89,686 | 154,261 | 43.86 | 99.13 | 57,073 | 43,074 | C: 92.0% [S: 48.7%, D: 43.3%], F: 1.8%, M: 6.2% |
| Chype_021 | 44,640,167 | 57,792 | 98,478 | 44.80 | 96.87 | 55,106 | 67,106 | C: 89.6% [S: 53.0%, D: 36.6%], F: 2.5%, M: 7.9% | |
| Chype_030 | 69,174,829 | 97,334 | 165,321 | 43.13 | 96.05 | 79,215 | 64,262 | C: 92.4% [S: 51.3%, D: 41.1%], F: 1.6%, M: 6.0% | |
|
| Cmar_005 | 43,950,718 | 58,868 | 86,851 | 45.35 | 96.61 | 30,505 | 34,572 | C: 88.8% [S: 59.5%, D: 29.3%], F: 1.8%, M: 9.4% |
| Cmar_007 | 53,405,794 | 70,397 | 108,982 | 44.96 | 97.07 | 42,388 | 46,829 | C: 89.8% [S: 56.4%, D: 33.4%], F: 2.2%, M: 8.0% | |
| Cmar_008 | 30,405,070 | 39,378 | 55,916 | 45,89 | 95.86 | 48,111 | 53,770 | C: 86.7% [S: 58.7%, D: 28.0%], F: 2.1%, M: 11.2% | |
|
| Cpac_006 | 57,260,966 | 79,107 | 133,448 | 45.60 | 96.53 | 57,755 | 60,389 | C: 90.7% [S: 38.2%, D: 52.5%], F: 2.0%, M: 7.3% |
| Cpac_007 | 65,331,309 | 85,103 | 153,092 | 45.30 | 96.68 | 48,704 | 56,214 | C: 93.5% [S: 40.4%, D: 53.1%], F: 1.8%, M: 4.7% | |
| Cpac_008 | 62,163,761 | 87,403 | 144,545 | 45.31 | 96.34 | 76,121 | 72,916 | C: 91.2% [S: 38.4%, D: 52.8%], F: 2.5%, M: 6.3% | |
|
| Chelg_003 | 82,334,519 | 110,120 | 199,181 | 44.70 | 97.17 | 62,642 | 67,106 | C: 93.6% [S: 39.8%, D: 53.8%], F: 1.2%, M: 5.2% |
| Chelg_007 | 61,489,417 | 83,960 | 137,554 | 45.23 | 96.31 | 54,695 | 60,542 | C: 90.6% [S: 41.6%, D: 49.0%], F: 2.2%, M: 7.2% | |
| Chelg_008 | 45,658,464 | 61,171 | 96,333 | 45.86 | 96.34 | 44,779 | 49,783 | C: 89.3% [S: 45.3%, D: 44.0%], F: 2.9%, M: 7.8% | |
|
| Cfin_SRR1141107 | 29,971,015 | 43,252 | 75,504 | 46,28 | 88.89 | 53,751 | 51,970 | C: 81.3% [S: 42.2%, D: 39.1%], F: 8.1%, M: 10.6% |
| Cfin_SRR1141110 | 39,691,175 | 53,703 | 113,701 | 45,39 | 88.73 | 25,298 | 33,135 | C: 86.7% [S: 27.8%, D: 58.9%], F: 4.7%, M: 8.6% | |
| Cfin_SRR1153468 | 62,399,753 | 90,151 | 229,051 | 44,22 | 96.20 | 34,899 | 43,427 | C: 90.2% [S: 54.5%, D: 35.7%], F: 4.3%, M: 5.5% | |
|
| Csin_SRP032493 | 61,756,777 | 102,986 | 235,405 | 46,44 | 94.98 | 67,651 | 61,031 | C: 92.1% [S: 20.2%, D: 71.9%], F: 2.1%, M: 5.8% |
| Csin_DRR144876 | 63,116,211 | 113,366 | 282,710 | 46,23 | 94.98 | 77,949 | 63,851 | C: 91.8% [S: 11.5%, D: 80.3%], F: 2.5%, M: 5.7% | |
| Csin_DRR144878 | 58,403,986 | 106,792 | 263,104 | 46,16 | 94.67 | 69,934 | 58,166 | C: 90.4% [S: 10.4%, D: 80.0%], F: 2.6%, M: 7.0% | |
|
| Atonsa_Nilsson | 118,203,047 | 48,149 | 114,717 | 37,9 | 98.79 | 31,986 | 16,174 | C: 56.8% [S: 56.2%, D: 0.6%], F: 2.8%, M: 40.4% |
|
| Eaffinis_Almada | 181,412,865 | 90,855 | 170,681 | 38,61 | 95.23 | 57,397 | 24,056 | C: 71.1% [S: 69.4%, D: 1.7%], F: 2.7%, M: 26.2% |
BUSCO assessment was based on arthropoda database (odb_10 containing 1103 orthologs). C = complete, S = single, D = duplicated, F = fragmented, and M = missing no. of orthologs.
FIGURE 2(a) Functional annotation of protein‐coding sequences based on Clusters of Orthologous Groups (COG) database. (b) Gene Ontology (GO) annotation representing biological process for seven species of Calanus and one outgroup taxon Acartia tonsa
FIGURE 3Inferred number of gene duplication events along Calanus species tree. Numbers on each branch are duplication events of each respective branch that are retained in all descendant species. Bar plots represent the number of gene duplication events for each species. Black arrows indicate number of proteins per species used for the inference
FIGURE 4(a) Maximum‐likelihood (ML) phylogenetic tree of seven Calanus species and two outgroup taxa Acartia tonsa and Eurytemora affinis based on 191 single‐copy orthologs derived from transcriptomes. ML tree showing bootstrap support values that are at the maximum on the majority of nodes (ML bootstrap = 100) except C. helgolandicus & C. pacificus (bootstrap = 92%). (b) Corresponding ML tree for the Calanus spp. dataset including two outgroup taxa. Numbers on each branch represent maximum‐likelihood support value, gCF, and sCF. The inset shows the scatterplot of gCF and sCF values