| Literature DB >> 30918947 |
Tue Sparholt Jørgensen1,2, Bent Petersen3,4, H Cecilie B Petersen1, Patrick Denis Browne2, Stefan Prost5,6,7, Jonathon H Stillman6,8, Lars Hestbjerg Hansen2, Benni Winding Hansen1.
Abstract
Members of the crustacean subclass Copepoda are likely the most abundant metazoans worldwide. Pelagic marine species are critical in converting planktonic microalgae to animal biomass, supporting oceanic food webs. Despite their abundance and ecological importance, only six copepod genomes are publicly available, owing to a number of factors including large genome size, repetitiveness, GC-content, and small animal size. Here, we report the seventh representative copepod genome and the first genome and the first transcriptome from the calanoid copepod species Acartia tonsa Dana, which is among the most numerous mesozooplankton in boreal coastal and estuarine waters. The ecology, physiology, and behavior of A. tonsa have been studied extensively. The genetic resources contributed in this work will allow researchers to link experimental results to molecular mechanisms. From PCR-free whole genome sequence and mRNA Illumina data, we assemble the largest copepod genome to date. We estimate that A. tonsa has a total genome size of 2.5 Gb including repetitive elements we could not resolve. The nonrepetitive fraction of the genome assembly is estimated to be 566 Mb. Our DNA sequencing-based analyses suggest there is a 14-fold difference in genome size between the six members of Copepoda with available genomic information. This finding complements nucleus staining genome size estimations, where 100-fold difference has been reported within 70 species. We briefly analyze the repeat structure in the existing copepod whole genome sequence data sets. The information presented here confirms the evolution of genome size in Copepoda and expands the scope for evolutionary inferences in Copepoda by providing several levels of genetic information from a key planktonic crustacean species.Entities:
Keywords: calanoid copepod genome; comparative genomics; genome assembly; genome size evolution; invertebrate genomics; repetitive DNA
Mesh:
Year: 2019 PMID: 30918947 PMCID: PMC6526698 DOI: 10.1093/gbe/evz067
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
.1.—Acartia tonsa genome assembly. (A) Female specimen of the DFU-ATI strain of A. tonsa used in this study. Photo by Minh Vu Thi Thuy. (B) Length and GC-content of each scaffold in the Aton1.0 assembly. Black dots are scaffolds connected using mRNA information and gray dots are all other scaffolds. In total, 351,850 scaffolds are included in Aton1.0. The scaffolds are tightly distributed around 32% GC with lengths ranging from 1 to 174 kb. Most scaffolds of around or above 10 kb have been scaffolded using mRNA information (black dots). (C) Workflow for producing the Aton1.0 assembly from the DFU-ATI strain of A. tonsa. (D) Overview of reported genome sizes for the subclass Copepoda. The area of individual plot points is equal to the axis value. Black dots represent information from the Animal Genome Size Database based on nucleus staining (Gregory, TR, 2018, http://www.genomesize.com) and the five open circles represent the genome sizes estimated from WGS data in this study. Within Copepoda, a 100-fold difference in genome size from the smallest Cyclopoid (pink bar, 0.14 Gb) to the largest Calanoid (blue bar, 14 Gb) can be seen. Within the order Calanoida, the genome sizes vary >10-fold between the smallest Diaptomidae (0.95 Gb) and the largest Calanidae (12 Gb). Harpactidae species are marked with a green bar and Caligidae species with a white bar.
Overview of Existing Genomic Resources for Copepoda
| Species |
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| Species info | Common estuarine and coastal species | Common estuarine and coastal species | Common estuarine and coastal species | Sea louse, aquaculture pest | Salmon louse, aquaculture pest | Common Pacific intertidal species |
| Assembly size (Mb) | 986 | 351 | 82 | 398 | 665 | 178 |
| Assembly GC-content (%) | 31 | 32 | 40 | 32 | 31 | 42 |
| Sequencing effort (Gb) | 116.3 | 39.3 | 5.1 | 37.8 | 29.8 | 42.4 |
| Contig N50 (kb) | 3.2 | 68 | 39 | 1.6 | 17 | 15 |
| Number of scaffolds | 351,850 | 6,171 | 4,626 | 288,616 | 83,165 | 2,385 |
| Reported sequencing depth | 50× | 75× | 50× | 95× | 45× | NA |
| Scaffolding | Paired end, mRNA | Paired end, Matepair | Paired end, Matepair, mRNA | Paired end | Paired end | Paired end, Matepair |
| Animal tissue | Pool of adult animals from culture | Pool of environmental eggs | Pool of environmental animals | Adult female animal | Adult female animal | Pools of animals from culture |
| Last updated | December 20, 2017 | December 13, 2017 | February 17, 2017 | May 8, 2015 | May 8, 2015 | March 19, 2015 |
| Assembly accession | OETC01 | AZAI02 | FTRT01 | LBBV01 | LBBX01 | TCALIF_genome_v1.0 |
| Reference | This study | The i5k Initiative |
| Unpublished, Leong et al. | Unpublished, Leong et al. | The i5k Initiative |
. 2.—Placement and characterization of the Aton1.0 assembly. (A) Placement of Aton1.0 within the genus Acartia. The Aton1.0 COI gene groups within the most well-studied North Atlantic clade of Acartia tonsa which is in line with the origin of the culture. (B) Placement of Aton1.0 within the subclass Copepoda. Phylogenetic tree based on Bayesian analysis of a combined gene data set. Nodal support is displayed as Bayesian posterior probability at each branch. The colored bars represent the orders Calanoida (blue), Harpacticoida (outline), Cyclopoida (cyan), and Siphonostomatoida (green). The branch separating the calanoid copepods from the other orders closely resemble recent phylogenetic analyses based primarily on different genes (Eyun 2017; Khodami et al. 2017). (C) BUSCO core gene content of the genome assemblies of copepods and Drosophila melanogaster. Between 2.4% and 18% of BUSCO genes are missing from assemblies (outline), between 2.4% and 21% are fragmented (light gray), and between 1.2% and 3.2% exist in duplicate (dark gray). From 59% to 94% of BUSCO genes are complete and single copy in the assemblies. For all metrics, the A. tonsa genome assembly performs worst, which is likely a result of the large genome size. The benchmarking species D. melanogaster has 99% complete single copy core genes. The mRNA transcriptome from all life stages of A. tonsa has 91% complete genes and additionally 8% fragmented genes, indicating that the resource is very powerful for identifying whole genes. (D) Total genome sizes for copepod WGS data sets and D. melanogaster. The unassembled genome fraction is depicted in light gray, the assembled repetitive genome fraction is in dark gray, the scaffolding gaps are in red and the nonrepetitive assembled fraction in black. The Aton1.0 assembly represents a genome that is estimated to be three to 20 times larger than the other copepods for which WGS resources are available through NCBI. The fraction of assembled nonrepetitive DNA is 22.7% (A. tonsa) to 53.8% (Tigriopus californicus) of the predicted total genome size, and only varies 7-fold from 75 Mb (Oithona nana) to 563 Mb (A. tonsa).
Overview of Aton1.0 Mitochondrial Resources. The identified genes are shown in black, and expected mitochondrial genes which were not identified are shown in red.
| Aton1.0 mitochondrial genes and tRNAs | Used for phylogeny |
|---|---|
| ATP6 | |
| ATP8 | |
| COI | x |
| COI2 | x |
| COI3 | x |
| CYTB | x |
| ND1 | x |
| ND2 | |
| ND3 | x |
| ND4 | x |
| ND4L | |
| ND5 | x |
| ND6 | |
| rRNA lsu | |
| rRNA ssu | |
| trnA | |
| trnC | |
| trnD | |
| trnE | |
| trnF | |
| trnG | |
| trnH | |
| trnI | |
| trnK | |
| trnL1 | |
| trnL2 | |
| trnM | |
| trnN | |
| trnP | |
| trnQ | |
| trnR | |
| trnS | |
| trnS | |
| trnT | |
| trnV | |
| trnW | |
| trnY |
. 3.—Classification of repeats in copepod WGS assemblies using RepeatModeler and RepeatMasker. Although >70% of identified repeats can be classified in the model species Drosophila, only between 5% and 20% of identified repeats from copepod genomes were classified. The unassembled genome fractions described in figure 2 and the large amount of unclassified repeats in copepods together illustrates how limited the current knowledge on this important animal group is.