| Literature DB >> 30931949 |
Ilaria Zarrella1, Koen Herten2,3, Gregory E Maes3,4,5, Shuaishuai Tai6, Ming Yang6, Eve Seuntjens7, Elena A Ritschard8, Michael Zach8, Ruth Styfhals7,9, Remo Sanges9, Oleg Simakov10, Giovanna Ponte1,9, Graziano Fiorito11.
Abstract
The common octopus, Octopus vulgaris, is an active marine predator known for the richness and plasticity of its behavioral repertoire, and remarkable learning and memory capabilities. Octopus and other coleoid cephalopods, cuttlefish and squid, possess the largest nervous system among invertebrates, both for cell counts and body to brain size. O. vulgaris has been at the center of a long-tradition of research into diverse aspects of its biology. To leverage research in this iconic species, we generated 270 Gb of genomic sequencing data, complementing those available for the only other sequenced congeneric octopus, Octopus bimaculoides. We show that both genomes are similar in size, but display different levels of heterozygosity and repeats. Our data give a first quantitative glimpse into the rate of coding and non-coding regions and support the view that hundreds of novel genes may have arisen independently despite the close phylogenetic distance. We furthermore describe a reference-guided assembly and an open genomic resource (CephRes-gdatabase), opening new avenues in the study of genomic novelties in cephalopods and their biology.Entities:
Mesh:
Year: 2019 PMID: 30931949 PMCID: PMC6472339 DOI: 10.1038/s41597-019-0017-6
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 8.501
Main statistics from O.
| Library ID | Insert Size(bp) | Read Length (bp) | Data (Gb) | Sequence Depth (X) |
|---|---|---|---|---|
| SZAXPI006102-158 | 170 | 100 | 82.15 | 29.34 |
| SZAXPI006612-13 | 250 | 150 | 52.25 | 18.66 |
| SZAXPI005989-166 | 500 | 100 | 62.05 | 22.16 |
| SZAXPI005988-169 | 800 | 100 | 53.59 | 19.14 |
| Total | — | — | 250.04 | 89.30 |
vulgaris sequencing data.
k-mer = 17 raw read statistics for Octopus vulgaris genome data.
| K-mer_num | Peak_depth | Genome Size | Used Bases | Used Reads |
|---|---|---|---|---|
| 212,679,899,304 | 76 | 2,798,419,727 | 249,873,643,000 | 2,324,608,981 |
Assembly statistics for Octopus vulgaris.
| # scaffolds | genome size | N50/L50 | N75/L75 | Ns/100 kbp | Complete BUSCOs | Fragmented BUSCOs | |
|---|---|---|---|---|---|---|---|
| ABySS k41 scaffolds | 26,350,077 | 3,30 Gb | 1,488 bp 199,442 | 767 bp 503,977 | 979.41 | 112 | 50 |
| ABySS k81 scaffolds | 8,918,381 | 3.31 Gb | 2,627 bp 195,104 | 980 bp 496,991 | 706.92 | 275 | 286 |
| Redundans k81 | 1,157,969 | 2.10 Gb | 3,958 bp 149,577 | 2,126 bp 330,514 | 3,961.18 | 390 | 319 |
| Chromosomer k81 | 77,683 | 1.78 Gb | 263,097 bp 1,607 | 56,379 bp 5,018 | 19,504.19 | 505 | 88 |
|
| 151,674 | 2.34 Gb | 485,615 bp 1,300 | 215,581 bp 3,077 | 15,346.35 | 773 | 28 |
Statistics were generated with QUAST and a default threshold of 500 bp. See text for details.
Fig. 1Sequencing depth and genome repetitiveness estimation from 17mer counts in the raw read data. (a) 17mer depth analysis using raw data showing elevated levels of heterozygosity. (b) Cumulative proportion of 17mers as a function of their depth showing that at least half of the genome occurs at depth 10 or more.
Fig. 2Proportions of the most abundant repetitive element classes in Octopus vulgaris compared to Octopus bimaculoides based on the ab initio reconstruction of repetitive elements using the DNAPipeTE pipeline. (a) Repeat propotions in the Octopus vulgaris genome. (b) Repeat propotions in the Octopus bimaculoides genome. In both genomes, SINE elements are the most abundant repeat classes. While the total number of repeats is similar in both genomes, differences in the proportions can be attributed to individual expansions of repeat elements that occurred independently in both lineages.
Fig. 3Comparison of coding and non-coding region conservation between Octopus bimaculoides and Octopus vulgaris. (a) Alignment coverage in the coding genomic regions. (b) Alignment coverage in the non-coding, non-repetitive genomic regions. Coverage shows the proportion of nucleotides that are covered in O. bimaculoides assembly with O. vulgaris read mapping in both coding and non-coding non-repetitive regions of at least 100 bp. The main peak at 1 (100% coverage) indicates the presence of a complete region in O. vulgaris genome at very low sequence divergence, whereas the secondary peak at 0 indicates regions of O. bimaculoides genome that are not matching in O. vulgaris read data (see text for analysis).
Fig. 4Comparison of whole genome alignments using MEGABLAST among the available octopod genomes. Only the longest scoring alignment between any given pair of two scaffolds or contigs was considered. Red: percentage nucleotide identity between Callistoctopus minor to Octopus bimaculoides. Blue: percentage nucleotide identity between Octopus vulgaris to O. bimaculoides.
| Design Type(s) | species comparison design • sequence analysis objective • sequence assembly objective |
| Measurement Type(s) | whole genome sequencing assay |
| Technology Type(s) | DNA sequencing |
| Factor Type(s) | |
| Sample Characteristic(s) | Octopus vulgaris • testis • ocean biome |