| Literature DB >> 31942620 |
Rute R da Fonseca1,2, Alvarina Couto3, Andre M Machado4, Brona Brejova5, Carolin B Albertin6, Filipe Silva4,7, Paul Gardner8, Tobias Baril9, Alex Hayward9, Alexandre Campos4, Ângela M Ribeiro4, Inigo Barrio-Hernandez10, Henk-Jan Hoving11, Ricardo Tafur-Jimenez12, Chong Chu13, Barbara Frazão4,14, Bent Petersen15,16, Fernando Peñaloza17, Francesco Musacchia18, Graham C Alexander19, Hugo Osório20,21,22, Inger Winkelmann23, Oleg Simakov24, Simon Rasmussen25, M Ziaur Rahman26, Davide Pisani27, Jakob Vinther27, Erich Jarvis28,29, Guojie Zhang30,31,32,33, Jan M Strugnell34,35, L Filipe C Castro4,7, Olivier Fedrigo29, Mateus Patricio36, Qiye Li37, Sara Rocha3,38, Agostinho Antunes4,7, Yufeng Wu39, Bin Ma40, Remo Sanges41,42, Tomas Vinar5, Blagoy Blagoev10, Thomas Sicheritz-Ponten15,16, Rasmus Nielsen23,43, M Thomas P Gilbert23,44.
Abstract
BACKGROUND: The giant squid (Architeuthis dux; Steenstrup, 1857) is an enigmatic giant mollusc with a circumglobal distribution in the deep ocean, except in the high Arctic and Antarctic waters. The elusiveness of the species makes it difficult to study. Thus, having a genome assembled for this deep-sea-dwelling species will allow several pending evolutionary questions to be unlocked.Entities:
Keywords: cephalopod; genome assembly; invertebrate
Year: 2020 PMID: 31942620 PMCID: PMC6962438 DOI: 10.1093/gigascience/giz152
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Statistics of the giant squid genome assembly (Meraculous + Dovetail) and corresponding gene prediction and functional annotation
| Global statistics | Genome | Gene models with evidence |
|---|---|---|
| Genome assembly* | ||
| Input assembly | Meraculous | |
| Contig N50 length (Mb) | 0.005 | |
| Longest contig (Mb) | 0.120 | |
| Scaffold N50 length (Mb) | 4.852 | |
| Longest scaffold (Mb) | 32.889 | |
| Total length (Gb) | 2.693 | |
| BUSCO statistics ([ | ||
| Complete BUSCOs | 86.1/88.5 | 81.6/78.3 |
| Complete and single-copy | 85.1/87.6 | 79.9/77.7 |
| Complete and duplicated | 1.0/0.9 | 1.7/0.6 |
| Partial | 4.3/3.6 | 9.6/5.7 |
| Missing | 9.6/7.9 | 8.8/16.0 |
| Total BUSCOs found | 90.4/92.1 | 91.2/84.0 |
| Genome annotation/gene prediction | ||
| Protein-coding gene number | 33,406 | |
| Transcript evidence | 30,472 | |
| Mean protein length (aa) | 339 | |
| Longest protein (aa) | 17,047 | |
| Mean CDS length (bp) | 1,015 | |
| Longest CDS (bp) | 51,138 | |
| Mean exon length (bp) | 199 | |
| Mean exons per gene | 5 | |
| Functional annotation (number of hits) | ||
| Swissprot | 15,749 | |
| Uniref90 | 29,553 | |
| Gene Ontology terms | 4,712 | |
| Conserved Domains Database | 15,280 |
The transcript evidence was confirmed by blastp hits with e-value < 10E−6 using the transcriptomes of 3 other species of squid (see the “Transcriptome sequencing” section).
*The presented statistics are to contigs/scaffolds with length ≥500 bp.
Euk: Database of Eukaryota orthologs genes, containing a total of 303 BUSCO groups.
Met: Database of Metazoa orthologs genes, containing a total of 978 BUSCO groups.
Figure 1:Comparison of genome repeat content among available cephalopod genomes with assembled genomes (repeat data for O. minor and O. bimaculoides from [52] and for E. scolopes from [53]). The tree indicates evolutionary relationships among the 2 available octopod cephalopods and the 2 available decapod cephalopods. Pie charts are scaled according to genome size (O. bimaculoides: 2.7 Gb, O. minor: 5.09 Gb, E. scolopes: 5.1 Gb, A. dux: 2.7 Gb), with repeat types indicated by colour.
Figure 2:(A) Stacked bar chart illustrating the proportions (expressed as percentage of the total genome) of repeats found in genic (≤2 kb from an annotated gene) and intergenic regions (>2 kb from an annotated gene) for the giant squid genome. TE classes include DIRS: Dictyostelium intermediate repeat sequence 1 - like elements. (B) Transposable element (TE) accumulation history in the giant squid genome, based on a Kimura distance-based copy divergence analysis of TEs, with Kimura substitution level (CpG adjusted) illustrated on the x-axis, and percentage of the genome represented by each repeat type on the y-axis. Repeat type is indicated by bar colour.
Figure 3:Schematic representation of the Hox gene clusters. Different scaffolds are separated by 2 slashes. (A) Simplified classification of the Hox clusters genomic organization. Type A identifies the lack of a “typical” Hox cluster configuration, i.e., genes are scattered through the genome (not closely placed); Type S indicates a Hox cluster that is separated by a chromosomal breakpoint; Type D clusters comprehend all the genes in the same location but encompassing a larger region than in organized clusters and may display non-Hox genes and repeats in between; Type O indicates a very compact cluster embracing a short region with only Hox genes. Non-coding RNA and microRNA can be found. (B) Simplified scheme of the chromosomal organization in various invertebrates. Scaffold length is shown underneath. Unlike in other coleoids, for Architeuthis dux all Hox genes were found in the same scaffold. However, the distance between the genes was larger than expected for invertebrate organisms, and non-homeobox genes were also present within the cluster. Hox2 remains undetected in coleoids. A. dux cluster can be found in scaffold 25. E. scolopes, O. bimaculoides, L. gigantea, C. teleta, and D. melanogaster assemblies and Hox cluster details can be found in [11, 53, 59, 70]. The asterisk indicates a gene that was reported in a different scaffold, adjacent to non-Hox genes (the length corresponds to the size of the gene). (C) Complete representation of the Hox cluster found in A. dux including the non-Hox genes. PO—predicted open reading frame; TATDN2–putative deoxyribonuclease TATDN2; ZMYM1–zinc finger MYM-type protein 1; POGK—pogo transposable element with KRAB; zinc finger—zinc finger protein; MYB-like—putative Myb-like DNA-binding domain protein; MAPRE1–microtubule-associated protein RP/EB family member 1; MGC12965–similar to cytochrome c, somatic.