| Literature DB >> 32709619 |
Wade R Roberts1, Kala M Downey2, Elizabeth C Ruck2, Jesse C Traller3, Andrew J Alverson2.
Abstract
The diatom, Cyclotella cryptica, is a well-established model species for physiological studies and biotechnology applications of diatoms. To further facilitate its use as a model diatom, we report an improved reference genome assembly and annotation for C. cryptica strain CCMP332. We used a combination of long- and short-read sequencing to assemble a high-quality and contaminant-free genome. The genome is 171 Mb in size and consists of 662 scaffolds with a scaffold N50 of 494 kb. This represents a 176-fold decrease in scaffold number and 41-fold increase in scaffold N50 compared to the previous assembly. The genome contains 21,250 predicted genes, 75% of which were assigned putative functions. Repetitive DNA comprises 59% of the genome, and an improved classification of repetitive elements indicated that a historically steady accumulation of transposable elements has contributed to the relatively large size of the C. cryptica genome. The high-quality C. cryptica genome will serve as a valuable reference for ecological, genetic, and biotechnology studies of diatoms.Entities:
Keywords: algal biofuels; horizontal gene transfer; lipids; nanopore; transposable elements
Mesh:
Substances:
Year: 2020 PMID: 32709619 PMCID: PMC7466962 DOI: 10.1534/g3.120.401408
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Genome characteristics for P. tricornutum, F. cylindrus, C. nana, and C. cryptica
| GENOME SIZE, MB | 27.4 | 61.1 | 32.4 | 161.8 | 171.1 |
| NUMBER OF SCAFFOLDS | 33 | 271 | 27 | 116,815 | 662 |
| N50 LENGTH, KB | 945 | 1295.6 | 1,992 | 12 | 494 |
| MEDIAN SCAFFOLD LENGTH, KB | 703.2 | 17.2 | 965.0 | 0.2 | 139.0 |
| GC CONTENT, % | 49 | 39 | 47 | 43 | 43 |
| REPETITIVE ELEMENTS, % | 12 | Not available | 2 | 54 | 59 |
| COMPLETE EUKARYOTIC BUSCO COUNT (%) | 200 (78.5%) | 199 (78.0%) | 191 (74.9%) | 183 (71.8%) | 191 (74.9%) |
| GENBANK ACCESSION NUMBER | GCA_000150955.2 | GCA_001750085.1 | GCA_000149405.2 | None | GCA_013187285.1 |
| PLASTID GENOME SIZE, BP | 117,369 | 123,275 | 128,814 | 129,320 | 129,328 |
| MITOCHONDRIAL GENOME SIZE, BP | 77,356 | 58,295 | 43,827 | 58,021 | 46,485 |
| REFERENCE | This study |
Genome mode against the eukaryota_odb10 dataset.
Available from
Figure 1Improved genome assembly for Cyclotella cryptica. (A) Cumulative scaffold length and N50 comparison in the version 1.0 and version 2.0 assemblies. Summary statistics for each assembly are given in Table 1. (B) BUSCO analysis of selected diatom genomes using the set of 255 conserved eukaryotic single-copy orthologs. Bars show the proportions of genes found in each assembly as a percentage of the total gene set.
Figure 2The updated assembly of Cyclotella cryptica is highly contiguous and contaminant-free. Blobplots showing the taxon-annotated GC content and coverage of (A) the version 1.0 assembly, and (B) the version 2.0 genome assembly after contaminant filtering. Legend format: “superkingdom (number of scaffolds; length of scaffolds; scaffold N50 length)”.
Summary of the Cyclotella cryptica genome annotations
| VERSION 1.0 | VERSION 2.0 | |
|---|---|---|
| TOTAL GENE MODELS | 21,121 | 21,250 |
| TOTAL GENE LENGTH, MB (%) | 31.07 (19.2%) | 44.35 (25.9%) |
| GENE DENSITY (GENES PER MB) | 131 | 124 |
| MEAN GENE SIZE, BP | 1,471 | 2,087 |
| TOTAL CODING LENGTH, MB (%) | 27.96 (17.3%) | 41.84 (24.3%) |
| EXONS PER GENE | 2.18 | 4.30 |
| MEAN EXON LENGTH, BP | 608 | 722 |
| MEAN INTRON LENGTH, BP | 125 | 152 |
| TOTAL TRANSCRIPT ISOFORMS | 23,235 | 31,409 |
| AVERAGE TRANSCRIPT ISOFORMS PER GENE | 1.10 | 1.48 |
| PROTEINS WITH PFAM DOMAIN (%) | 10,384 (44.7%) | 14,518 (46.2%) |
| PROTEINS WITH INTERPROSCAN HIT (%) | 14,565 (62.7%) | 19,690 (62.7%) |
| PROTEINS WITH SWISSPROT HIT (%) | 6,219 (26.8%) | 13,054 (41.6%) |
| PROTEINS WITH UNIPROT HIT (%) | 16,495 (71.0%) | 23,530 (74.9%) |
| GENE MODELS WITH AED < 0.5 (%) | Not determined | 20,506 (96.5%) |
| COMPLETE EUKARYOTIC BUSCO COUNT (%) | 184 (72.2%) | 192 (75.3%) |
Protein mode against the eukaryota_odb10 dataset.
Figure 3Repeat content of the Cyclotella cryptica genome. (A) Repeat content in the version 1.0 and version 2.0 assemblies. Bars show the proportions of the genome assemblies masked and annotated by RepeatMasker. (B) Age distribution of transposable elements in the C. cryptica version 2.0 genome. The total amount of DNA in each TE class was split into bins of 1% Kimura divergence, shown on the X axis (see Methods). Abbreviations: DNA, DNA transposon; LINE, long interspersed nuclear element; LTR, long terminal repeat retrotransposon; RC, rolling circle transposons (Helitron); SINE, small interspersed nuclear element.