| Literature DB >> 31964893 |
Thomas Hackl1,2, Roman Martin3,4, Karina Barenhoff5, Sarah Duponchel5, Dominik Heider3, Matthias G Fischer6.
Abstract
The heterotrophic stramenopile Cafeteria roenbergensis is a globally distributed marine bacterivorous protist. This unicellular flagellate is host to the giant DNA virus CroV and the virophage mavirus. We sequenced the genomes of four cultured C. roenbergensis strains and generated 23.53 Gb of Illumina MiSeq data (99-282 × coverage per strain) and 5.09 Gb of PacBio RSII data (13-45 × coverage). Using the Canu assembler and customized curation procedures, we obtained high-quality draft genome assemblies with a total length of 34-36 Mbp per strain and contig N50 lengths of 148 kbp to 464 kbp. The C. roenbergensis genome has a GC content of ~70%, a repeat content of ~28%, and is predicted to contain approximately 7857-8483 protein-coding genes based on a combination of de novo, homology-based and transcriptome-supported annotation. These first high-quality genome assemblies of a bicosoecid fill an important gap in sequenced stramenopile representatives and enable a more detailed evolutionary analysis of heterotrophic protists.Entities:
Mesh:
Year: 2020 PMID: 31964893 PMCID: PMC6972860 DOI: 10.1038/s41597-020-0363-4
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Fig. 1Sampling locations and phylogenetic relationship of Cafeteria roenbergensis strains. (a) Map representing the sampling sites of the four C. roenbergensis strains around the Americas. (b) Maximum likelihood tree reconstructed from a concatenated alignment of 123 shared single-copy core genes for the four C. roenbergensis strains and their outgroup Halocafeteria seosinensis. Numbers next to internal nodes indicate bootstrap support based on 100 iterations. The branch to the outgroup represented by a dashed line has been shortened for visualization.
Strain and sample information.
| Species | Strain | Location | Coordinates | Year | Biosample | Roscoff ID |
|---|---|---|---|---|---|---|
| E4-10P | North Pacific, 5 km west of Yaquina Bay, Oregon | 44.62N 124.06W | 1989 | SAMN12216681 | RCC:4624 | |
| E4-10M1 | North Pacific, 5 km west of Yaquina Bay, Oregon | 44.62N 124.06W | 1989 | SAMN12216695 | RCC:4625 | |
| BVI | The British Virgin Islands | 18.42N 64.61W | 2012 | SAMN12216698 | ||
| Cflag | North Atlantic Ocean, Woods Hole, MA | 41.52N 70.67W | 1986 | SAMN12216699 | ||
| RCC970-E3 | South Pacific Ocean, 2200 km off the coast of Chile | 30.78S 95.43W | 2004 | SAMN12216700 | RCC:4623 |
Sequencing information and library statistics.
| Instrument | Library layout | # Libraries | Library size (Gbp) | Coverage | SRA study accession | SRA run accession | |
|---|---|---|---|---|---|---|---|
| E4-10P | Illumina MiSeq | paired | 2 | 6.85 | 171 | SRP215872 | SRR9724619 |
| E4-10P | PacBio RS II | single | 2 | 0.52 | 13 | SRP215872 | SRR9724618 |
| E4-10M1 | Illumina MiSeq | paired | 2 | 4.45 | 111 | SRP215872 | SRR9724621 |
| E4-10M1 | PacBio RS II | single | 2 | 1.3 | 32 | SRP215872 | SRR9724620 |
| BVI | Illumina MiSeq | paired | 1 | 4.31 | 108 | SRP215872 | SRR9724615 |
| BVI | PacBio RS II | single | 3 | 1.8 | 45 | SRP215872 | SRR9724614 |
| Cflag | Illumina MiSeq | paired | 1 | 3.94 | 99 | SRP215872 | SRR9724617 |
| Cflag | PacBio RS II | single | 2 | 0.95 | 24 | SRP215872 | SRR9724616 |
| RCC970-E3 | Illumina MiSeq | paired | 1 | 3.98 | 100 | SRP215872 | SRR9724623 |
| RCC970-E3 | PacBio RS II | single | 2 | 0.52 | 13 | SRP215872 | SRR9724622 |
Fig. 2K-mer frequency distribution and estimated genome size of Cafeteria roenbergensis strain E4-10P. Frequency distribution of 19-mers in the quality-trimmed MiSeq read set of CrE4-10P. The major peak at ~120 × coverage corresponds to the majority of homozygous k-mers of the diploid (2n) genome, the smaller peak at half the coverage comprises haplotype-specific (1n) k-mers. Small peaks at 3n and 4n represent regions of higher copy numbers. Low-coverage k-mers derive from sequencing errors and bacterial contamination. Cumulatively, the k-mer distribution suggests an approximate haploid genome size of 40 Mbp.
Assembly and annotation statistics.
| Assembly | # Contigs | Total size (bp) | Contig N50 (bp) | % GC | % Repeats | % BUSCOs | # Proteins | Genbank Accession |
|---|---|---|---|---|---|---|---|---|
| CrE4-10P | 218 | 35,335,825 | 402,892 | 70.5 | 27.8 | 83.8 | 8364 | VLTO01000000 |
| CrBVI | 170 | 36,327,047 | 460,467 | 70.1 | 27.7 | 83.2 | 8483 | VLTN01000000 |
| CrCflag | 270 | 34,521,237 | 231,394 | 70.5 | 27.9 | 82.8 | 8018 | VLTM01000000 |
| CrRCC970-E3 | 396 | 33,988,271 | 148,311 | 70.5 | 27.9 | 81.8 | 7857 | VLTL01000000 |
Fig. 3Completeness assessment of Cafeteria genome assemblies based on single-copy orthologs. (a) Abundance of 303 single-copy core gene markers (BUSCOs) in different categories and assemblies. (b) Distribution of BUSCOs missing in at least one assembly (black tiles).
Primers used for the validation of the intron-exon structure in two genes of C. roenbergensis strain RCC970-E3.
| Primer name | Forward (5′ to 3′) | Reverse (5′ to 3′) |
|---|---|---|
| TBP-PCR#1 | CCGCGATGCTTCTGCCTCCA | CGCGCAGTCGAGATTCACAGT |
| TBP-PCR#2 | GCCATCACCAAGCACGGGATCA | CGCGCAGTCGAGATTCACAGT |
| 60sRP-PCR#3 | CGCAACCAGACCAAGTTCCACG | GTACGCCAGAGCATGCGGGA |
| 60sRP-PCR#4 | CGCACTGAGGAGGTGAACGTC | GCGGGTTGGTGTTCCGCTTC |
| Measurement(s) | DNA • mitochondrial_DNA • sequence_assembly • sequence feature annotation |
| Technology Type(s) | DNA sequencing • genome assembly • sequence annotation |
| Sample Characteristic - Organism | Cafeteria roenbergensis |
| Sample Characteristic - Environment | marine water body |
| Sample Characteristic - Location | Northwest Atlantic Ocean • Carribean Sea • Southeast Pacific Ocean • North East Pacific Ocean |