| Literature DB >> 31632652 |
Jon Bråte1, Janina Fuss2, Shruti Mehrota1,3, Kjetill S Jakobsen4, Dag Klaveness3.
Abstract
Hydrurus foetidus is a freshwater chrysophyte alga. It thrives in cold rivers in polar and high alpine regions. It has several morphological traits reminiscent of single-celled eukaryotes, but can also form macroscopic thalli. Despite its ability to produce polyunsaturated fatty acids, its life under cold conditions and its variable morphology, very little is known about its genome and transcriptome. Here, we present an extensive set of next-generation sequencing data, including genomic short reads from Illumina sequencing and long reads from Nanopore sequencing, as well as full length cDNAs from PacBio IsoSeq sequencing and a small RNA dataset (smaller than 200 bp) sequenced with Illumina. The genome sequences were combined to produce an assembly consisting of 5069 contigs, with a total assembly size of 171 Mb and a 77% BUSCO completeness. The new data generated here may contribute to a better understanding of the evolution and ecological roles of chrysophyte algae, as well as to resolve the branching patterns at a larger phylogenetic scale. Copyright:Entities:
Keywords: Chrysophyceae; Hydrurus foetidus; Nanopore; PacBio; genome; golden algae; transcriptome
Year: 2019 PMID: 31632652 PMCID: PMC6784874 DOI: 10.12688/f1000research.16734.3
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Overview of datasets produced in this study.
| Dataset | Description | Accession |
|---|---|---|
| Hfoetidus_ACAGTG_L001_R1_001.fastq.gz
| Genomic DNA sequenced with Illumina HiSeq 2500. | ERR2882522 |
| Hfoetidus_ACAGTG_L002_R1_001.fastq.gz
| Genomic DNA sequenced with Illumina HiSeq 2500. | ERR3188711 |
| Hydrurus_nanopore_fastq_files.tar.gz | Basecalled Oxford Nanopore reads | ERR2887871 |
| IsoSeq_1-2kb_polished_low_qv_consensus_isoforms.fastq.gz | mRNA sequenced with PacBio SMRT RSII. | ERR2882521 |
| IsoSeq_1-2kb_polished_high_qv_consensus_isoforms.fastq.gz | mRNA sequenced with PacBio SMRT RSII. | ERR2869477 |
| IsoSeq_2-3kb_polished_low_qv_consensus_isoforms.fastq.gz | mRNA sequenced with PacBio SMRT RSII. | ERR2869481 |
| IsoSeq_2-3kb_polished_high_qv_consensus_isoforms.fastq.gz | mRNA sequenced with PacBio SMRT RSII. | ERR2869478 |
| IsoSeq_3-6kb_polished_low_qv_consensus_isoforms.fastq.gz | mRNA sequenced with PacBio SMRT RSII. | ERR2869484 |
| IsoSeq_3-6kb_polished_high_qv_consensus_isoforms.fastq.gz | mRNA sequenced with PacBio SMRT RSII. | ERR2869483 |
| 1-Hfo-miRNA_S6_R1_001.fastq.gz | Small RNA sequenced with Illumina NextSeq 500. | ERR2869485 |
| pilon_round3.fasta.gz | Draft genome assembly | GCA_900617105.1 |
Summary of the read numbers in the different file types of the IsoSeq data set.
| Size fraction | |||||||
|---|---|---|---|---|---|---|---|
| Library | < 1kb | 1-2kb | 2-3kb | 3-4kb | 4-5kb | 5-6kb | > 6kb |
| 1-2kb_high | 7310 | 31953 | 17 | 0 | 0 | 0 | 0 |
| 1-2kb_low | 903 | 5908 | 170 | 110 | 89 | 39 | 80 |
| 2-3kb_high | 596 | 2703 | 37399 | 217 | 443 | 147 | 0 |
| 2-3kb_low | 78 | 586 | 7749 | 116 | 301 | 134 | 215 |
| 3-6kb_high | 8 | 13 | 552 | 28621 | 4603 | 20 | 388 |
| 3-6kb_low | 0 | 6 | 268 | 8830 | 1535 | 74 | 418 |
The suffixes “_high” and “_low” refers to the high- and low-quality sequences produced by the IsoSeq sequencing. The data files have the following accession numbers: 1-2kb_high – ERR2869477; 1-2kb_low - ERR2882521; 2-3kb_high - ERR2869478; 2-3kb_low - ERR2869481; 3-6kb_high - ERR2869483; 3-6kb_low - ERR2869484.
Statistics of the Hydrurus foetidus draft genome assembly.
| Number of
| Largest contig | Contig N50 | Assembly size | Estimated
| Complete and fragmented
| GC (%) |
|---|---|---|---|---|---|---|
| 5069 | 5118963 bp | 43856 bp | 171 Mb | 299.9 Mb | 77.2% | 45.4 |
aThe genome size estimation was based on k-mer frequencies on the Illumina data.
bBUSCO was run against the Eukaryota dataset.