| Literature DB >> 32490088 |
Christian Morabito1, Riccardo Aiese Cigliano2, Eric Maréchal1, Fabrice Rébeillé1, Alberto Amato1.
Abstract
The complete genome of the thraustochytrid Aurantiochytrium limacinum strain CCAP_4062/1 was sequenced using both Illumina Novaseq 6000 and third generation sequencing technology PacBio RSII in order to obtain trustworthy assembly and annotation. The reads from both platforms were combined at multiple levels in order to obtain a reliable assembly, then compared to the A. limacinum ATCCⓇ MYA1381™ reference genome. The final assembly was annotated with the help of strain CCAP_4062/1 RNAseq data. A. limacinum strain CCAP_4062/1 is an industrial strain used for the production of very long chain polyunsaturated fatty acids, like the docosahexaenoic acid that is an essential fatty acid synthesised only at very low pace in humans and vertebrates . Thraustochytrids in general and Aurantiochytrium more specifically, are used for carotenoid and squalene production as well. Beside their biotechnological interest, thraustochytrids play a crucial role in both inshore and oceanic basins ecosystems. Genome sequences will foster biotechnological as well as ecological studies.Entities:
Keywords: Biotechnology; Genome; Next generation sequencing; Structural annotation; Third generation sequencing; Thraustochytrid
Year: 2020 PMID: 32490088 PMCID: PMC7262427 DOI: 10.1016/j.dib.2020.105729
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Description of the genomics datasets used in this study.
| PacBio Sequel RSII | Illumina NovaSeq 6000 | |
|---|---|---|
| Sequenced Bases | 12,160,726,429 bp | 9,045,844,656 bp |
| Number of Reads | 770,207 | 59,906,256 |
| Sequencing Layout | Single End Long Reads | Paired End 2 × 150 bp |
| Max Read Length | 120,000 bp | 150 bp |
| Read N50 | 37,932 bp | 150 bp |
| Estimate Genome Coverage | 202 × | 150 × |
Fig. 1Schematic representation of the genome assembly pipeline. Raw PacBio reads were corrected using the Illumina data and using three iterations of the program LoRDEC. The corrected PacBio reads were used to create a draft assembly with the tool wtdbg2, thus producing the first raw assembly. The latter was then polished using the Illumina reads and performing five iterations of Pilon corrections and one run of REAPR to remove misassemblies. The polished assembly was used together with the Illumina reads to perform an assembly with Spades. The obtained assembly was polished with 10 iterations of Pilon, then gap closing was performed with LR_GapCloser.
Fig. 2Dotplot obtained by aligning the Aurli1 reference genome [13] assembly from A. limacinum ATCCⓇ MYA1381™ (X-axis) against the Aurantiochytrium limacinum strain CCAP_4062/1 scaffolds.
Genome assembly statistics.
| Aurantiochytrium limacinum strain CCAP_4062/1 | |
|---|---|
| Number of Contigs | 478 |
| Genome Size | 62,086,374 bp |
| Number of Contigs larger than 50 Kbp | 210 |
| N50 | 358,008 bp |
| L50 | 51 |
| Largest Contig | 2029,424 bp |
| GC Content | 45.66% |
Fig. 3Results of the BUSCO analysis highlighting the presence of complete and single copy eukaryotic genes in the assembly. Letters indicate the BUSCO categories presented in the figure, numbers indicate the number of genes composing a category. ‘n’ indicate the total number of genes in all BUSCO categories.
Fig. 4Histograms showing the distribution of alignment and identity percentage between Aurantiochytrium limacinum ATCCⓇ MYA1381™ (reference) and Aurantiochytrium limacinum strain CCAP_4062/1 predicted proteins (present study).
| Applied Microbiology and Biotechnology | |
| Marine eukaryotic microbiology | |
| DNA Sequencing Data | |
| The data were acquired by Next-Generation Sequencing technology using Illumina Novaseq 6000 and third generation sequencing technology using PacBio RSII platforms | |
| Raw reads were deposited in GenBank. | |
| DNA was extracted from six day-old cultures. | |
| Whole-genome sequencing, genome assembly, and annotation | |
| Institution: LPCV-IRIG | |
| Repository name: NCBI BioProjects |