Literature DB >> 34277902

De novo transcriptome assembly data of the marine bioluminescent dinoflagellate Pyrocystis lunula.

Damian Menghini1, Sylvain Aubry1.   

Abstract

Pyrocystis lunula is a unicellular bioluminescing dinoflagellates. While the mechanisms and genes underlying bioluminescence and luciferase synthesis are understood in many bioluminescing clades, it remains unknown in dinoflagellates. We took advantage of merging long and short reads to provide here a de novo assembly of P. lunula transcriptome. A total of 975 million filtered paired-end reads were obtained and assembled into 155,716 contigs corresponding to putative transcripts that were functionally annotated. This dataset will be valuable for improving our understanding of protist's biology and is accessible via NCBI BioProject (PRJNA727555).
© 2021 The Author(s).

Entities:  

Keywords:  Bioluminescence; Dinoflagellates; Hybrid assembly; Pyrocystis lunula; RNAseq; Transcriptome

Year:  2021        PMID: 34277902      PMCID: PMC8267542          DOI: 10.1016/j.dib.2021.107254

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table

Value of the Data

We present here the de novo assembly of the transcriptome of the bioluminescent dinoflagellate (eukaryotic protists) Pyrocystis lunula. Dinoflagellates often present very large genomes that remain almost out of reach of current sequencing techniques; therefore, a comprehensive transcriptome is highly valuable to further research. The RNAseq has been performed combining long and short reads in order to improve quality of the assembly. These data will allow getting more information on the specialized metabolism of dinoflagellates, the genetic basis and regulation of bioluminescence and get more insight into dinoflagellate evolution.

Data Description

We present here a de novo transcriptome sequencing and assembly of the unicellular dinoflagellate P. lunula (Fig. 1A). Many marine protists like P. lunula are responsible for the “sea blooming” in various places worldwide [1]. P. lunula is a model species used for deciphering circadian rhythms, bioluminescence and photosynthesis, but despite a long history as a model organism, the knowledge about its genomic features remains relatively limited. Sequencing transcriptome of this organism might help shading light on few peculiarities of dinoflagellates more generally, particularly the extent to which transcriptional regulation is actually involved in gene expression in these organisms [2]. A total output of 975 Gb reads was generated from short (Illumina), SRA accession number SRX10783586-SRX10783590 and long (ONT) reads, SRA accession number SRX10783591. In absence of reference genome, reads were filtered and used for the de novo transcriptome assembly: 57 % of the total reads were eventually used for assembling the transcript contigs. The resulting transcriptome was 232 Mb size with a GC content of 62 % and a N50 contig length of 1780 bp (Table 1, available at https://doi.org/10.6084/m9.figshare.14554824.v2). We then evaluated the assembled transcriptome by Benchmarking Universal Single-Copy Orthologs (BUSCO, [3]), and shown that 96 % of transcripts were complete BUSCO genes using alveolate genomes as a reference. Transcripts were then annotated and classified according to their gene ontology terms using Interproscan and BLAST against Arabidopsis thaliana (TAIR 10) proteome (Supplementary Data 1).
Fig. 1

A. Pyrocystis lunula is a unicellular bioluminescing dinoflagellate. Microscope picture showing chlorophyll (red) and luciferase (blue) glowing. B. Contig's size repartition of the assembled transcriptome C. BUSCO assessment of the contigs using alveolate's database from BUSCO.

Table 1

Summary statistics of de novo transcriptome assembly for Pyrocystis lunula using the combined data of 7 samples.

Transcriptome featuresValue
No of contigs155,716
Largest contig49,470
Total length232,137,320
N501780
N751179
L5042,944
L7582,752
GC (%)61.58
A. Pyrocystis lunula is a unicellular bioluminescing dinoflagellate. Microscope picture showing chlorophyll (red) and luciferase (blue) glowing. B. Contig's size repartition of the assembled transcriptome C. BUSCO assessment of the contigs using alveolate's database from BUSCO. Summary statistics of de novo transcriptome assembly for Pyrocystis lunula using the combined data of 7 samples.

Experimental Design, Materials and Methods

Sampling and RNA extraction

Two months old Pyrocystis lunula cultures grown in 12 h day/12 h night cycles and kept without mixing at 21 °C and 140 µmol/m2/s. Total RNA was isolated from shock freezed in liquid nitrogen pelleted cultures using trizol extraction and subsequently treated by DNase digestion step according to the manufacturer's protocol (Qiagen, Germany). The integrity of the RNA was measured on a 4200 TapeStation using the RNA ScreenTape assay (Agilent Technologies, USA).

Library preparation and sequencing

RNA samples with an RNA integrity number above 8.0 were used for library preparation. A total of 6 cDNA libraries were prepared out of 300 ng total RNA input with the TruSeq RNA Sample Prep Kit v2 (Illumina, USA) according to the manufacturer's protocol. Libraries were pooled and sequenced using an Illumina NovaSeq 6000 sequencing instrument using 100 bp paired-end reads. Sequencing was performed by the Functional Genomic Centre of the University of Zürich. In parallel, mRNA from one shaked culture was extracted using NEBNext Poly(A) mRNA Magnetic Isolation Module (New England Biolabs) and direct cDNA kit (Oxford Nanopore Technologies). ONT sequencing was performed using a MinION device following the manufacturer's instructions.

Transcriptome assembly

Illumina paired-end (PE) reads were first quality checked using Fastqc (v0.11.9), MultiQC (v1.9) and FastqScreen (v0.14.1). Afterwards they were adapter trimmed, quality trimmed (4 bp sliding windows from 5’ and 3’ ends, windows with low quality ( None-rRNA Illumina read pairs and pass filtered ONT reads (quality score of Q7 and above) were assembled using k-mer = 33 into transcripts using rnaspades (v3.14.0) with stranded mode of “–ss rf”. Assembled transcriptome was analyzed using BUSCO (v5, [3]), QUAST (v5.0.2, [4]) and EMBOSS (v6.6.0, [5]).

Functional annotation and gene ontology

Longest ORF per transcript contig was identified using TransDecoder (v5.5.0). Predicted protein sequences were compared against the TAIR10 protein sequences using BLASTP (v2.10.1+). They were also compared to the InterPro database using Interproscan (v5.32-71.0) to obtain gene ontology (GO) and pathway annotation.

CRediT Author Statement

Damian Menghini: Conceptualization, culture and extraction; Sylvain Aubry: Conceptualization, Data curation, paper writing, reviewing, editing.

Funding Information

This work was supported by the Swiss National Science Foundation (#31003A_172977).

Ethical Statement

Not applicable.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships which have or could be perceived to have influenced the work reported in this article.
SubjectAlgal biology
Specific subject areaTranscriptomics
Type of dataAssembly (fasta file), Table, Figure
How data were acquiredIllumina Novaseq 6000, ONT MinION
Data formatRaw and analyzed
How data were acquiredPyrocystis lunula cultures were harvested at dawn in the dark with or without 2 h shaking treatment.
Description of data collectionTotal RNA was extracted from 6 cultures of P. lunula using Trizol and subsequent DNAse treatment. cDNA was prepared using the TruSeq sample prep kit and Illumina sequencing performed using 100 bp pair-ended reads. In parallel, mRNA was purified using NebNExt poly(A) mRNA isolation module followed by Oxford Nanopore direct cDNA sequencing kit and sequenced on MinION device.
Data source locationThe P. lunula strain was obtained from the University of Montreal were grown in culture cabinet at the University of Zürich, Switzerland
Data accessibilityRaw data were deposited in the NCBI SRA database under the Bioproject accession number PRJNA727555, accessible under this link:https://www.ncbi.nlm.nih.gov/bioproject/ PRJNA727555. The assembly transcript contigs and annotation are accessible on Figshare under this link: https://doi.org/10.6084/m9.figshare.14554824.v2
  5 in total

1.  EMBOSS: the European Molecular Biology Open Software Suite.

Authors:  P Rice; I Longden; A Bleasby
Journal:  Trends Genet       Date:  2000-06       Impact factor: 11.639

2.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs.

Authors:  Felipe A Simão; Robert M Waterhouse; Panagiotis Ioannidis; Evgenia V Kriventseva; Evgeny M Zdobnov
Journal:  Bioinformatics       Date:  2015-06-09       Impact factor: 6.937

3.  An "omic" approach to Pyrocystis lunula: New insights related with this bioluminescent dinoflagellate.

Authors:  Carlos Fajardo; Francisco Amil-Ruiz; Carlos Fuentes-Almagro; Marcos De Donato; Gonzalo Martinez-Rodriguez; Almudena Escobar-Niño; Rafael Carrasco; Juan Miguel Mancera; Francisco Javier Fernandez-Acero
Journal:  J Proteomics       Date:  2019-08-26       Impact factor: 4.044

4.  QUAST: quality assessment tool for genome assemblies.

Authors:  Alexey Gurevich; Vladislav Saveliev; Nikolay Vyahhi; Glenn Tesler
Journal:  Bioinformatics       Date:  2013-02-19       Impact factor: 6.937

5.  The Lingulodinium circadian system lacks rhythmic changes in transcript abundance.

Authors:  Sougata Roy; Mathieu Beauchemin; Steve Dagenais-Bellefeuille; Louis Letourneau; Mario Cappadocia; David Morse
Journal:  BMC Biol       Date:  2014-12-20       Impact factor: 7.431

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.