Literature DB >> 31720332

De novo whole genome sequencing data of two mangrove-isolated microalgae from Terengganu coastal waters.

Kit Yinn Teh^1,2, C L Wan Afifudeen^1,2, Ahmad Aziz³, Li Lian Wong^1,2,4, Saw Hong Loh^3,1, Thye San Cha^3,1,2.

Abstract

Interest in harvesting potential benefits from microalgae renders it necessary to have the many ecological niches of a single species to be investigated. This dataset comprises de novo whole genome assembly of two mangrove-isolated microalgae (from division Chlorophyta); Chlorella vulgaris UMT-M1 and Messastrum gracile SE-MC4 from Universiti Malaysia Terengganu, Malaysia. Library runs were carried out with 2 × 150 base paired-ends reads, whereas sequencing was conducted using Illumina Novaseq 2500 platform. Sequencing yielded raw reads amounting to ∼11 Gb in total bases for both species and was further assembled de novo. Genome assembly resulted in a 50.15 Mbp and 60.83 Mbp genome size for UMT-M1 and SE-MC4, respectively. All filtered and assembled genomic data sequences have been submitted to National Centre for Biotechnology Information (NCBI) and can be located at DDBJ/ENA/GenBank under the accession of VJNP00000000 (UMT-M1) and VIYE00000000 (SE-MC4).

Entities: Chemical Disease Gene Species

Keywords: Chlorophyta; IDBA-UD; Next generation sequencing; Oleaginous microalgae; Salinity

Year: 2019 PMID： 31720332 PMCID： PMC6838400 DOI： 10.1016/j.dib.2019.104680

Source DB: PubMed Journal: Data Brief ISSN： 2352-3409

Specifications Table First complete chromosomal genome sequencing of two native microalgae isolated from mangrove area in tropical region. Further enrich the currently limited WGS data collections of important microalgae species, aid in strain improvement and support interests of various biotechnology industries. Benefit future works on comparative genome analysis and microalgae adaptation responses.

Data

Response of microalgae to environmental stimuli is species-specific and may even vary from strain to strain [1,2]. Moreover, mangrove dwelling microalgae are often exposed to impending high and low tides making them unique assemblages in a marginal ecosystem niche with possibly unique responses. Being able to regulate and exert control over the outcome of those responses remain as the most difficult conundrums in phycology research. Both UMT-M1 and SE-MC4 used in this research are oleaginous native species isolated from the mangrove areas in Terengganu, Malaysia. UMT-M1 has been intensively studied in our previous research for oil and fatty acid productions under various culture conditions, such as nitrogen starvation [3], phytohormones treatments [[4], [5], [6]], as well as strain improvement through genetic modifications [7,8]. On the other hand, SE-MC4 is a non-model species which has been observed to produce more than 50% (of dry weight) of total oil content in our laboratory. The exploration on novel genome in a non-model microalga is imperative in order to enrich the available genome data for further biodiesel development applications. Efforts to improve microalgae feedstock from a molecular aspect is often curtailed by the limited number of available microalgae genomes [9]. Moreover, available C. vulgaris genome only constitutes a freshwater species [10]. Following in that prospect, the de novo WGS of C. vulgaris UMT-M1 featured in this report represents a mangrove dwelling microalga that is able to adapt and survive in a wide range of salinity. Besides that, exploration of potentially high-oil producing non-model species such as M. gracile SE-MC4 is pertinent for adding genetic variety to the presently available genetic databank [11]. In UMT-M1, subsequent sequencing generated 73, 495,318 raw reads, amounting to 11,097,793,018 (11.09 Gb) in total bases (Table 1). Overall, 89.58% of total bases achieved a Phred score of Q30 with GC content of 62.29%. High quality raw reads from Table 1 were then filtered, normalized and assembled de novo using IDBA-UD assembler [12]. The IDBA-UD assembler internally pipes contigs into scaffolds to form assembled scaffolds. Scaffolds with less than 200 bases were removed. Assembly produced 2547 scaffolds amounting to a total of 50,153,796 bases (50 Mbp). The scaffold positioned at the N50 and N90 were 56,390 and 14,886 bases, respectively (Table 2).

Table 1

Statistics of paired-end sequence library for C. vulgaris UMT-M1 and M. gracile SE-MC4.

Species	Total reads	Total bases	GC Content (%)	Nt* > Q30% (%)
C. vulgaris UMT-M1	73,495,318	11,097,793,018 (11.09 Gb)	62.29	89.58
M. gracile SE-MC4	72,742,158	10,984,065,858 (10.98 Gb)	68.27	90.52

*Nt = nucleotides.

Table 2

De novo sequence statistics for C. vulgaris UMT-M1 and M. gracile SE-MC4.

Species	Number of scaffolds	Total length (base)	Max length (base)	Min length (base)	N50	N90
C. vulgaris UMT-M1	2547	50,153,796	386,660	201	56,390	14,886
M. gracile SE-MC4	32,473	60,830,643	52,109	201	2915	802

Statistics of paired-end sequence library for C. vulgaris UMT-M1 and M. gracile SE-MC4. *Nt = nucleotides. De novo sequence statistics for C. vulgaris UMT-M1 and M. gracile SE-MC4. In SE-MC4, total bases generated from sequencing amounted to 10,984,065,858 bp (10.98 Gb) with 68.27% GC content and a Phred score of 90.52%. Sequencing data statistics are summarised in Table 1. De novo assembly in SE-MC4 obtained 32,473 scaffolds and a total length of 60,830,643 bp (60.83 Mb) with maximum length of 52,109 bp and minimum length of 201 bp. Mean length (N50) of scaffolds is 2915 bp, while N90 is 802 bp. Statistics of the genome assembly are as shown in Table 2.

Experimental design, materials, and methods

Sample preparation

Inoculum stock was obtained from microalgae culture collection at the Universiti Malaysia Terengganu. Stock cultures were maintained under axenic and sterile culture conditions in modified Guillard's F2 medium [3] prepared with artificial seawater (30 ppt). Microalgae cells were harvested at mid-stationary phase. Microalgal cells were harvested from 50 mL of culture by centrifugation at 7000 rpm for 5 min. DNA was extracted from fresh pellet using Wizard® Genomic DNA Purification Kit (Promega, USA). All extraction steps were carried out as per manufacturer's protocol. Prior to sequencing, DNA purity was evaluated via absorbance values of (260/280, 260/230) ratio, gel electrophoresis pattern and double-strand DNA concentration measurements.

De novo WGS sequencing

Library preparation and sequencing were conducted by Theragen Bio Itex, South Korea. Library preparation was carried out using TruSeq Nano DNA Library Prep Kit (Illumina, USA). Library construction was made by DNA size selection attached with adaptors to produce an insert size of 350 bp [13]. Runs were conducted with 2 × 150 base paired-end reads. Sequencing was then performed on Illumina Novaseq 2500 platform. Cluster generation on flow cells was performed by using constructed libraries on cBot equipment (Illumina, USA). Following sequencing of raw reads, adapter sequences were trimmed via cutadapt v1.10 [14] and quality filtering was performed to remove contaminants. Reads that scored above Q30 were selected for assembly. De novo assembly of high quality reads was then carried out using IDBA-UD assembler to form scaffolds [12]. Scaffolds that were <200 bp in length were removed manually.

Deposition of genome data

Raw data sequence and assembled genome were deposited in NCBI depository portal. Steps by steps guidelines on submission was followed as in NCBI author guide via https://www.ncbi.nlm.nih.gov/genbank/genomesubmit/. Breakdown of the project accession is shown in Table 3.

Table 3

Sequence accession numbers and directory links.

Species	Directory/Data	Accession number	Links
C. vulgaris UMT-M1	BioProject	PRJNA550188	https://www.ncbi.nlm.nih.gov/bioproject/PRJNA550188
	BioSample	SAMN12111214	https://www.ncbi.nlm.nih.gov/biosample/SAMN12111214
	Raw sequence (SRA)	SRR9478717	https://www.ncbi.nlm.nih.gov/sra/SRX6245806/
	Assembled genome	VJNP00000000	https://www.ncbi.nlm.nih.gov/nuccore/VJNP00000000
M. gracile SE-MC4	BioProject	PRJNA550185	https://www.ncbi.nlm.nih.gov/bioproject/PRJNA550185
	BioSample	SAMN12111213	https://www.ncbi.nlm.nih.gov/biosample/SAMN12111213
	Raw sequence (SRA)	SRR9587833	https://www.ncbi.nlm.nih.gov/sra/SRX6353668
	Assembled genome	VIYE00000000	https://www.ncbi.nlm.nih.gov/nuccore/VIYE00000000

Sequence accession numbers and directory links.

Specifications Table

Subject	Molecular Biology
Specific subject area	Whole genome sequencing (WGS)
Type of data	WGS data of:i) C. vulgaris UMT-M1ii) M. gracile SE-MC4
How data were acquired	Paired-end sequencing on Illumina Novaseq 2500 platform followed by de novo assembly using IUBD-DA
Data format	Raw and filtered de novo genome sequences: FASTQ
Parameters for data collection	DNA extracted from axenic cultures
Description of data collection	DNA from fresh microalgae cells was extracted. DNA purity and concentration were measured before sequencing. Data were assembled de novo using IDBA-UD assembler.
Data source location	Institution: Institute of Marine Biotechnology, Universiti Malaysia TerengganuCity/Town/Region: Kuala Terengganu, TerengganuCountry: MalaysiaLatitude and longitude (and GPS coordinates) for collected samples/data:i) UMT-M1: 5° 24′ 11.39″ N, 103° 05′ 9.60″ E (Mengabang Telipot, Universiti Malaysia Terengganu)ii) SE-MC4: 5° 31′ 59.2″ N 102° 56′ 52.2″ E (Setiu Wetland, Terengganu)
Data accessibility	Genomes of both species can be found at DDBJ/ENA/GenBank under the accession numbers:i) C. vulgaris UMT-M1: VJNP00000000ii) M. gracile SE-MC4: VIYE00000000

Value of the Data

•

First complete chromosomal genome sequencing of two native microalgae isolated from mangrove area in tropical region.

•

Further enrich the currently limited WGS data collections of important microalgae species, aid in strain improvement and support interests of various biotechnology industries.

•

Benefit future works on comparative genome analysis and microalgae adaptation responses.

9 in total

1. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth.

Authors: Yu Peng; Henry C M Leung; S M Yiu; Francis Y L Chin
Journal: Bioinformatics Date: 2012-04-11 Impact factor: 6.937

2. Gibberellin Promotes Cell Growth and Induces Changes in Fatty Acid Biosynthesis and Upregulates Fatty Acid Biosynthetic Genes in Chlorella vulgaris UMT-M1.

Authors: Malinna Jusoh; Saw Hong Loh; Ahmad Aziz; Thye San Cha
Journal: Appl Biochem Biotechnol Date: 2018-12-10 Impact factor: 2.926

Review 3. The potentials and challenges of algae based biofuels: a review of the techno-economic, life cycle, and resource assessment modeling.

Authors: Jason C Quinn; Ryan Davis
Journal: Bioresour Technol Date: 2014-10-24 Impact factor: 9.642

Review 4. Can Omics Approaches Improve Microalgal Biofuels under Abiotic Stress?

Authors: El-Sayed Salama; Sanjay P Govindwar; Rahul V Khandare; Hyun-Seog Roh; Byong-Hun Jeon; Xiangkai Li
Journal: Trends Plant Sci Date: 2019-05-10 Impact factor: 18.313

Review 5. Library construction for next-generation sequencing: overviews and challenges.

Authors: Steven R Head; H Kiyomi Komori; Sarah A LaMere; Thomas Whisenant; Filip Van Nieuwerburgh; Daniel R Salomon; Phillip Ordoukhanian
Journal: Biotechniques Date: 2014-02-01 Impact factor: 1.993

6. Indole-3-acetic acid (IAA) induced changes in oil content, fatty acid profiles and expression of four fatty acid biosynthetic genes in Chlorella vulgaris at early stationary growth phase.

Authors: Malinna Jusoh; Saw Hong Loh; Tse Seng Chuah; Ahmad Aziz; Thye San Cha
Journal: Phytochemistry Date: 2015-01-09 Impact factor: 4.072

7. Differential regulation of fatty acid biosynthesis in two Chlorella species in response to nitrate treatments and the potential of binary blending microalgae oils for biodiesel application.

Authors: Thye San Cha; Jian Woon Chen; Eng Giap Goh; Ahmad Aziz; Saw Hong Loh
Journal: Bioresour Technol Date: 2011-09-16 Impact factor: 9.642

8. Examination of triacylglycerol biosynthetic pathways via de novo transcriptomic and proteomic analyses in an unsequenced microalga.

Authors: Michael T Guarnieri; Ambarish Nag; Sharon L Smolinski; Al Darzins; Michael Seibert; Philip T Pienkos
Journal: PLoS One Date: 2011-10-17 Impact factor: 3.240

9. Genome Sequence of the Oleaginous Green Alga, Chlorella vulgaris UTEX 395.

Authors: Michael T Guarnieri; Jennifer Levering; Calvin A Henard; Jeffrey L Boore; Michael J Betenbaugh; Karsten Zengler; Eric P Knoshaug
Journal: Front Bioeng Biotechnol Date: 2018-04-05

9 in total

7 in total

1. A brief period of darkness induces changes in fatty acid biosynthesis towards accumulation of saturated fatty acids in Chlorella vulgaris UMT-M1 at stationary growth phase.

Authors: Thye San Cha; Willy Yee; Pamela Szu Phin Phua; Saw Hong Loh; Ahmad Aziz
Journal: Biotechnol Lett Date: 2021-01-12 Impact factor: 2.461

2. Influence of nitrogen availability on biomass, lipid production, fatty acid profile, and the expression of fatty acid desaturase genes in Messastrum gracile SE-MC4.

Authors: Kaben Anne-Marie; Willy Yee; Saw Hong Loh; Ahmad Aziz; Thye San Cha
Journal: World J Microbiol Biotechnol Date: 2020-01-07 Impact factor: 3.312

3. Enhanced fatty acid methyl esters recovery through a simple and rapid direct transesterification of freshly harvested biomass of Chlorella vulgaris and Messastrum gracile.

Authors: Saw Hong Loh; Mee Kee Chen; Nur Syazana Fauzi; Ahmad Aziz; Thye San Cha
Journal: Sci Rep Date: 2021-02-01 Impact factor: 4.379

4. Double-high in palmitic and oleic acids accumulation in a non-model green microalga, Messastrum gracile SE-MC4 under nitrate-repletion and -starvation cultivations.

Authors: Che-Lah Wan Afifudeen; Saw Hong Loh; Ahmad Aziz; Kazutaka Takahashi; Abd Wahid Mohd Effendy; Thye San Cha
Journal: Sci Rep Date: 2021-01-11 Impact factor: 4.379

5. Lipid accumulation patterns and role of different fatty acid types towards mitigating salinity fluctuations in Chlorella vulgaris.

Authors: Kit Yinn Teh; Saw Hong Loh; Ahmad Aziz; Kazutaka Takahashi; Abd Wahid Mohd Effendy; Thye San Cha
Journal: Sci Rep Date: 2021-01-11 Impact factor: 4.379

Review 6. Bioprospecting of microalgae metabolites against cytokine storm syndrome during COVID-19.

Authors: Che Lah Wan Afifudeen; Kit Yinn Teh; Thye San Cha
Journal: Mol Biol Rep Date: 2021-11-09 Impact factor: 2.742

7. Graph-based models of the Oenothera mitochondrial genome capture the enormous complexity of higher plant mitochondrial DNA organization.

Authors: Axel Fischer; Jana Dotzek; Dirk Walther; Stephan Greiner
Journal: NAR Genom Bioinform Date: 2022-03-31

7 in total