Literature DB >> 24415958

Nannochloropsis genomes reveal evolution of microalgal oleaginous traits.

Dongmei Wang¹, Kang Ning¹, Jing Li¹, Jianqiang Hu¹, Danxiang Han², Hui Wang³, Xiaowei Zeng¹, Xiaoyan Jing¹, Qian Zhou¹, Xiaoquan Su¹, Xingzhi Chang¹, Anhui Wang¹, Wei Wang¹, Jing Jia¹, Li Wei¹, Yi Xin¹, Yinghe Qiao¹, Ranran Huang¹, Jie Chen¹, Bo Han¹, Kangsup Yoon⁴, Russell T Hill³, Yonathan Zohar⁵, Feng Chen³, Qiang Hu⁴, Jian Xu¹.

Abstract

Oleaginous micron class="Species">algae are promising feedstock for biofuels, yet the genetic diversity, origin and evolution of oleaginous traits remain largely unknown. Here we present a detailed phylogenomic analysis of five oleaginous Nannochloropsis species (a total of six strains) and one time-series transcriptome dataset for triacylglycerol (TAG) synthesis on one representative strain. Despite small genome sizes, high coding potential and relative paucity of mobile elements, the genomes feature small cores of ca. 2,700 protein-coding genes and a large pan-genome of >38,000 genes. The six genomes share key oleaginous traits, such as the enrichment of selected lipid biosynthesis genes and certain glycoside hydrolase genes that potentially shift carbon flux from chrysolaminaran to TAG synthesis. The eleven type II diacylglycerol acyltransferase genes (DGAT-2) in every strain, each expressed during TAG synthesis, likely originated from three ancient genomes, including the secondary endosymbiosis host and the engulfed green and red algae. Horizontal gene transfers were inferred in most lipid synthesis nodes with expanded gene doses and many glycoside hydrolase genes. Thus multiple genome pooling and horizontal genetic exchange, together with selective inheritance of lipid synthesis genes and species-specific gene loss, have led to the enormous genetic apparatus for oleaginousness and the wide genomic divergence among present-day Nannochloropsis. These findings have important implications in the screening and genetic engineering of microalgae for biofuels.

Entities: Chemical Disease Gene Mutation Species

Mesh：

Substances：
Triglycerides

Year: 2014 PMID： 24415958 PMCID： PMC3886936 DOI： 10.1371/journal.pgen.1004094

Source DB: PubMed Journal: PLoS Genet ISSN： 1553-7390 Impact factor: 5.917

Introduction

Micron class="Species">algae represent a promising source of biomass feedstock for fuels and chemicals because many species possess the ability to grow rapidly and synthesize large amounts of storage neutral lipids in a form of triacylglycerol (TAG) from sunlight and carbon dioxide. They can be cultivated on non-arable land with non-potable water and waste streams (e.g., flue gases and wastewaters) and thus pose little competition to food crops while providing environmental benefits [1]. However, understanding of the divergence and evolution of oleaginous traits and the underlying evolutionary forces and molecular mechanisms in microalgae remains elusive [2]. n class="Species">Nannochloropsis is a genus of unicellular photosynthetic microalgae in the class Eustigmatophyceae, ranging in size from 2–5 µm and widely distributed in marine, fresh and brackish waters. They are of interest as a potential feedstock for fuels and high-value products because they tolerate broad enivronmental and culture conditions while growing rapidly and producing large amounts of TAG and eicosapentaenoic acid, a high-value polyunsaturated fatty acid [3]. A homologous recombination–based gene transformation system was recently established in Nannochloropsis [4], making trait improvement in this organism possible for overproduction of biomass or desirable products. Here we present a comparative analysis of six genomes of oleaginous n class="Species">Nannochloropsis spp. that includes two N. oceanica strains (IMET1 and CCMP531) and one strain from each of four other recognized species: N. salina (CCMP537), N. gaditana (CCMP526, which was previously reported [5]), N. oculata (CCMP525) and N. granulata (CCMP529) ( ; ; ; ). Moreover, for N. oceanica IMET1, the diversity of transcripts was mapped to support gene prediction by sequencing cDNA libraries using 454-based long reads. Furthermore, transcript dynamics were measured via a two-condition (control condition and nitrogen starved condition), three time-point temporal series of transcriptomes during TAG accumulation using Illumina-based short-reads (). Integration of phenotypic, genomic and transcriptomic data across a Nannochloropsis phylogeny provided new insights into the molecular mechanisms driving the diversity and evolution of these oleaginous microalgae.

Figure 1

Structural features of the six Nannochloropsis genomes.

(A) Whole-genome based phylogeny of Nannochloropsis. A maximum likelihood (consensus) tree was generated using the PhyML program (JTT model) with 1,000 replicates based on the 1,085 six-way single-copy orthologous gene sets identified from the six Nannochloropsis genomes (). Percentages of replicate trees in the bootstrap test are shown next to the branches. (B) Genome divergence in Nannochloropsis. For each pair of genomes that consists of IMET1 and another Nannochloropsis strain, the percentages of strain-specific proteins versus their discrepancies in the full-length 18S rDNA sequence were plotted. Ten S. cerevisiae strains, eight E. coli strains, nine Synechococcus strains and two each of Ostreococcus and Micromonas were also included. For prokaryotic organisms including E. coli and Synechococcus, the percentages of strain-specific proteins were plotted against the discrepancies in 16S rDNA sequences. (C) The number of genes from the Nannochloropsis genomes and the Nannochloropsis core, with successive inclusion of each additional strain. (D) Functional categorization of Nannochloropsis core proteins. GO Slim terms corresponding to each GO term are presented.

Structural features of the six Nannochloropsis genomes.

(A) Whole-genome based phylogeny of n class="Species">Nannochloropsis. A maximum likelihood (consensus) tree was generated using the PhyML program (JTT model) with 1,000 replicates based on the 1,085 six-way single-copy orthologous gene sets identified from the six Nannochloropsis genomes (). Percentages of replicate trees in the bootstrap test are shown next to the branches. (B) Genome divergence in Nannochloropsis. For each pair of genomes that consists of IMET1 and another Nannochloropsis strain, the percentages of strain-specific proteins versus their discrepancies in the full-length 18S rDNA sequence were plotted. Ten S. cerevisiae strains, eight E. coli strains, nine Synechococcus strains and two each of Ostreococcus and Micromonas were also included. For prokaryotic organisms including E. coli and Synechococcus, the percentages of strain-specific proteins were plotted against the discrepancies in 16S rDNA sequences. (C) The number of genes from the Nannochloropsis genomes and the Nannochloropsis core, with successive inclusion of each additional strain. (D) Functional categorization of Nannochloropsis core proteins. GO Slim terms corresponding to each GO term are presented.

Results

(I) General features of Nannochloropsis genomes

The genome sizes of the six oleaginous n class="Species">Nannochloropsis species and strains range from 25.38 to 32.07 Mb ( ; ). For strain IMET1, the nuclear, chloroplast and mitochondria genomes are 31.36 Mb, 117.5 Kb and 38 Kb, respectively, totaling 31.5 Mb. Pulse-field gel electrophoresis on total IMET1 DNA confirmed the genome size and indicated the presence of 22 chromosomes (, S3B). For IMET1, 9,754, 126 and 35 protein-coding genes were predicted in the nuclear, chloroplast and mitochondrial genomes, respectively ( ). Among the nuclear genes, 93.4% (9,111) were covered by mRNA-Seq data (defined as >80% of the transcribed region mapped by at least 10 reads; ).

Table 1

Genomic features of the Nannochloropsis genomes.

Microalgae	N. oceanica IMET1	N. oceanica CCMP531	N. granulate CCMP529	N. oculata CCMP525	N. salina CCMP537	N. gaditana CCMP526*
Whole genome
Assembly size (Mb)	30.1	35.5	30.1	34.5	26.9	29.0
G+C content (%)	53.7	53.8	54.6	54.2	54.2	54.2
Predicted gene number	9,915	7,268	8,060	7,254	6,562	8,738
Genes having blast hits in NCBI NR database	6,853	3,584	4,014	3,557	3,789	5,467
Genes with functional annotation	5,121	2,528	2,798	3,321	1,378	3,704
Predicted tRNA number	80	79	85	82	85	85
Predicted LTR number	26	24	6	25	4	33
LTR length (Kbp)	24.3	9.4	2.3	25.6	3.4	41.8
Mitochondrial genome
Size (bases)	38,057	38,057	38,791	38,444	41, 907	38,981
G+C content (%)	31.9	31.9	32.0	31.8	31.4	31.4
Predicted gene number	35	35	35	35	36	35
Predicted tRNA number	28	28	28	28	28	26
Chloroplast genome
Size (bases)	117,548	117,634	117,672	117,463	114,883	114,937
G+C content (%)	33.6	33.6	33.4	33.4	33.1	33.0
Predicted gene number	126	126	126	126	123	122
Predicted tRNA number	34	34	34	34	33	28

, data cited from the genome paper of N. gaditana CCMP526 [5].

, data cited from the genome paper of pan class="Species">N. gaditana CCMP526 [5]. These n class="Species">Nannochloropsis genomes are all relatively compact (; [5], [6]), much smaller than that of the model green microalga Chlamydomonas reinhardtii (121 Mb; [7]). The IMET1 genome features a higher coding potential (52.1%) than the diatom Thalassiosira pseudonana (32.7%; [8]), which has a similar genome size. Mobile elements can be prevalent in algae [e.g. T. pseudonana harbors 238 long terminal repeats (LTRs) totaling 1.56 Mb], but they are rather limited in IMET1, as only 26 LTRs (24.3 Kb in total), along with several DNA transposons (864 bp in total), are present in the genome without transposases (). The relative paucity of mobile elements appears to be one shared feature of the six Nannochloropsis strains ( )

(II) Divergence of Nannochloropsis genomes

Genomic diversity and divergence defining microalgal genera, species or strains are largely unknown [9]. A whole-genome phylogeny of pan class="Species">Nannochloropsis ( ) was constructed from 1,085 single-copy-orthologous groups identified from the six genomes, which is consistent with the 18S-based phylogeny (). Among the five n>n class="Species">Nannochloropsis species, N. granulata and N. oculata have a recent common ancestor and are clustered with the two N. oceanica strains. Among the 1,085 single-copy orthologous groups, 628 (61.7%) exhibited congruent phylogenies with the whole-genome phylogeny. The mean Ka/Ks of 0.08 calculated from these candidate phylogenetic markers in the nuclear genomes was higher than in the chloroplast genomes (0.031) and in the mitochondrial genomes (0.064). Among these candidate markers, 25 genes exhibited sequence variations large enough to differentiate each of the species and strains (density of inter-species SNP at 20–40% and intra-species over 1%), but allowed for the design of consensus flanking PCR primers (). Those with the highest resolution included cytochrome P450, btaA, plastid ribosomal protein S1 and transaldolase etc., which represent novel phylogenetic markers that are more sensitive than 18S or ITS sequences (0.16% and 0.52% in intra-species SNP density, respectively) in strain-typing of Nannochloropsis. Between any two genomes among the six n class="Species">Nannochloropsis strains, 35% of protein-coding genes (ranging from 2.6% between the two N. oceanica strains to 66.4% between IMET1 and N. salina CCMP537) were not found in the other genome on average, despite >98% similarity in full-length 18S rDNA. This places their inter-species genome divergence higher than the green algae studied and their intra-species divergence comparable to E. coli and yeast ( ). Therefore, the Nannochloropsis pan-genome, as defined by the six strains, consists of at least 38,000 protein-coding genes, along with a relatively small pool of Nannochloropsis core genes (e.g., 2,734 genes in IMET1) that are shared by the six strains ( , ). Most (93.2%) of these core genes have blast hits in NCBI non-redundant (NR) database, of which 94% were functionally annotated. The core genes mostly encode proteins involved in DNA, RNA, and protein synthesis and modification, transporters, signal transduction and central metabolic pathways ( ; for functional classification based on molecular function and cellular component, see ). The accessory genes, referring to those missing in at least one strain, mainly encode (i) central metabolism such as carbohydrate, lipid, energy, and nucleotide and amino acid metabolism (which are overlapped with the core genes), (ii) secondary metabolism and N-glycan biosynthesis (which are complementary to the core genes, and (iii) unknown functions (). There were 164–1,513 genes that were strain-specific among the six genomes. In contrast to the 2,734 pan class="Species">Nannochloropsis core genes, of which 96.7% were supported by our mRNA-Seq reads, 11.0% (18) of the 164 n>n class="Species">IMET1-specific proteins lacked such supports, suggesting the possible presence of pseudogenes or false positives in gene prediction. Among the IMET1-specific genes with mRNA support, 94.5% were putative novel genes without any known homologs (Blast hits) in the NCBI NR database. It is possible that some of them might have horizontally transferred from unsequenced species. Among strain-specific genes with functional annotations, most were involved in responses to freezing in N. oculata CCMP525, N. granulata CCMP529 and N. oceanica CCMP531. In N. gaditana CCMP526, transporters were prevalent, while in N. salina CCMP537, no significant enrichment was found in any processes (). Correlation analysis revealed that the core and accessory genes exhibited different sequence and transcriptional features under the experimental conditions tested (). The accessory genes tend to be under lower purifying pressure while lower transcriptional levels (; ), supporting a link between sequence evolution and transcriptional activity [10], [11], [12]. To probe the link between the accessory genes and divergence of the genomes, protein-coding genes in the six n class="Species">Nannochloropsis were classified into different groups based on the number of strains in which they were present (thus those present in all the six strains were part of the Nannochloropsis core). The most prominent group included the genes shared by four of the strains, in which the majority (97.3%) were found in the phylogenetically closely related species, i.e., N. oceanica (two strains), N. granulata and N. oculata. The absence of these genes in the other two species explained the small number of Nannochloropsis core genes (). These genes might have been present in the common ancestors of heterokonts and later lost in N. salina and N. gaditana, as >60% of them were found in other heterokonts (e.g., diatoms, Ectocarpus and other non-photosynthetic heterokonts such as Phytophthora). The functions supported by these genes were similar to those of core genes, with oxidation-reduction, transmembrane transport and protein-related metabolism being dominant. This does not support the presence of functional bias in the gene loss events ().

(III) Features of paralogous protein groups in the Nannochloropsis genomes

To seek the cause of the structural divergence among the n class="Species">Nannochloropsis genomes, we clustered all encoded proteins based on their sequence similarity. Among sequenced plant and algal genomes, large paralogous groups are common, e.g., 217 F-box family protein genes in Arabidopsis [13] and 51 Class III guanylyl and adenylyl cyclase genes in Chlamydomonas [7]). However, Nannochloropsis spp. appear to have adopted a strategy in which paralogous groups are less biased in size, i.e., they formulate a large number of relatively small paralogous groups ( , ). There are 4,263, 4,325, and 7,171 paralogous groups in Thalassiosira, Chlamydomonas, and N. oceanica IMET1, respectively. The top 15 largest paralogous groups in each Nannochloropsis genome range in size from two to seven genes, with a median value of three to four ( ). For example, the largest paralogous group in IMET1 consists of 11 genes (mainly in metabolic process), which is in a sharp contrast with T. pseudonana (46 genes; protein modification process; [8]), Cyanidioschyzon merlae (23 genes; DNA metabolic process; [14]) and C. reinhardtii (150 genes; protein modification process; [7]). As genes from different origins might exhibit relatively low sequence conservation and thus fail to formulate a paralogous group, the reduced sizes of paralogous groups in the Nannochloropsis genomes might result from the integration of multiple genome resources, which is consistent with the proposal that heterokonts originated from multiple secondary endosymbiosis [15]. This observation also suggests that strain-specific gene sequence duplication was relatively rare in Nannochloropsis. On the other hand, it is also possible that Nannochloropsis spp. have adapted to their environment via a strategy of frugality in proteome structure, with paralogous protein-coding genes either emerging less frequently or many of them being lost.

Figure 2

Structure of paralogous groups in each Nannochloropsis strain.

Each circle represents a paralogous group. The area of the circle is proportional to the size of the paralogous group. The top 15 largest paralogous groups in each of the six Nannochloropsis genomes and the other three model microalgae (including T. pseudonana, C. merolae and C. reinhardtii) are shown. The largest paralogous groups in the six Nannochloropsis genomes and the other three model microalgae, including T. pseudonana, C. merolae and C. reinhardtii. The color of the circle represents the functions (as defined by the associated GO Slim terms in biological process) encoded by the paralogous group. The paralogous groups in each Nannochloropsis strain are relatively small in size.

Structure of paralogous groups in each Nannochloropsis strain.

Each circle represents a paralogous group. The area of the circle is proportional to the size of the paralogous group. The top 15 largest paralogous groups in each of the six pan class="Species">Nannochloropsis genomes and the other three model micron>n class="Species">algae (including T. pseudonana, C. merolae and C. reinhardtii) are shown. The largest paralogous groups in the six Nannochloropsis genomes and the other three model microalgae, including T. pseudonana, C. merolae and C. reinhardtii. The color of the circle represents the functions (as defined by the associated GO Slim terms in biological process) encoded by the paralogous group. The paralogous groups in each Nannochloropsis strain are relatively small in size. Among the 8,992 homologous groups from the six pan class="Species">Nannochloropsis genomes (by OrthoMCL [16]; based on amino acid sequence similarity), 1,731 included the genes from all six strains. However, 2,312, 1,515, 1,551 and 1,653 groups included genes from two, three, four and five of the strains, respectively, and thus were “mosaic” groups as they included genes from only a subset, but not all, of the strains. Furthermore, 230 groups were specific to one of the six strains, with 4 to 151 such groups in each strain. The large number of mosaic paralogous groups (7,031 or 78.2% in total) could explain the large size of the n>n class="Species">Nannochloropsis pan-genome, although the numbers of genes and gene groups could be over- or under-estimated due to the presence of alternative splice forms or artifacts of genome assembly.

(IV) Gene dose expansion in each Nannochloropsis genome at selected steps of lipid synthesis pathways

Despite their high structural diversity, each of the six n class="Species">Nannochloropsis genomes exhibits functional features that underlie their oleaginous phenotypes. There is significantly higher gene enrichment for cellular lipid metabolism in each genome than in C. reinhardtii ( , ). In all or most of the Nannochloropsis strains, the subcategories of lipid metabolism are enriched, including glycerolipid metabolism, phospholipid metabolism, lipopolysaccharide metabolism and lipid modification. Metabolic pathways enriched in Nannochloropsis also include organic acid metabolism, precursor generation and sulfur compound metabolism. Genes related to stress response, including responses to DNA damage stimulus, DNA repair and cold stress response, were also enriched in several Nannochloropsis strains. However, the number of genes involved in phosphorus metabolism and cellular macromolecule metabolism was significantly lower in each Nannochloropsis strain than in C. reinhardtii ( ). Thus, the enrichment of gene doses in lipid metabolism pathways and stress response-related pathways appears to be a shared feature of Nannochloropsis genomes and likely underlies their advantageous oleaginous and environmental tolerance traits.

Figure 3

Functional conservation and variation of the Nannochloropsis genomes.

Functional conservation and variation of the Nannochloropsis genomes.

For each genome, the numbers of genes assigned to each GO term and its subcategory terms are shown. The color scheme, defined by the scale bar on the top, represents the degree of relative enrichment or depletion for each functional category as compared to pan class="Species">C. reinhardtii. The p values of enrichment or depletion were calculated using a binomial test corrected by FDR for multiple compan>risons. n>n class="Species">IMET1, N. oceanica IMET1; 531, N. oceanica CCMP531; 529, N. granulata CCMP529; 525, N. oculata CCMP525; 526, N. gaditana CCMP526; 537, N. salina CCMP537; cre, C. reinhardtii. In the n class="Chemical">lipid biosynthesis pathway (the de novo biosynthesis of fatty acids and TAG), a prominent expansion in gene copy number in particular reaction nodes was observed as a shared feature among the six Nannochloropsis strains, despite a genome size only one-fourth of C. reinhardtii. Such enriched genes include those encoding ketoacyl-ACP synthase (KAS, four to five in each Nannochloropsis strain vs. three in C. reinhardtii), acyl-ACP thioesterase (acyl-ACP TE, five vs. one), long-chain fatty acyl-CoA synthetase (LC-FACS, 11–12 vs. seven), phosphatidic acid phosphatase (PAP, five vs. one), and the last two acyltransferases: lysophosphatidyl acyltransferase (LPAT, seven to eight vs. one) and diacylglycerol acyltransferase (DGAT) ( ). Multiple copies of KAS proteins were found in each Nannochloropsis strain for the assembly of type II fatty acid synthases. In addition, six bacterial type I fatty acid synthase genes, each with several conserved functional domains, were identified (compared to only one in C. reinhardtii); phylogenetic analysis revealed that these genes are closely related to polyketide synthases (), yet they might be involved in fatty acid synthesis [17].

Figure 4

Enrichment of lipid biosynthesis genes in each of the six Nannochloropsis strains.

(A) The gene dose expansion in N. oceanica IMET1 as compared to C. reinhardtii. In the schema, enzymes in each reaction node in the Nannochloropsis and C. reinhardtii lipid biosynthesis pathways are represented as yellow and blue circles, respectively. Length of radius stands for gene dose. Putative HGT genes in each node in IMET1 are shown in purple. Chlamydomonas genes were not investigated for HGT events here. (B) The expansion in gene dose was conserved among the Nannochloropsis genomes. Each colored cell in the heatmap represents the gene copy numbers in each of the Nannochloropsis strains and in C. reinhardtii. The scale of the color bar ranges from 1 (the lowest copy number among the genomes) to 13 (the highest copy number).

Enrichment of lipid biosynthesis genes in each of the six Nannochloropsis strains.

(A) The gene dose expansion in pan class="Species">N. oceanica IMET1 as compared to n>n class="Species">C. reinhardtii. In the schema, enzymes in each reaction node in the Nannochloropsis and C. reinhardtii lipid biosynthesis pathways are represented as yellow and blue circles, respectively. Length of radius stands for gene dose. Putative HGT genes in each node in IMET1 are shown in purple. Chlamydomonas genes were not investigated for HGT events here. (B) The expansion in gene dose was conserved among the Nannochloropsis genomes. Each colored cell in the heatmap represents the gene copy numbers in each of the Nannochloropsis strains and in C. reinhardtii. The scale of the color bar ranges from 1 (the lowest copy number among the genomes) to 13 (the highest copy number). Notably, such expansion in gene dose was not ubiquitous along the TAG pathway. For many of the nodes, the gene doses are comparable to those in pan class="Species">C. reinhardtii ( ). These nodes include the n>n class="Chemical">acetyl-CoA carboxylase (ACCase), MCAT, KAR, HAD in fatty acid biosynthesis, GPAT in TAG assembly, and other membrane lipid biosynthesis-related enzymes (such as the MGD and DGD in galactolipid synthesis, SQD in sulfolipid synthesis, BtaA and BtaB in betaine lipid synthesis and EPT in phosphatidylethanolamine synthesis). The expansion of gene dose for the selective steps highlights their crucial roles in channeling carbon flux into TAG synthesis and might be considered a “genomic signature” of oleaginousness.

(V) Evolutionary origin of gene dose expansion in lipid biosynthesis pathway genes

To probe the evolutionary forces expanding the TAG biosynthesis gene repertoire in pan class="Species">Nannochloropsis, we carried out a phylogenomic analysis to investigate the horizontal gene transfer (HGT) events in n>n class="Species">N. oceanica IMET1 genome (; [18]). We identified 99 HGT candidates (; ), accounting for 1.0% of nuclear genes. Among them, the most abundant functions encoded (in terms of GO Slim terms in biological process) included biosynthetic process, small molecule metabolism, cellular nitrogen compound metabolism and lipid metabolism (). HGT appeared to have played an important role in the evolution of oleaginousness loci in these organisms. Totally nine HGT candidates (15.3% of total pan class="Chemical">lipid biosynthesis genes, much higher than average percentage of HGT presence in nuclear genome) were inferred in most of the nodes with increased gene doses, such as KAS, enoyl-ACP reductase (ENR), acyl-ACP TE, LC-FACS and n>n class="Chemical">PAP ( , ). PAP catalyzes the Mg2+-dependent dephosphorylation of phosphatidic acid (PA) to yield diacylglycerol (DAG) and Pi. Both PA (via CDP-DAG) and DAG can enter phospholipid synthesis, and DAG is the direct precursor of TAG. Thus, PAP may control the direction of carbon flux and affect overall cellular lipid synthesis [19]. Five genes encoding PAP enzymes were found in each Nannochloropsis strain: three were conserved in eukaryotes, while the other two were clustered with the bacteria, indicating a bacterial HGT origin (either one HGT followed by gene duplication or multiple horizontal transfers; ). The two horizontally transferred PAP genes exhibited higher transcriptional levels than the eukaryotic ones. The presence of multiple prokaryotic PAP genes suggests complex mechanisms to regulate the substrate preference for the synthesis of various classes and species of lipids. Among the ENR genes in each Nannochloropsis strain, two likely originated by HGT from bacteria into the common ancestor of the six Nannochloropsis strains (, ; suggested by the absence of other heterokonts in the bacterial ENR clade), which were then inherited by each of the Nannochloropsis strains. The most prominent example of gene dose expansion is pan class="Gene">DGAT, which catalyzes the last step of TAG synthesis from n>n class="Chemical">DAG and acyl-CoA [20] and includes DGAT-1 and DGAT-2 [21]. There are 12–13 DGAT in each Nannochloropsis strain (one to two DGAT-1 and 11 DGAT-2), representing the highest dose among known genomes ( ). In contrast, only six and four DGAT are present in C. reinhardtii and the diatom T. thalassiosira, respectively, and even fewer in some other green algae and heterokonts ( ). In IMET1, all the DGAT-1 and DGAT-2 were transcriptionally active [FPKM (Fragments Per Kilobase of exon per Million mapped reads) >1.0]. Phylogenetic analysis of DGAT from selected bacteria, fungi, algae and higher plants revealed extraordinary evolutionary diversities of all 74 DGAT in the six Nannochloropsis strains (). Several observations were apparent. (i) The partition of DGAT-1 and DGAT-2 might have occurred early, likely before the primary endosymbiosis event or even earlier. (ii) The copy number of DGAT-1 was lower (1–2) and less variable than that of DGAT-2, which is consistent in a wide range of organisms from bacteria to land plants. (iii) A similar degree of DGAT-2 dose expansion was observed in all six Nannochloropsis strains ( ). Moreover, for each of the 11 DGAT-2 identified in each strain, the orthologs in the other five strains were all identified and clustered into a phylogenetic group (); the sequence identity between orthologous gene pairs was >98% between the two N. oceanica strains, >80% among N. oceanica, N. oculata and N. granulate, and >65% between N. oceanica IMET1 and the outmost N. gaditana. These results suggested the stable inheritance of DGAT-2 genes in Nannochloropsis evolution. In contrast, DGAT-1 might have experienced species-specific gene loss. For example, no counterparts of DGAT-1B in IMET1 were found in N. salina and N. gaditana despite a high degree of conservation of this gene in the other four strains. (iv) The 11 DGAT-2 genes in IMET1 exhibited relatively low intra-genome pairwise identity (averaging 18%), and each was grouped into a separate paralogous group with its orthologs from the other five Nannochloropsis strains, indicative of distinct and divergent phylogenetic origins of DGAT-2 in Nannochloropsis. Two of the DGAT-2 genes in IMET1 (DGAT-2F and DGAT-2D) exhibited a relatively high protein sequence similarity (identity at 51%), suggesting that the two genes might be derived from a gene duplication event in the Nannochloropsis lineage (). However this individual case of suspected gene duplication cannot account for the expanded dose of DGAT-2 genes in IMET1.

Figure 5

Divergent phylogenetic origins of Nannochloropsis DGAT-2 genes.

(A) Gene doses of DGAT-1 and DGAT-2 in Nannochloropsis and several model organisms. The numbers of DGAT-1 and DGAT-2 genes in each genome are indicated by different colors. N. oceanica IMET1 is marked by red star. (B) Schema illustrating the divergent phylogenetic origins of Nannochloropsis DGAT-2 from the red/green algae-related endosymbionts and potential secondary host. The protein names listed in each individual schematic cell indicate the DGAT-2 genes that were likely to be encoded in the genome. Abbreviations: N, nucleus; C, chloroplast; M, mitochondria.

Divergent phylogenetic origins of Nannochloropsis DGAT-2 genes.

(A) Gene doses of n class="Gene">DGAT-1 and DGAT-2 in Nannochloropsis and several model organisms. The numbers of DGAT-1 and DGAT-2 genes in each genome are indicated by different colors. N. oceanica IMET1 is marked by red star. (B) Schema illustrating the divergent phylogenetic origins of Nannochloropsis DGAT-2 from the red/green algae-related endosymbionts and potential secondary host. The protein names listed in each individual schematic cell indicate the DGAT-2 genes that were likely to be encoded in the genome. Abbreviations: N, nucleus; C, chloroplast; M, mitochondria. To infer the origin of these genes, a comprehensive phylogenetic analysis was carried out among all species with genomes and ESTs available in several public databases (; [22]). pan class="Gene">DGAT-2C showed a phylogeny with strong affiliation with the red n>n class="Species">algae C. merolae and formed a sister group with those from other chromalveolates (). The most plausible explanation for such strong links between Nannochloropsis and red algal DGAT is a red-algae derivation of DGAT-2C through endosymbiotic gene transfer (EGT) in the secondary endosymbiosis event (which was proposed as the evolutionary mechanism through which the common ancestor of chromalveolates acquired chloroplasts from a red algae-related endosymbiont [23]). On the other hand, four DGAT-2 genes, including DGAT-2A, DGAT-2B, DGAT-2I and DGAT-2G, are clustered with their counterparts from green algae as well as other chromalveolates (), suggesting a green algal origin of these DGAT-2 genes. This is consistent with the hypothesis of a green algae related endosymbiont residing in the common ancestor of chromalveolates [15]. Three (DGAT-2A, DGAT-2I and DGAT-2C) of the above five red-lineage (red algae derived) and green-lineage (green algae derived) DGAT-2 genes were predicted to harbor chloroplast targeting signals, supporting their ancestral derivation from the endosymbionts. The higher dose of green- than red-lineage DGAT-2 in each of the Nannochloropsis strains suggests a more significant contribution of the green lineage to the oleaginous traits of modern Nannochloropsis. Furthermore, phylogenetic trees of the other six DGAT-2 genes did not exhibit unambiguous relationships with those from red or green algae and are thus referred to as “unresolved.” It is possible that several of these genes originated from the secondary host [24], as four (DGAT-2D, DGAT-2E, DGAT-2F and DGAT-2H) of the six genes were predicted to be located in the endoplasmic reticulum (ER) or cytosol. Thus, the observed sequence divergence of the 74 DGAT genes (eight type I and 66 type II) in the six Nannochloropsis genomes mainly resulted from their diverse origins from the red- or green-algae–related endosymbionts (through EGT) and the secondary host ( ). Phylogenetic evidence also supports a green endosymbiont origin for one gene encoding MCAT (s00247.g6828) in pan class="Chemical">fatty acid biosynthesis (). Thus, the diverse evolutionary origin of the n>n class="Species">Nannochloropsis DGAT-2s and one of the other lipid synthesis genes has underlain their massive genetic pools and likely contributed to the extraordinary capacity for TAG synthesis in present-day strains. In addition to their diverse origins, differentiation in selective pressure appeared to underlie the sequence divergence of pan class="Gene">DGAT and other members of n>n class="Chemical">lipid-related gene families. DGAT genes in Nannochloropsis were generally under strong purifying selective pressure (Ka/Ks typically under 0.1). However, higher Ka/Ks ratios were observed in the red-lineage DGAT-2C of 0.11 (). No significant difference in the ratio was found between the green-lineage and secondary-host–originated DGAT-2. Furthermore, DGAT-2C with the highest Ka/Ks ratio was among the DGAT-2 genes with the lowest transcriptional level under normal growing conditions, while DGAT-2J with the lowest ratio was one of the most transcribed DGAT-2 (second only to DGAT-2A). These findings add further support to the negative correlation between transcriptional level and selective pressure in the evolution of Nannochloropsis genes.

(VI) Enrichment of glycoside hydrolase genes and their evolutionary origin

Dramatic enrichment of pan class="Chemical">glycoside hydrolase (GH) genes accompanied by a reduced pool of n>n class="Chemical">glycoside synthase genes (as compared to C. reinhardtii) was also observed in each of the Nannochloropsis genomes. C. reinhardtii harbors seven starch synthase genes for starch production and two 1,3-β-glucan synthase genes; however, each Nannochloropsis encodes just one 1,3-β-glucan synthase gene (which might convert glucose into the polysaccharide chrysolaminarin or laminarin), and no starch synthase genes were found. Conversely, 48–49 GH genes were found in each strain (Dataset S4), with a gene dose per Mb of genome 6–7 fold higher than that of C. reinhardtii (27 GH genes). These Nannochloropsis genes were from 13 different GH families, dominated by GH2 (13 members), GH9, GH3 and GH1 families with over four members. Surprisingly, there were only three genes in the GH16 family, which specifically hydrolyzes the glycosidic bond of 1,3-β-glucan, while GH16 was the dominant group in C. reinhardtii, with five members. In IMET1, 91.7% of the 48 GH genes were transcriptionally active at each of the time points under both N-replete and N-depleted culture conditions (FPKM>1.0; ). Among them, 16 exhibited significant variations at the transcriptional level under N-depleted conditions (10 with increased transcription), including two members of the GH2_C family and one GH17 gene with a significant increase in transcription (fold-change >1.5) from 3 h and 6 h after the onset of N-depletion and one GH9 gene down-regulated under the same conditions. The monosaccharides released from GH that catalyzed hydrolysis of polysaccharides may be used in glycolysis to produce acetyl-CoA and ATP for fatty acid synthesis. Among the 48 GH genes in each strain that were conserved among the six genomes, 16 were inherited from the common ancestor of pan class="Species">heterokonts, as their homologs were found in the diatoms Phytophthora and Ectocarpus. Another five GH genes were likely to have originated from bacteria via HGT. Among these, three GH8 genes inferred to be horizontally acquired from cellulose-digesting Clostridium-like bacterium were absent in other sequenced unicellular n>n class="Species">algae. The remaining 27 GH genes were phylogenetically closest to homologs in animals, insects or multicellular fungi, such as the nine putative cellulase genes that were most similar to those in the nematode Pristionchus, indicating HGT events with donors being Nannochloropsis-like organisms [25]. In addition, N. granulata and N. salina each possessed one strain-specific GH gene that might have been introduced after their speciation.

Discussion

Micropan class="Species">algae, which are primarily unicellular, aquatic and photosynthetic eukaryotes, are phylogenetically diverse. They are responsible for over 45% of our planet's annual net primary biomass [26]. The Nannochloropsis genomes studied here, one of the first such datasets for microalgae, reveal the nature and degree of genome divergence and dynamics at the strain, species and genus level. They could serve as an initial framework for genome-wide association studies, while the genome-derived nuclear gene markers should be useful for highly sensitive typing of strains. The genomes of the six oleaginous pan class="Species">Nannochloropsis strains presented here are of relatively small size and high coding potential and many fewer mobile elements compared to many previously sequenced micron>n class="Species">algae [9]. The large size of the Nannochloropsis pan-genome can be partially traced to the large number of mosaic paralogous groups, which further suggests a significant degree of species-specific gene loss during Nannochloropsis evolution. On the other hand, the small core genome size and the large number of mosaic homologous gene clusters among the Nannochloropsis spp. suggest that, as one moves down the tree of life for stramenopiles, the number of shared genes reduces quickly and is replaced by lineage-specific gene gains and losses. The core genes generally exhibit lower Ka/Ks ratio but higher transcriptional levels than non-core genes, suggesting their roles in shaping the evolution of microalgal genes. Our findings, together with observations in yeasts [10], [27], revealed a link that is conserved in unicellular eukaryotes in terms of gene function, selective pressure, transcriptional level and gene essentiality. Despite the high sequence diversity of protein-coding genes, the six n class="Species">Nannochloropsis genomes shared a genus-level oleaginousness signature that included enrichment of selective lipid biosynthesis genes and certain glycoside hydrolases that potentially shift carbon flux from storage carbohydrate to TAG synthesis. It is quite remarkable that these gene expansions have occurred despite their significant genome shrinkage relative to other microalgae such as C. reinhardtii. Different mechanisms have underlain the emergence of the signature. Multiple-genome pooling was particularly evident for the 11 DGAT-2 in each strain, which were all transcriptionally expressed during TAG synthesis and apparently originated from at least three ancient genomes: (i) the engulfed green algae, (ii) the engulfed red algae and (iii) the host cell in the secondary endosymbiosis. Chromalveolates include both photosynthetic (e.g. diatoms and Eustigmatophyceae that include n class="Species">Nannochloropsis) and non-photosynthetic taxa (e.g., Phytophthora). The chromalveolate hypothesis suggests that the common ancestor of Chromalveolates originated via an eukaryotic host (i.e., the secondary host) engulfing a red alga (as the secondary plastid) in an ancient secondary endosymbiosis event [23]. The presence of a large number of “green genes” in the diatom nuclear genomes has been interpreted as evidence of a cryptic prasinophyte-like secondary endosymbiosis before the red algae intake [15]. Though confounded by potential sampling bias against red algae and artifacts in phylogenetic analysis [28], this hypothesis was supported by the 172 membrane transporter genes showing potential origins from green or red algae in a relatively strict phylogenomic analysis [22]. Moreover, genomes of the cryptophyte alga Guillardia theta and the chlorarachniophyte alga Bigelowiella natans also contain hundreds of genes with a phylogenetic affiliation to red or green algae [24]. Our search of DGAT-2 in publicly available red algae genomes (and ESTs) returned one DGAT-2 each from Cyanidioschyzon merolae, Galdieria sulphuraria [18] and Porphyridium purpureum [29]. The paucity of DGAT-2 in red algal genomes and the distinct features of these genes in the six Nannochloropsis genomes (the greatly expanded copy number, large pair-wise sequence divergence, rare gene duplication events, and absence of mobile elements or evidence for HGT in each of the DGAT-2 loci) suggested multiple-genome pooling as the cause for the massive DGAT pool in Nannochloropsis spp. These findings also provided additional support for the existence of a green algae–associated secondary endosymbiosis in the evolutionary history of chromalveolates. Furthermore, among the six n class="Species">Nannochloropsis strains, the inheritance of each DGAT-2 was highly conserved in that no strain-specific duplications or losses were found for any DGAT-2 in each of the six strains, and the genes have been under strong negative selection. In contrast, diatoms such as T. pseudonana (believed to have also experienced the multiple secondary endosymbiosis [15]) encode many fewer DGAT-2; only four DGAT-2 were identified, and all were predicted to be from the green algae–related endosymbiont and the secondary host, with none from red lineage. The absence of Thalassiosira genes in certain gene-phylogeny clusters (e.g., the red-lineage DGAT-2C) in the diatom, in contrast to the presence of these genes in Nannochloropsis and many other heterokonts, suggests the loss of DGAT-2 in diatoms. Thus, such strict inheritance and stable maintenance of the large reservoir of DGAT-2 from multiple lineages seem to be Nannochloropsis-specific. It also suggests the essentiality of each DGAT-2 and its possible functional complementarity in the cell. In addition, HGT primarily from bacteria were found in the majority of the gene dose-expanded pan class="Chemical">lipid synthesis nodes and in many n>n class="Chemical">glycoside hydrolases. In the red alga Galdieria sulphuraria, 5% of protein-coding genes were acquired from bacteria and archaea via HGT, which forged its adaptation to a thermophilic and metal-rich environment [18]. The HGT events in Nannochloropsis likely reflected an organismal adaptation to a niche that favored oleaginousness and glycoside hydrolysis. Therefore, the multiple-genome pooling and horizontal genetic exchange from bacteria, together with the selective inheritance of pan class="Chemical">lipid synthesis genes and species-specific gene loss, might have underlain the enormous genetic appan>ratus for oleaginousness and led to the structural divergence and functional conservation observed among present-day n>n class="Species">Nannochloropsis. In many organisms, other mechanisms such as gene and genome duplications may play an important role in supplying new genetic materials for organismal adaptation [30] and have been frequently proposed as drivers of the emergence of particular traits in bacteria [31], [32], fungi [33], [34], plants [35] and animals [36]. Thus, the extraordinary origin and evolution of oleaginous traits in Nannochloropsis have important implications in the selection and genetic engineering of such traits in these and other microalgae of economic interest.

Methods

Data files

All genomic data for this study, including the assembled genomes and mRNA-Seq data, were deposited at NCBI. The BioProject accessions for assembled genomes were: PRJNA202418 for pan class="Species">N. oceanica IMET1, PRJNA65107 for n>n class="Species">N. oculata CCMP525, PRJNA65111 for N. granulata CCMP529, PRJNA65113 for N. oceanica CCMP531 and PRJNA62503 for N. salina CCMP537. The mRNA-Seq data were deposited at SRA under SRP032930.

Sequencing of the Nannochloropsis genomes and the transcriptomes of N. oceanica IMET1

Five new n class="Species">Nannochloropsis genomes were sequenced in this work ( ; ). For Nannochloropsis oceanica IMET1, both shotgun sequencing data and paired-end data with different pair distances from 454 Titanium and Illumina GAIIx were collected. Newbler (Roche) was used for initial assembly. Gap-filling and scaffold-building were performed with Illumina data, followed by manual manipulation and sorting of contigs. Genes were predicted by combining the ab initio predictions with predictions based on mRNA-Seq read alignments (387K aligned cDNA reads from a Roche 454 Sequencer) by AUGUSTUS (v2.5). For each of the other four Nannochloropsis strains (Table S1), paired GAIIx reads were assembled using Velvet with specified insert sizes. The previously published genome sequence of N. gaditana CCMP526 [5] was downloaded from http://Nannochloropsis.genomeprojectsolutions-databases.com/. Gene models of each of the six genomes were predicted using two different ab initio gene predictors (AUGUSTUS and GeneID) followed by a combination of gene models using EVidenceModeler (EVM) with a 1∶1 weight ratio. For all strains, predicted protein-coding genes were annotated via searching against three databases: the NCBI NR and KEGG databases by BlastP, and the Gene Ontology database by InterProScan. GO terms were mapped to the GO slim hierarchy proposed by the GO consortium by a customized script (http://www.bioenergychina.org/fg/d.wang_scripts/). For collecting the transcriptomics datasets underlying TAG production, pan class="Species">N. oceanica IMET1 was cultivated in f/2 liquid medium [37] with 4 mM n>n class="Chemical">NO3 − under continuous light at 50 µmol photons m−2 s−1. Mid-logarithmic phase algal cells were inoculated in nitrogen-replete and nitrogen-depleted conditions, respectively. Total RNA were collected at 3, 6 and 24 h after each inoculation and pooled together for full-length cDNA sequencing in 454 Titanium. The data produced were subsequently used for gene prediction. Furthermore, total RNA from each of the aforementioned control (nitrogen-replete) and nitrogen-starvation conditions along the time points of 3, 6 and 24 h after the onset of nitrogen depletion (six samples under each condition) were loaded for mRNA-Seq in Illumina GAIIx.

Phylogeny analysis

n class="Species">Nannochloropsis core genes were identified as the intersections of the five “IMET1 pairwise cores,” which were obtained by searching IMET1 proteins via BlastP and tBlastN against the proteome and the genome, respectively, of each of the other five strains with an e-value cutoff of 1e-5 and a protein sequence identity cutoff of 80%. Paralogous groups among these six strains were identified by a Markov Clustering algorithm (OrthoMCL [16], v. 4) with an inflation index of 1.5. PAML (v. 4.4c) codon substitution models and likelihood ratio tests (codeml) were used to estimate the selective pressure. An identical method was applied in the establishment of paralogous groups among other model microalgae. HGT candidates were inferred following the method in the genomic analysis of n class="Species">Galdieria sulphuraria (; [18]). Phylogenetic trees for each of the putative HGT genes in NEWICK format were deposited in . The phylogenetic tree for each HGT candidate was manually checked and only accepted when a clear pattern of HGT was observed in both Neighbor Joining (NJ) and Maximum Likelihood (ML) trees. To deduce the evolutionary origins of lipid biosynthesis-related genes, we first implemented the strategy described in Chan et al. [22] to build a comprehensive database and to construct the homologous groups for each lipid synthesis gene, except that we collected more recently published genomes and EST datasets updated in public databases, including genomes of the red algae G. sulphuraria [18], Chondrus crispus [38] and Porphyridium purpureum [29]. In the following phylogenetic analysis, phylogenies for the homologous group of each lipid synthesis gene were constructed in MEGA5 by both NJ and ML methods. A gene was inferred to be potentially derived from a green or red algae related secondary endosymbiont when the phylogeny was supported by both NJ and ML trees. For a comprehensive and detailed description of the methods, please refer to . Candidate nuclear-genome phylogenetic markers in pan class="Species">Nannochloropsis. (XLS) Click here for additional data file. Genes from Figure 3 annotated with Gene Ontology terms in biological processes. Genes from each pan class="Species">Nannochloropsis strain and n>n class="Species">C. reinhardtii are listed respectively in an accompanying Excel worksheet. (XLS) Click here for additional data file. Putative horizontally transferred genes in the pan class="Species">N. oceanica IMET1 genome. Each line represents a HGT candidate. Both NJ and ML trees in NEWICK format for each HGT candidate are included in the dataset. Species names are shown at the end of each branch. Characters in parentheses represent the classification of each species: B, bacteria; B_cyano, cyanobacteria; E, eukaryotes; E_M, metazoan; E_P, plant; E_p_a, eukaryotic n>n class="Species">algae; E_F, fungi. HGT candidates related to the lipid biosynthesis pathway (as noted in and ) are highlighted. (XLS) Click here for additional data file. pan class="Chemical">Glycoside hydrolase genes encoded in the pan class="Species">N. oceanica IMET1 genome. (XLS) Click here for additional data file. Total pan class="Chemical">lipid content of the six Nannochloropsis strains under normal growth conditions. Information for N. gaditana CCMP526 was cited from Radakovits et al. [5]. (PDF) Click here for additional data file. Phylogenetic relationship of n class="Species">Nannochloropsis and other microalgal lineages. (A) Phylogeny based on 18S ribosomal DNA sequences. Higher plants were used as an outgroup. Nannochloropsis strains included in this phylogenomic analysis were underlined. (B) Phylogeny based on orthologs derived from whole-genome comparisons among Nannochloropsis oceanica IMET1, Thalassiosira pseudonana, Cyanidioschyzon merolae, Chlamydomonas reinhardtii and Arabidopsis thaliana. (PDF) Click here for additional data file. Pulsed-field gel electrophoresis (PFGE) analysis of pan class="Species">Nannochloropsis oceanica IMET1 chromosomal DNA. Asterisks (*) indicate those bands that likely represent two or more chromosomes. Three DNA size standards that include Saccharomyces cerevisiae (240–2,200 Kb, Marker A), Hansenula wingei (1–3.1 Mb, Marker B), Schizosaccharomyces pombe (3.5–5.7 Mb, Marker C) were used to estimate chromosome sizes. (A) PFGE profile of n>n class="Species">IMET1 chromosomes. The band profile as analyzed by ImageJ is shown on the right. (B) PFGE profile of chromosomes larger than 2 Mb. (PDF) Click here for additional data file. Functional categories of pan class="Species">Nannochloropsis core genes. (A) GO Slim categories in “molecular function”. (B) GO Slim categories in “cellular component”. (PDF) Click here for additional data file. Metabolic conservation and divergence among the six n class="Species">Nannochloropsis strains. Metabolic pathways were represented in IPATH scheme tools. Red edges indicate KEGG IDs identified in both Nannochloropsis core genes and accessory genes. Blue edges indicate those exclusively identified in core genes. Green edges indicate those exclusively identified in accessory genes. (PDF) Click here for additional data file. Functional categories of strain-specific genes in pan class="Species">N. oceanica IMET1. The GO categories (in biological process) of strain-specific genes in each strain were presented. (PDF) Click here for additional data file. Ka/Ks of core and accessory gene sets and the link to their transcriptional level in pan class="Species">N. oceanica. For each of the five-strain orthologous gene sets (excluding the outmost n>n class="Species">N. gaditana) from the Nannochloropsis core and accessory genes, PAML model M0 was used to estimate a single ω (Ka/Ks, ratio of non-synonymous to synonymous nucleotide divergence) as a measure of the selective pressure that the gene sets were under. (A) Different Ka/Ks of core and accessory proteins. (B) Selective pressure on the functional categories associated with Nannochloropsis core and accessory genes. GO slim terms in biological process with at least three genes associated are shown. Yellow box-plot indicates smallest observation (sample minimum), lower quartile (Q1), upper quartile (Q3), largest observation (sample maximum) and outlier(s) of ω values in a functional category. (C) Negative correlation between the Ka/Ks and the transcriptional level of a gene. (D) Different transcriptional levels of core and accessory genes. (PDF) Click here for additional data file. Selective pressure on protein-coding genes in pan class="Species">Nannochloropsis spp. For each of the 1,085 six-way single-copy orthologous gene sets, pan class="Chemical">PAML model M0 was used to estimate a single ω (Ka/Ks, ratio of non-synonymous to synonymous nucleotide divergence) that is fixed across the reconstructed whole-genome phylogeny. The associated GO slim terms that have at least five genes are shown for (A) Biological Process, (B) Molecular Function and (C) Cellular Component. The red dots represent the median ω in a functional category. The yellow box-plot shows the smallest observation (sample minimum), lower quartile (Q1), upper quartile (Q3), largest observation (sample maximum) and outlier(s) of ω values in a functional category. (PDF) Click here for additional data file. Conservation of n class="Species">IMET1 genes in the other five Nannochloropsis strains. (A) A histogram of the number of IMET1 genes conserved in a series of strain sets. The diagonal in the two-strain sets represents genes that were exclusively conserved in N. oceanica species; the diagonal in the four-strain sets represents genes that were exclusively conserved in N. oceanica (IMET1 and CCMP531), N. granulata and N. oculata. (B) Functional categories in the GO slim hierarchy of genes that were exclusively conserved in N. oceanica, N. granulata and N. oculata. Categories with percentages less than 1% are pooled and presented as “Others.” (PDF) Click here for additional data file. Phylogeny of type I n class="Chemical">fatty acid synthase. Multiple sequence alignments among six putative PKS proteins from IMET1 (indicated by a red square), as well as type I PKS proteins and type I fatty acid synthases (FAS) from bacteria, as indicated by the species name, were generated with ClustalW. The phylogenetic tree was constructed using the neighbor-joining method in MEGA5 with a bootstrap test (based on 100 replicates). (PDF) Click here for additional data file. Functional categories of horizontally transferred genes in pan class="Species">N. oceanica IMET1. (A) The GO Slim categories (in biological process) of the HGT genes in n>n class="Species">IMET1. (B) The GO Slim categories (in molecular function) of the HGT genes in IMET1. (PDF) Click here for additional data file. Phylogenies by the NJ method for n class="Chemical">lipid synthesis genes that were inferred to have originated from HGT. MUSCLE was used to perform multiple alignments for each of the lipid synthesis genes (indicated by a red star) and their orthologous genes generated by Inparanoid Program. Neighbor-joining trees were constructed in MEGA5 with the bootstrapping method (based on 100 replicates), using the passion correction model to calculate evolutionary distances. Branches of bacterial sequences (with the exception of cyanobacteria) are highlighted in red. (A) Phylogeny of KAS gene (ID: s00303.g8736) gene; (B) phylogeny of KAR gene (s00043.g2007); (C) phylogeny of ENR genes (s00007.g154, s00295.g8627); (D) phylogeny of the acyl-ACP TE gene (s00355.g10346); (E) phylogeny of LCFACS genes (s00262.g7492, scaffold00341.g9817); (F) phylogeny of PAP genes (s00058.g2171, s00127.g4338). (PDF) Click here for additional data file. Phylogenies by the ML method for n class="Chemical">lipid synthesis genes that were inferred to have originated from HGT. MUSCLE was used to perform multiple alignments for each of the lipid synthesis genes (indicated by a red star) and their orthologous genes generated by Inparanoid Program. ML trees were constructed in PhyML program by the best protein evolution model selected by ProtTest, using 100 bootstrapping replicates. Branches of bacterial sequences (with the exception of cyanobacteria) are highlighted in red. (A) Phylogeny of KAS gene (ID: s00303.g8736) gene; (B) phylogeny of KAR gene (s00043.g2007); (C) phylogeny of ENR genes (s00007.g154, s00295.g8627); (D) phylogeny of the acyl-ACP TE gene (s00355.g10346); (E) phylogeny of LCFACS genes (s00262.g7492, scaffold00341.g9817); (F) phylogeny of PAP genes (s00058.g2171, s00127.g4338). (PDF) Click here for additional data file. Phylogeny of n class="Gene">DGAT genes in the six Nannochloropsis genomes. DGAT genes from all the six Nannochloropsis strains, as well as lineages representing bacteria, fungi, microalgae and plants, were aligned in ClustalW. A phylogenetic tree was constructed using the neighbor-joining method in MEGA5 with a bootstrap test (based on 100 replicates). Orthologs (from the six Nannochloropsis strains) of each DGAT in IMET1 are indicated with the same color. The Ka/Ks value of each orthologous group of DGAT-2 in Nannochloropsis is shown next to the Group ID. (PDF) Click here for additional data file. Phylogenies by the NJ method for n class="Chemical">lipid synthesis genes that were inferred to have originated from secondary endosymbionts. DGAT-2 from N. oceanica IMET1 are indicated by red stars. (A) Phylogeny of DGAT-2C; (B) phylogeny of DGAT-2B; (C) phylogeny of DGAT-2A; (D) phylogeny of DGAT-2I; (E) phylogeny of DGAT-2G; (F) phylogeny of MCAT. (PDF) Click here for additional data file. Phylogenies by the ML method for n class="Chemical">lipid synthesis genes that were inferred to have originated from secondary endosymbionts. DGAT-2 from N. oceanica IMET1 are indicated by red stars. (A) Phylogeny of DGAT-2C; (B) phylogeny of DGAT-2B; (C) phylogeny of DGAT-2A; (D) phylogeny of DGAT-2I; (E) phylogeny of DGAT-2G; (F) phylogeny of MCAT. (PDF) Click here for additional data file. Genomic and transcriptomic datasets for the six n class="Species">Nannochloropsis species and strains. (A) Genomic DNA sequencing data for N. oceanica IMET1. (B) Genomic DNA sequencing data for the other six Nannochloropsis strains. (C) cDNA sequencing data for N. oceanica IMET1. The cDNA sequencing was performed on 454 Titanium. (D) mRNA-Seq data for N. oceanica IMET1. SG, shotgun; PE, pair-end; MP, mate-pair. *: Illumina GAIIx–based transcriptome sequencing was performed on total mRNA samples isolated from microalgal cells under both the control growth conditions (indicated as “C”) and the N-depleted conditions (indicated as “N”) at three different time points (3, 6 and 24 h) for each condition. (DOC) Click here for additional data file. Comparing the genomic features of n class="Species">N. oceanica IMET1 and five unicellular eukaryotic microalgae. Parameters of the genomes other than IMET1 were retrieved from the literature. (DOC) Click here for additional data file. Detailed descriptions of materials and methods. (DOC) Click here for additional data file. The selective pressure that drives genome diversity in pan class="Species">Nannochloropsis spp. (DOC) Click here for additional data file.

38 in total

1. Essential genes are more evolutionarily conserved than are nonessential genes in bacteria.

Authors: I King Jordan; Igor B Rogozin; Yuri I Wolf; Eugene V Koonin
Journal: Genome Res Date: 2002-06 Impact factor: 9.043

2. Gene loss, protein sequence divergence, gene dispensability, expression level, and interactivity are correlated in eukaryotic evolution.

Authors: Dmitri M Krylov; Yuri I Wolf; Igor B Rogozin; Eugene V Koonin
Journal: Genome Res Date: 2003-10 Impact factor: 9.043

Review 3. The evolution of modern eukaryotic phytoplankton.

Authors: Paul G Falkowski; Miriam E Katz; Andrew H Knoll; Antonietta Quigg; John A Raven; Oscar Schofield; F J R Taylor
Journal: Science Date: 2004-07-16 Impact factor: 47.728

4. Genome sequence of the ultrasmall unicellular red alga Cyanidioschyzon merolae 10D.

Authors: Motomichi Matsuzaki; Osami Misumi; Tadasu Shin-I; Shinichiro Maruyama; Manabu Takahara; Shin-Ya Miyagishima; Toshiyuki Mori; Keiji Nishida; Fumi Yagisawa; Keishin Nishida; Yamato Yoshida; Yoshiki Nishimura; Shunsuke Nakao; Tamaki Kobayashi; Yu Momoyama; Tetsuya Higashiyama; Ayumi Minoda; Masako Sano; Hisayo Nomoto; Kazuko Oishi; Hiroko Hayashi; Fumiko Ohta; Satoko Nishizaka; Shinobu Haga; Sachiko Miura; Tomomi Morishita; Yukihiro Kabeya; Kimihiro Terasawa; Yutaka Suzuki; Yasuyuki Ishii; Shuichi Asakawa; Hiroyoshi Takano; Niji Ohta; Haruko Kuroiwa; Kan Tanaka; Nobuyoshi Shimizu; Sumio Sugano; Naoki Sato; Hisayoshi Nozaki; Naotake Ogasawara; Yuji Kohara; Tsuneyoshi Kuroiwa
Journal: Nature Date: 2004-04-08 Impact factor: 49.962

5. The genome of the diatom Thalassiosira pseudonana: ecology, evolution, and metabolism.

Authors: E Virginia Armbrust; John A Berges; Chris Bowler; Beverley R Green; Diego Martinez; Nicholas H Putnam; Shiguo Zhou; Andrew E Allen; Kirk E Apt; Michael Bechner; Mark A Brzezinski; Balbir K Chaal; Anthony Chiovitti; Aubrey K Davis; Mark S Demarest; J Chris Detter; Tijana Glavina; David Goodstein; Masood Z Hadi; Uffe Hellsten; Mark Hildebrand; Bethany D Jenkins; Jerzy Jurka; Vladimir V Kapitonov; Nils Kröger; Winnie W Y Lau; Todd W Lane; Frank W Larimer; J Casey Lippmeier; Susan Lucas; Mónica Medina; Anton Montsant; Miroslav Obornik; Micaela Schnitzler Parker; Brian Palenik; Gregory J Pazour; Paul M Richardson; Tatiana A Rynearson; Mak A Saito; David C Schwartz; Kimberlee Thamatrakoln; Klaus Valentin; Assaf Vardi; Frances P Wilkerson; Daniel S Rokhsar
Journal: Science Date: 2004-10-01 Impact factor: 47.728

6. Evolutionary implications of bacterial polyketide synthases.

Authors: Holger Jenke-Kodama; Axel Sandmann; Rolf Müller; Elke Dittmann
Journal: Mol Biol Evol Date: 2005-06-15 Impact factor: 16.240

7. Functional genomic analysis of the rates of protein evolution.

Authors: Dennis P Wall; Aaron E Hirsh; Hunter B Fraser; Jochen Kumm; Guri Giaever; Michael B Eisen; Marcus W Feldman
Journal: Proc Natl Acad Sci U S A Date: 2005-03-30 Impact factor: 11.205

8. OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups.

Authors: Feng Chen; Aaron J Mackey; Christian J Stoeckert; David S Roos
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

9. Correlations between gene expression and gene conservation in fission yeast.

Authors: Juan Mata; Jürg Bahler
Journal: Genome Res Date: 2003-11-12 Impact factor: 9.043

10. Genome of the red alga Porphyridium purpureum.

Authors: Debashish Bhattacharya; Dana C Price; Cheong Xin Chan; Huan Qiu; Nicholas Rose; Steven Ball; Andreas P M Weber; Maria Cecilia Arias; Bernard Henrissat; Pedro M Coutinho; Anagha Krishnan; Simone Zäuner; Shannon Morath; Frédérique Hilliou; Andrea Egizi; Marie-Mathilde Perrineau; Hwan Su Yoon
Journal: Nat Commun Date: 2013 Impact factor: 14.919

52 in total

1. Diel oscillation in the optical activity of carotenoids in the absorption spectrum of Nannochloropsis.

Authors: Gabriela C Possa; Hugo Santana; Bruno S A F Brasil; Luiz F Roncaratti
Journal: J Mol Model Date: 2017-02-28 Impact factor: 1.810

2. Chimeric origins of ochrophytes and haptophytes revealed through an ancient plastid proteome.

Authors: Richard G Dorrell; Gillian Gile; Giselle McCallum; Raphaël Méheust; Eric P Bapteste; Christen M Klinger; Loraine Brillet-Guéguen; Katalina D Freeman; Daniel J Richter; Chris Bowler
Journal: Elife Date: 2017-05-12 Impact factor: 8.140

3. Genomic Foundation of Starch-to-Lipid Switch in Oleaginous Chlorella spp.

Authors: Jianhua Fan; Kang Ning; Xiaowei Zeng; Yuanchan Luo; Dongmei Wang; Jianqiang Hu; Jing Li; Hui Xu; Jianke Huang; Minxi Wan; Weiliang Wang; Daojing Zhang; Guomin Shen; Conglin Run; Junjie Liao; Lei Fang; Shi Huang; Xiaoyan Jing; Xiaoquan Su; Anhui Wang; Lili Bai; Zanmin Hu; Jian Xu; Yuanguang Li
Journal: Plant Physiol Date: 2015-10-20 Impact factor: 8.340

4. Light Remodels Lipid Biosynthesis in Nannochloropsis gaditana by Modulating Carbon Partitioning between Organelles.

Authors: Alessandro Alboresi; Giorgio Perin; Nicola Vitulo; Gianfranco Diretto; Maryse Block; Juliette Jouhet; Andrea Meneghesso; Giorgio Valle; Giovanni Giuliano; Eric Maréchal; Tomas Morosinotto
Journal: Plant Physiol Date: 2016-06-20 Impact factor: 8.340

Review 5. Advanced genetic tools enable synthetic biology in the oleaginous microalgae Nannochloropsis sp.

Authors: Eric Poliner; Eva M Farré; Christoph Benning
Journal: Plant Cell Rep Date: 2018-03-06 Impact factor: 4.570

6. The Microalga Nannochloropsis during Transition from Quiescence to Autotrophy in Response to Nitrogen Availability.

Authors: Agnieszka Zienkiewicz; Krzysztof Zienkiewicz; Eric Poliner; Jane A Pulman; Zhi-Yan Du; Giovanni Stefano; Chia-Hong Tsai; Patrick Horn; Ivo Feussner; Eva M Farre; Kevin L Childs; Federica Brandizzi; Christoph Benning
Journal: Plant Physiol Date: 2019-11-18 Impact factor: 8.340

7. Ultrastructure and composition of the Nannochloropsis gaditana cell wall.

Authors: Matthew J Scholz; Taylor L Weiss; Robert E Jinkerson; Jia Jing; Robyn Roth; Ursula Goodenough; Matthew C Posewitz; Henri G Gerken
Journal: Eukaryot Cell Date: 2014-09-19

8. Polar Lipids Analysis of Cultured Phytoplankton Reveals Significant Inter-taxa Changes, Low Influence of Growth Stage, and Usefulness in Chemotaxonomy.

Authors: José Pedro Cañavate; Isabel Armada; Ismael Hachero-Cruzado
Journal: Microb Ecol Date: 2016-11-11 Impact factor: 4.552

9. Betaine Lipid Is Crucial for Adapting to Low Temperature and Phosphate Deficiency in Nannochloropsis.

Authors: Hiroki Murakami; Takashi Nobusawa; Koichi Hori; Mie Shimojima; Hiroyuki Ohta
Journal: Plant Physiol Date: 2018-03-19 Impact factor: 8.340

10. Choreography of Transcriptomes and Lipidomes of Nannochloropsis Reveals the Mechanisms of Oil Synthesis in Microalgae.

Authors: Jing Li; Danxiang Han; Dongmei Wang; Kang Ning; Jing Jia; Li Wei; Xiaoyan Jing; Shi Huang; Jie Chen; Yantao Li; Qiang Hu; Jian Xu
Journal: Plant Cell Date: 2014-04-01 Impact factor: 11.277