Literature DB >> 35486731

Genome evolution of a nonparasitic secondary heterotroph, the diatom Nitzschia putrida.

Ryoma Kamikawa1, Takako Mochizuki2, Mika Sakamoto2, Yasuhiro Tanizawa2, Takuro Nakayama3, Ryo Onuma4, Ugo Cenci5, Daniel Moog6,7, Samuel Speak8, Krisztina Sarkozi8, Andrew Toseland8, Cock van Oosterhout8, Kaori Oyama9, Misako Kato9, Keitaro Kume10, Motoki Kayama11, Tomonori Azuma11, Ken-Ichiro Ishii11, Hideaki Miyashita11, Bernard Henrissat12,13,14, Vincent Lombard12,13, Joe Win15, Sophien Kamoun15, Yuichiro Kashiyama16, Shigeki Mayama17, Shin-Ya Miyagishima4, Goro Tanifuji18, Thomas Mock8, Yasukazu Nakamura2.   

Abstract

Secondary loss of photosynthesis is observed across almost all plastid-bearing branches of the eukaryotic tree of life. However, genome-based insights into the transition from a phototroph into a secondary heterotroph have so far only been revealed for parasitic species. Free-living organisms can yield unique insights into the evolutionary consequence of the loss of photosynthesis, as the parasitic lifestyle requires specific adaptations to host environments. Here, we report on the diploid genome of the free-living diatom Nitzschia putrida (35 Mbp), a nonphotosynthetic osmotroph whose photosynthetic relatives contribute ca. 40% of net oceanic primary production. Comparative analyses with photosynthetic diatoms and heterotrophic algae with parasitic lifestyle revealed that a combination of gene loss, the accumulation of genes involved in organic carbon degradation, a unique secretome, and the rapid divergence of conserved gene families involved in cell wall and extracellular metabolism appear to have facilitated the lifestyle of a free-living secondary heterotroph.

Entities:  

Year:  2022        PMID: 35486731      PMCID: PMC9054022          DOI: 10.1126/sciadv.abi5075

Source DB:  PubMed          Journal:  Sci Adv        ISSN: 2375-2548            Impact factor:   14.957


INTRODUCTION

The loss of photosynthesis in photoautotrophs is successful if compensated by a competitive advantage arising from the availability of an extracellular energy source. Hence, many secondary heterotrophs evolve as parasites (–), relying on sufficient resources provided by their hosts. Well-studied examples are the Apicomplexa [e.g., ()], which have lost photosynthesis secondarily. However, the loss of photosynthesis can also lead to free-living secondary heterotrophs, which are as common as parasites (, –). Despite their significance, our knowledge about the evolution of free-living secondary heterotrophs is very limited, and we therefore lack insights into evolutionary processes required for them to thrive without photosynthesis and independently of a resource-providing host. Given that a parasitic lifestyle accelerates the rate of evolution (cf. Red Queen hypothesis) () and of loss of conserved orthologous genes [e.g., ()], the genome analysis of a nonparasitic secondary heterotroph can provide insights uncompromised by parasite-specific adaptations. Hence, the diatom Nitzschia putrida, isolated from mangrove estuaries, is the ideal model to test these hypotheses because it is an example of a free-living secondary heterotroph (, ) within the diverse group of largely photoautotrophic diatoms (, ). As several genomes of the latter have recently become available including close phylogenetic relatives (–), a genome-based comparative metabolic reconstruction of N. putrida promises to reveal fresh insights into what is required to thrive as a free-living secondary heterotroph. Thus, here, we have analyzed the draft genome sequence of N. putrida, which provides insight into evolutionary processes underpinning lifestyle shifts from photoautotrophy to free-living heterotrophy in the context of a coastal surface ocean ecosystem.

RESULTS

Genome assembly

K-mer–based GenomeScope analysis () with 150–base pair (bp)–long Illumina short reads suggested the genome of N. putrida (Fig. 1A) to be diploid (fig. S1A). To provide a high-quality genome with long-range contiguity, PacBio sequencing (RSII platform) was performed, resulting in ≥40-fold coverage. Because of the confirmed diploid nature of the N. putrida genome, we have applied the Falcon assembler and Falcon_unzip version 0.5 () to provide a first draft genome of this species. On the basis of this assembly, we estimated a genome size of 35 Mbp, including 87 scaffolds with an N50 of 860.9 kbp. The longest scaffold was 3.8 Mbp. The heterozygous regions of the genome (alternate contigs) estimated by the Falcon assembler resulted in 12 Mbp, with an N50 of 121 kbp (table S1). The Falcon assembly was error-corrected and polished by approximately 150-fold coverage of Illumina short reads, which were subsequently used for generating the final assembly with Pilon 1.2.2 () including manual curation.
Fig. 1.

The heterotrophic diatom N. putrida and its plastid proteome.

(A) The frustule view of N. putrida. Bar, 10 μm. (B) Estimated plastid proteome size in three diatoms. Light and dark gray bars show low and high confident plastid-targeted proteins identified by ASAFind (), respectively. Data of two photosynthetic diatoms P. tricornutum and T. pseudonana are derived from the previous study (). (C) Unique and shared plastid-targeted orthogroups. Highlighted in red is the orthogroup exclusively shared by the two photosynthetic diatoms. (D) Predicted metabolic map of the nonphotosynthetic plastid. Representative pathways found in photosynthetic diatom species are shown. Green and light gray arrows show the presence and absence of the responsible protein sequences for the reactions in the genome, respectively. Amino acids are highlighted in red. Abbreviations are described in the Supplementary Materials.

The heterotrophic diatom N. putrida and its plastid proteome.

(A) The frustule view of N. putrida. Bar, 10 μm. (B) Estimated plastid proteome size in three diatoms. Light and dark gray bars show low and high confident plastid-targeted proteins identified by ASAFind (), respectively. Data of two photosynthetic diatoms P. tricornutum and T. pseudonana are derived from the previous study (). (C) Unique and shared plastid-targeted orthogroups. Highlighted in red is the orthogroup exclusively shared by the two photosynthetic diatoms. (D) Predicted metabolic map of the nonphotosynthetic plastid. Representative pathways found in photosynthetic diatom species are shown. Green and light gray arrows show the presence and absence of the responsible protein sequences for the reactions in the genome, respectively. Amino acids are highlighted in red. Abbreviations are described in the Supplementary Materials. According to the k-mer–assessed diploid nature of the N. putrida genome, the read coverage of the homozygous regions is approximately twofold higher than the read coverage for the heterozygous regions, suggesting the presence of diverged alleles as previously identified in the genome of the photoautotroph diatom Fragilariopsis cylindrus (fig. S1, A and B). Thus, most of the diverged allelic variants can be found in the heterozygous regions characterized by the presence of alternate contigs, whereas the regions with no corresponding alternate contigs are homozygous (fig. S1B). On the basis of the analysis with Braker2 version 2.0.3 (), the Nitzschia genome comprises 15,003 and 5767 inferred protein-coding loci on the primary and alternate contigs, respectively (table S1). Almost 40% of loci in the genome of N. putrida appear to be characterized by diverged alleles. A BUSCOv3 analysis () revealed the genome to be complete at a level of 90.1% based on the haploid set of genes.

The loss of photosynthesis

The haploid set of genes was used to reconstruct the nuclear-encoded plastid proteome of N. putrida and therefore to reveal the extent of gene loss including key genes of photosynthesis. A comparative analysis of the N. putrida plastome () with its photosynthetic counterparts revealed that more than 50% of nuclear encoded plastid proteins have been lost (Fig. 1B). More than 500 orthogroups (OrthoFinder) () of nuclear-encoded plastid proteins, which are usually shared between photosynthetic diatoms (), are missing in the predicted plastid proteome of N. putrida (Fig. 1C). The missing part of the plastid proteome included genes encoding for proteins of light-harvesting antenna including fucoxanthin-chlorophyll a/c protein (fcp), photosystem II and I (e.g., psbA, psbC, psbO, psaA, psaB, and psaD), the cytochrome b6/f complex (e.g., petA), and carbon fixation (e.g., rbcS and rbcL) in addition to genes of the Calvin cycle [e.g., phosphoribulokinase (prk)]. Furthermore, a substantial number of key genes were missing for the biosynthesis of chlorophyll, carotenoids, and plastoquinones (fig. S1C). Despite the loss of some of these key photosynthesis genes, there is still a substantial number of genes left encoding common plastid metabolic pathways as known from photosynthetic diatoms, including the generation of adenosine 5′-triphosphate (ATP) by adenosine triphosphatase (ATPase) subunits, which are encoded both in the nuclear and plastid genomes (). Almost all genes encoding for plastid enzymes to synthesize essential amino acids are still encoded in the nuclear genome of N. putrida. Furthermore, all genes of the heme pathway have been found, and N. putrida appears to be able to synthesize riboflavin. The presence of plastid-targeted transporters () enables the transport of phosphoenolpyruvate, 3-phosphoglycerate, and dihydroxyacetone-phosphate across the plastid membranes. In addition, our genome-based reconstruction of plastid metabolism identified the biosynthesis pathway for lipids and the ornithine cycle in N. putrida (Fig. 1D and fig. S2). The latter has been reported neither in previous transcriptome-based studies with this species () nor in any other secondary heterotroph ([.g., (, )]. As N. putrida has an osmotrophic lifestyle, it relies on dissolved organic matter and nutrients. Thus, the biosynthesis of a variety of metabolic compounds supports the osmotrophic lifestyle, lacking abilities to prey or to parasitize on other organisms.

Communication between organelles and light-dependent gene expression

The lack of CO2 fixation in plastids of N. putrida—which reduces the amount of amino acids, lipids, and other metabolites to be synthesized—appears to be partially compensated for by the remodeling of metabolic interactions with mitochondria and peroxisomes and by retaining active recycling of nitrogen (Figs. 1 and 2 and figs. S3 and S4). It appears that the nonphotosynthetic plastid of N. putrida still exchanges glutamine and ornithine, both of which are important intermediates of the ornithine cycle. All genes for the ornithine-urea cycle have been retained in the N. putrida genome. The ornithine-urea cycle is indispensable for nitrogen recycling in photosynthetic diatoms (, ), and even after the loss of photosynthesis, nitrogen recycling appears to be essential in N. putrida (Fig. 2A) due to its osmotrophic lifestyle. Usually, the ornithine-urea cycle is tightly linked with tricarboxylic acid cycle and/or photorespiration in photosynthetic diatoms (, ). However, N. putrida is not likely to perform photorespiration (Fig. 2A). The metabolic exchange with the peroxisome through glycolate likely has ceased as phosphoglycolate phosphatase and peroxisomal glycolate oxidase are missing. Thus, photorespiration is unlikely to take place in nonphotosynthetic plastids of N. putrid due to the lack of ribulose 1,5-bisphosphate carboxylase/oxygenase and other key enzymes of the Calvin cycle (Fig. 1). Nevertheless, peroxisomes still appear to play a role in N. putrida for the production of malate or glyoxylate, which feed into respiratory pathways of the mitochondria to support ATP and NADPH (reduced form of nicotinamide adenine dinucleotide phosphate) production (Fig. 2A).
Fig. 2.

Loss of genes for the plastid-peroxisome metabolic flow and photoreceptors.

(A) Metabolic interactions between a mitochondrion and a nonphotosynthetic plastid and between a mitochondrion and a peroxisome. Black, orange, and blue arrows show presence of responsible protein sequences for the reactions in a plastid, a mitochondrion, and a peroxisome, respectively, while light gray arrows show absence of responsible protein sequences. Dashed arrows show possible interorganellar metabolic flows. Abbreviations are described in the Supplementary Materials. (B) Photoreceptor and cell-cycle genes in the N. putrida genome. The other genes are shown in fig. S5. Light green and light gray boxes show the presence and absence of corresponding genes, respectively. (C) Growth of the heterotrophic diatom under the different light conditions. Closed boxes show growth in the continuous dark condition, while open boxes show growth in the light-dark condition. Shaded in gray are the dark periods in the light-dark cultivation conditions. (D) Left: Heatmap showing the reproducible expression patterns of genes (Pearson’s correlation coefficient < 0.9). k-means clustering was calculated for each gene based on reads per kilobase of transcript per million mapped reads (RPKM) + 1 values, which were transformed to log2 and centered by median values. Yellow and blue indicate up-regulation and down-regulation of the gene, respectively. Right: The line graphs showing expression pattern of genes in each cluster. The colored line indicates the average value of the expression patterns. LogFC, log fold change.

Loss of genes for the plastid-peroxisome metabolic flow and photoreceptors.

(A) Metabolic interactions between a mitochondrion and a nonphotosynthetic plastid and between a mitochondrion and a peroxisome. Black, orange, and blue arrows show presence of responsible protein sequences for the reactions in a plastid, a mitochondrion, and a peroxisome, respectively, while light gray arrows show absence of responsible protein sequences. Dashed arrows show possible interorganellar metabolic flows. Abbreviations are described in the Supplementary Materials. (B) Photoreceptor and cell-cycle genes in the N. putrida genome. The other genes are shown in fig. S5. Light green and light gray boxes show the presence and absence of corresponding genes, respectively. (C) Growth of the heterotrophic diatom under the different light conditions. Closed boxes show growth in the continuous dark condition, while open boxes show growth in the light-dark condition. Shaded in gray are the dark periods in the light-dark cultivation conditions. (D) Left: Heatmap showing the reproducible expression patterns of genes (Pearson’s correlation coefficient < 0.9). k-means clustering was calculated for each gene based on reads per kilobase of transcript per million mapped reads (RPKM) + 1 values, which were transformed to log2 and centered by median values. Yellow and blue indicate up-regulation and down-regulation of the gene, respectively. Right: The line graphs showing expression pattern of genes in each cluster. The colored line indicates the average value of the expression patterns. LogFC, log fold change. Light in photosynthetic organisms not only plays a substantial role for photosynthesis generating ATP and NADPH but also regulates cell division, diel cycles, and different signaling processes unlike in many heterotrophic organisms (–). As a consequence, we identified the remaining photoreceptors and cell-cycle regulators such as cyclins and cyclin-dependent kinases (). Although they were still encoded and expressed in the genome of N. putrida (Fig. 2B and fig. S5, A and B), we were unable to identify a diel cycle in cell division (Fig. 2C). This suggests that these cel-cycle regulators potentially have neo/subfunctionalized and therefore have a different regulatory role in N. putrida unrelated to the diel cycle. The loss of the transcription factor bHLH-1a (RITMO1), which has been identified as a master regulator of diel periodicity (), corroborates our finding that N. putrida has lost the ability to perform diel cycles. In addition, most of the other photoreceptors known from photosynthetic diatoms have also been lost (Fig. 2B) such as the blue light sensing aureochromes 1a/b, both of which are transcription factors responsible for photoacclimation (). Despite the lack of light-dependent cell-cycle regulation, a few remaining photoreceptors were identified including bHLH1b_PAS, aureochrome 1c, and cryptochrome-DASH/CPF2 (Fig. 2B) (, ). Basic ZIP [basic leucine zipper proteins (bZIP)] transcription factors having potentially light-sensitive Per-Arnt-Sim (PAS) domains (bZIP-PAS) () were also identified in the N. putrida genome such as homologs to bZIP6 and bZIP7 of Phaeodactylum tricornutum (). The latter homolog has been duplicated and diversified in N. putrida (fig. S5C). The presence of bZIP-PAS protein in a heterotrophic eukaryote is not unprecedented as some oomycetes, nonphotosynthetic parasites, have been reported to also encode them in their genomes [e.g., ()]. Although their role in regulating gene expression remains to be investigated in N. putrida, light still appears to influence the expression of some genes in this heterotrophic species. Comparative transcriptome analyses every 4 hours during a shift from a light phase to darkness (Fig. 2D) revealed eight clusters characterized by different expression patterns. Furthermore, there was no cluster explicitly representing the light-dependent gene expression patterns as seen in photosynthetic algae [e.g., (, )]. However, one of the clusters contained genes only expressed in the mid-light phase: cluster 7 containing 90 genes (0.6% total). Forty four of them were genes with known functional domains based on a KOG (EuKaryotic Orthologous Groups) analysis, and 21 of them were encoding proteins for substrate import and carbon metabolism (fig. S5D). However, the photoreceptor homologs above—bHLH1b_PAS, aureochrome 1c, and cryptochrome-DASH/CPF2—were not part of this cluster, and there was no explicit trend in their gene expression patterns with respect to changes between light and dark conditions.

The genetic toolkit for the evolution of nonparasitic secondary heterotrophy

Despite the loss of many nuclear genes and their families, the genome size of N. putrida is not significantly different to photosynthetic relatives such as F. cylindrus and P. tricornutum and the more distantly related diatom Thalassiosira pseudonana (table S1). This is distinct from evolutionary trends observed in parasitic eukaryotes that have lost photosynthesis as they have smaller genomes encoding smaller gene families compared to their photosynthetic relatives (fig. S6). By comparing KOGs of paralogous proteins, there was no significant difference in the number of unique KOG IDs between these four diatom species (fig. S7, A and B). However, when we compared the number of paralogous proteins assigned to each KOG ID, there were several KOG categories for which N. putrida had a higher number of paralogous proteins compared to the other diatom species: nucleotide transport (F), transcription (K), signal transduction (T), intracellular trafficking, secretion, vesicular transport (U), and cytoskeleton (Z) (fig. S7C). Even after normalization by total gene numbers, nucleotide transport (F), signal transduction (T), and cytoskeleton (Z) genes were more abundant in the N. putrida genome (fig. S8A). This observation was corroborated by N. putrida–specific enrichment of Pfam domains such as adenylate/guanylate cyclase and cyclic nucleotide esterase, leucine-rich repeat (LRR), and glycosyl/galactosyl transferase domains (fig. S8B). A microbial heterotroph acquires nutrients either by phagotrophy, the preferred nutrition of many parasites, or by osmotrophy. The latter requires uptake of dissolved organic compounds by osmosis as realized by bacteria and fungi, for instance (, ). As N. putrida grows well under axenic conditions (, ), it is likely an osmotroph, dependent on the uptake of dissolved organic compounds across the silicified cell wall and the plasma membrane. As realized by osmotrophic fungi, N. putrida may even be able to degrade higher–molecular weight compounds extracellularly to be subsequently taken up as individual molecules by specific transporters or even osmosis (, ). Thus, it is likely that cell wall, membrane, and secreted proteins were diversified in N. putrida compared to photosynthetic diatoms to facilitate osmotrophy. To address this hypothesis, we analyzed the enrichment of paralog proteins and differences in nutrient transporters involved in the uptake of dissolved organic compounds such as solute carriers. A comparison to photosynthetic diatoms and parasitic nonphotosynthetic algal species [Prototheca and Helicosporidium (green algae), and the apicomplexans Plasmodium and Toxoplasma] has revealed that N. putrida has a unique composition of genes encoding transporters, which is therefore different to photosynthetic algae and parasitic nonphotosynthetic algal species (Fig. 3, A to C). For instance, the number of genes encoding silicon transporters (SITs), solute symporters, and the resistance-nodulation–cell division superfamily was more than twice as abundant in N. putrida compared to photosynthetic diatom species (Fig. 3D and fig. S9A). However, in contrast to the difference between N. putrida and photosynthetic diatom species, there is no enrichment of particular transporters in parasitic algal species when compared to their photosynthetic relatives (fig. S9, B and C).
Fig. 3.

Diversity of transporters and carbohydrate active enzymes in N. putrida.

(A) Distribution of the number of transporters in each transporter family of diatoms. Differences in the distributions among species were tested by the Wilcoxon signed-rank test corrected with the Benjamini-Hochberg procedure (P < 0.05), but there is no significant difference. Outliers were omitted in the boxplot. Nonphotosynthetic species are highlighted in gray. (B) Distribution of the number of transporters in each transporter family of Alveolata. Details are described in (A). (C) Distribution of the number of transporters in each transporter family of green algae (Trebouxiophyceae). Details are described in (A). (D) The gene number of transporters in the 12 most abundant transporter families of N. putrida. (E) Silicon transporter (SIT) genes tandemly located in the contig 000000F. SIT genes are highlighted in light red with the gene IDs. (F) Glycoside hydrolase (GH) families from the Carbohydrate Active enZyme (CAZy) database focused on diatoms. The diagram shows a heatmap of CAZyme prevalence in each taxon (number of a particular CAZyme family divided by the total number of CAZyme genes in the organism); the white to blue color scheme indicates low to high prevalence, respectively. Dendrograms (left and top) show respectively the relative taxa proximity with respect co-occurrence of CAZyme families and the co-occurrence of CAZyme families with one another within genomes. (G) GH114 genes tandemly located in the contig 000022F. GH114 genes are highlighted in light green with the gene IDs.

Diversity of transporters and carbohydrate active enzymes in N. putrida.

(A) Distribution of the number of transporters in each transporter family of diatoms. Differences in the distributions among species were tested by the Wilcoxon signed-rank test corrected with the Benjamini-Hochberg procedure (P < 0.05), but there is no significant difference. Outliers were omitted in the boxplot. Nonphotosynthetic species are highlighted in gray. (B) Distribution of the number of transporters in each transporter family of Alveolata. Details are described in (A). (C) Distribution of the number of transporters in each transporter family of green algae (Trebouxiophyceae). Details are described in (A). (D) The gene number of transporters in the 12 most abundant transporter families of N. putrida. (E) Silicon transporter (SIT) genes tandemly located in the contig 000000F. SIT genes are highlighted in light red with the gene IDs. (F) Glycoside hydrolase (GH) families from the Carbohydrate Active enZyme (CAZy) database focused on diatoms. The diagram shows a heatmap of CAZyme prevalence in each taxon (number of a particular CAZyme family divided by the total number of CAZyme genes in the organism); the white to blue color scheme indicates low to high prevalence, respectively. Dendrograms (left and top) show respectively the relative taxa proximity with respect co-occurrence of CAZyme families and the co-occurrence of CAZyme families with one another within genomes. (G) GH114 genes tandemly located in the contig 000022F. GH114 genes are highlighted in light green with the gene IDs. Expansion of those gene families may, at least partly, have been achieved by recent tandem duplications (Fig. 3E). To gain insight into when the expansion had occurred, we performed a coalescence analysis, which revealed that SITs in N. putrida began to expand around 3.3 million years (Ma) ago [1.2 to 6.6, 95% confidence interval (CI)], while divergence from another nonphotosynthetic diatom N. alba is estimated to have occurred around 6.67 Ma ago (2.5 to 11.5, 95% CI; fig. S10). The split between F. cylindrus and P. multiseries, which was used to date the tree, was estimated at 9.7 Ma ago (7.6 to 11.6, 95% CI). Thus, the recent expansion of SITs suggests neo/subfunctionalization of the gene family in response to the change in lifestyle. The divergence rate of SIT genes was much larger than that of control genes (e.g., myosin), indicating that SIT diversification might have contributed to the adaptation of the heterotrophic lifestyle. In support of this hypothesis, we detected several sites under positive selection in different members of the SIT family (table S2), which implies that the evolution of those genes may have been driven by diversifying selection. The solute sodium symporters are estimated to have diverged around 7.5 Ma ago (3.8 to 11.1, 95% CI), markedly earlier than the SIT gene family. Although the divergence rate is also larger than that of control genes (fig. S10), we did not find evidence of diversifying selection in this gene family. The differences between these two families of transporters suggest that their expansion might have occurred in a stepwise manner and driven by different evolutionary forces. Furthermore, although the overall carbohydrate-active enzyme (CAZyme) family composition of Nitzschia was not different from that of photosynthetic diatoms (fig. S11), families encoding β-glycoside hydrolase (GH8), laminarinase (GH16_3), pectinase (GH28), β-glucanase (GH72), α-mannan hydrolyzing enzymes (GH99), and β-1,2-glucan hydrolytic enzymes (GH114) were enriched in N. putrida compared to photosynthetic species (Fig. 3F). Expansion of these families might, at least partly, have been achieved by recent tandem duplications (Fig. 3G), suggesting an important role of these genes for the heterotrophic lifestyle of N. putrida. Notably, more than one-third of proteins assigned to the above six CAZyme families are predicted to be secreted in N. putrida (see below). The CAZyme compositions suggest that N. putrida might be able to degrade extracellular polysaccharides such as ß-1,3 glucans (e.g., lichenin, paramylon, callose, and laminarin), starches, β-1,2-glucans, pectin, and α-mannan. As N. putrida has been isolated from disintegrating mangrove leaves in a paddle (, ), this species might play a role in degrading dead leaves and therefore facilitating carbon recycling in mangroves. To gain first insight into how transcription of CAZyme genes is regulated by different carbon sources, we performed comparative transcriptome analyses with starved N. putrida cells in comparison to cells growing on glucose and starch. However, we found that only a limited number of genes encoding CAZymes were differentially expressed (table S3). About half of these genes were up-regulated in response specifically to starch as a carbon source, while only one CAZyme gene was up-regulated in response to glucose (table S3). This observation suggests that most of the CAZymes in N. putrida are not for the utilization of glucose and only very few for starch utilization. Arguably, providing a very limited set of organic substrates does not reflect the complexity of organic carbon provided by disintegrating leaves in a mangrove ecosystem. Hence, this might be the main reason for the limited transcriptome response observed in our experiments.

The predicted secretome of the nonparasitic, free-living secondary heterotroph N. putrida

Given that the secretome plays an important role for substrate degradation and subsequent uptake of low–molecular weight compounds in osmotrophs (), we conducted a comparative analysis to predict secreted proteins of N. putrida in silico by identifying proteins with N-terminal signal peptides and a lack of transmembrane domains. The resulting proteins were clustered using TribeMCL (), and plastid- and lysosome-localized proteins were subsequently removed using ASAFind according to their characteristic targeting motifs () and Pfam domains. The number of putatively secreted proteins is 978, 998, 596, and 718 in N. putrida, F. cylindrus, P. tricornutum, and T. pseudonana, respectively, which corresponds to between 5 and 7% of the total number of genes in their genomes (fig. S12A). Nevertheless, there were significant differences when we compared the diversity of proteins between these four diatom species (Fig. 4, A and B); N. putrida, on average, had a significantly higher number of proteins per tribe than any of the other diatom species (two-sided Wilcoxon signed-rank test; P < 0.01; Fig. 4C). In particular, proteins involved in heterotrophy such as organic matter degradation/modification including CAZymes and peptidases were more abundant in N. putrida than in the photosynthetic diatom genomes (188 in N. putrida, 142 in F. cylindrus, 118 in P. tricornutum, and 101 in T. pseudonana; fig. S12A). This is in contrast to parasitic green algae because their predicted secretomes are smaller than those of their photosynthetic relatives and show no explicit enrichment of secretome proteins per tribe (Fig. 4, D to F).
Fig. 4.

Secretome of nonphotosynthetic algae.

(A) The number of secretome tribes of diatoms, including at least four sequences, clustered by TribeMCL (). Different colors represent tribe categories as follows: 1, species specific tribes; 2 to 4, tribes shared by two to four species, respectively. OTUs, operational taxonomic unit. (B) Proportion of each tribe category in diatoms. Details are described in (A). (C) Distribution of the number of protein sequences in each secretome tribe in diatoms. Outliers were omitted in the boxplot. The Wilcoxon signed-rank test corrected with the Benjamini-Hochberg procedure was used for tests of statistical significance. (D) The number of secretome tribes in green algae (trebouxiophytes), including at least four sequences, clustered by TribeMCL (). Different colors represent tribe categories as follows: 1, species specific tribes; 2 to 5, tribes shared by two to five species, respectively. (E) Proportion of each tribe category in green algae. Details are described in (D). (F) Distribution of the number of protein sequences in each secretome tribe in green algae. Details are described in (C). (G) Expression of the 10 largest tribes in N. putrida during the 25 hours of cultivation. Genes in the tribes could be divided into four clusters. Details are described in Fig. 2D.

Secretome of nonphotosynthetic algae.

(A) The number of secretome tribes of diatoms, including at least four sequences, clustered by TribeMCL (). Different colors represent tribe categories as follows: 1, species specific tribes; 2 to 4, tribes shared by two to four species, respectively. OTUs, operational taxonomic unit. (B) Proportion of each tribe category in diatoms. Details are described in (A). (C) Distribution of the number of protein sequences in each secretome tribe in diatoms. Outliers were omitted in the boxplot. The Wilcoxon signed-rank test corrected with the Benjamini-Hochberg procedure was used for tests of statistical significance. (D) The number of secretome tribes in green algae (trebouxiophytes), including at least four sequences, clustered by TribeMCL (). Different colors represent tribe categories as follows: 1, species specific tribes; 2 to 5, tribes shared by two to five species, respectively. (E) Proportion of each tribe category in green algae. Details are described in (D). (F) Distribution of the number of protein sequences in each secretome tribe in green algae. Details are described in (C). (G) Expression of the 10 largest tribes in N. putrida during the 25 hours of cultivation. Genes in the tribes could be divided into four clusters. Details are described in Fig. 2D. The most common secreted proteins in N. putrida are LRR-containing proteins (fig. S12B), many of which contain additional domains such as tegument and glycoprotein domains, suggesting an increased functional diversity (fig. S13). Only very few LRR-containing proteins were identified in the predicted secretomes of the photosynthetic diatoms, indicating that signal peptide–dependent secretion of abundant and diverse LRR-containing proteins may be an essential requirement in this secondary heterotroph, such as for environmental signaling (). In addition to LRR-containing proteins, the top 10 most enriched proteins in N. putrida were von Willebrand factor type D (VWFD) proteins involved in adhesion or clotting, two types of endopeptidases, trypsin and leishmanolysin (cell-surface peptidase of the human parasite Leishmania), intradiol ring-cleavage dioxygenase protein for degradation of aromatic compounds, methyltransferase, and four proteins with unknown function (fig. S12B). LRR-containing proteins and VWFDs might play important roles in N. putrida for attaching to disintegrating mangrove leaves (, , ). Endopeptidases and aromatic compound degradation may facilitate the utilization of their complex carbon compounds. Furthermore, transcriptional dynamics of the predicted secretome over a diel cycle (Fig. 2) revealed the presence of four different clusters. Genes in cluster 1 were transcribed at the beginning of the first light phase and genes in cluster 2 at the end of the dark phase and into the second light phase (Fig. 4G). Genes of cluster 3 were most strongly expressed in the middle and end of the first light phase, whereas genes in cluster 4 were relatively weakly expressed throughout day and night. These results suggest that stimuli including light and/or nutrients play a role in the regulation of these genes, which might either be a relict from the photosynthetic ancestor or a response to diel cycles of organic substances in the aquatic system occupied by N. putrida. There is only weak evidence of lateral transfer of secretome genes in N. putrida (figs. S14 and S15) with five genes of potential lateral origin (figs. S14 and S15). Thus, the origin of most secretome proteins in N. putrida likely was derived vertically from homologs of a photosynthetic ancestor.

DISCUSSION

N. putrida experienced a series of genetic adaptations toward a heterotrophic lifestyle. This diatom species took a step backward in one of the major evolutionary transitions, from photoautotrophs to heterotrophs, potentially relaxing selection on some of the now redundant gene networks and their functions. As expected, more than 50% of nuclear encoded plastid proteins have been lost in the N. putrida plastid proteome in comparison to its photosynthetic counterparts (). However, the total number of genes (~15,000) fell within the range of photosynthetic microalgae, and we found no evidence of pseudogene formation, genome streamlining [e.g., ()], gene family contraction (cf. birth-and-death hypothesis) (), or reductive genome evolution (Black Queen hypothesis) (). The relatively large genome size is not unexpected given that N. putrida is a free-living osmotroph. This free-living lifestyle in a complex and highly variable coastal marine environment likely is the reason why a substantial number of genes including some photoreceptors, cell cycle regulators, and common plastid metabolic pathways usually present in photosynthetic diatoms have remained. Although some of the latter genes were still expressed, N. putrida appears to lack a diel growth cycle, which suggests that these cell-cycle regulators have neo/subfunctionalized. However, as a certain number of genes still appear to be regulated by light, osmotrophy potentially benefits from diel fluctuations of resources such as dissolved organic carbon in aquatic environments (–). For photoautotrophs, it is important to regulate the cell cycle in accordance with diel cycles for optimizing photosynthesis and therefore cell proliferation (–, , ). Without being reliant on light as its primary energy source, the osmotroph N. putrida no longer requires coordinating its cell cycle with diel cycles. Thus, after the loss of photosynthesis, the strict light-dependent regulation of gene expression might have become less important and gene expression therefore may have become predominantly regulated by other stimuli. Many photoreceptors are missing, but duplication of genes for bZIP transcription factors with PAS domains and genes for signal transduction and cellular regulatory roles such as adenyl/guanyl cyclase and cyclic nucleotide esterase domains was enriched in the N. putrida genome. Furthermore, the peroxisome-plastid interaction is no longer required after the loss of photosynthesis, giving rise to loss of carbon fixation in the context of glycolate recycling. In contrast, the ornithine-urea cycle likely remains to be functional to facilitate nitrogen recycling. Gene family expansions and neo/subfunctionalizations appear to have played a prominent role in the adaptation to its different lifestyle given that many proteins predicted to be secreted have diversified in N. putrida, possibly to facilitate osmotrophy. Together, the marked change of lifestyle associated with the “devolution” did not result in reductive genome evolution as known from nonphotosynthetic plastid-bearing parasites.

METHODS

Cultivation, DNA and RNA extraction, and sequencing

N. putrida NIES-4239 was cultivated in the Daigo’s IMK medium (Wako) including 1% Luria-Bertani medium based on the artificial seawater made with MARINE ART SF-1 (Osaka Yakken Co.) at 20°C under the 12-hour light and 12-hour dark conditions: 50 μmol photons/m2 per second with plant cultivation light-emitting diode light (BC-BML3, Biomedical Science). DNA was extracted with the Extrap Soil DNA Kit Plus version 2 (Nippon Steel). Total DNA was subjected to library construction with TruSeq DNA PCR-Free (350; Illumina) and to 151-bp paired-end sequencing by HiSeqX, resulting in 660 million paired-end reads, and to PacBio RSII, with SMRT cell 8Pac v3 and DNA Polymerase Binding Kit P6 v2, in Macrogen, resulting in 1.3-Gb subreads. Total RNA was extracted with TRIzol (Sigma-Aldrich) according to the manufacturer’s instruction and was subjected to library construction with TruSeq RNA Sample Prep Kit v2 (Illumina) and 101-bp paired-end sequencing by HiSeq 2500, resulting in 107.5 million paired-end reads.

Genome assembly and construction of gene models

PacBio reads were assembled into contigs using Falcon (version 0.7.0) () with a length cutoff of 7000 bp for seed reads and an estimated genome size of 33 Mbp. Genome size estimation was performed on the GenomeScope web server (http://qb.cshl.edu/genomescope/) based on the k-mer frequency distribution of Illumina reads calculated by JellyFish version 2.2.6 with a k-mer size of 21. The resultant primary and associate contigs were then subjected to Falcon_unzip (version 0.5.0) (), generating partially haplotype-phased contigs (primary contigs) and fully phased contigs (haplotigs). The assembly was polished using PacBio reads and Quiver program, followed by single-nucleotide polymorphism (SNP) and short insertion-deletion (indel) error correction using Pilon (version 1.2.2) with Illumina reads mapped by the Burrows-Wheeler Aligner (version 0.7.15) (). Indel errors in the vicinity of hetero-SNPs were further fixed manually, as they were difficult to be automatically corrected. Contigs derived from plastid and mitochondrial genomes were identified using BLASTN and separated from contigs derived from the nuclear genome. RNA sequencing reads were trimmed under the parameters of ILLUMINACLIP:TruSeq3-PE.fa:2:30:10, LEADING:20 TRAILING:20, SLIDINGWINDOW:4:15, and MINLEN:75 using Trimmomatic (version 0.36) (). The trimmed reads aligned to the assembled contigs using HISAT2 (version 2.0.4) (). They were provided to BRAKER2 gene annotation pipeline (version 2.0.3) () as training data to be used for ab initio prediction of protein-coding genes. In addition, PASA (version 2.3.3) () was used to generate transcript-based gene models by integrating de novo transcriptome assembly and genome-guided assembly using Trinity (version 2.5.0) (). The genome-guided assembly used the mapping result from HISAT2 with --dta option. TransDecoder (version 5.0.2) () was used to extract protein coding regions from PASA result with the alignment files from BlastP (version 2.7.1) () against UniRef90 with -evalue 1e-5 option and hmmscan (http://hmmer.org/, version 3.1b2) against Pfam () database. The gene models that overlapped with the results from BRAKER were removed using BlastP with evalue 1e-5 option, and the remaining gene models were merged with the BRAKER gene models to generate the final gene annotation. Transposable elements in the NIES-4239 genome were searched by RepeatMasker (version 4.9.0) using Dfam3.1 and RepBase-20170127 as reference repeat libraries (). The predicted gene set was available in Dryad (https://doi.org/10.5061/dryad.j3tx95xft). The integrity of gene annotation was assessed by BUSCO (version 3.0.2) () and the Eukaryota odb9 (version 2) dataset. The manipulation of SAM/BAM file was used by SAMtools (version 1.9). The sequence files of gene region from gff file were used by GffRead (version 0.9.11) (). Organellar genome annotation was performed by comparison with previously sequenced organellar genomes of nonphotosynthetic diatoms (). Gene sets and their arrangements of the plastid and mitochondrial genomes sequenced in this study were found to be identical to previously sequenced nonphotosynthetic diatoms (). Assembled genomes were deposited to DNA Data Bank of Japan (http://getentry.ddbj.nig.ac.jp/) under the accession numbers BLYE01000001 to BLYE01000234 for the nuclear genome, LC600866 for the mitochondrial genome, and LC600867 for the plastid genome.

Functional annotation

The predicted protein coding genes were annotated using InterProScan, and RPS-BLAST search was performed against KEGG orthology database (, ). KO identifiers for Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolic pathways were assigned using KEGG Automatic Annotation Server (). Transporter proteins were annotated with TransportTP () followed by manual curation. Reference proteome datasets for three photosynthetic diatom species were obtained from the JGI Genome Portal: P. tricornutum CCAP 1055/1 v2.0 (Phatr2_bd_unmapped_GeneModels_FilteredModels1_aa.fasta and Phatr2_chromosomes_geneModels_FilteredModels2_aa.fasta, 10,402 protein sequences in total), T. pseudonana CCMP 1335 (Thaps3_bd_unmapped_GeneModels_FilteredModels1_aa.fasta and Thaps3_chromosomes_geneModels_FilteredModels2_aa.fasta, 11,776 sequences), and F. cylindrus CCMP 1102 (Fracy1_GeneModels_FilteredModels3_aa.fasta, 21,066 sequences). KEGG and KOG annotation was performed with them in the same manner as NIES-4239. Other details for annotation of CAZyme, cyclins, cyclin-dependent kinases, bZIP transcription factors, photoreceptor proteins, mitochondrial proteins, plastid proteins, and secretome proteins are described in the Supplementary Materials. Evolutionary analyses, comparative transcriptome analyses under the 12-hour light and 12-hour dark condition, those in different carbon sources, and biochemical experiments for lipids, fatty acids, and quinones are also described in the Supplementary Materials. Transcriptome data obtained in this study were deposited to DNA Data Bank of Japan (https://ddbj.nig.ac.jp/resource/bioproject/PRJDB11016 and https://ddbj.nig.ac.jp/resource/bioproject/PRJDB12553).
  106 in total

1.  A rapid method of total lipid extraction and purification.

Authors:  E G BLIGH; W J DYER
Journal:  Can J Biochem Physiol       Date:  1959-08

2.  Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments.

Authors:  Gerard Talavera; Jose Castresana
Journal:  Syst Biol       Date:  2007-08       Impact factor: 15.683

3.  Using RepeatMasker to identify repetitive elements in genomic sequences.

Authors:  Maja Tarailo-Graovac; Nansheng Chen
Journal:  Curr Protoc Bioinformatics       Date:  2009-03

4.  Principles of plastid reductive evolution illuminated by nonphotosynthetic chrysophytes.

Authors:  Richard G Dorrell; Tomonori Azuma; Mami Nomura; Guillemette Audren de Kerdrel; Lucas Paoli; Shanshan Yang; Chris Bowler; Ken-Ichiro Ishii; Hideaki Miyashita; Gillian H Gile; Ryoma Kamikawa
Journal:  Proc Natl Acad Sci U S A       Date:  2019-03-14       Impact factor: 11.205

5.  Osmotrophy.

Authors:  Thomas A Richards; Nicholas J Talbot
Journal:  Curr Biol       Date:  2018-10-22       Impact factor: 10.834

6.  The Black Queen Hypothesis: evolution of dependencies through adaptive gene loss.

Authors:  J Jeffrey Morris; Richard E Lenski; Erik R Zinser
Journal:  MBio       Date:  2012-05-02       Impact factor: 7.867

7.  Correction: Transcriptional Orchestration of the Global Cellular Response of a Model Pennate Diatom to Diel Light Cycling under Iron Limitation.

Authors:  Sarah R Smith; Jeroen T F Gillard; Adam B Kustka; John P McCrow; Jonathan H Badger; Hong Zheng; Ashley M New; Chris L Dupont; Toshihiro Obata; Alisdair R Fernie; Andrew E Allen
Journal:  PLoS Genet       Date:  2017-03-29       Impact factor: 5.917

8.  Evolution and regulation of nitrogen flux through compartmentalized metabolic networks in a marine diatom.

Authors:  Sarah R Smith; Chris L Dupont; James K McCarthy; Jared T Broddrick; Miroslav Oborník; Aleš Horák; Zoltán Füssy; Jaromír Cihlář; Sabrina Kleessen; Hong Zheng; John P McCrow; Kim K Hixson; Wagner L Araújo; Adriano Nunes-Nesi; Alisdair Fernie; Zoran Nikoloski; Bernhard O Palsson; Andrew E Allen
Journal:  Nat Commun       Date:  2019-10-07       Impact factor: 14.919

9.  Large-scale gene losses underlie the genome evolution of parasitic plant Cuscuta australis.

Authors:  Guiling Sun; Yuxing Xu; Hui Liu; Ting Sun; Jingxiong Zhang; Christian Hettenhausen; Guojing Shen; Jinfeng Qi; Yan Qin; Jing Li; Lei Wang; Wei Chang; Zhenhua Guo; Ian T Baldwin; Jianqiang Wu
Journal:  Nat Commun       Date:  2018-07-11       Impact factor: 14.919

10.  The Pfam protein families database in 2019.

Authors:  Sara El-Gebali; Jaina Mistry; Alex Bateman; Sean R Eddy; Aurélien Luciani; Simon C Potter; Matloob Qureshi; Lorna J Richardson; Gustavo A Salazar; Alfredo Smart; Erik L L Sonnhammer; Layla Hirsh; Lisanna Paladin; Damiano Piovesan; Silvio C E Tosatto; Robert D Finn
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.