Matin Miryeganeh1, Ferdinand Marlétaz2, Daria Gavriouchkina3, Hidetoshi Saze1. 1. Plant Epigenetics Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, 904-0495, Japan. 2. Department of Genetics, Evolution and Environment (GEE), University College London, Darwin Building, Gower Street, London, WC1E 6BT, UK. 3. Molecular Genetics Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, 904-0495, Japan.
Abstract
Mangroves are adapted to harsh environments, such as high ultraviolet (UV) light, low nutrition, and fluctuating salinity in coastal zones. However, little is known about the transcriptomic and epigenomic basis of the resilience of mangroves due to limited available genome resources. We performed a de novo genome assembly and in natura epigenome analyses of the mangrove Bruguiera gymnorhiza, one of the dominant mangrove species. We also performed the first genome-guided transcriptome assembly for mangrove species. The 309 Mb of the genome is predicted to encode 34 403 genes and has a repeat content of 48%. Depending on its growing environment, the natural B. gymnorhiza population showed drastic morphological changes associated with expression changes in thousands of genes. Moreover, high-salinity environments induced genome-wide DNA hypermethylation of transposable elements (TEs) in the B. gymnorhiza. DNA hypermethylation was concurrent with the transcriptional regulation of chromatin modifier genes, suggesting robust epigenome regulation of TEs in the B. gymnorhiza genome under high-salinity environments. The genome and epigenome data in this study provide novel insights into the epigenome regulation of mangroves and a better understanding of the adaptation of plants to fluctuating, harsh natural environments.
Mangroves are adapted to harsh environments, such as high ultraviolet (UV) light, low nutrition, and fluctuating salinity in coastal zones. However, little is known about the transcriptomic and epigenomic basis of the resilience of mangroves due to limited available genome resources. We performed a de novo genome assembly and in natura epigenome analyses of the mangrove Bruguiera gymnorhiza, one of the dominant mangrove species. We also performed the first genome-guided transcriptome assembly for mangrove species. The 309 Mb of the genome is predicted to encode 34 403 genes and has a repeat content of 48%. Depending on its growing environment, the natural B. gymnorhiza population showed drastic morphological changes associated with expression changes in thousands of genes. Moreover, high-salinity environments induced genome-wide DNA hypermethylation of transposable elements (TEs) in the B. gymnorhiza. DNA hypermethylation was concurrent with the transcriptional regulation of chromatin modifier genes, suggesting robust epigenome regulation of TEs in the B. gymnorhiza genome under high-salinity environments. The genome and epigenome data in this study provide novel insights into the epigenome regulation of mangroves and a better understanding of the adaptation of plants to fluctuating, harsh natural environments.
Mangrove trees have evolved special characteristics, such as vivipary and aerial roots, to adapt to harsh intertidal environments and are resilient to many environmental stresses, including salinity, intense ultraviolet (UV) radiation, and anaerobic soils (Tomlinson, 1986). Their high adaptability makes them a valuable model for understanding the molecular mechanisms underlying stress tolerance and adaptation in plants (Dassanayake et al., 2010; Xu et al., 2017). Previous studies have identified various genes and metabolic pathways in stress responses that are implicated in the adaptive evolution of mangroves to the extreme environments (Miyama & Hanagata, 2007; Dassanayake et al., 2009; Chen et al., 2011; Yang et al., 2015; Guo et al., 2017; Feng et al., 2020; Meera & Augustine, 2020). However, mangrove trees often show drastic changes in phenotypes, such as biomass and tree height, in response to a spectrum of stress factors, including high salinity, strong wind, and poor nutrient conditions (Suwa et al., 2009), suggesting the involvement of epigenome regulation in their phenotypic plasticity in natural environments.Among various epigenome modifications, DNA cytosine methylation has been intensively studied in plants, which primarily targets and silences repeats and transposable elements (TEs) in plant genomes (Slotkin & Martienssen, 2007). DNA methylation is regulated by multiple chromatin modifications (Law & Jacobsen, 2010; Matzke & Mosher, 2014). Small interfering RNAs (siRNAs) generated by RNA interference (RNAi) machinery and de novo methylase DOMAIN REARRAENGED METHYLTRANSFERASEs (DRMs) are involved in the RNA‐directed DNA methylation (RdDM) pathway that establishes DNA cytosine methylation at CG and non‐CG (CHG and CHH contexts; H can be A, or C, or T) sites. However, histone H3 Lysine9 methylation (H3K9me) mediated by histone methyltransferases recruits CHROMOMETHYLASEs (CMTs) to introduce non‐CG methylation at TEs. In contrast, active DNA demethylation is mediated by DNA glycosylases that remove TE methylation, especially in the reproductive tissues of plants (Zhang et al., 2018). DNA methylation changes are often induced in response to biotic and abiotic stresses, which are known to modulate gene expression and phenotypic variations (Dowen et al., 2012; Sahu et al., 2013; Soubry, 2015; Lamke & Baurle, 2017), potentially leading to stress resistance and adaptation in plants (Wibowo et al., 2016; Erdmann & Picard, 2020).Despite the potential importance of epigenome regulation in the environmental adaptations of mangroves, there has only been a few epigenome studies on mangroves (Lira‐Medeiros et al., 2010; Wang et al., 2018), mainly due to the limited genome information on diverse mangrove species (Xu et al., 2017; Hu et al., 2020). To investigate the role of epigenome regulation in stress responses and adaptations of these trees, especially in natural environments, we assembled the genome and analyzed the transcriptome and epigenome of natural populations of Bruguiera gymnorhiza (L.) Lamk., which is one of the most widely distributed mangrove trees from the most mangrove‐rich family Rhizophoraceae (Duke & Ge, 2011; Christenhusz & Byng, 2016). Same as all the species belonging to the family Rhizophoracee, B. gymnorhiza is diploid and possesses 36 chromosomes (2n = 36). The flowers are bisexual, self‐compatible, and self/cross‐pollinating (Raju Aluri, 2014). The de novo assembled B. gymnorhiza genome is approximately 309 Mb in size, encoding 34 403 genes and consisting of 48% of repetitive sequences. Intriguingly, the plants growing under high‐salinity conditions showed DNA hypermethylation in TEs in both natural and controlled experiments, suggesting suppression of TE activation under the stress conditions. Our study provides novel insights into the epigenome regulation of stress responses of mangroves that are highly resilient to extreme natural environments.
Materials and Methods
De novo genome assembly
Plant material, and genomic DNA extraction method is given in the Supporting Information Notes S1.
Estimation of the genome size
Flow cytometry
Leaves (20–40 mg) were chopped in 1–2 ml ice‐cold nuclear isolation buffer (45 mM MgCl2, 20 mM MOPS, 30 mM sodium citrate, 0.01% (v/v) Triton X‐100, 5 mM beta mercaptoethanol/1 ml), with a disposable scalpel. The homogenate was mixed by pipetting several times, and the debris was filtered into a sample tube through a 42 μm nylon mesh. Finally, propidium iodide (DNA fluorochrome; Thermo Fisher Scientific, Waltham, MA, USA) (50 μg ml−1) and RNase A (Thermo Fisher Scientific) were added. Stained nuclei were analyzed using a FACSCalibur flow cytometer. Values of nuclear DNA were estimated by comparing the nuclear peak on the linear scale with the peak for Arabidopsis thaliana and Solanum lycopersicum included as internal standards (Fig. S1a).
k‐mer analysis
A total of 72 Gb of reads obtained from the PE Illumina Hi‐seq platforms were subjected to a 27‐mer frequency distribution analysis with jellyfish v.2.2.3 (Marcais & Kingsford, 2011). The analysis parameters were set as ‐t 8 ‐C ‐m 27 ‐s 5G, and the final result was plotted as a frequency curve (Fig. S1b). One peak was observed from the distribution curve, which provided a peak depth of 51 for the estimation of its genome size. Since the total number of k‐mers was 18 636 738 543, the genome size was calculated to be approximately 373 Mb using the formula: genome size = k‐mer number/peak depth. This indicated that the sequenced short Illumina reads provided c. ×51 coverage. The genome size and homogenity was then investigated further by running GenomeScope 2.0 (Ranallo‐Benavidez et al., 2020). The result supports a diploid genome with a proportion of heterozygosity carried by paralogs (i.e. non‐diploid k‐mer pairs) of 0.37 (Fig. S1b).
Whole‐genome sequencing
Genome sequencing was performed using a combination of PacBio RS II and Illumina sequencers.Details about Illumina sequencing is given as Notes S1.
PacBio sequencing
For PacBio sequencing, high‐molecular weight DNA at a concentration of 70 ng µl−1 and a sample volume of 350 µl were sent to the SQC‐OIST for sequencing. Library preparation for PacBio sequencing was performed with high‐quality genomic DNA, and sequencing was performed in four single molecule real‐time (SMRT) cells using P6‐C4 chemistry (20 kb protocol), resulting in 7651 821 reads, with an N50 read length of 8393 bp and a mean read length of 6524 bp. In total, approximately 50 Gb reads were generated on the PacBio platforms, representing c. ×120 coverage of the genome. Altogether, the three libraries provided 84 Gb of trimmed and filtered sequencing data, which translated to c. ×240 coverage of the genome (Tables S1, S2).
Whole‐genome assembly
For de novo assembly of the reads, with deep coverage from each sequencing platform, we initially used three strategies: (1) Illumina reads alone; (2) PacBio reads alone; and (3) combination of Illumina and PacBio reads. Thereafter, we compared the genome statistics for each strategy (Table S3) and chose the combination of Illumina and PacBio for assembly as the best strategy, as outlined later.Assembly of PE and MP Illumina data is given as Notes S1
Assembly of PacBio single‐molecule long reads
Additionally, another assembly was built using only PacBio long reads with the program canu‐1.7.1 (Koren et al., 2018), resulting in 4008 contigs with an N50 length of 1.3 Mb. Genome polishing was performed using the Arrow algorithm of the variantCaller tool within the GenomicConsensus package v.2.3.2 (PacificBiosciences, Menlo Park, CA, USA) using default parameters, to improve site‐specific consensus accuracy.
Final assembly using both Illumina and PacBio platform data
Finally, in an effort to construct the best assembly, all MP and PE Illumina reads were used to scaffold the resulting contigs from PacBio, assembly only by SSPACE/sspace_basic‐2.1.1 (Boetzer et al., 2011), yielding 2282 scaffolds and an N50 size of 1.5 Mb. Finally, the size of the scaffold N50 was extended to 2.3 Mb, and the number of scaffolds decreased to 1075 using SOAPdenovo Gapcloser (Table S3). Using this assembly pipeline, we obtained a final draft genome of 309 Mb. The quality of the genome assemblies was assessed using Quast (Gurevich et al., 2013). The final assembly results showed that contig N50 was significantly improved, from 470 16 kb in the Illumina to 2.3 Mb in the final assembly (Table S3).
Evaluation of the assembled genome
Validation of assembled genome was assessed using two different approaches:(1) Quality trimmed Illumina HiSeq PE reads were aligned to our assembly using BWAmem (Li & Durbin, 2009) and the mapping efficiency was checked using Alfred (Thankachan et al., 2016). Moreover, single nucleotide polymorphisms (SNPs) and small indels were called by Freebayes (Garrison & Marth, 2012). The matching rate was > 99% (Table S4) and a unimodal coverage dataset was observed which corroborate the homogeneity of the assembly. The variant call formatted (VCF) SNP file was then analyzed using BCFtools (https://github.com/samtools/bcftools) to select SNPs in homozygous positions with the minimum quality score of 20 and the coverage cut‐off of 20. This identified only 22 984 SNPs, which means only 0.007% of the positions in the genome were affected and gave us confidence that no further polishing was necessary. (2) We compared the genome assembly against a set of Benchmarking Universal Single‐Copy Orthologs (BUSCO) (Simao et al., 2015) to test the completeness of the genome assembly and gene space using the plant‐specific profile. This approach makes use of the single‐copy genes expected to be present in plants (1440 total BUSCO groups were searched). Consequently, 94.2% (1354) of the gene set was ‘completely’ retrieved (Table S5). To assess the completeness level of gene annotations, we also compared the predicted proteins of the assembled genome against BUSCO in protein mode with the eudicots_odb10 data set. The result showed that 93.5% (2133/2326) of the potential proteins were ‘completely’ retrieved which shows the good quality of the annotation of the genome (Table S5). Only 3.8% missing genes shows that the annotation has captured almost all the coding genes in B. gymnorhiza genome (Table S5).
Genome annotation
RepeatModeler v.2.1 (Flynn et al., 2020) was used with default parameters for de novo repeat identification in the assembled genome (Table S6); the constructed raw repeat library was used as the input for RepeatMasker (v.4.1.0, https://www.repeatmasker.org) to classify the types of repetitive sequences and repeat sequences were masked throughout the genome before further analysis. The LTR‐Finder software (Xu & Wang, 2007) was also used to specifically search for full‐length LTR retrotransposons in the genome. The results were then compared to the RepeatModeler output (Table S7), and as there were no new LTR found by the LTR‐Finder, we considered the repeatmodeler output a good source for our reference in this study.We used the combination of natural and control samples (all 25 libraries) for transcriptome assembly. Reads from 25 PE RNA‐sequencing (RNA‐Seq) were mapped on the repeat masked genome using Star (Dobin et al., 2013), and the mapping results were then assembled into potential transcripts using StringTie v.1.3.6 (https://ccb.jhu.edu/software/stringtie/). The assembly of clean reads produced the total of 131 108 transcripts with an average length of 1400 bp and N50 of 2150 bp (Table S3). The assembled transcripts from each sample were then merged using Taco (Niknafs et al., 2017), and the resulting transcriptome was subjected to prediction of coding DNA sequences (CDSs) using TransDecoder‐ v.5.5.0 (http://transdecoder.sf.net) built in scripts (‘gtf_genome_to_cdna_fasta.pl’ and ‘gtf_to_alignment_gff3.pl’). We then used Portcullis‐v.1.1.2 (Mapleson et al., 2018) and ‘bam2hints’ built in script from Augustus‐v.3.3 (http://bioinf.uni‐greifswald.de/augustus) on mapped RNA bam files and the genome to get intron hits. Additionally, we used an in‐house Python script on the resulting transcriptome from Taco to obtain exon hints (Notes S2). The intron and exon hints were then used to train Augustus. Coding sequences were obtained using Cufflinks ‘gffread’ utility (http://cole‐trapnell‐lab.github.io/cufflinks/file_formats/#the‐gffread‐utility) and were aligned against the National Center for Biotechnology Information (NCBI) nonredundant protein databases using Blast (Wheeler et al., 2007), with an e‐value threshold of 1 × 10−6 to annotate the functions of genes. Finally, the genome was found to contain 34 403 genes, with 10 140 identified as being similar to known genes (Table S8).Gene ontology (GO) annotation was obtained by aligning against the UNIPROT database using David 6.8 (https://david.ncifcrf.gov/home.jsp), where the protein sequences were also searched against the KEGG (Kyoto Encyclopedia of Genes and Genomes) database for KEGG orthology (KO) assignments and pathway annotation. The output data of all GO annotations were thoroughly examined to identify those involved in stress, growth, and epigenetics. The B. gymnorhiza chloroplast genome was assembled using GetOrganelle v.1.7.1 (Jin et al., 2020) and visualized by GeSeq (Tillich et al., 2017). Graphs for GO biological process, cellular component, and molecular function were plotted using Hayai‐Annotation Plants (Ghelfi et al., 2019). Transfer RNAs (tRNAs) were predicted by tRNAscan‐SE 2.0.6 (Chan & Lowe, 2019) with default settings. Ribosomal RNA (rRNA), microRNA (miRNA), and small nucleolar RNA (snoRNA) were predicted by Infernal 1.1.2 (Nawrocki & Eddy, 2013) against the Rfam.cm database. The B. gymnorhiza 14 842 EST dataset (Miyama et al., 2006) (DDBJ accession numbers BP938635 to BP953476) was used for homology search using Blastn with default settings by CLC Genomics Workbench 20.0.4 (Qiagen). The genome data of Kandelia obovata (Hu et al., 2020) and Rhizophora apiculata (Xu et al., 2017) were downloaded from the databases under accession numbers GWHACBH00000000 and PRJEB8423, respectively. A phylogenetic tree was generated by MegaX (Kumar et al., 2018) using CDSs of the plastid genomes of plant species. DNA sequences were aligned using Muscle with default settings, and a phylogenetic tree was constructed using the maximum likelihood method with 1000 bootstrap reiteration, ‐nucleotide, and Kimura 2‐parameter model. Synteny blocks of Mangrove genomes (scaffolds/contigs longer than 5 Mb) were identified using SyMap v.5.0.6 (Soderlund et al., 2006) with default settings (Table S9), and a synteny map was plotted in RIdeogram (Hao et al., 2020). Syntenic regions of the BURP gene cluster between B. gymnorhiza and K. obovata genomes were searched by generating genome files in the GenBank format using the seqret function in Emboss (Rice et al., 2000), and visualized using Easyfig 2.2.2 (Sullivan et al., 2011) with the following settings: Blastn; minimum length = 300; maximum e‐value = 1e−5; minimum identity value = 80.
Plant material for natural and control experiments
Samples for ‘natural’ experiments were obtained from the same mangrove forest where the individual for genome assembly was collected from. In the natural study site along the Okukubi River, Okinawa Island, Japan (26°27′N, 127°56′E), the riverside trees were located along the estuary of the river, and the oceanside trees grew along the coastal area of the Japanese Pacific Ocean. Propagules of the individual B. gymnorhiza tree in the natural environment that was used for the whole genome assembly, were all collected at the same day, and were used for all the individual plants in control experiments. The control experiment was performed in the laboratory, where samples grown in saline water were compared to samples grown in brackish water (hereafter referred to as saline and brackish, respectively) in the laboratory inside a growth chamber. We planted B. gymnorhiza seedlings in 25 cm diameter plastic pots and placed them in stainless steel tanks filled with brackish (1.5% salinity) or saline (4% salinity) water to simulate the ‘riverside’ and ‘oceanside’ natural environments.
Whole‐genome bisulfite sequencing and DNA methylation analysis
Leaves from two biological individual replicates from each condition (two of each oceanside and riverside for natural experiment and two of each saline and brackish for the control experiment; in total, eight samples) were collected, frozen in liquid nitrogen, and stored at −80°C for DNA extraction. Intact genomic DNA extracted from leaves (using the same extraction method as described earlier) was used for whole‐genome bisulfite sequencing (WGBS) with the post‐bisulfite adapter tagging (PBAT) method (Miura et al., 2012) on the Illumina Hi‐Seq 2500 platform (Illumina, San Diego, CA, USA) at OIST‐SQC. After removing low‐quality reads using Trimmomatic‐0.36 (Bolger et al., 2014), the quality‐trimmed reads were aligned to the de novo built genome using Bismark v.0.19.0 (Krueger & Andrews, 2011) with default parameters. Only uniquely mapped reads were retained, and only cytosines covered by at least three reads were retained and the two replicates were then intersected using the bedtools intersect option (BEDTools v.2.27.1, https://github.com/arq5x/bedtools2) for further analysis. After removing duplicate reads, the mapped reads were merged to calculate the counts of each potentially methylated cytosine. The conversion rate (rate at which unmethylated cytosines were converted to uracil by bisulfite) was calculated and cytosines were termed methylated (false discovery rate, FDR < 0.01) using a binomial test employing the conversion rate at the scaffold420 with high homology to chloroplast genomes in other species, followed by Benjamini–Hochberg (Benjamini & Hochberg, 1995) multiple testing correction.Differentially methylated regions (DMRs) in CG, CHG, and CHH contexts were detected using CGmaptools (Guo et al., 2018) (cgmaptools intersect) and (cgmaptools dmr) with the following parameters: ‐c 3 ‐s 200 ‐S 4000. Detected DMRs (FDR < 0.01) were further filtered by ≥ |0.4| change in CG, ≥ |0.2| change in CHG, and ≥ |0.1| change in CHH contexts. For TE analysis, the same TE family within 50 bp was merged, and copies less than 50 bp in length were excluded from the analysis. DNA methylation distribution plots were performed with deepTools (Ramirez et al., 2016). Venn diagrams were drawn using the eulerr web tool (http://eulerr.co). Heatmaps were generated using pheatmap1.0.12 (https://cran.r‐project.org/web/packages/pheatmap/) with default parameters. The Circos plot was plotted using the Clico FS (Cheong et al., 2015). Boxplots and violin plots were plotted using gpubr (https://CRAN.R‐project.org/package=ggpubr).Genome features associated with DMRs were identified by bedtools intersect with ‐f 0.5 parameter in the following order: TE > Gene > Intergenic. DMRs associated with genes and flanking regions were extracted by bedtools intersect the ‐f 0.9 setting in the following order: Genebody > 0–1 kb upstream of Genebody > 0–1 kb downstream of Genebody > 1–3 kb upstream of Genebody > 1–3 kb downstream of Genebody. Genes overlapping > 80% with TE annotation were excluded from the analysis.Details about RNA extraction and RNA‐Seq is given at Notes S1.
Transcription analyses
The quality treamed RNA‐Seq reads were aligned to the draft genome using Star v.2.7 (Dobin et al., 2013) with a 1‐bp mismatch (Table S10), and counts of reads uniquely mapped to annotated genes were obtained using featurecounts (v.1.6.3) (Liao et al., 2014). Briefly, the measurement of gene expression level for each transcript was performed using the fragments per kilo bases or per million bases of transcripts, using the mapped transcripts per million (TPM) (Gongora‐Castillo et al., 2012). Differential expression analysis was performed using DESeq2 in R software (v.3.6.1) (Love et al., 2014). FDR values ≤ 0.05 and |log2 FC| ≥ 1 were considered the threshold of differential expression (Zhao et al., 2018). We used the David Gene Ontology (https://david.ncifcrf.gov/home.jsp) for functional analysis of expressed homologous gene pairs to determine overrepresented GO categories across biological processes, cellular components, and molecular function domains. Enrichment of GO terms were tested using Fisher’s exact test, with P < 0.05 considered as significant. Statistical analyses and visualizations were performed using R. Venn diagrams were drawn using the Eulerr web tool (http://eulerr.co), and heatmaps with clustering were generated using pheatmap1.0.12, with the default parameters (https://cran.r‐project.org/web/packages/pheatmap/). Volcano plots were generated using EnhancedVolcano (https://github.com/kevinblighe/EnhancedVolcano). RNA‐Seq data were converted to bigwig files using deepTools (Ramirez et al., 2016) bamCoverage function with options; bs = 1, ‐normalizeUsing CPM. Data tracks were visualized using the Integrated Genome Browser (IGB v.9.1.6) (Freese et al., 2016). Boxplots and violin plots were plotted using ggpubr as described earlier.
Results
De novo genome assembly of the mangrove B. gymnorhiza
To obtain a reference for the genome, transcriptome, and epigenome analyses of mangrove species, we performed a de novo genome assembly of the genome of B. gymnorhiza (Fig. 1a), one of the dominant mangrove species with a wide distribution range in the Indo‐West Pacific region (Duke & Ge, 2011). Flow cytometry analyses of the nuclei and k‐mer spectrum (k = 27) analysis of the Illumina short‐read sequences indicated that the genome size of B. gymnorhiza was estimated to be c. 365 Mb (Fig. S1a,b). Combining PacBio long reads (total 50 Gb), Illumina paired‐end (PE) reads (56 Gb), and Illumina mate‐pair reads (44 Gb) (Tables S1–S3), we obtained a final assembly containing 1075 scaffolds with a total size of 309 Mb genome (Table S3). The smaller size of the assembled genome compared to the predicted size from k‐mer and flow cytometry analyses could be either because of collapsing similar regions in scaffolds or because of the differences between Illumina and PacBio datasets in handling the repeat regions, as Illumina reads were used for k‐mer analysis. The longest scaffold was 14.1 Mb, the N50 of contigs was 2.3 Mb, and the 31 and 104 largest scaffolds covered 50% and 75% of the genome, respectively (Table S3). Gene model prediction and annotation based on a repeat‐masked genome yielded a total of 34 403 protein‐coding genes, of which 10 138 genes showed homology to genes in other plant species (Tables S8, S11; Figs S1c, S2). In addition, tRNA genes (Table S12 and other small and non‐coding RNA genes, including miRNAs, snoRNAs and rRNA genes were annotated (Table S13). To further assess the completeness of the assembly, we investigated the recovery of the 1440 plant‐conserved gene set of BUSCO, which showed that 94.0% (1354) of the gene set was ‘completely’ retrieved, and 1.7% (24) was ‘partially’ retrieved, while 4.3% (62) was ‘missing’ (Table S5). In addition, 13 085 out of 14 842 previously reported expressed sequence tags (ESTs) for B. gymnorhiza (Miyama et al., 2006) matched to the 5138 out of the 34 403 annotated gene models in this study (Blastn, e‐value < 1e−5, Table S8), indicating the identification of thousands of novel genes, in addition to the reported B. gymnorhiza ESTs. The assembled genome contained 48.66% repetitive sequences, with 25.46% of the total assembly consisting of common TE families and 23.20% of unclassified repeats (Table S14). Among the TEs, LTR‐type retrotransposon families, including Gypsy (16.4%), Copia (15.4%), and Caulimovirus (5.1%) were the most abundant in the genome (Fig. S1d; Tables S14, S15. The B. gymnorhiza genome is larger than the recently reported genome sequences of other mangrove species from the same family Rhizophoraceae, R. apiculata (274 Mb) (Xu et al., 2017) and K. obovata (178 Mb) (Hu et al., 2020). This might be due to the higher accumulation of repetitive elements in the B. gymnorhiza genome. The genome sequence comparisons between B. gymnorhiza and R. apiculata, and B. gymnorhiza and K. obovata identified conserved synteny blocks between the genomes (Fig. 1b; Table S9). However, the syntenic genomic regions in the B. gymnorhiza genome were widely scattered to different scaffolds in the other two species, suggesting extensive rearrangement of mangrove genomes. This is interesting and needs to be investigated further as it might be something specific to the Rhizophoraceae family as they have diverged from their most recent common ancestor only around 40.7 Myr ago (Xu et al., 2017). Since the B. gymnorhiza genome in this study is still at a draft state, a pseudochromosome level assembly in the subsequent study by Hi‐C (Belton et al., 2012) that can span very distant DNA regions (Peona et al., 2021) would provide us a clearer picture of genome divergence among the mangrove species. Nonetheless, a nearly two‐fold genome‐size difference between B. gymnorhiza (c. 365 Mb) and K. ovobata (c. 170 Mb) (Hu et al., 2020) suggests that there might have been extensive genome rearrangements and expansions/contractions after the divergence of the species.
Fig. 1
(a) A cladogram of plant species including Bruguiera gymnorhiza. (b) Synteny blocks between B. gymnorhiza and Kandelia obovata (yellow ribbons), and B. gymnorhiza and Rhizophora apiculata (light blue ribbons) genomes. Scaffolds/contigs longer than 5 Mb were used for the analysis. Scaffold/contig numbers are indicated.
(a) A cladogram of plant species including Bruguiera gymnorhiza. (b) Synteny blocks between B. gymnorhiza and Kandelia obovata (yellow ribbons), and B. gymnorhiza and Rhizophora apiculata (light blue ribbons) genomes. Scaffolds/contigs longer than 5 Mb were used for the analysis. Scaffold/contig numbers are indicated.
Differential transcriptome profiles of B. gymnorhiza in natural and controlled environments
To investigate the genome‐wide gene expression dynamics of B. gymnorhiza in natural environments, we performed RNA‐Seq analysis using B. gymnorhiza samples obtained in natura. The experiment was conducted using leaf samples from natural B. gymnorhiza populations in Okinawa, Japan (Fig. 2a), where trees were growing in a wide range of salinity conditions, from the oceanside with high salinity to the upper riverside with brackish environments (saline and brackish, hereafter). Trees in the oceanside (saline) generally showed a dwarf phenotype with smaller leaves, while trees growing c. 1 km away on the riverside (brackish) were taller, with thicker trunks and larger leaves (Fig. 2b,c), indicating drastic alterations in the morphological phenotypes of B. gymnorhiza depending on their growing environments. To corroborate the in natura transcriptome analysis, we further conducted RNA‐Seq for B. gymnorhiza plants grown in a controlled environment in a laboratory growth chamber, where plants were grown under 1.5% or 4% salinity, to simulate the riverside ‘brackish’ or the oceanside ‘saline’ natural environments. In total, 25 samples (six biological replicates from the oceanside (saline), five from the riverside (brackish) for natural samples, eight biological replicates from saline, six from brackish conditions for control samples, detailed in Table S16) were sequenced and analyzed. Importantly, the whole‐genome transcriptome profiles of both natural and control saline samples, or natural and control brackish samples, showed high similarities (Figs 2d, S3a), suggesting that the salinity difference in the control conditions largely mimicked the natural conditions that resulted in the transcriptome difference. We identified 5467 upregulated (> two‐fold, P‐adjusted < 0.05, Table S17) and 2242 downregulated genes (< 0.5‐fold, P‐adjusted < 0.05, Table S18) in the natural saline environment compared to the brackish environment (Fig. 2e), including previously reported salt stress‐responsive genes of B. gymnorhiza such as Bg70 (Miyama & Hanagata, 2007) and B. gymnorhiza BURP Domain Containing protein3 (BgBDC3) (Banzai et al., 2002b) (Fig. 2f). This suggests that salinity is an environmental factor that causes transcriptome variations in the natural environments. GO analysis of upregulated genes showed that genes involved in ‘response to abscisic acid,’ and ‘response to water deprivation,’ which are known to be related to salt‐stress responses in plants (Hasegawa et al., 2000; van Zelm et al., 2020), were enriched in the upregulated genes in the saline environment (Fig. 2g; Tables S19–S22). On the contrary, many genes involved in chloroplast regulation were downregulated (Table S20), which may correlate with the decline in photosynthesis rate observed in the mangroves under salt and osmotic stresses (Miyama & Tada, 2008). In contrast, upregulated or downregulated genes in the control conditions only partially overlapped with those in the counterpart natural samples (Fig. S3b,c; Tables S23–S27). This may be due to natural or control condition‐specific factors affecting gene expression profiles. Indeed, GO terms, ‘response to chitin,’ ‘response to wounding,’ ‘response to salicylic acid,’ and the KEGG pathway term ‘Plant–pathogen interaction’ were specifically enriched in the upregulated genes in the natural saline condition (Fig. 2g; Tables S19, S21), suggesting that pathogen challenge may also be one of the major environmental factors affecting transcriptome dynamics of B. gymnorhiza trees in their natural environment.
Fig. 2
(a) The location of the study site. Maps of Japan, the Okinawa main island, and the location of Bruguiera gymnorhiza populations: river side (brackish) and oceanside (saline), along the Okukubi River (26°27′N, 127°56′E). (b, c) Representative B. gymnorhiza trees in the river side population (b), and the oceanside population (c). Bars represent 100 cm. (d) A heat map showing the expression levels of B. gymnorhiza genes (n = 25 250) clustered by transcripts per million (TPM) of each gene (rows) and replicates (columns). The number of replicates for RNA‐sequencing (RNA‐Seq) in each condition is shown at the bottom of the heatmap. (e) Volcano plot showing differential gene expression in the natural environment. Values of log2 fold change (saline/brackish) and adjusted P‐value for genes were obtained by RNA‐Seq data using DEseq2 (Supporting Information Tables S17, S18). Vertical dashed lines indicate thresholds of ± 1 (0.5‐ and 2‐fold changes), and the horizonal dashed line indicates a threshold of five (adjusted P = 1e−5). (f) Browser view of representative B. gymnorhiza gene loci showing differential gene expression in both natural and control conditions. Numbers in brackets indicate scales of tracks in count per million (CPM). Three representative RNA‐Seq replicates are shown in each condition. (g) Results of gene ontology enrichment analysis of genes upregulated (top) and downregulated (bottom) under saline condition in the natural environment.
(a) The location of the study site. Maps of Japan, the Okinawa main island, and the location of Bruguiera gymnorhiza populations: river side (brackish) and oceanside (saline), along the Okukubi River (26°27′N, 127°56′E). (b, c) Representative B. gymnorhiza trees in the river side population (b), and the oceanside population (c). Bars represent 100 cm. (d) A heat map showing the expression levels of B. gymnorhiza genes (n = 25 250) clustered by transcripts per million (TPM) of each gene (rows) and replicates (columns). The number of replicates for RNA‐sequencing (RNA‐Seq) in each condition is shown at the bottom of the heatmap. (e) Volcano plot showing differential gene expression in the natural environment. Values of log2 fold change (saline/brackish) and adjusted P‐value for genes were obtained by RNA‐Seq data using DEseq2 (Supporting Information Tables S17, S18). Vertical dashed lines indicate thresholds of ± 1 (0.5‐ and 2‐fold changes), and the horizonal dashed line indicates a threshold of five (adjusted P = 1e−5). (f) Browser view of representative B. gymnorhiza gene loci showing differential gene expression in both natural and control conditions. Numbers in brackets indicate scales of tracks in count per million (CPM). Three representative RNA‐Seq replicates are shown in each condition. (g) Results of gene ontology enrichment analysis of genes upregulated (top) and downregulated (bottom) under saline condition in the natural environment.
DNA hypermethylation in the B. gymnorhiza genome under high salinity environments
To investigate the whole‐genome cytosine methylation of B. gymnorhiza in different natural environments, we performed WGBS, using both natural and control samples. Two biological individual replicates from each condition (two of each oceanside and riverside for natural experiment and two of each saline and brackish for the control experiment; in total, eight samples) were used (Fig. S4a; Tables S28, S29).To investigate the DNA methylation patterns in different genomic regions, we analyzed the methylation level in the gene bodies (from transcription start site, TSS, to transcription termination site, TTS) as well as flanking regions of 1 kb upstream from TSS and 1 kb downstream from TTS (Figs 3c,d, S4b,c). The methylation level was highest in the CG context, followed by CHG and then CHH in each gene region (Fig. S4a), which is consistent with the methylome studies in other species (Al‐Harrasi et al., 2018; Jiang et al., 2019; Zhang et al., 2019). In the natural saline environment, DNA cytosine methylation level was elevated, especially in the CHG and CHH contexts, which mainly occurred in TE sequences (Fig. 3a,b). The non‐CG hypermethylation in TEs was recapitulated in the samples grown under saline condition in control experiments (Fig. S4b,c), demonstrating that salinity might be one of the primary factors inducing the hypermethylation in TEs in the B. gymnorhiza genome. The observed non‐CG hypermethylation was most prominent in TE families with higher copy numbers, including LTR‐Copia and LTR‐Gypsy families (Figs 3c, S1d, S4c). To further study the differential methylation among different groups, we also identified the DMRs. The number of DMRs on each scaffold and the length distribution of DMRs are listed in Tables S30–S35. DMRs were identified by comparing the average methylation levels within a 100 bp window between saline and brackish, and those with statistical significance (FDR < 0.01) were used in the analysis. We detected 25 791 DMRs between oceanside and riverside natural samples and 21 918 DMRs between saline and brackish control samples. Many of the DMRs with hyper‐CHH methylations were identified in the genomes of natural and control saline samples, which partially overlapped (Fig. 3d–f; Tables S30–S35), implying that other natural environmental factors besides salinity could affect the methylation variations among the samples from different environments.
Fig. 3
(a) A Circos plot showing cytosine methylation levels of the 10 longest scaffolds (> 5 Mb) of Bruguiera gymnorhiza. The outermost circle shows the 10 scaffolds. The next six circles represent the DNA methylation levels in brackish (B) and saline (S) conditions for CG (0 to 1), CHG (0 to 1), and CHH (0 to 0.2) contexts (H can be A, or C, or T) in the natural environment. The first and second innermost circles show the density of genes (black) and transposable elements (TEs; orange), respectively. (b) (Left) Metaplots of cytosine methylation levels in CG, CHG, and CHH contexts for genes and TEs in brackish and saline conditions in the natural environment. (Right) Violin plots showing levels in CG, CHG, and CHH contexts for genes and TEs in brackish and saline conditions in the natural environment. Genes and TEs with ≥ 5 Cs were used for the analysis. Genes: mCG (brackish, n = 31 119; saline, n = 31 293); mCHG (brackish, n = 32 522; saline, n = 32 648); mCHH (brackish, n = 33 120; saline, n = 33 155). TEs: mCG (brackish, n = 43 020; saline, n = 44 446); mCHG (brackish, n = 58 549; saline, n = 60 430); mCHH (brackish, n = 122 428; saline, n = 124 160). Dots indicate median values. P‐values by Brunner–Munzel test are indicated. (c) Metaplots of cytosine methylation levels in CG, CHG, and CHH contexts for TE families in brackish and saline conditions in the natural environment. (d) The number of differentially methylated regions (DMRs) in CG, CHG and CHH contexts in the control (left) and natural (right) experiments. (e) Venn diagrams indicating number of DMRs and overlap between DMRs of control and natural samples in CG, CHG and CHH contexts. (f) Heat maps of cytosine methylation levels in DMRs in natural samples and their corresponding regions in control samples, clustered by methylation levels (rows). Two replicates of BS‐Seq data in indicated conditions are shown.
(a) A Circos plot showing cytosine methylation levels of the 10 longest scaffolds (> 5 Mb) of Bruguiera gymnorhiza. The outermost circle shows the 10 scaffolds. The next six circles represent the DNA methylation levels in brackish (B) and saline (S) conditions for CG (0 to 1), CHG (0 to 1), and CHH (0 to 0.2) contexts (H can be A, or C, or T) in the natural environment. The first and second innermost circles show the density of genes (black) and transposable elements (TEs; orange), respectively. (b) (Left) Metaplots of cytosine methylation levels in CG, CHG, and CHH contexts for genes and TEs in brackish and saline conditions in the natural environment. (Right) Violin plots showing levels in CG, CHG, and CHH contexts for genes and TEs in brackish and saline conditions in the natural environment. Genes and TEs with ≥ 5 Cs were used for the analysis. Genes: mCG (brackish, n = 31 119; saline, n = 31 293); mCHG (brackish, n = 32 522; saline, n = 32 648); mCHH (brackish, n = 33 120; saline, n = 33 155). TEs: mCG (brackish, n = 43 020; saline, n = 44 446); mCHG (brackish, n = 58 549; saline, n = 60 430); mCHH (brackish, n = 122 428; saline, n = 124 160). Dots indicate median values. P‐values by Brunner–Munzel test are indicated. (c) Metaplots of cytosine methylation levels in CG, CHG, and CHH contexts for TE families in brackish and saline conditions in the natural environment. (d) The number of differentially methylated regions (DMRs) in CG, CHG and CHH contexts in the control (left) and natural (right) experiments. (e) Venn diagrams indicating number of DMRs and overlap between DMRs of control and natural samples in CG, CHG and CHH contexts. (f) Heat maps of cytosine methylation levels in DMRs in natural samples and their corresponding regions in control samples, clustered by methylation levels (rows). Two replicates of BS‐Seq data in indicated conditions are shown.The hyper‐CHG and ‐CHH DMRs were mainly associated with TE sequences, while the majority of hyper‐ and hypo‐CG DMRs were associated with genic regions in both natural and control samples (Figs 4a,b, S5a,b). We further examined the impact of changes in DNA methylation on the transcription of genes associated with DMRs. Genes associated with gene‐body CG DMRs did not show significant differences between the hyper‐ and hypo‐methylated gene groups (Figs 4c, S5c; Table S36). Similarly, genes associated with hyper‐CHH DMRs in the promoter region (1 kb upstream of genes) or gene‐body did not show significant changes in expression between saline and brackish samples in either natural or control environments, suggesting limited impacts of the observed DNA hypermethylation on the expression of the genes associated with DMRs in B. gymnorhiza leaf tissue.
Fig. 4
(a) Pie charts showing genome features associated with natural‐sample differentially methylated regions (DMRs). (b) Bar graphs representing the number of hypo‐ and hyper‐DMRs in the genome features. (c) Box plots representing expression differences of genes (log2 fold changes) associated with DMRs in the indicated genome features and contexts. Thick horizonal bars in the boxes indicate median values, box limits are upper and lower quartiles, and whiskers represent ×1.5 interquartile ranges. P‐values by Brunner–Munzel test are indicated. ns, not significant (P > 0.05).
(a) Pie charts showing genome features associated with natural‐sample differentially methylated regions (DMRs). (b) Bar graphs representing the number of hypo‐ and hyper‐DMRs in the genome features. (c) Box plots representing expression differences of genes (log2 fold changes) associated with DMRs in the indicated genome features and contexts. Thick horizonal bars in the boxes indicate median values, box limits are upper and lower quartiles, and whiskers represent ×1.5 interquartile ranges. P‐values by Brunner–Munzel test are indicated. ns, not significant (P > 0.05).
Salinity‐dependent regulation of BURP‐domain containing protein gene cluster in the B. gymnorhiza genome
Although not many genes associated with DMRs showed expression changes, transcriptome analyses showed that there were several highly downregulated genes located on scaffold39 under the natural saline condition (Table S18). These genes encode BURP‐domain containing proteins (Table S11), which were forming a cluster spanning the c. 300 kb region from 80 000 to 110 000 on scaffold39 (Fig. 5a). The BURP domain (named from protein families BNM2, USP, RD22, PG1β) proteins (Hattori et al., 1998) are involved in various plant metabolic pathways, development, and stress responses (Wang et al., 2015), and often form gene clusters in plant genomes (Li et al., 2016). Previous studies on B. gymnorhiza identified BURP‐domain containing protein genes BgBDC1‐4 (Banzai et al., 2002b), which showed downregulation under high‐salinity conditions (Fig. 2f) (Banzai et al., 2002a; Miyama & Hanagata, 2007). We searched for proteins homologous to BgBDCs in the B. gymnorhiza genome, identifying at least 14 BgBDC genes, including those highly similar to BgBDC1‐4 genes (Fig. S6a). Among the 14 BgBDC genes, 13 genes were located in the cluster on the scaffold39 (Fig. S6a,b). In addition to BgBDC genes, many TEs and other repeats were accumulated within the cluster (Fig. 5a), which were highly methylated at both CG and non‐CG contexts (Fig. 5a). These TEs were hypermethylated at CHH contexts under saline conditions in both natural and control environments (Fig. 5b), which could be associated with the cluster‐wide gene regulation under salt stress. We further searched the syntenic regions of the cluster in the genome of K. obovata (Hu et al., 2020), and found a syntenic block in contig1 of the K. obovata genome (Fig. 5c). Interestingly, only a single BURP domain protein gene is encoded in the corresponding region of K. obovata, suggesting either duplications and expansions of the BURP gene cluster in B. gymnorhiza, or contractions of the cluster in the K. obovata genome. Accumulation of the repetitive elements in the BURP gene cluster in the B. gymnorhiza genome might have contributed to the formation of the gene cluster, which has often been observed in other gene families in the plant genomes (Galindo‐Gonzalez et al., 2017; Lai & Eulgem, 2018).
Fig. 5
(a) Browser view of the BURP gene cluster in the scaffold39 of the Bruguiera gymnorhiza genome. Tracks: top to bottom; cytosine methylation levels (0 to 1) in Watson and Crick strands in CG (blue), CHG (light blue), and CHH (pink) contexts (H can be A, or C, or T) of indicated conditions, CHH hyper‐differentially methylated regions (DMRs; orange), RNA‐sequencing tracks (count per million (CPM; 0 to 40)), repeats (orange), gene model (black), and heat maps showing differences in expression (log2 fold changes) of genes, including BURP genes (Bruguiera gymnorhiza BURP‐domain containing protein, BgBDCs; red). g80 and g81 in scaffold39 were originally annotated as two separate genes, while from the protein sequences they were likely from one BURP gene‐coding sequences, and only expression of g81 is shown. (b) Boxplots showing cytosine methylation levels of the BURP gene cluster, from 800 201 to 1076 600 of scaffold39, averaged in 100‐bp windows. P‐values by Brunner–Munzel test are indicated. ns, not significant (P > 0.05). Thick horizonal bars in the boxes indicate median values, box limits are upper and lower quartiles, and whiskers represent ×1.5 interquartile ranges. (c) A synteny of the BURP gene cluster between B. gymnorhiza (scaffold39: 790 000–1100 000) and Kandelia obovata (Contig1: 6360 000–6560 000, reverse orientation). BURP genes are shown in red. Blast identity (%) is indicated by ribbons (sense orientation, light blue to blue; reverse orientation, yellow to orange).
(a) Browser view of the BURP gene cluster in the scaffold39 of the Bruguiera gymnorhiza genome. Tracks: top to bottom; cytosine methylation levels (0 to 1) in Watson and Crick strands in CG (blue), CHG (light blue), and CHH (pink) contexts (H can be A, or C, or T) of indicated conditions, CHH hyper‐differentially methylated regions (DMRs; orange), RNA‐sequencing tracks (count per million (CPM; 0 to 40)), repeats (orange), gene model (black), and heat maps showing differences in expression (log2 fold changes) of genes, including BURP genes (Bruguiera gymnorhiza BURP‐domain containing protein, BgBDCs; red). g80 and g81 in scaffold39 were originally annotated as two separate genes, while from the protein sequences they were likely from one BURP gene‐coding sequences, and only expression of g81 is shown. (b) Boxplots showing cytosine methylation levels of the BURP gene cluster, from 800 201 to 1076 600 of scaffold39, averaged in 100‐bp windows. P‐values by Brunner–Munzel test are indicated. ns, not significant (P > 0.05). Thick horizonal bars in the boxes indicate median values, box limits are upper and lower quartiles, and whiskers represent ×1.5 interquartile ranges. (c) A synteny of the BURP gene cluster between B. gymnorhiza (scaffold39: 790 000–1100 000) and Kandelia obovata (Contig1: 6360 000–6560 000, reverse orientation). BURP genes are shown in red. Blast identity (%) is indicated by ribbons (sense orientation, light blue to blue; reverse orientation, yellow to orange).
Differential expression of chromatin modifier genes under salinity stresses in B. gymnorhiza
To understand the basis of the DNA hypermethylation of TEs in the B. gymnorhiza genome induced under saline conditions (Fig. 3), expression profiles of chromatin modifier genes in B. gymnorhiza were further investigated by searching the counterparts of A. thaliana (Arabidopsis Genome, 2000). Most of the homologs involved in DNA methylation pathways, including DNA methylases and demethylases, RNAi machinery, factors involved in RdDM, and histone H3K9 methylases and demethylases, were identified (Table S37), suggesting the conservations of the basic DNA methylation machineries in the B. gymnorhiza genome (Fig. 6a). Interestingly, the saline conditions in both natural and control environments significantly altered the expression of several genes involved in DNA methylation (Fig. 6a,b). In particular, the B. gymnorhiza homologs of the non‐CG methylase genes Bg_CMT2 and Bg_CMT3 were significantly upregulated in the natural saline environment. These genes are required for CHG and CHH methylation of TEs, depending on H3K9 methylation in other plant species (Law & Jacobsen, 2010; Stroud et al., 2013; Zemach et al., 2013). In addition, the de novo methyltransferase homolog Bg_DRM2, which is involved in RdDM in plants (Matzke & Mosher, 2014; Erdmann & Picard, 2020) was significantly upregulated in both natural and control saline conditions (Figs 6a, S7). Conversely, one of the DNA demethylase homologs (Zhang et al., 2018), Bg_DME2, showed rather significant downregulation under saline conditions (Fig. 6a,b). Although the degree of DNA methylation differences induced by the gene expression changes remains unclear, the upregulation of the non‐CG methylase genes and downregulation of the DNA demethylase gene under saline conditions were consistent with the observed DNA hypermethylation of TEs, especially in CHG and CHH sites in saline conditions. Taken together, this indicates that the differences in the expression of chromatin modifier genes may play a role in epigenome regulation in plants under high‐salinity environments.
Fig. 6
(a) Heat map showing differences in expression (log2 fold changes (saline/brackish)) of genes homologous to chromatin modifiers. The dendrogram shows hierarchical gene clustering by expression levels in natural and control conditions (rows). (b) Genome browser view of three representatives of chromatin modifier genes showing differential expression in both natural and control conditions. Numbers in brackets indicate scales of tracks in count per million (CPM). Three representative RNA‐sequencing replicates are shown in each condition. q‐Values (adjusted P‐values) calculated by DEseq2 are indicated.
(a) Heat map showing differences in expression (log2 fold changes (saline/brackish)) of genes homologous to chromatin modifiers. The dendrogram shows hierarchical gene clustering by expression levels in natural and control conditions (rows). (b) Genome browser view of three representatives of chromatin modifier genes showing differential expression in both natural and control conditions. Numbers in brackets indicate scales of tracks in count per million (CPM). Three representative RNA‐sequencing replicates are shown in each condition. q‐Values (adjusted P‐values) calculated by DEseq2 are indicated.
Discussion
TEs can contribute to genome expansion and evolution (Bennetzen & Wang, 2014), and are often activated by environmental stresses in plants (Ito et al., 2011; Galindo‐Gonzalez et al., 2017; Lanciano & Mirouze, 2018). The genome of mangroves is the smallest among tree species (Xu et al., 2017; Hu et al., 2020). Previous comparative genome studies have reported that mangrove genomes show a convergent reduction of TE sequences compared to nonmangrove genomes (Lyu et al., 2018), which can partly be attributed to RdDM pathway that suppresses the accumulation of young retrotransposons, such as Gypsy family, in the genome (Wang et al., 2018). The RdDM pathway primarily directs non‐CG methylation at the LTRs of retrotransposons at the edges of the TE units (Stroud et al., 2013; Zemach et al., 2013). In contrast, CMTs mainly methylate non‐CG sites in the body of TE sequences (Zemach et al., 2013), and the convergence of RdDM and CMT pathways are required for stable silencing of TEs in plant genomes (Law & Jacobsen, 2010). The non‐CG hypermethylation of TEs in the B. gymnorhiza genome under salt stress seems to occur at both the edges and the body of TE sequences (Figs 3, S4), concurrent with the activation of CMT and DRM methylases (Fig. 6), suggesting that both RdDM and CMT‐mediated DNA methylation pathways may become active in B. gymnorhiza under saline conditions. The results suggest the presence of robust TE suppression mechanisms involving DNA methylation in the mangrove genome under salinity stress (Fig. 3). However, the B. gymnorhiza genome is still comparatively large, with a high TE proportion compared to other mangrove species. Among the mangrove family Rhizophoraceae, which diverged from nonmangrove species (at c. 56 Myr) (Guo et al., 2017; Xu et al., 2017), B. gymnorhiza diverged earlier, and its phylogenetic position was thus isolated from other mangroves, despite the monophyly of the Rhizophoreae tribe (Fig. 1) (Schwarzbach & Ricklefs, 2000; Guo et al., 2017). Bruguiera gymnorhiza might regulate methylation of TEs less stringently as compared to other Rhizophoraceae species that diverged more recently; this would explain the differences in TE accumulation and the genome size differences.TE sequences in the genome often enhance rearrangement, duplication, and recombination, which contribute to gene evolution and genome diversification (Bennetzen & Wang, 2014). The presence of many TEs in the BURP gene cluster (Fig. 5) suggested that the TEs might have contributed to cluster formation in B. gymnorhiza, as observed in the R‐gene clusters essential for various pathogen responses in plant genomes (Galindo‐Gonzalez et al., 2017; Lai & Eulgem, 2018). Indeed, we found intron‐less BURP genes (BgBDC7, BgBDC9, BgBDC10, BgBDC11) scattered in the cluster, suggesting previous retroposition or gene transduction events that are often triggered by the presence of nearby LTR retrotransposon sequences (Galindo‐Gonzalez et al., 2017). Thus, there may be a trade‐off between the cost of maintaining large genomes under harsh environments and the accumulations of TEs in the genomes as a source of genetic diversity. Epigenetic control of TEs could also contribute to the differentiation of nearby gene expression and phenotypic plasticity, which may play a role in the adaptive response of mangrove trees to their stressful environment. Indeed, mangrove trees, including B. gymnorhiza, often show drastic morphological changes in responses to the environmental stresses (Fig. 2) (Suwa et al., 2009; Lira‐Medeiros et al., 2010), suggesting phenotypic and physiological adaptations to the harsh growing environments. Intriguingly, our transcriptome analysis revealed a significant downregulation of the RNAi component ARGONAUTE10 (AGO10) (Fig. S7), which is involved in stem cell maintenance via miRNA regulation in A. thaliana, and the mutation of AGO10 causes stem cell termination in the shoot apical meristem (Liu et al., 2009; Zhu et al., 2011). The downregulation of the AGO10 could be a factor leading to the dwarf phenotypes of B. gymnorhiza under saline conditions (Fig. 2). In addition, a recent finding of cell cycle‐coupled regulation of DNA methylation (Borges et al., 2021) suggests that differences in cell cycle activities in B. gymnorhiza plants growing under different environmental conditions might affect DNA methylation levels in the genome. The observed morphological diversification of the mangrove tree (Fig. 2) might be a result of adaptive control of developmental growth in the saline conditions rather than passive responses to environmental stresses, which may allow the allocation of limited resources to other essential cellular processes.Differential DNA methylation pattern has been known to have an impact in regulation of salt responsive genes. For example, studying salt‐treated palm seedlings has shown that salinity stress causes insufficient root growth and lower expression of some key photosynthetic genes, which is suggested to be due to DNA methylation changes (Sperling et al., 2014; Al‐Harrasi et al., 2018). Other studies on Arabidopsis (Boyko et al., 2010), alfalfa (Medicago spp.) (Al‐Lawati et al., 2016), and Mesembryanthemum crystallinum (Dyachenko et al., 2006) have all reported DNA methylation to be associated with salinity tolerance phenotype. Salt‐sensitive rice cultivars that experienced salt stress showed changes in DNA methylation where hypomethylation caused by salinity was correlated with upregulation of DNA demethylase genes (Ferreira et al., 2015). Non‐CG methylation of a salt tolerance gene in Arabidopsis also showed to help with the adaptation to high‐salinity stress (Ashapkin et al., 2020). A study on wheat has shown a similar result, where hyper‐methylation caused by salt stress was correlated with the regulation of salt‐responsive genes (Kumar et al., 2017). Similar to our results, DNA methylation through the activity of methyltransferases such as MET1, CMT3, and DRM2 has been reported extensively in response to stress (Shen et al., 2014; Naydenov et al., 2015). Thus, the observed global changes in DNA methylation in B. gymnorhiza under saline conditions may be a conserved response in various plant species.In the study site of mangrove forest, B. gymnorhiza showed the greatest morphological differences between the oceanside and riverside compared to the other two neighboring mangrove species R. stylosa and K. obovata. Indeed, Bruguiera has several unique characteristics that distinguish it from other mangrove species. For example, seedling dispersal in Bruguiera is accompanied with the fruits, whereas only seedlings are dispersed in other mangroves (Juncosa, 1984; Tomlinson, 1986). Furthermore, B. gymnorhiza is mainly pollinated by birds, while Rhizophora and other close relatives are wind‐ and insect‐pollinated (Kondo et al., 1987). Additionally, B. gymnorhiza has an extraordinarily wide distribution range throughout the Indo‐West Pacific from East Africa to Southeast Asia, Australia, and the western Pacific (Duke & Ge, 2011). Future studies of various mangrove species would allow us to better understand the genome strategies of plant species for adaptation to extreme natural environments.
Conclusions
In this study, we performed a de novo genome assembly and in natura epigenomics of the mangrove tree B. gymnorhiza. The results revealed that B. gymnorhiza under high‐salinity conditions induced DNA hypermethylation in TEs, suggesting a robust epigenome regulation for TEs in B. gymnorhiza under salt stresses. The high‐quality reference genome, transcriptome, and epigenome data in this study builds a promising foundation for future genetic and epigenetic studies in mangroves and may also contribute to a better understanding of the adaptive responses and epigenome regulation in plant species under fluctuating environmental conditions. Our results support the involvement of DNA methylation in some differentiation of gene expression and phenotypic characteristics that may be part of the adaptive response of mangrove trees to their stressful environment. Future studies are needed to understand the mechanism behind this connection.
Author contributions
The study was conceived and coordinated by MM and HS, all experiments were designed and performed by MM. For data and bioinformatic analyses, flow cytometry and k‐mer analyses were performed by MM. Short read genome assembly was performed by MM, PacBio genome assembly was performed by MM and FM, scaffolding and gap closing were performed by MM, annotation of the genome was performed by MM, FM, and was assisted by DG, Transcriptome assembly was performed by MM and FM, Gene expression analysis was performed by MM and was assisted by HS and DG, DNA methylation analyses were performed by MM and HS. Figures and tables were prepared by MM and HS. The manuscript was written by MM and HS.Fig. S1 The Bruguiera gymnorhiza genome and annotations.Fig. S2 The Bruguiera gymnorhiza chloroplast genome with gene annotation.Fig. S3 The expression analysis of Bruguiera gymnorhiza genes in Natural and Control conditions.Fig. S4 DNA methylation analysis of Bruguiera gymnorhiza genome in Natural and Control conditions.Fig. S5 The expression of Bruguiera gymnorhiza genes associated with DMRs in the Control condition.Fig. S6 Protein analysis of BURP genes in the Bruguiera gymnorhiza genome.Click here for additional data file.Notes S1 Supplementary details about Materials and Methods.Click here for additional data file.Notes S2 In‐house Python script to obtain exon hints from transcriptome.Click here for additional data file.Table S1 Raw and Trimmed Illumina DNA sequencing information used in this study.Table S2 PACBIO DNA sequencing information used in this study.Table S3 Genome and transcriptome assembly results.Table S4 Quality control of final assembly by mapping Hi‐Seq data to final assembly.Table S5 BUSCO analysis of the genome and the potential proteins.Table S6 Repeat annotations.Table S7 The intersect between LTR‐finder and repeatmodeler output.Table S8 Top Blast hits for genes based on a repeat‐masked genome against Uniprot plant database.Table S9 Synteny blocks between Bruguiera gymnorhiza and Kandelia obovata, B. gymnorhiza and Rhizophora apiculata, and K. obovata and R. apiculate.Table S10 RNA‐Seq mapping rate by Star_Asterix.Table S11 34 403 Protein‐coding gene models in the Bruguiera gymnorhiza genome.Table S12 tRNA annotation.Table S13 snoRNA_miRNA_rRNA annotations.Table S14 Repeat_summary.Table S15 Blast hits to Bruguiera gymnorriza ESTs.Table S16 Raw Illumina RNA sequencing information used in this study.Table S17 Upregulated genes in oceanside (saline) samples.Table S18 Downregulated genes in oceanside (saline) samples.Table S19 GO enrichment in biological process for upregulated genes in oceanside samples.Table S20 GO enrichment in biological process for downregulated genes in oceanside samples.Table S21 KEGG pathway enrichment for upregulated genes in oceanside samples.Table S22 KEGG pathway enrichment for downregulated genes in oceanside samples.Table S23 Upregulated genes in saline control condition.Table S24 Downregulated genes in saline control samples.Table S25 GO enrichment in biological process for upregulated genes in saline control samples.Table S26 GO enrichment in biological process for downregulated genes in saline control samples.Table S27 KEGG pathway enrichment for downregulated genes in saline control samples.Table S28 Statistics of samples used for BS‐Seq.Table S29 BS‐Seq summary.Table S30 CG_DMRs in natural samples.Table S31 CHG_DMRs in natural samples.Table S32 CHH_DMRs in natural samples.Table S33 CG_DMRs in control samples.Table S34 CHG_DMRs in control samples.Table S35 CHH_DMRs in control samples.Table S36 Gene expression changes and associated DMRs.Table S37 Homologs of chromatin modifiers in the Bruguiera gymnorhiza genome.Please note: Wiley Blackwell are not responsible for the content or functionality of any Supporting Information supplied by the authors. Any queries (other than missing material) should be directed to the New Phytologist Central Office.Click here for additional data file.
Authors: O V Dyachenko; N S Zakharchenko; T V Shevchuk; H J Bohnert; J C Cushman; Ya I Buryanov Journal: Biochemistry (Mosc) Date: 2006-04 Impact factor: 2.487
Authors: Xia Shen; Jennifer De Jonge; Simon K G Forsberg; Mats E Pettersson; Zheya Sheng; Lars Hennig; Örjan Carlborg Journal: PLoS Genet Date: 2014-12-11 Impact factor: 5.917
Authors: Michael Tillich; Pascal Lehwark; Tommaso Pellizzer; Elena S Ulbricht-Jones; Axel Fischer; Ralph Bock; Stephan Greiner Journal: Nucleic Acids Res Date: 2017-07-03 Impact factor: 16.971
Authors: Isabel García-García; Belén Méndez-Cea; David Martín-Gálvez; José Ignacio Seco; Francisco Javier Gallego; Juan Carlos Linares Journal: Front Plant Sci Date: 2022-01-04 Impact factor: 5.753