Literature DB >> 35266513

Chromosome-level genome assembly of a xerophytic plant, Haloxylon ammodendron.

Mingcheng Wang¹, Lei Zhang², Shaofei Tong³, Dechun Jiang⁴, Zhixi Fu⁵.

Abstract

Haloxylon ammodendron is a xerophytic perennial shrub or small tree that has a high ecological value in anti-desertification due to its high tolerance to drought and salt stress. Here, we report a high-quality, chromosome-level genome assembly of H. ammodendron by integrating PacBio's high-fidelity sequencing and Hi-C technology. The assembled genome size was 685.4 Mb, of which 99.6% was assigned to nine pseudochromosomes with a contig N50 value of 23.6 Mb. Evolutionary analysis showed that both the recent substantial amplification of long terminal repeat retrotransposons and tandem gene duplication may have contributed to its genome size expansion and arid adaptation. An ample amount of low-GC genes was closely related to functions that may contribute to the desert adaptation of H. ammodendron. Gene family clustering together with gene expression analysis identified differentially expressed genes that may play important roles in the direct response of H. ammodendron to water-deficit stress. We also identified several genes possibly related to the degraded scaly leaves and well-developed root system of H. ammodendron. The reference-level genome assembly presented here will provide a valuable genomic resource for studying the genome evolution of xerophytic plants, as well as for further genetic breeding studies of H. ammodendron.

Entities: Chemical

Keywords: zzm321990 Haloxylon ammodendronzzm321990 ; PacBio’s high-fidelity sequencing; desert adaptation; genome assembly; xerophyte

Mesh：

Year: 2022 PMID： 35266513 PMCID： PMC8946665 DOI： 10.1093/dnares/dsac006

Source DB: PubMed Journal: DNA Res ISSN： 1340-2838 Impact factor: 4.458

1. Introduction

Desertification is a serious threat to arid and semi-arid regions, which cover more than 40% of the global land area and are home to about 1 billion people., This has become a major global crisis. Desertification has affected 10–20% of these drylands, and the desertification rate is likely to increase rapidly in the face of dramatic climate change and human activities., Deterioration of plant cover is the prime indicator of desertification. However, some plant species have developed their own adaptation strategies to overcome water and nutrient shortages in harsh desert environments. These extremophytes have played important roles in anti-desertification projects worldwide. Therefore, studying genetic mechanisms of plant adaptation to extreme desert environments will help us to develop more drought-resistant crops, which is beneficial for future desertification control. In the last few decades, molecular basis of drought and salt tolerance has been extensively studied in the model plant Arabidopsis thaliana. With the rapid development of sequencing technologies, our understanding of drought and salt stress response in plants has advanced by transcriptome analysis of a large number of xerophytic plants. However, genome sequencing of xerophytic plants, which is crucial for comprehensive understanding of their genetic mechanisms of adaptive evolution, has been limited to very few species, including one gymnosperm (Welwitschia mirabilis), one dicot species (Ammopiptanthus nanus), and three monocots (Cleistogenes songorica, Oropetium thomaeum, and Aloe vera). Haloxylon ammodendron (Amaranthaceae), a xerophytic perennial shrub or small tree with a high tolerance to drought and salt stress, dominates many sandy and saline areas of Asian deserts. It plays an important role in the maintenance of the structure and function of the desert ecosystem via sand fixation, wind control, and microclimate amelioration. In addition, this species has a high economic value because it has traditionally been used as livestock feed and firewood, and it is the host plant of a rare traditional Chinese medicinal plant, ‘desert ginseng’, Cistanche deserticola. A few xeromorphic and halomorphic characteristics have been developed in this species, including highly degraded scaly leaves, C4 photosynthesis in the succulent stem, a deep root system, and high salt content in the main stem and branch. To date, molecular-level studies of H. ammodendron’s response to abiotic stressors have been limited to the transcriptome and a limited number of genes. A comprehensive understanding of the adaptive evolution of H. ammodendron has been hampered by the lack of genome information. In this study, we assembled a chromosome-level genome assembly of H. ammodendron (2n = 2x = 18) with PacBio’s high-fidelity (HiFi) reads and Hi-C reads. Using this high-quality genome, we investigated the evolutionary history and molecular mechanisms underlying the desert adaptation of H. ammodendron. The genome assembly presented here will provide a valuable genomic resource for studying the genome evolution of xerophytic plants. This genome is also beneficial for further genetic breeding studies of this ecologically and economically important species.

2. Materials and methods

2.1. Plant materials and sequencing

All plant materials used in this study were harvested from an adult plant of H. ammodendron growing in Yinchuan, Ningxia Province, northwestern China (Fig. 1A). The plant samples were authenticated by Dr Lei Zhang from North Minzu University, China. For whole-genome DNA sequencing, total genomic DNA was extracted from fresh leaves using the cetyl trimethylammonium bromide method. First, paired-end libraries with an insert size of 350 bp were constructed and sequenced on an Illumina HiSeq 2500 platform following the manufacturer’s instructions (Illumina, San Diego, CA, USA). Second, SMRTbell libraries were constructed using the PacBio 15-kb protocol (Pacific Biosciences, CA, USA), including DNA fragmentation, DNA damage repair and end repair, hairpin adapter ligation, and DNA purification. Long-read HiFi sequencing was then performed using circular consensus sequencing (CCS) mode on a PacBio Sequel II platform. Third, more than 2 g of young leaves were collected for the Hi-C experiment. Hi-C libraries were constructed as described previously, including chromatin extraction and digestion, DNA ligation, and purification. The DNA was sheared to a mean fragment size of ∼350 bp and sequenced on an Illumina HiSeq X Ten platform.

Figure 1

Overview of H. ammodendron and its genome. (A) Photo of an adult plant of H. ammodendron. (B) Heatmap showing Hi–C interactions at a resolution of 500 kb. (C) Landscape of the H. ammodendron genome. The tracks from outer to inner circles indicate the following: (a) GC content; (b) repeat density; (c) density of Gypsy elements; (d) density of Copia elements; (e) gene density; and (f) location of TF genes. The inner circle represents the collinear blocks identified in the H. ammodendron genome. To aid in genome quality assessment and gene prediction, fresh leaves and stems from the same individual of H. ammodendron were collected for RNA sequencing (RNA-seq). Total RNA was extracted from these tissues, and a DNA-free DNA removal kit (Thermo Fisher Scientific, USA) was used to remove residual DNA. The RNA-seq libraries were prepared and sequenced on an Illumina HiSeq 2500 platform. RNA-seq reads were filtered using Trimmomatic v0.36 with default parameters.

2.2. Genome assembly and assessment

The genome size of H. ammodendron was estimated based on a 17-mer frequency distribution of Illumina short reads generated by Jellyfish v2.2.9 with parameters ‘-m 17 -s 3G -C’. Low-frequency K-mers (<4×) were removed and the genome size was calculated based on the formula Genome_size = (Total number of k-mers−Number of low-frequency k-mers)/Homozygous peak depth. Raw HiFi-sequencing data were processed using CCS v4.2.0 (https://ccs.how), with the main parameters of min-passes = 3 and min-rq = 0.99. The high-quality HiFi reads were then assembled into contigs using hifiasm v0.14, with default parameters. Further, ALLHiC was used to anchor these contigs into pseudochromosomes based on Hi-C data. Placement and orientation errors exhibiting obvious discrete chromatin interaction patterns were manually adjusted according to the contact maps plotted by Juicebox v1.8.8. Genome quality and completeness were assessed by a combination of short reads mapping, transcript alignment, benchmarking universal single-copy orthologs (BUSCO) analysis, and LTR assembly index (LAI) scores. First, the Illumina reads used in genome size estimation were mapped to the assembly using BWA v0.7.17, and the read depth distribution was calculated by SAMTOOLS v1.5. Second, RNA-seq reads were assembled into unigenes by Trinity v2.8.4 under the mode without reference, and all unigenes were mapped to the genome using BLAT v36. Third, a BUSCO completeness score was calculated using BUSCO v3.02 in genome mode with the Embryophyta odb10 dataset. Finally, LAI scores were calculated in a sliding window of 3 Mb with a 300 kb step size across the entire genome using LTR_retriever v2.8.

2.3. Genome annotation

The genome of H. ammodendron was annotated using procedures similar to those used in two previous studies., Homology searches of repeats were performed against the assembly by RepeatMasker v4.0.7 based on a comprehensive database, including a de novo repeat library generated by RepeatModeler v1.0.11 and the ‘Viridiplantae’ repeat library from the Repbase database v22.11. We also performed de novo detection of intact long terminal repeat retrotransposons (LTR-RTs) by LTR_Finder v1.06 and LTRharvest v1.5.10, with the main parameters of min LTR length = 100 bp, max LTR length = 7,000bp, and min LTR similarity = 90%. Finally, LTR_retriever was used to integrate the LTR predictions, remove false positives, and estimate the insertion times of intact LTR-RTs. We then predicted protein-coding genes based on the repeat-masked genome. First, a homology-based prediction was conducted by aligning the protein sequences of Suaeda aralocaspica v1.0,Beta vulgaris EL10_1.0,Chenopodium quinoa v1.0,Spinacia oleracea Spov3,Amaranthus hypochondriacus v2.1, and A.thaliana TAIR10 to the H. ammodendron genome with TBLASTN v2.2.31+; GeneWise v2.4.1 was used to predict gene structures based on the alignments. Second, Program to Assemble Spliced Alignment (PASA) v2.3.3 was used to detect likely protein-coding regions based on a comprehensive transcriptome database comprising, de novo and genome-guided RNA-seq assembly. Third, a de novo prediction was performed using Augustus v3.2.3 with parameters trained with high-quality gene models selected from the PASA predictions with an exon number ≥3 and CDS length ≥1,500bp. Finally, all gene predictions were integrated into a final gene set using EvidenceModeler (EVM) v1.1.1. The final gene set was searched against PlantTFDB v5.0 to identify transcription factor (TF) genes. Functional annotation was performed by aligning the protein-coding genes against the SwissProt and TrEMBL databases using DIAMOND v0.9.22. Protein motifs and domains were annotated using InterProScan v5.31 by searching multiple publicly available databases. Gene ontology (GO) IDs for each gene were retrieved with Blast2GO v2.5. Kyoto Encyclopaedia of Genes and Genomes (KEGG) pathway mapping was achieved using the KEGG Automatic Annotation Server (https://www.genome.jp/kaas-bin/kaas_main).

2.4. Comparative genomic analysis

The protein sequences of H. ammodendron and seven other sequenced plant species, namely six Amaranthaceae species (S.aralocaspica, B.vulgaris, Beta patula,C.quinoa, S.oleracea, and A.hypochondriacus) and one outgroup species (A.thaliana), were used for the phylogenetic analysis. Single-copy orthogroups among the eight species were distinguished using OrthoFinder v2.3.11 with the DIAMOND aligner and the Markov cluster algorithm. For each single-copy orthogroup, protein sequences were aligned by MAFFT-LINSI v7.313. Conserved sites were extracted from the concatenated alignments of all single-copy genes using Gblocks v0.91b. A maximum likelihood phylogeny was constructed based on the conserved sites using RAxML v8.0.17 under the PROTGAMMAILGX model with 100 bootstrap replicates. Divergence times among the eight species were estimated using MCMCTREE implemented in PAML v4.9e with the options ‘independent rates’ and ‘F84’ models. The calibration points for the A.thaliana–A.hypochondriacus divergence (111–131 million years ago, Mya) and S.oleracea–C.quinoa divergence (18–57 Mya) were obtained from the TimeTree database (http://www.timetree.org). Gene family expansions and contractions were further determined using CAFE v3.1. To investigate gene families that are possibly involved in desert adaptation in H. ammodendron and xerophytes, gene family clustering was performed using OrthoFinder based on the protein sequences of H. ammodendron, S.oleracea, A.thaliana, C.songorica, and O.thomaeum. Gene families specific to xerophytes (H. ammodendron, C. songorica, and O. thomaeum) and specific to H. ammodendron were extracted using an in-house python script.

2.5. Whole-genome and tandem gene duplication analysis

The protein sequences of H. ammodendron and S.aralocaspica were self-aligned and aligned with each other using DIAMOND. Syntenic blocks within and between genomes were identified using MCScanX v1.1, with default parameters. For each syntenic gene pair, the non-synonymous substitution rate (Ka) and synonymous substitution rate (Ks) were calculated using the ‘add_ka_and_ks_to_collinearity.pl’ script from MCScanX. The prevalence of whole genome duplication (WGD) events was examined by checking the Ks distributions of paralogs and orthologs within and between the two species. A Ka/Ks ratio >1 indicates a positive selection. Tandem gene duplications were identified as homologous gene pairs that (i) are located within 10 consecutive genes on the same pseudochromosome; (ii) are located within 100 kb of each other; and (iii) have an identity ≥50% and a coverage ≥70% obtained by BLASTP.

2.6. Gene expression analysis

Previously published RNA-seq data of 5% PEG-6000 (Osmotin mimics drought stress) treated H. ammodendron seedlings and controls were downloaded from the NCBI database under the BioProject PRJNA312463. Both the drought-treated seedlings and the controls had three biological replicates. RNA-seq reads were filtered by Trimmomatic and mapped to the assembly using HISAT2 v2.1.0. Fragments per kilobase per million (FPKM) was calculated to evaluate the gene expression level using Cufflinks v2.2.1. Genes that exhibited >2-fold change in FPKM values between drought-treated seedlings and controls were identified as up- or down-regulated genes.

3. Results and discussion

3.1. High-quality genome assembly of H. ammodendron

Before assembly, a total of 49.4 Gb Illumina data were generated and subjected to genome size estimation (Supplementary Table S1). In total, 44,091,563,459 k-mers (k = 17) were identified, and a major peak was observed around a k-depth of 61 (Supplementary Fig. S1). After removing low-frequency k-mers, the genome size of H. ammodendron was estimated to be 708.7 Mb, with a high degree of heterozygosity (1.04%). To generate a high-accuracy and high-continuity genome assembly, we obtained a total of 22.6 Gb (31.9× genome coverage) HiFi long reads, with a maximum and average read length of 64.7 and 15.8 kb, respectively (Supplementary Table S1 and Fig. S2). De novo assembly of these HiFi reads resulted in a preliminary assembly comprising 132 contigs for a total length of 685.3 Mb (96.7% of the estimated genome size). Contigs longer than 1 Mb totalled 675.0 Mb (98.5% of the total assembly length), indicating the high continuity of the assembly. Coverage depth distribution of Illumina reads revealed that there were very few duplicate haplotypes in the assembly (Supplementary Fig. S3). To reconstruct a chromosome-level assembly, we generated 84.7 Gb (119.6× genome coverage) Hi-C clean reads (Supplementary Table S1) and anchored the contigs into pseudochromosomes using the Hi-C scaffolding approach. Ultimately, 682.3 Mb sequences (99.6% of the entire assembly) were assigned to nine pseudochromosomes (Fig. 1B and C), corresponding to the haploid chromosome number of H. ammodendron. The lengths of pseudochromosomes ranged from 65.5 to 88.7 Mb (Supplementary Table S2), and all pseudochromosomes showed a well-organized diagonal pattern for intra-chromosomal interactions (Supplementary Fig. S4). The final genome assembly contained very few gaps, with an average gap number of 7.2 per pseudochromosome. The contig and scaffold N50 values of this assembly reached 23.6 and 76.2 Mb, respectively (Table 1 and Supplementary Table S3).

Table 1

Global statistics of H. ammodendron genome assembly and annotation

Assembly
Estimated genome size (by k-mer analysis) (Mb)	708.69
Length of genome assembly (Mb)	685.35
Number of pseudochromosomes	9
Total length of pseudochromosomes (Mb)	682.32
Scaffold N50 (Mb)	76.23
Contig N50 (Mb)	23.61
Longest contig (Mb)	45.81
Gap size (bp)	6,500
Annotation
GC content (%)	35.38
Percentage of repeat sequences (%)	46.48
Number of protein-coding genes	41,647
Average gene length (bp)	3,997.3
Average coding sequence length (bp)	1,075.3
Average exon length (bp)	277.4
Average intron length (bp)	783.6
Functionally annotated genes	39,032

Global statistics of H. ammodendron genome assembly and annotation To validate the quality of the genome assembly, we first mapped the Illumina reads back to the assembly and obtained an overall mapping rate of 98.6% and a 10-fold minimum genome coverage of 90.4% (Supplementary Table S4). We also assembled the RNA-seq reads into 306,064 unigenes, 82.7% of which had a length coverage of >90% within a single scaffold (Supplementary Table S5). Furthermore, we observed a BUSCO completeness score of 97.8% and an average LAI score of 19.8 for the entire genome (Supplementary Table S6 and Fig. S5). This evidence together indicated that the H. ammodendron assembly attained a high level of quality and completeness.

3.2. Annotation of the H. ammodendron genome

We predicted 318.6 Mb of repetitive sequences representing 46.5% of the H. ammodendron genome (Supplementary Table S7). The putative repetitive elements were more likely to be enriched in the central regions of chromosomes (Supplementary Fig. S6). The percentage of repetitive sequences in H. ammodendron was comparable to that of three other Amaranthaceae species, B.vulgaris (42.3%), A.hypochondriacus (48.1%), and S.aralocaspica (38.4%), but was much lower than that of S.oleracea (74.4%) and C.quinoa (64.0%) (Supplementary Table S8). Transposable elements (TEs) were the most abundant repeat class, spanning 182.7 Mb or 26.7% of the genome, with LTR-RTs being dominant (132.6 Mb; 19.4% of the genome) (Supplementary Table S7). Nearly 70% of TEs had a divergence rate of <20%, indicating a recent burst of TE elements in the H. ammodendron genome (Supplementary Fig. S7). In addition, we identified 125.1 Mb (18.3% of the genome) of unclassified repetitive sequences, which might be H. ammodendron-specific. After masking all repetitive sequences, we identified a total of 41,647 protein-coding genes, 97.2% (40,467) of which were located on the nine pseudochromosomes, with an overall gene density of 59.3 gene per Mb (Supplementary Table S2). A BUSCO completeness score of 94.6% was obtained at the gene model level, indicating the high completeness of gene annotation (Supplementary Table S4). Compared to other Amaranthaceae species, H. ammodendron had the shortest average transcript length, which was mainly caused by intron length reduction (Supplementary Table S9). Of the protein-coding genes, 39,032 (93.7%) were functionally annotated in at least one publicly available database, and 21,946 (52.7%) genes were assigned to GO terms (Supplementary Table S10). Among the protein-coding genes, 1,340 TF genes were predicted and classified into 56 distinct gene families (Supplementary Table S11).

3.3. Evolutionary history of H. ammodendron

We identified 467 strictly single-copy orthogroups among H. ammodendron and seven other plant species (Supplementary Table S12). A well-supported species tree was recovered based on the multiple sequence alignments of these single-copy genes (Fig. 2A). The results showed that H. ammodendron clustered together with another desert plant S. aralocaspica with 100% bootstrap support (Supplementary Fig. S8). This topology remained robust when we excluded the allotetraploid species C. quinoa from the phylogenetic analysis (Supplementary Fig. S8). The divergence time between H. ammodendron and S. aralocaspica was estimated to be around 42 Mya (Fig. 2A). However, the genome size of H. ammodendron (709 Mb) was significantly larger than that of S. aralocaspica (467 Mb). To gain further insights into the evolutionary history underlying genome size differences between these two species, we examined the prevalence of whole-genome/tandem duplication events and LTR amplifications, which were considered to be the major factors that contributing to genome size expansion. We identified a total of 31,220 collinear blocks within and between the genomes of H. ammodendron and S. aralocaspica. The Ks distribution of orthologs between these two species showed a major peak around 0.47, which is younger than the peak identified from the analysis of paralogs within H. ammodendron (∼0.69), indicating that no independent WGD event had occurred in the H. ammodendron genome after its split from S. aralocaspica (Fig. 2B). However, we found that the H. ammodendron genome displayed a high level of tandem duplication (2,343 arrays containing 7,245 genes), contrasting strongly with the S. aralocaspica genome (1,481 arrays containing 3,582 genes) (Supplementary Table S12). The Ks distribution of these duplicated genes suggests a recent burst of tandem duplication events in the H. ammodendron genome (Supplementary Fig. S9). Furthermore, we identified 6,357 intact LTR-RTs with a total length of 48.7 Mb (7.1% of the genome) in the H. ammodendron genome. In contrast, only 1,682 intact LTR-RTs with a total length of 13.0 Mb (2.9% of the genome) were identified in the S. aralocaspica genome. Approximately 52.9% of the intact LTR-RTs inserted into H. ammodendron were younger than 2 million years, with median insertion times of 1.02 and 0.56 Mya for Copia and Gypsy elements, respectively (Fig. 2C). Interestingly, the recent burst of LTR-RTs and gene duplications was coincided with the aridification process of Asian inland in late Cenozoic, which was possibly driven by the rapid uplift of the Tibetan Plateau or global cooling., These results indicate that both recent substantial amplification of LTR-RTs and tandem gene duplication in the H. ammodendron genome may have contributed to its genome size expansion and arid adaptation. In addition, we identified 344 significantly (P < 0.01) expanded gene families (containing 4,762 genes) in the H. ammodendron genome, which were functionally related to ‘defense response’, ‘response to water deprivation’, ‘meristem maintenance’, and ‘root development’ (Supplementary Fig. S10).

Figure 2

Phylogenetic and evolutionary analyses of the H. ammodendron genome. (A) A species tree based on 467 single-copy orthogroups from eight plant species with clusters of orthologous and paralogous gene families. Gene family expansion and contraction are denoted in the numbers next to the plus and minus signs, respectively. Numbers on nodes represent the inferred divergence times with 95% confidence intervals. Circles represent the calibration times obtained from the TimeTree database. (B) Ks distributions for the paralogs and orthologs identified from the whole genome of H. ammodendron and S. aralocaspica. (C) Insertion age distribution of H. ammodendron intact LTRs in comparison to S. aralocaspica.

3.4. Molecular mechanisms underlying desert adaptation of H. ammodendron

A recent study speculated that a low genomic GC content was possibly related to plant adaptation to harsh nutrient- and water-limited conditions. We observed a total GC content of 35.4% for the H. ammodendron genome, which was lower than that of non-exrophytic species in Amaranthaceae, such as S. oleracea (37.9%), B. vulgaris (35.9%), and C. quinoa (36.9%). The overall GC content of protein-coding genes in the H. ammodendron genome was 36.7%, with 1,669 genes exhibiting a GC content of <32.0% (Fig. 3A). These low-GC genes were found to be highly enriched in ‘seed growth’, ‘response to wounding’, ‘cellular response to osmotic stress’, ‘floral meristem determinacy’, ‘regulation of meristem development’, and ‘hyperosmotic salinity response’ (Supplementary Fig. S11). Reduction in the GC content of these genes may have contributed to the desert adaptation of H. ammodendron.

Figure 3

Molecular mechanisms underlying desert adaptation of H. ammodendron. (A) Histogram distribution of GC content of the protein-coding genes in the H. ammodendron genome. (B) Venn diagram of orthologous gene families identified among H. ammodendron, S. oleracea, A. thaliana, C. songorica, and O. thomaeum. (C) Venn diagram of xerophyte and H. ammodendron specific genes and up- and down-regulated genes in 5% PEG-6000 treated H. ammodendron seedlings compared with the controls. (D) Phylogenetic tree of genes involved in metabolism and biosynthesis of BRs in the H. ammodendron genome. Tandemly duplicated genes are highlighted, and positively selected genes are marked with pentagrams. Furthermore, we performed gene family clustering between H. ammodendron, S. oleracea, A. thaliana, and two other xerophytic plants, C.songorica and O.thomaeum. A total of 36,670 genes in the H. ammodendron genome were assigned to 14,911 gene families. Among these gene families, 8,338 are common to all five species, whereas 109 are confined to xerophytic plants (H. ammodendron, C. songorica, and O. thomaeum) (Fig. 3B). The xerophyte-specific gene families were highly enriched in ‘response to heat’, ‘negative regulation of response to salt stress’, ‘positive regulation of abscisic acid biosynthetic process’, ‘stomatal closure’, ‘lateral root morphogenesis’, and ‘negative regulation of leaf development’ (Supplementary Fig. S12). We also identified 2,003 gene families (containing 13,961 genes) that are specific to H. ammodendron (Fig. 3B). These H. ammodendron-specific genes were markedly enriched in ‘response to water deprivation’, ‘embryo development ending in seed dormancy’, ‘response to wounding’, ‘circadian rhythm’, ‘response to brassinosteroid’, and ‘regulation of stomatal closure’ (Supplementary Fig. S13). Gene expression analysis identified 875 up-regulated and 3,616 down-regulated genes in 5% PEG-6000 treated H. ammodendron seedlings compared with the controls. Among these differentially expressed genes, 22 are specific to the three xerophytes, 1,796 are specific to H. ammodendron (Fig. 3C), 257 exhibit a GC content of <32.0%, and 839 are located in the significantly expanded gene families. These genes may play important roles in the direct response of H. ammodendron to water-deficit stress. Desert plants usually develop smaller and narrower leaves than plants living in temperate habitat to limit water loss by transpiration, and obtain nutrients from soil through their well-developed roots for optimal growth. Brassinosteroids (BRs) are plant steroid hormones known mainly for promoting organ growth through their combined effect on cell expansion and division.,Arabidopsis mutant plants with known defects in the biosynthesis or perception of BRs develop small leaves. We identified 71 genes that were involved in metabolism (GO: 0016131) or biosynthesis (GO: 0016132) of BRs in the H. ammodendron genome (Fig. 3D and Supplementary Table S13). Among these genes, 31 were tandemly duplicated, and six were under positive selection (Fig. 3D). We also identified eight genes that were involved in the biosynthesis of strigolactones (SLs; GO: 1901601; Supplementary Table S13), which are a novel class of plant hormones controlling shoot branching in seed plants., In rice, treatment with SL biosynthesis inhibitor stopped elongation of seminal roots in wild-type plants under stress conditions. Based on Interproscan annotations, we identified 1 SHORT-ROOT (SHR), 1 SHORT-ROOT-like (SHR-like), 1 SCARECROW (SCR), and 19 SCARECROW-like (SCR-like) genes in the H. ammodendron genome (Supplementary Table S13), which may play important roles in the root development of H. ammodendron. Both the SHR and SCR Arabidopsis mutant plants showed dramatically shorter roots than wild-type.,SHR-like and SCR-like genes are also found to be closely linked to root formation in several plants., Among the genes identified above, nine BR metabolism/biosynthesis genes, one SHR gene, and two SCR-like genes were differentially expressed after drought treatment (Supplementary Table S13). These genes may contribute to the degraded scaly leaves and well-developed root system of H. ammodendron, which further promotes its adaptation to drought stress. These candidate genes provide essential information for future functional studies of the organ development of H. ammodendron.

4. Conclusion

In this study, we presented a chromosome-level genome assembly of H. ammodendron with high-level quality and completeness. We predicted 41,647 high-confidence protein-coding genes in the genome assembly. No clear evidence of a recent WGD event was found in the H. ammodendron genome. However, both the recent substantial amplification of LTR-RTs and tandem gene duplication may have contributed to its genome size expansion and arid adaptation. An ample amount of low-GC genes was closely related to functions that may contribute to the desert adaptation of H. ammodendron. Gene family clustering together with gene expression analysis identified differentially expressed genes that may play important roles in the direct response of H. ammodendron to water-deficit stress. We also identified several genes possibly related to the degraded scaly leaves and well-developed root system of H. ammodendron. The present high-quality reference genome sequence will accelerate desert adaptation studies of xerophytic plants as well as further genetic breeding studies of H. ammodendron.

Supplementary data

Supplementary data are available at DNARES online.

Funding

This work was supported by the National Natural Science Foundation of China (32070410, 32000158, and 31900322), the Sichuan Science and Technology Program (2020YFH0005), the Youth Innovation Promotion Association of CAS (2021370), the doctoral scientific research foundation of the North Minzu University (2021KYQD10), the Project of Sustainable Development Research Center of Resources and Environment of Western Sichuan, Sichuan Normal University (2020CXZX03), and start-up funds provided by Chengdu University (2081921039).

Conflict of interest

None declared.

Data availability

All sequence data have been deposited at the NCBI under the BioProject PRJNA786049. RNA-seq data are available under the Sequence Read Archive (SRA) accession numbers SRR17137407 and SRR17137408. Illumina short-read data, HiFi long-read data, and Hi-C data are available under the SRA accession numbers SRR17127859, SRR17129371, and SRR17138685. The genome assembly, annotations, and predicted peptides are available on FigShare at the link: https://doi.org/10.6084/m9.figshare.17128424.v1. Click here for additional data file.

73 in total

1. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis.

Authors: J Castresana
Journal: Mol Biol Evol Date: 2000-04 Impact factor: 16.240

2. PAML 4: phylogenetic analysis by maximum likelihood.

Authors: Ziheng Yang
Journal: Mol Biol Evol Date: 2007-05-04 Impact factor: 16.240

Review 3. The evolutionary significance of ancient genome duplications.

Authors: Yves Van de Peer; Steven Maere; Axel Meyer
Journal: Nat Rev Genet Date: 2009-08-04 Impact factor: 53.242

Review 4. Plants grow on brassinosteroids.

Authors: Gustavo E Gudesblat; Eugenia Russinova
Journal: Curr Opin Plant Biol Date: 2011-07-27 Impact factor: 7.834

5. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana.

Authors:
Journal: Nature Date: 2000-12-14 Impact factor: 49.962

6. Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum.

Authors: Robert VanBuren; Doug Bryant; Patrick P Edger; Haibao Tang; Diane Burgess; Dinakar Challabathula; Kristi Spittle; Richard Hall; Jenny Gu; Eric Lyons; Michael Freeling; Dorothea Bartels; Boudewijn Ten Hallers; Alex Hastie; Todd P Michael; Todd C Mockler
Journal: Nature Date: 2015-11-11 Impact factor: 49.962

7. Assessing genome assembly quality using the LTR Assembly Index (LAI).

Authors: Shujun Ou; Jinfeng Chen; Ning Jiang
Journal: Nucleic Acids Res Date: 2018-11-30 Impact factor: 16.971

8. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm.

Authors: Haoyu Cheng; Gregory T Concepcion; Xiaowen Feng; Haowen Zhang; Heng Li
Journal: Nat Methods Date: 2021-02-01 Impact factor: 28.547

9. PlantTFDB 4.0: toward a central hub for transcription factors and regulatory interactions in plants.

Authors: Jinpu Jin; Feng Tian; De-Chang Yang; Yu-Qi Meng; Lei Kong; Jingchu Luo; Ge Gao
Journal: Nucleic Acids Res Date: 2016-10-24 Impact factor: 16.971

10. The genome of Cleistogenes songorica provides a blueprint for functional dissection of dimorphic flower differentiation and drought adaptability.

Authors: Jiyu Zhang; Fan Wu; Qi Yan; Ulrik P John; Mingshu Cao; Pan Xu; Zhengshe Zhang; Tiantian Ma; Xifang Zong; Jie Li; Ruijuan Liu; Yufei Zhang; Yufeng Zhao; Gisele Kanzana; Yanyan Lv; Zhibiao Nan; German Spangenberg; Yanrong Wang
Journal: Plant Biotechnol J Date: 2020-10-28 Impact factor: 9.803