Literature DB >> 35438173

Chromosomal-level genome assembly of the orchid tree Bauhinia variegata (Leguminosae; Cercidoideae) supports the allotetraploid origin hypothesis of Bauhinia.

Yan Zhong¹, Yong Chen², Danjing Zheng², Jingyi Pang¹, Ying Liu¹, Shukai Luo², Shiyuan Meng², Lei Qian², Dan Wei³, Seping Dai², Renchao Zhou¹.

Abstract

Cercidoideae, one of the six subfamilies of Leguminosae, contains one genus Cercis with its chromosome number 2n = 14 and all other genera with 2n = 28. An allotetraploid origin hypothesis for the common ancestor of non-Cercis genera in this subfamily has been proposed; however, no chromosome-level genomes from Cercidoideae have been available to test this hypothesis. Here, we conducted a chromosome-level genome assembly of Bauhinia variegata to test this hypothesis. The assembled genome is 326.4 Mb with the scaffold N50 of 22.1 Mb and contains 37,996 protein-coding genes. The Ks distribution between gene pairs in the syntenic regions indicates two whole-genome duplications (WGDs): one is B. variegata-specific, and the other is shared among core eudicots. Although Ks between gene pairs generated by the recent WGD in Bauhinia is greater than that between Bauhinia and Cercis, the WGD was not detected in Cercis, which can be explained by an accelerated evolutionary rate in Bauhinia after divergence from Cercis. Ks distribution and phylogenetic analysis for gene pairs generated by the recent WGD in Bauhinia and their corresponding orthologs in Cercis support the allopolyploidy origin hypothesis of Bauhinia. The genome of B. variegata also provides a genomic resource for dissecting genetic basis of its ornamental traits.

Entities: Chemical

Keywords: zzm321990 Bauhinia variegatazzm321990 ; allopolyploidization; genome assembly; rapid evolution; whole-genome duplication

Mesh：

Year: 2022 PMID： 35438173 PMCID： PMC9052405 DOI： 10.1093/dnares/dsac012

Source DB: PubMed Journal: DNA Res ISSN： 1340-2838 Impact factor: 4.477

1. Introduction

Leguminosae is an economically and agronomically important family, with six subfamilies (Papilionoideae, Caesalpinioideae, Detarioideae, Cercidoideae, Dialioideae and Duparquetioideae), ca. 770 genera and 20,000 species., Some legumes are major sources of plant protein and micronutrients, and have been used as high-quality food and fodder. Many legumes show high horticultural value and have been cultivated throughout the world. Given their economical and agronomical significance, genome sequencing has been conducted for quite a few legumes, mainly from Papilionoideae, including Glycine max,Cajanus cajan,Arachis duranensis,Pisum sativum,Lotus japonicus and Medicago truncatula. In contrast, draft genome sequences are available for only two species (Mimosa pudica and Chamaecrista fasciculata) of Caesalpinioideae and one species (Cercis canadensis) of Cercidoideae. Whole-genome duplication (WGD) plays important roles in plant genome evolution and diversification., A previous study showed that a WGD occurred in the common ancestor of all papilionoids (i.e. Papilionoideae) and several independent WGDs near the base of Caesalpinioideae, Detarioideae and Cercidoideae. Cercidoideae is the earliest-diverging subfamily among the six subfamilies of Leguminosae. In Cercidoideae, Cercis is the only genus that has the chromosome number of n = 7, identical with the ancestral chromosome number inferred for legumes, and all other genera have their chromosome number of n = 14 (CCDB; http://ccdb.tau.ac.il/). Genomic analysis of Cercis and other species of this subfamily suggested the lack of a recent WGD in Cercis and an allotetraploid origin for the common ancestor of the rest of the subfamily was proposed. However, no chromosome-level fully assembled genome from Cercidoideae has been available to test this hypothesis. Bauhinia, the largest genus of the subfamily Cercidoideae, consists of ∼380 species distributed in the pantropical regions, with many species exhibiting high ornamental value and being widely cultivated in tropical regions. Bauhinia variegata, also called the orchid tree, possesses diverse petal colours varying from white to deep purple and is especially attractive in horticulture. Here, we assembled the chromosomal-level genome of B. variegata using PacBio and Illumina sequencing, and Hi-C scaffolding technologies. Genome evaluation and annotation, phylogenomic analysis, gene family evolution and intra- and inter-genome synteny analysis were performed. We aimed to test the hypothesis of the allotetraploid origin of Bauhinia with the high-quality genome.

2. Materials and methods

2.1. Sampling and sequencing

Samples of an individual of B. variegata used for the whole-genome and transcriptome sequencing were obtained from Sun Yat-sen University campus, Guangzhou, China. Genomic DNA was extracted from the leaves. RNAs were isolated from four fresh tissues, i.e. flower, fruit, leaf and root. A DNA library with an insert size of 30 kb was constructed and then sequenced on the PacBio Sequel II System and 175.4 Gb reads were generated. To perform the genome survey, a short genome fragment library with an insert size of 350 bp was constructed and then sequenced on an Illumina NovaSeq platform, and 48.8 Gb paired-end reads of 150 bp were generated. Transcriptome sequencing was also conducted on the same Illumina NovaSeq platform and about 6 Gb sequence data were generated for each tissue. For High-throughput Chromatin Conformation Capture (Hi-C), fresh leaves were cut into small pieces and infiltrated in 2% formaldehyde. Glycine was added to stop crosslinking. The tissue was ground to powder and nuclei isolation buffer was then added to obtain a nuclei suspension. Nuclei were digested with HindIII restriction endonuclease. DNA fragments of 150–300 bp were purified, and PCR amplification was performed after adapters were ligated to the Hi-C products. The PCR products were purified, and the Hi-C libraries were quantified by quantitative PCR for Illumina HiSeq X Ten sequencing. Finally, a total of 31.3 Gb paired-end reads of 150 bp were generated.

2.2. Genome size estimation

Genome survey analysis was performed using clean Illumina reads filtered by fastp 0.20.1 and FastUniq with default parameters. K-mers were counted and k-mer count histogram was produced with Jellyfish v.2.3.0 for 48.8 Gb Illumina reads with k-mer length of 17. Genome size was estimated based on k-mer frequency distributions by GenomeScope 1.0 (http://qb.cshl.edu/genomescope/).

2.3. Genome assembly

The PacBio reads were corrected, trimmed and assembled into contigs using Canu v2.0, with the parameters correctedErrorRate = 0.035 and minReadLength = 2,000. The primary assembly was polished by referring to the PacBio reads and Illumina reads with NextPolish 1.2.0 with default parameters. Finally, haplotigs and contig overlaps in the polished assembly were purged based on read depth using Purge_Dups (https://github.com/dfguan/purge_dups). Hi-C unique reads were used to scaffold the PacBio assembly contigs using 3D-DNA pipeline. Hi-C datasets were first processed by Juicer. Abnormal contact patterns in initially assembled contigs were corrected, partitioned, orientated and ordered, and finally anchored onto 14 pseudo-chromosomes using 3D-DNA. We further manually adjusted the Hi-C scaffolding based on the chromatin contact matrix in Juicebox.

2.4. Genome quality evaluation

The quality of the B. variegata genome was further evaluated based on eudicots_odb10 database (2326 BUSCOs) and fabales_odb10 database (5366 BUSCOs) using Benchmarking Universal Single-Copy Orthologs (BUSCO) programme with default parameters. The same evaluation was also performed for the genomes of C.canadensis, C.fasciculata, G.max and M.truncatula.

2.5. Genome annotation

Known repeat sequences were identified by RepeatMasker v 4.1.1 (http://www.repeatmasker.org) with the Repbase library. A de novo repeat library was constructed using RepeatModeler v 2.0.1. RNA-seq data from four tissues were mapped to the genome by HISAT2, merged by SAMtools, and then transcripts were extracted by StringTie v 2.1.3 and coding regions in the transcripts were predicted by TransDecoder (https://github.com/TransDecoder/TransDecoder). The training result of RepeatModeler and the coding sequence from TransDecoder v 5.5.0 were supplied to EDTA to identify repetitive sequences. We predicted protein-coding genes using a combination of homologous-sequence search, ab initio gene prediction, and transcriptome-data comparison in an automatic genome annotation tool GETA v2.4.5 (https://github.com/chenlianfu/geta). Illumina RNA-seq reads from different tissues were used to assemble transcripts and predict genes using HISAT2 and TransDecoder. Protein sequences from Swiss-Prot plant database (https://www.uniprot.org/) and four legumes (Arachis hypogaea, G.max, M.truncatula and Vigna unguiculata) (Table 1) were combined for homology-based prediction with GeneWise (https://www.ebi.ac.uk/~birney/wise2/). Ab initio prediction was performed in Augustus v3.3.3, trained with intron and exon information generated above. These prediction results were integrated and then were searched against the Pfam database for screening to get the final gene prediction result. Functional annotation of genes was also performed by using InterProScan, eggnog-mapper (http://eggnog-mapper.embl.de/), PANNZER2 and Mercator4 v3.0. The functional annotation results were then integrated by an in-house script.

Table 1

Sources of genomic and transcriptomic data of other species included in the study

Species	Sequence type	Source
Glycine max	Genomic	Phytozome
Medicago truncatula	Genomic	Phytozome
Lotus japonicus	Genomic	Phytozome
Vigna unguiculata	Genomic	Phytozome
Cercis canadensis	Genomic	GigaDB
Chamaecrista fasciculata	Genomic	GigaDB
Acacia pycnantha	Transcriptomic	http://www.onekp.com
Copaifera officinalis	Transcriptomic	http://www.onekp.com
Gleditsia triacanthos	Transcriptomic	http://www.onekp.com
Quillaja saponaria	Transcriptomic	http://www.onekp.com
Xanthocercis zambesiaca	Transcriptomic	http://www.onekp.com

Sources of genomic and transcriptomic data of other species included in the study The density of genes, repeats, genes located in syntenic regions (see below) and GC content in 14 pseudo-chromosomes were calculated in a 100-kb sliding window with BEDTools v2.30.0 and were plotted with Circos v 0.69-8.

2.6. Phylogenomic analysis

The longest protein or transcript data from nine legume species (G.max, M.truncatula, L.japonicus and Xanthocercis zambesiaca from Papilionoideae; Acacia pycnantha, C.fasciculata and Gleditsia triacanthos from Caesalpinioideae; C.canadensis from Cercidoideae and Copaifera officinalis from Detarioideae) and one outgroup (Quillaja saponaria) were downloaded (Table 1). All-against-all comparison was performed in OrthoFinder2 with default parameters based on protein sequences of the 11 species. For each ortholog, the protein sequences were aligned using PRANK, and then converted into nucleotide sequence alignments using pal2nal.pl script. All the sequence alignments were then concatenated into a supermatrix, and used for phylogenomic analyses. ModelTest-NG with the Bayesian information criterion was employed for DNA substitution model selection, and RAxML-NG v 0.9.0 was used to construct a phylogenetic tree with 1,000 bootstrap replicates. The divergence time in the ML tree was estimated by mcmctree programme in the PAML package with two soft calibration points between Q. Saponaria and G. max, A. pycnantha and G. max from TimeTree (http://www.timetree.org).

2.7. Gene family expansion and contraction analysis

The orthogroup information identified above and the phylogenetic tree constructed above were used to infer gene family expansion and contraction in CAFE5. Gene families with >100 gene copies were filtered by the script clade_and_size_filter.py. Root frequency distribution was designated as the Poisson distribution, and the Gamma model was set with five gamma rate categories. Gene families with an accelerated rate of expansion and contraction were determined with a threshold conditional P-value (P < 0.05). The numbers of expanded and contracted gene families were labelled in the phylogenetic tree. KEGG pathway enrichment analysis was conducted using KOBAS 3.0 with the parameters of top cluster = 5 and edge weight = 0.35, and statistical significance was tested by Fisher’s exact test in combination with the False Discovery Rate correction.

2.8. Identification of WGD

All-versus-all alignment of the protein sequences of B. variegata was constructed using the Blastp algorithm. To detect the signature of WGD, the programme MCScanX with default parameters was used to define syntenic blocks. For each gene pair in the syntenic blocks, Ks value was calculated using KaKs_Calculator 2.0 with the YN model and the distribution of Ks values of all gene pairs was plotted using R package ggplot2. Intragenomic synteny was plotted with Circos v 0.69-8. Meanwhile, inter-genomic syntenic blocks between B. variegata and C. canadensis were searched, and the Ks values between syntenic gene pairs were calculated as stated above. To show the genomic synteny between the two species, syntenic regions between the 14 chromosomes of B. variegata and 11 longest contigs of C. canadensis were identified and plotted with MCScan pipeline.

2.9. Testing the allopolyploidy origin hypothesis

To test the allopolyploidy origin hypothesis of Bauhinia, gene pairs with the Ks range of 25% greater and lower than the Ks peak value for the B. variegata-specific WGD were extracted, and each of the extracted gene pairs was randomly assigned to two groups (B1 and B2). Orthologs were identified respectively with OrthoFinder2 for each of the two groups and two closely related species (C.canadensis and C.fasciculata). Shared single copy orthologs for the two groups were used for further analyses. Amino acid sequences of each single-copy ortholog (homeologous B1 and B2 for B. variegate and their corresponding ortholog in C. canadensis) were aligned with MAFFT v 6.8, and then converted into nucleotide sequences using ParaAT. Ks values between B1 and B2, B1 and C. canadensis, and B2 and C. canadensis for each gene were calculated using the same method mentioned above. Ks distribution was plotted by R package ggplot2. For phylogenetic analysis among B1, B2 and C. canadensis, one maximum likelihood tree was constructed with RAxML-NG based on coding region sequences of each single copy ortholog, with C. fasciculata as an outgroup. The number of each tree topology was counted.

3. Results and discussion

3.1. Genome assembly and assembly quality assessment

We generated 175.4 Gb PacBio and 48.8 Gb Illumina reads from an individual of B.variegata and used them to assemble its genome. Genome survey of Illumina reads indicated that B.variegata has a genome size of 327.00 Mb (Fig. 1A). We obtained a genome assembly of 411 contigs with a total size of 326.4 Mb (Table 2), representing 99.8% of the estimated genome size. 92.2% (300.8 Mb) of sequences were anchored to the 14 pseudochromosomes based on the Hi-C data. The scaffold N50 and contig N50 are 22.09 Mb and 4.55 Mb, respectively. The overall GC content of the B. variegata genome is 35.0% (Table 2). This is the first chromosomal-level genome assembly for the subfamily Cercidoideae. Bauhinia variegata has the second smallest genome size among legumes with available genome size data (http://data.kew.org/cvalues/), only larger than Leucaena macrophylla (303 Mb).

Figure 1

Table 2

Statistics of the genome assembly for Bauhinia variegata

Assembly features
Genome size (bp)	326,375,084
GC content	34.95%
Scaffolds number	411
Scaffold N50 (bp)	22,089,475
Scaffold L50	7
Contig N50 (bp)	4,549,988
Contig L50	21
Annotation features
Number of predicted gene models	37,996
Mean of exon number per gene	5.4
Mean of exon length (bp)	297.5
Mean of intron length (bp)	382.5
Repeat content (% of the genome assembly)	27.22%
Functional annotation
Total number of annotated genes	35,659
Number of genes annotated by InterProScan	35,189
Number of genes annotated by Eggnog	34,601
Number of genes annotated by Pannzer2	29,589
Number of genes annotated by Mercator4	26,311

N50: sequence length of the shortest contig/scaffold at 50% of the total genome length.

L50: the smallest number of contigs/scaffolds whose length sum makes up half of genome size.

Genome size estimation and genome assembly assessment. (A) Genome survey of Bauhinia variegata with GenomeScope. (B) BUSCO assessment of the genome assemblies of five legumes with eudicots_odb10 dataset. (C) BUSCO assessment of the genome assemblies of five legumes with fabales_odb10 dataset. Statistics of the genome assembly for Bauhinia variegata N50: sequence length of the shortest contig/scaffold at 50% of the total genome length. L50: the smallest number of contigs/scaffolds whose length sum makes up half of genome size. The BUSCO analysis recovered 2,297 (98.7%) universal single copy genes of eudicots_odb10 dataset (2,326 genes) and 5,043 (94.0%) of fabales_odb10 (5,366 genes) in B. variegata (Fig. 1), indicating high completeness of the genome assembly. Comparative analysis among 10 legumes showed that B. variegata had the second highest proportion of duplicated complete BUSCOs (24.2% in eudicots_odb10 and 36.2% in fabales_odb10), only lower than soybean (58.2% and 62.5%, respectively), which has experienced two WGDs after the origin of legumes. The high proportion of duplicated BUSCOs in B. variegata implies that there might be WGD(s) in this species (see below).

3.2. Genome annotation

Transposable elements took up 27.2% of the B. variegata genome (Table 2; Fig. 2c), including 8.6% LTR (4.2% Gypsy, 2.6% Copia and 1.9% others) and 12.0% TIR. Tandem repeat took up 0.64% of the genome. We identified 37,996 protein-coding genes in B. variegata based on de novo prediction, transcript evidence and homology with other known plant proteins (Table 2; Fig. 2b); 93.9% of the predicted genes were functionally annotated by at least one of the four databases (Table 2). The mean exon and intron sizes are 297.5 bp and 382.5 bp, respectively (Table 2).

Figure 2

Intra-genomic synteny analysis and other genomic features of Bauhinia variegata. Tracks from outside to inside show 14 pseudo-chromosomes (a), gene density (b), transposable elements (TE) density (c), GC content (d), the density of genes located in syntenic regions (e) and intragenomic synteny (f).

3.3. Phylogenetic analyses and gene family evolution

We constructed a maximum likelihood tree for 10 legumes (G.max, M.truncatula, L.japonicus and X.zambesiaca from Papilionoideae; A.pycnantha, C.fasciculata and G.triacanthos from Caesalpinioideae; B.variegata and C.canadensis from Cercidoideae and C.officinalis from Detarioideae) based on 129 single-copy genes, with Q.saponaria as an outgroup. The tree topology is consistent with previous studies, and confirms that Bauhinia is close to Cercis (Fig. 3A). Interestingly, B. variegata has a much longer (> 3-fold) branch length than C. canadensis after their divergence.

Figure 3

Phylogenomic analysis and gene family analysis. (A) Phylogenetic tree of 10 legumes and an outgroup based on concatenated sequences of 129 single-copy genes, with the numbers of expanded (left) and contracted (right) gene families shown on each branch. (B) Venn diagram showing the shared and unique gene families among five legumes. (C) KEGG pathway enrichment analysis for significantly expanded gene families in Bauhinia variegata. Each row represents an enriched pathway, and the length of the bar represents the enrichment ratio, which is calculated as ‘input gene number’/‘background gene number’. Different clusters are shown in different colours for the bar. Protein sequences of the 11 species were clustered into 54,370 orthogroups, with 25,927 orthogroups with two or more members. As shown in the Venn diagram (Fig. 3B), a total of 9,119 orthogroups were shared among five legumes (B.variegata, C.canadensis, C.officinalis, C.fasciculata and G.max), and B. variegata contains 732 unique orthogroups. The estimated divergence time between B. variegata and C. canadensis was 35.9 million years ago (Ma). Gene family expansion and contraction analysis identified 369 significantly expanded and 82 significantly contracted (P < 0.05) gene families among 4,523 expanded and 345 contracted gene families of B. variegata, respectively (Fig. 3A). Compared with other legumes, B. variegata has the second highest number of expanded genes, only lower than G.max. KEGG pathway enrichment analysis indicated that significantly expanded gene families were enriched in pathways of stilbenoid, diarylheptanoid and gingerol biosynthesis, flavonoid biosynthesis, cyanoamino acid metabolism, monoterpenoid biosynthesis, AGE-RAGE signalling pathway in diabetic complications, tropane, piperidine and pyridine alkaloid biosynthesis, etc. (Fig. 3C), which may contribute to its biotic and abiotic resistance, and various petal colours.

3.4. Testing the allotetraploidy origin hypothesis of Bauhinia

Compared with Cercis, which has a chromosome number of 2n = 14, B. variegata has a chromosome number of 2n = 28 (CCDB; http://ccdb.tau.ac.il/). It implies that B. variegata should have undergone a WGD after divergence from Cercis. To verify this, we searched intra-genomic syntenic blocks in the B. variegata genome and identified 479 intra-genomic syntenic blocks that contain 15,791 genes pairs, with the longest block containing 969 gene pairs. On average, each syntenic block contains 33 homeologous gene pairs. Collectively, these 479 syntenic blocks include 21,371 genes, indicating that 56.3% of the predicted genes of B. variegata exhibit synteny-based signals. The Ks (the number of substitutions per synonymous site) distribution between gene pairs in the syntenic blocks suggests two WGDs: a young WGD at Ks = 0.22 and an old duplication at Ks = 1.74 (Fig. 4A), with the latter consistent with the γ triplication event shared in core eudicots. Ks for gene pairs on syntenic blocks between B. variegata and C.canadensis exhibit two peaks of 0.14 and 0.16 (Fig. 4B), much lower than Ks (0.22) between homeologous gene pairs produced by the young WGD, suggesting the WGD might have occurred before the divergence between Bauhinia and Cercis if the evolutionary rates for both genera are the same. However, most syntenic regions between B. variegata and C. canadensis correspond to a rate of 2:1 (Fig. 4C), suggesting that this WGD was specific to B. variegata. Therefore, a greater Ks value between gene pairs produced by the young WGD might be due to accelerated evolutionary rate of Bauhinia after it diverged from Cercis, as is also shown by much longer branch length than Cercis on the phylogenetic tree (Fig. 3A). There are two plausible scenarios (Fig. 5) for this and both scenarios involve accelerated evolutionary rate in Bauhinia: one is autopolyploidy in the ancestor of Bauhinia and the other is allopolyploidy between a progenitor of Cercis and another diverged diploid species (already extinct). The latter scenario has been proposed before., Our analyses support the latter scenario, as reasoned below.

Figure 4

Figure 5

Alternative models for the origin of Bauhinia. (A) Autopolyploidy occurred in the ancestor of Bauhinia after divergence from Cercis. (B) Hybridization between the ancestor of Cercis and an extinct, diverged diploid species and genome doubling produced the allopolyploid ancestor of Bauhinia.

Identification of whole genome duplication (WGD) in Bauhinia variegata. (A) The histogram of synonymous substitution rate (Ks) between gene pairs on syntenic blocks in the genome of B. variegata. (B) The frequency density distribution of synonymous substitution rate (Ks) between B. variegata and Cercis canadensis. Shown are Ks distribution of gene pairs on syntenic blocks between the two species, and that between each of the WGD-generated duplicated genes in B. variegata and its corresponding ortholog in C. canadensis. (C) Synteny analysis between B. variegata and C. canadensis. Only 11 longest contigs of C. canadensis are shown here. Alternative models for the origin of Bauhinia. (A) Autopolyploidy occurred in the ancestor of Bauhinia after divergence from Cercis. (B) Hybridization between the ancestor of Cercis and an extinct, diverged diploid species and genome doubling produced the allopolyploid ancestor of Bauhinia. First, the Ks distribution between each gene pairs of B. variegata produced by the young WGD and their corresponding ortholog of C. canadensis revealed two peaks at Ks = 0.14 and Ks = 0.16 (Fig. 4B), which suggests that the homeolog pairs might not originate from the same Bauhinia lineage. The two peaks are also consistent with those obtained from gene pairs on syntenic blocks between B. variegata and C. canadensis, suggesting these genes of this type in B. variegata (showing a 1:1 ratio with Cercis) are remnants of duplicated genes due to homeolog loss following the WGD. Second, phylogenetic analysis of 3,032 genes showed that one homeolog of Bauhinia was sister to the ortholog of Cercis rather than the other homeolog of Bauhinia for the majority of genes (73.9%, 75.7% and 74.7% genes when the bootstrap support values > 60, > 70 and >80 are required, respectively). This is inconsistent with the model of autopolyploidy in the ancestor of Bauhinia, in which the two homeologs of Bauhinia are expected to form sister to each other. Therefore, our genomic data support the allopolyploidy hypothesis proposed before., Surprisingly, C. canadensis has a larger genome size (367 Mb) than B. variegata, although it lacks the young WGD. We propose that genome downsizing due to genetic diploidization following the WGD in B. variegata can accounts for this.

4. Conclusions

We provide the first high-quality chromosome-level genome for the subfamily Cercidoideae (Leguminosae). Based on the genome sequence, we identified two WGDs in B. variegata, a young WGD specific to B. variegata and an old one corresponding to the γ triplication shared in core eudicots. Interestingly, this young WGD is not shared with Cercis although Ks analysis suggests so. The reason for this conflict should be accelerated evolutionary rate in Bauhinia after it diverged from Cercis, which is also supported by the much longer branch length in B. variegata than C. canadensis after their divergence. The divergence and phylogenetic analyses for each gene pairs of B. variegata produced by the young WGD and their corresponding ortholog in C. canadensis support the allopolyploidy origin hypothesis for Bauhinia. Consistent with the WGD, B. variegata possesses a large number of expanded gene families among legumes. The genome of B. variegata provides a valuable genomic resource for dissecting genetic basis of its ornamental traits and addressing other evolutionary and genetic questions in Cercidoideae and legumes in general.

Funding

This work was financially supported by the Natural Science Foundation of Guangdong (2021A1515010997) and Forestry Science and Technology Innovation Project of Guangdong (2018KJCX043).

Conflict of interest

None declared.

Data availability

The high-quality genome assembly and annotation of Bauhinia variegata have been deposited in NCBI under the accession number: JAKRYI000000000 (BioProject accession: PRJNA801801). The repeats, gene annotation and the orthogroups among 11 species obtained from OrthoFinder2 are available at https://doi.org/10.6084/m9.figshare.19298582.v1.

47 in total

1. Circos: an information aesthetic for comparative genomics.

Authors: Martin Krzywinski; Jacqueline Schein; Inanç Birol; Joseph Connors; Randy Gascoyne; Doug Horsman; Steven J Jones; Marco A Marra
Journal: Genome Res Date: 2009-06-18 Impact factor: 9.043

2. Multiple polyploidy events in the early radiation of nodulating and nonnodulating legumes.

Authors: Steven B Cannon; Michael R McKain; Alex Harkess; Matthew N Nelson; Sudhansu Dash; Michael K Deyholos; Yanhui Peng; Blake Joyce; Charles N Stewart; Megan Rolf; Toni Kutchan; Xuemei Tan; Cui Chen; Yong Zhang; Eric Carpenter; Gane Ka-Shu Wong; Jeff J Doyle; Jim Leebens-Mack
Journal: Mol Biol Evol Date: 2014-10-27 Impact factor: 16.240

3. MapMan4: A Refined Protein Classification and Annotation Framework Applicable to Multi-Omics Data Analysis.

Authors: Rainer Schwacke; Gabriel Y Ponce-Soto; Kirsten Krause; Anthony M Bolger; Borjana Arsova; Asis Hallab; Kristina Gruden; Mark Stitt; Marie E Bolger; Björn Usadel
Journal: Mol Plant Date: 2019-01-09 Impact factor: 13.164

4. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype.

Authors: Daehwan Kim; Joseph M Paggi; Chanhee Park; Christopher Bennett; Steven L Salzberg
Journal: Nat Biotechnol Date: 2019-08-02 Impact factor: 54.908

5. A reference genome for pea provides insight into legume genome evolution.

Authors: Jonathan Kreplak; Mohammed-Amin Madoui; Petr Cápal; Petr Novák; Karine Labadie; Grégoire Aubert; Philipp E Bayer; Krishna K Gali; Robert A Syme; Dorrie Main; Anthony Klein; Aurélie Bérard; Iva Vrbová; Cyril Fournier; Leo d'Agata; Caroline Belser; Wahiba Berrabah; Helena Toegelová; Zbyněk Milec; Jan Vrána; HueyTyng Lee; Ayité Kougbeadjo; Morgane Térézol; Cécile Huneau; Chala J Turo; Nacer Mohellibi; Pavel Neumann; Matthieu Falque; Karine Gallardo; Rebecca McGee; Bunyamin Tar'an; Abdelhafid Bendahmane; Jean-Marc Aury; Jacqueline Batley; Marie-Christine Le Paslier; Noel Ellis; Thomas D Warkentin; Clarice J Coyne; Jérome Salse; David Edwards; Judith Lichtenzveig; Jiří Macas; Jaroslav Doležel; Patrick Wincker; Judith Burstin
Journal: Nat Genet Date: 2019-09-02 Impact factor: 38.330

6. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.

Authors: Sergey Koren; Brian P Walenz; Konstantin Berlin; Jason R Miller; Nicholas H Bergman; Adam M Phillippy
Journal: Genome Res Date: 2017-03-15 Impact factor: 9.043

7. Cercis: A Non-polyploid Genomic Relic Within the Generally Polyploid Legume Family.

Authors: Jacob S Stai; Akshay Yadav; Carole Sinou; Anne Bruneau; Jeff J Doyle; David Fernández-Baca; Steven B Cannon
Journal: Front Plant Sci Date: 2019-04-11 Impact factor: 5.753

8. OrthoFinder: phylogenetic orthology inference for comparative genomics.

Authors: David M Emms; Steven Kelly
Journal: Genome Biol Date: 2019-11-14 Impact factor: 13.583

9. FastUniq: a fast de novo duplicates removal tool for paired short reads.

Authors: Haibin Xu; Xiang Luo; Jun Qian; Xiaohui Pang; Jingyuan Song; Guangrui Qian; Jinhui Chen; Shilin Chen
Journal: PLoS One Date: 2012-12-20 Impact factor: 3.240

10. Insights into the evolution of symbiosis gene copy number and distribution from a chromosome-scale Lotus japonicus Gifu genome sequence.

Authors: Nadia Kamal; Terry Mun; Dugald Reid; Jie-Shun Lin; Turgut Yigit Akyol; Niels Sandal; Torben Asp; Hideki Hirakawa; Jens Stougaard; Klaus F X Mayer; Shusei Sato; Stig Uggerhøj Andersen
Journal: DNA Res Date: 2020-06-01 Impact factor: 4.458