| Literature DB >> 35466378 |
Weixiao Lei1, Zefu Wang2, Man Cao1, Hui Zhu1, Min Wang1, Yi Zou1, Yunchun Han1, Dandan Wang1, Zeyu Zheng1, Ying Li1, Bingbing Liu3, Dafu Ru1.
Abstract
Sophora japonica is a medium-size deciduous tree belonging to Leguminosae family and famous for its high ecological, economic and medicinal value. Here, we reveal a draft genome of S. japonica, which was ∼511.49 Mb long (contig N50 size of 17.34 Mb) based on Illumina, Nanopore and Hi-C data. We reliably assembled 110 contigs into 14 chromosomes, representing 91.62% of the total genome, with an improved N50 size of 31.32 Mb based on Hi-C data. Further investigation identified 271.76 Mb (53.13%) of repetitive sequences and 31,000 protein-coding genes, of which 30,721 (99.1%) were functionally annotated. Phylogenetic analysis indicates that S. japonica separated from Arabidopsis thaliana and Glycine max ∼107.53 and 61.24 million years ago, respectively. We detected evidence of species-specific and common-legume whole-genome duplication events in S. japonica. We further found that multiple TF families (e.g. BBX and PAL) have expanded in S. japonica, which might have led to its enhanced tolerance to abiotic stress. In addition, S. japonica harbours more genes involved in the lignin and cellulose biosynthesis pathways than the other two species. Finally, population genomic analyses revealed no obvious differentiation among geographical groups and the effective population size continuously declined since 2 Ma. Our genomic data provide a powerful comparative framework to study the adaptation, evolution and active ingredients biosynthesis in S. japonica. More importantly, our high-quality S. japonica genome is important for elucidating the biosynthesis of its main bioactive components, and improving its production and/or processing.Entities:
Keywords: zzm321990 Sophora japonicazzm321990 ; Hi-C; WGD; genome; nanopore
Mesh:
Year: 2022 PMID: 35466378 PMCID: PMC9154292 DOI: 10.1093/dnares/dsac009
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.477
Figure 1(A) Photo of S. japonica tree; (B) photo of an S. japonica leaf; (C) flower buds (Huaimi) of S. japonica; (D) flowers (Huaihua) of S. japonica; (E) seeds of S. japonica.
Summary of the genome assembly and annotation tables
| Genome assembly | ||
|---|---|---|
| Estimated genome size | 535.77 Mb | |
| N50 length (contig) | 17.34 Mb | |
| Longest contig | 32.92 Mb | |
| Number of contigs | 110 | |
| Total length of contigs | 511.49 Mb | |
| N50 length (scaffold) | 31.32 Mb | |
| Longest scaffold | 60.53 Mb | |
| Number of scaffolds | 99 | |
| Total length of scaffolds | 511.49 Mb | |
| Average GC content (%) | 33.77 | |
| BUSCO score of assembly (%) | 98.1 (S: 88.7, D: 9.4), F: 1.0, M: 0.9 | |
|
| ||
| Annotation | Percent (%) | Total length (bp) |
| DNA | 5.59 | 27,990,056 |
| LINE | 1.17 | 5,880,495 |
| LTR | 37.98 | 190,194,740 |
| SINE | 0.15 | 762,931 |
| Satellite | 0.18 | 904,219 |
| Simple repeat | 9.37 | 46,924,249 |
| Small RNA | 0.09 | 438,768 |
| Unknown | 20.31 | 101,699,257 |
| Total | 53.13 | 271,762,340 |
|
| ||
| Predicted genes | 31,000 | |
| Average genes length(bp) | 4,864.39 | |
| Average CDS length (bp) | 1,304.15 | |
| Average exons per gene | 5.63 | |
| Average exon length (bp) | 231.49 | |
| Average intron length (bp) | 620.11 | |
| BUSCO score of annotation (%) | 97.4 (S: 87.4, D: 10.0), F: 1.0, M: 1.6 | |
Figure 2.(A) Circos plot showing genomic features of S. japonica. Concentric circles, outer to inner, show GC density, gene density, repetitive sequence density and collinearity, respectively. (B) Heat map of chromatin contact matrices generated by aligning the Hi-C dataset to the S. japonica genome. (C) Phylogenetic relationships and divergence times of commelinid plants obtained using the maximum likelihood (ML) method with A. thaliana as a distant outgroup. Divergence times were estimated using the ‘mcmctree’ program incorporated in the PAML package.
Figure 3Results of comparative genomic analyses. (A) Ks distribution of syntenic blocks. (B) Dot plot of syntenic blocks identified by MCscanX in the S. japonica genome. (C) MCscanX identified synteny blocks (involving ≥5 collinear genes) between S. japonica, G. max and M. truncatula.
Figure 4(A) Map showing current distribution of S. japonica. (B) Principal component analysis (PCA) plots showing scores of the first two principal components. (C) Demographic history of S. japonica, showing PSMC estimates of the species’ effective population size (Ne). The time scale on the x-axis was calculated assuming a mutation rate per generation (μ) of 3.65×e−9 and generation time (g) of 7 years. The time of the Naynayxungla glaciation is highlighted in grey vertical bars. (D) Neighbour joining tree obtained with Senna tora as the outgroup. Red and blue indicate species of S. japonica and S. tora, respectively.