| Literature DB >> 35184175 |
Jing Wang1,2, Jianguang Li1,2, Zaiyuan Li3, Bo Liu3, Lili Zhang4, Dongliang Guo1,2, Shilian Huang1,2, Wanqiang Qian3, Li Guo5,6.
Abstract
Longan (Dimocarpus longan) is a subtropical fruit best known for its nutritious fruit and regarded as a precious tonic and traditional medicine since ancient times. High-quality chromosome-scale genome assembly is valuable for functional genomic study and genetic improvement of longan. Here, we report a chromosome-level reference genome sequence for longan cultivar JDB with an assembled genome of 455.5 Mb in size anchored to fifteen chromosomes, representing a significant improvement of contiguity (contig N50 = 12.1 Mb, scaffold N50 = 29.5 Mb) over a previous draft assembly. A total of 40 420 protein-coding genes were predicted in D. longan genome. Synteny analysis suggests longan shares the widespread gamma event with core eudicots, but has no other whole genome duplications. Comparative genomics showed that D. longan genome experienced significant expansions of gene families related to phenylpropanoid biosynthesis and UDP-glucosyltransferase. Deep genome sequencing analysis of longan cultivars identified longan biogeography as a major contributing factor for genetic diversity, and revealed a clear population admixture and introgression among cultivars of different geographic origins, postulating a likely migration trajectory of longan overall confirmed by existing historical records. Finally, genome-wide association studies (GWAS) of longan cultivars identified quantitative trait loci (QTL) for six different fruit quality traits and revealed a shared QTL containing three genes for total soluble solid and seed weight. The chromosome-level reference genome assembly, annotation and population genetic resource for D. longan will facilitate the molecular studies and breeding of desirable longan cultivars in the future.Entities:
Keywords: Dimocarpus longan; GWAS; gene flow; phenylpropanoid; population genomics; reference genome
Year: 2022 PMID: 35184175 PMCID: PMC9071379 DOI: 10.1093/hr/uhac021
Source DB: PubMed Journal: Hortic Res ISSN: 2052-7276 Impact factor: 7.291
Figure 1Chromosome-level genome assembly of longan ( (A–D): Photos of tree (A), flower (B), fruit cluster (C), and fruit section (D) of the longan cultivar JDB. (E) Kmer frequency distribution analysis for the JDB genome based on Illumina paired-end reads. (F) Overview of the D. longan genome. Tracks a to i: chromosomes, GC-content, density of Gypsy LTRs, density of Copia LTRs, density of protein-coding genes, SNP density, Indel density, distribution of secondary metabolic gene clusters (predicted using plantiSMASH), and syntenic blocks (colored ribbons). The density statistics were calculated within genomic windows 150 kb in size.
Statistics for Dimocarpus longan JDB genome assembly and annotations
|
|
|
| |
|---|---|---|---|
|
| Total number of contigs | 250 | 51 392 |
| Assembly size (Mb) | 455.5 | 471.9 | |
| Contig N50 (Mb) | 12.1 | 0.026 | |
| Contig N90 (Mb) | 1.8 | 0.006 | |
| Largest Contig (Mb) | 31.1 | 0.17 | |
|
| Total number of scaffolds | 90 | 17 367 |
| Assembly size (Mb) | 455.5 | 495.3 | |
| Scaffold N50 (Mb) | 29.6 | 0.57 | |
| Scaffold N90 (Mb) | 22.3 | 0.12 | |
| Largest scaffold (Mb) | 46.6 | 6.9 | |
|
| Number of genes | 40 420 | 31 007 |
| Repeat content (%) | 41.7 | 52.9 | |
| Number of ncRNAs | 2555 | NA | |
| BUSCO (%) | 98.1% | 94% | |
| GC content (%) | 43.9 | 33.7 |
Figure 2Phylogenomic genomics of (A) Summary of gene family clustering of D. longan and 13 related species. Single copy orthologs: 1-copy genes in an ortholog group. Multiple copy orthologs: multiple genes in an ortholog group. Unique orthologs: species-specific genes. Other orthologs: the rest of the clustered genes. Unclustered genes: number of genes outside of cluster. (B) Comparison of orthogroups (gene families) among six angiosperm species, D. longan (longan), A. thaliana (Arabidopsis), C. sinensis (citrus), S. tuberosum (potato), P. trichocarpa (poplar), and O. sativa (rice). (C) Phylogenetic relationships and divergence time estimates (with confidence intervals). The numbers of gene family expansions and contractions are indicated by red and blue numbers, respectively. (D) Bubble plot summarizing the most significantly enriched KEGG terms associated with expanded gene families in D. longan. The x-axis is the log10 transformed p-value. The size of the bubble is scaled to the number of genes. The color scale represents the scale of odds ratio in observed versus expected (genomic background) numbers of genes annotated with specific KEGG terms. (E) A phylogenetic tree of UGTs (UDP-glucosyltransferases) in three angiosperms, including D. longan.
Figure 3Population structure and admixture analysis of (A) A neighbor-joining phylogenetic tree of all D. longan individuals was constructed using SNPs. The artificial breeding individual was marked with red dots inside. Colors represent different geographic groups. (B) A biogeographical ancestry (admixture) analysis of D. longan accessions with four ancestral clusters colored differently in the heatmap, in which each column represents a longan sample. (C) Distribution of Fst values (a measure of genetic differentiation) between longan populations from Thailand (Thai), Fujian (FJ), and Guangdong (GD). (D) Maximum-likelihood tree and migration events among seven groups of D. longan. The migration events are colored according to their weight.
Figure 4GWAS mapping of seed weight and total soluble solids in longan fruit. (A) Top: Manhattan plot summarizing the GWAS results for seed weight based on analysis performed with a randomly down-sampled SNP call set. Middle: Manhattan plot of GWAS results for total soluble solids. Dotted lines represent the Bonferroni significance threshold. Red vertical lines represent the overlapping region of the two traits as highlighted below. Bottom: close-up of highlighted regions with three genes in the vicinity of the GWAS peak; (B) Diagram of three genes with respect to reference sequences and the haplotypes observed in samples collected from China. Black solid arrows indicate synonymous SNPs, and red solid arrows indicate non-synonymous SNPs specific to varieties with high total soluble solids (TSS) and low SW (seed weight); (C) Correlations of SW and TSS; (D) SNPs within the three genes for the most extreme TSS and SW phenotypes at each end of the high-type and low-type distributions (orange mutation is heterozygous to homozygous; blue: non-mutated) (see Supplementary Table 11 for all samples).