| Literature DB >> 33328448 |
Yun Li1,2,3, Hairong Wei1,4, Jun Yang1,2,3, Kang Du1,2,3, Jiang Li1,2,3, Ying Zhang1,2,3, Tong Qiu1,2,3, Zhao Liu1,2,3, Yongyu Ren1,2,3, Lianjun Song5, Xiangyang Kang6,7,8.
Abstract
We report the acquisition of a high-quality haploid chromosome-scale genome assembly for the first time in a tree species, Eucommia ulmoides, which is known for its rubber biosynthesis and medicinal applications. The assembly was obtained by applying PacBio and Hi-C technologies to a haploid that we specifically generated. Compared to the initial genome release, this one has significantly improved assembly quality. The scaffold N50 (53.15 MB) increased 28-fold, and the repetitive sequence content (520 Mb) increased by 158.24 Mb, whereas the number of gaps decreased from 104,772 to 128. A total of 92.87% of the 26,001 predicted protein-coding genes identified with multiple strategies were anchored to the 17 chromosomes. A new whole-genome duplication event was superimposed on the earlier γ paleohexaploidization event, and the expansion of long terminal repeats contributed greatly to the evolution of the genome. The more primitive rubber biosynthesis of this species, as opposed to that in Hevea brasiliensis, relies on the methylerythritol-phosphate pathway rather than the mevalonate pathway to synthesize isoprenyl diphosphate, as the MEP pathway operates predominantly in trans-polyisoprene-containing leaves and central peels. Chlorogenic acid biosynthesis pathway enzymes were preferentially expressed in leaves rather than in bark. This assembly with higher sequence contiguity can foster not only studies on genome structure and evolution, gene mapping, epigenetic analysis and functional genomics but also efforts to improve E. ulmoides for industrial and medical uses through genetic engineering.Entities:
Year: 2020 PMID: 33328448 PMCID: PMC7603500 DOI: 10.1038/s41438-020-00406-w
Source DB: PubMed Journal: Hortic Res ISSN: 2052-7276 Impact factor: 6.793
Fig. 1Karyotypic analysis of haploid plants generated through parthenogenesis under high-temperature treatments (48 °C for 6 h and 54 °C for 4 h) in E. ulmoides.
a Somatic chromosome number of the haploids (2n = x = 17). b Ploidy levels obtained from 3-week-old first leaf samples from haploid plants by flow cytometric analysis. c Ploidy levels obtained from 3-week-old first leaf samples from a mixture of haploid and diploid plants by flow cytometric analysis. d A haploid plant (left) and diploid plant (right) of E. ulmoides
Statistics for the Eucommia genome and gene annotation.
| Estimate of genome size | 1.02 Gb |
| Total assembly size | 947.84 Mb |
| Number of contigs | 564 |
| N50 of contigs | 13.16 Mb |
| Longest contigs | 34.99 Mb |
| Sequence anchored to the Hi–C map | 947.86 Mb |
| Number of scaffolds after Hi–C assembly | 501 |
| N50 of scaffolds after Hi–C assembly | 53.15 Mb |
| Longest scaffold after Hi–C assembly | 79.92 Mb |
| GC content | 0.3517 |
| Number of genes | 26001 |
| Percentage of gene length in genome | 16.84% |
| Mean gene length | 6138.21 |
| Mean coding sequence length | 1108.8 |
| Mean exon number per gene | 4.85 |
| Mean exon length | 228.56 |
| Mean intron length | 1305.93 |
| rRNAs | 2099 |
| tRNAs | 825 |
| miRNAs | 1032 |
| snRNAs | 875 |
| Repeat content | 62.50% |
Fig. 2Overview of the E. ulmoides genome.
a Lengths of pseudochromosomes (0.5 Mb window size (WS)); b gene density (0.3 Mb WS); c repeat density (1 Mb WS); d GC content (1 Mb WS). The colored lines in the center show links between syntenic blocks of at least five genes
Fig. 3Evolution and synteny of the E. ulmoides genome.
a The insertion times for intact LTR-RTs in the E. ulmoides genome. The insertion times for LTR-RTs were calculated by the formula T = K/2r. T insertion time; r synonymous mutations/site/Mya; K the divergence between the two LTRs. A substitution rate of 8.25 × 10−9 per site per year was used to calculate the insertion times. b Venn diagram of shared orthologs among the five species. Each number represents a gene family number. c Phylogenetic tree of 12 species based on orthologs of single-gene families. The number at the root (28,004) represents the number of gene families related to the common ancestor. The value above each branch denotes the number of gene families gained/lost during each round of genome duplication after diversification from the common ancestor. The red number below each branch denotes the speculated divergence time of each node. Bootstrap values for all nodes are above 50%. d Density distributions of Ks for paralogous genes. The peak values are shown in insets for E. ulmoides and C. canephora. e Density distributions of Ks for paralogous genes. The peak values are shown in insets for E. ulmoides and S. lycopersicum. f Schematic representation of syntenic genes among E. ulmoides, V. vinifera and C. canephora. Gray lines in the background indicate the collinear blocks of at least twenty genes within the E. ulmoides genome and other plant genomes, while the red lines highlight the syntenic gene pairs
Fig. 4The E. ulmoides rubber biosynthesis pathway and expression profiles of genes involved in the pathway.
The expression level is presented by log2-transformed fragments mapped per kilobase of transcript length per million total mapped reads (log2-FPKM). ACAT acetyl-coenzyme A (CoA) C-acetyltransferase; HMGS hydroxymethylglutaryl-CoA synthase; HMGR hydroxymethylglutaryl-CoA reductase; MVK mevalonate kinase; PMK 5-phosphomevalonate kinase; MPD mevalonate pyrophosphate decarboxylase; DXS 1-deoxy-d-xylulose 5-phosphate synthase; DXR 1-deoxy-d-xylulose 5-phosphate reductoisomerase; MCT 2-C-methyl-d-erythritol 4-phosphate cytidylyltransferase; CMK 4-(cytidine 5′-diphospho)-2-C-methyl-d-erythritol kinase; MDS 2-C-methyl-d-erythritol 2,4-cyclodiphosphate synthase; HDS 4-hydroxy-3-methylbut-2-enyl diphosphate synthase; HDR 4-hydroxy-3-methylbut-2-enyl diphosphate reductase; IDI isopentenyl diphosphate isomerase; GPS geranyl diphosphate synthase; FPS farnesyl diphosphate synthase; GGPS, geranylgeranyl diphosphate synthase; SRPP small rubber particle protein; Acetyl-CoA acetyl coenzyme-A; Acetoacetyl-CoA, 3-acetoacetyl-CoA; HMG-CoA, 3-hydroxy-3-methylglutaryl-CoA; MVA mevalonate; MVA-5P mevalonate-5-phosphate; MVA-5PP mevalonate-5-diphosphate; GA-3-P, glyceraldehyde 3-phosphate; DXP, 1-deoxy-d-xylulose 5-phosphate; MEP 2-C-methyl-d-erythritol 4-phosphate; CME 4-(cytidine 5′-diphospho)-2-C-methyl-d-erythritol; PCME 2-phospho-4-(cytidine 5′-diphospho)-2-C-methyl-d-erythritol; CMEC 2-C-methyl-d-erythritol 2,4-cyclodiphosphate; HMED 4-hydroxy-3-methylbut-2-enyl diphosphate; IPP isopentenyl diphosphate; DMAPP, dimethylallyl diphosphate; GPP geranyl diphosphate; FPP farnesyl diphosphate; GGPP, geranylgeranyl diphosphate. LF leaf; CP central peel; PE peel edge; XM xylem; SD seed
Fig. 5The chlorogenic acid biosynthesis pathway and expression profiles of genes involved in the pathway.
The expression level is presented as log2-transformed fragments mapped per kilobase of transcript length per million total mapped reads (log2-FPKM). L1-L3 leaf; B1-3 bark. The enzymes involved are as follows: PAL phenylalanine ammonia-lyase; C4H cinnamate 4-hydroxylase; 4CL 4-coumaroyl-CoA ligase; HCT hydroxycinnamoyl-CoA: shikimate hydroxycinnamoyl transferase; C3′H p-coumaroyl ester 3′-hydroxylase; HQT hydroxycinnamoyl-CoA:quinate hydroxycinnamoyl transferase