| Literature DB >> 34671091 |
Chi Song1,2, Fangfang Fu3, Lulu Yang4,5, Yan Niu5, Zhaoyang Tian5, Xiangxiang He5, Xiaoming Yang3, Jie Chen5, Wei Sun1, Tao Wan6, Han Zhang2, Yicheng Yang5, Tian Xiao4, Komivi Dossa5,7, Xiangxiao Meng1, Fuliang Cao3, Yves Van de Peer8,9,10,11, Guibin Wang12, Shilin Chen13.
Abstract
Taxol, a natural product derived from Taxus, is one of the most effective natural anticancer drugs and the biosynthetic pathway of Taxol is the basis of heterologous bio-production. Here, we report a high-quality genome assembly and annotation of Taxus yunnanensis based on 10.7 Gb sequences assembled into 12 chromosomes with contig N50 and scaffold N50 of 2.89 Mb and 966.80 Mb, respectively. Phylogenomic analyses show that T. yunnanensis is most closely related to Sequoiadendron giganteum among the sampled taxa, with an estimated divergence time of 133.4-213.0 MYA. As with most gymnosperms, and unlike most angiosperms, there is no evidence of a recent whole-genome duplication in T. yunnanensis. Repetitive sequences, especially long terminal repeat retrotransposons, are prevalent in the T. yunnanensis genome, contributing to its large genome size. We further integrated genomic and transcriptomic data to unveil clusters of genes involved in Taxol synthesis, located on the chromosome 12, while gene families encoding hydroxylase in the Taxol pathway exhibited significant expansion. Our study contributes to the further elucidation of gymnosperm relationships and the Taxol biosynthetic pathway.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34671091 PMCID: PMC8528922 DOI: 10.1038/s42003-021-02697-8
Source DB: PubMed Journal: Commun Biol ISSN: 2399-3642
Assembly and annotation statistics of the draft genome of T. yunnanensis.
| Assembly features | |
|---|---|
| Total length of scaffolds (bp) | 10,738,316,084 |
| Longest scaffold (bp) | 1,071,627,631 |
| N50 of scaffold (bp) | 966,801,426 |
| Total length of contigs (bp) | 10,737,203,084 |
| Longest contig (bp) | 22,834,067 |
| N50 of contig (bp) | 2,892,145 |
| GC ratio (%) | 36.91 |
| Total number of contigs | 11,280 |
Fig. 1Distribution of T. yunnanensis genomic features.
a Circular representation of the 12 pseudochromosomes, b gene density (5 Mb window), c percentage of repeats (5 Mb window), d GC content (5 Mb window), and intragenomic syntenic regions denoted by a single line represent a genomic syntenic region covering at least five paralogues.
Fig. 2Genome evolution of T. yunnanensis.
a Inferred phylogenetic tree with 588 single-copy gene families in 14 plant species. Gene family expansions are indicated in green, and gene family contractions are indicated in red. Blue bars at nodes represent divergence times estimated by Maximum Likelihood (PAML). b Shared and unique gene families in four species. c Synonymous substitutions per synonymous site (Ks) distributions of orthologous (and paralogous) genes between T. yunnanensis and G. montanum, G. biloba, P. abies, S. giganteum and A. trichopoda.
Fig. 3Genes involved in the Taxol biosynthetic pathway.
a Transcriptomic analysis of genes involved in the Taxol biosynthetic pathway. The FPKM was calculated to evaluate the expression level of each gene. T1βOH and T9αOH represent the enzymes responsible for C-1 and C-9 oxidation that are currently unverified and which are presumed to belong to CYP450 gene family. b Phylogenetic tree of the CYP725A gene sub-family in T. yunnanensis, S. giganteum, P. menziesii and G. biloba. Genes from the four different plants are labeled in different colors, Blue, T. yunnanensis; Pink, P. menziesii; Red, S. giganteum; Green, G. biloba. c Arrangement and chromosomal positions of three Taxol gene clusters on chromosome 12 (chr12). d Heat maps of gene expression of CYP725A genes located on chromosome 12. The average expression profiles of three replicates of different tissues of T. yunnanensis were used to make the heat map. Color scale represents log2-transformed FPKM (expected number of fragments per kilobase of transcript sequence per millions base pairs sequenced) values. The gradual change of the color indicates the different expression levels of genes, white indicating low transcript abundance and red indicating high levels of transcript abundance.