| Literature DB >> 32280432 |
Junyang Yue1,2,3, Ran Wang1, Xiaojing Ma1, Jiayi Liu1, Xiaohui Lu4, Sambhaji Balaso Thakar3,5, Ning An2, Jia Liu6, Enhua Xia3, Yongsheng Liu1,3,7.
Abstract
Crocus sativus, containing remarkably amounts of crocin, picrocrocin and safranal, is the source of saffron with tremendous medicinal, economic and cultural importance. Here, we present a high-quality full-length transcriptome of the sterile triploid C. sativus, using the PacBio SMRT sequencing technology. This yields 31,755 high-confidence predictions of protein-coding genes, with 50.1% forming paralogous gene pairs. Analysis on distribution of Ks values suggests that the current genome of C. sativus is probably a product resulting from at least two rounds of whole-genome duplication (WGD) events occurred at ~28 and ~114 million years ago (Mya), respectively. We provide evidence demonstrating that the recent β WGD event confers a major impact on family expansion of secondary metabolite genes, possibly leading to an enhanced accumulation of three distinct compounds: crocin, picrocrocin and safranal. Phylogenetic analysis unravels that the founding member (CCD2) of CCD enzymes necessary for the biosynthesis of apocarotenoids in C. sativus might be evolved from the CCD1 family via the β WGD event. Based on the gene expression profiling, CCD2 is found to be expressed at an extremely high level in the stigma. These findings may shed lights on further genomic refinement of the characteristic biosynthesis pathways and promote germplasm utilization for the improvement of saffron quality.Entities:
Keywords: Apocarotenoid biosynthesis; Comparative transcriptomics; Crocus sativus; Saffron quality; Single molecular real-time (SMRT) sequencing
Year: 2020 PMID: 32280432 PMCID: PMC7132054 DOI: 10.1016/j.csbj.2020.03.022
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Supplementary Fig. S1Length distribution of the obtained non-redundant sequences.
Fig. 1Summary of sequence quality and annotation for the full-length transcriptome in C. sativus. (a) Quality assessment with the BUSCO tool showed proportions classified as Complete and single-copy (S, blue), Complete and duplicated (D, green), Fragmented (F, yellow) and Missing (M, red). (b) The numbers of protein-coding genes annotated in the nr, UniProt, GO and Pfam databases were illustrated by Venn diagram. (c) Simple sequence repeats (SSRs) including six main classes were counted. Frequency of the top ten motifs (if any) in each SSR class was present. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 2Comparative analysis of genome evolution and gene family in C. sativus. (a) Venn diagram showed the shared and specific gene families distributed among C. sativus and 11 representative plant species. Each value in parentheses represented the number of genes within corresponding families (without parentheses). Three-letter acronym for the abbreviation of each species name. (b) The specific genes in C. sativus were assigned to biological process and molecular function categories according to the GO annotation. Pie diagram next to each histogram bar represented the proportion of a given GO term in the specific genes to the proteomes of C. sativus. (c) Expansion and contraction of gene families among the 12 plant species. Phylogenetic tree was constructed based on 257 high-quality 1:1 single-copy orthologous genes using A. trichopoda as outgroup. The numerical values on each branch of the tree represented gene families undergoing gain (red) or loss (green) events. The number of gene families predicted in the most recent common ancestor (MRCA) was 19,106. The numerical values in the box denoted the estimated divergent time of each node (Myr). Three-letter acronym for the abbreviation of each species name. (d) Whole-genome duplication events detected in C. sativus as well as in A. officinalis and A. comosus. The occurrence time was estimated from the peak Ks value. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 3Evolutionary relationships and expression patterns of the key genes involved in apocarotenoid biosynthesis. (a) Biosynthetic pathway for producing distinct apocarotenoids through the cleavage of zeaxanthin. CCD, UGT, ALDH and β-GS represented gene-encoding enzymes of carotenoid cleavage dioxygenase, UDP-glucosyl transferase, aldehyde dehydrogenase and β-glucosidase, respectively. The histogram next to each enzyme showed the distribution of corresponding gene members identified from C. sativus, A. officinalis, Z. mays, D. carota and S. lycopersicum. Only the number of CCD members in C. sativus was relatively higher than other species. Table in the left box denoted zeaxanthin with different cleavage sites that were available by the CCD enzymes in the five representative species. Three-letter acronym for the abbreviation of each species name. (b) Neighbor-joining (NJ) phylogenetic tree of 38 CCD proteins constructed from the five representative plant species. Four subfamilies were grouped according to the substrate preference and cleavage specificity. In C. sativus (Csa, red solid dots), 13 CCD members were clustered into CCD1 and CCD4. The putative member for cleaving zeaxanthin at 7,8/7′,8′ double bonds was identified by similarity search against CsCCD2 and intended to be Cs3t109488 (blue pentagram). The numeric values within each red solid dot corresponded to the serial number given in the subgraph C. (c) Expression patterns of 13 CCD gene members (rows) from C. sativus based on the SGS RNA-Seq reads from five different tissues (columns). The heatmap was drawn with log2 transformation of gene expression data. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)