| Literature DB >> 35889464 |
Mi Jin Jeon1, Neha Samir Roy2, Beom-Soon Choi3, Ji Yeon Oh1, Yong-In Kim4, Hye Yoon Park5, Taeyoung Um2, Nam-Soo Kim3, Soonok Kim1, Ik-Young Choi2,3,6.
Abstract
The annual herb Euphorbia maculata L. produces anti-inflammatory and biologically active substances such as triterpenoids, tannins, and polyphenols, and it is used in traditional Chinese medicine. Of these bioactive compounds, terpenoids, also called isoprenoids, are major secondary metabolites in E. maculata. Full-length cDNA sequencing was carried out to characterize the transcripts of terpenoid biosynthesis reference genes and determine the copy numbers of their isoforms using PacBio SMRT sequencing technology. The Illumina short-read sequencing platform was also employed to identify differentially expressed genes (DEGs) in the secondary metabolite pathways from leaves, roots, and stems. PacBio generated 62 million polymerase reads, resulting in 81,433 high-quality reads. From these high-quality reads, we reconstructed a genome of 20,722 genes, in which 20,246 genes (97.8%) did not have paralogs. About 33% of the identified genes had two or more isoforms. DEG analysis revealed that the expression level differed among gene paralogs in the leaf, stem, and root. Whole sets of paralogs and isoforms were identified in the mevalonic acid (MVA), methylerythritol phosphate (MEP), and terpenoid biosynthesis pathways in the E. maculata L. The nucleotide information will be useful for identifying orthologous genes in other terpenoid-producing medicinal plants.Entities:
Keywords: Euphorbia maculata L.; MEP pathway; MVA pathway; PacBio SMRT sequencing; medicinal plant; terpenoids; transcriptomes
Mesh:
Substances:
Year: 2022 PMID: 35889464 PMCID: PMC9316252 DOI: 10.3390/molecules27144591
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.927
Figure 1Schematic representation of full-length cDNA analysis in E. maculata.
PacBio summary of RNA-seq data from two RNA libraries of E. maculata.
| Analysis Metric | Under 4 kb | Over 4 kb |
|---|---|---|
|
| ||
| Total Polymerase Read length (bp) | 31,143,923,142 | 31,036,246,900 |
| Total Polymerase Reads | 548,527 | 601,659 |
| Average Polymerase Read Length (bp) | 56,777 | 51,584 |
|
| ||
| Total Subreads | 18,525,814 | 8,597,836 |
| N50 | 2504 | 3893 |
| Average Subread Length (bp) | 1630 | 3739 |
|
| ||
| Total CCS reads | 467,479 | 465,085 |
| Total CCS read length (bp) | 1,155,280,061 | 1,879,756,017 |
| Average CCS read length (bp) | 2471 | 4040 |
|
| ||
| Number of polished high-quality isoforms | 47,860 | 33,573 |
| Number of polished low-quality isoforms | 405 | 993 |
IsoSeq results and statistics of isoforms in the transcriptomes of E. maculata.
| Iso Seq Result | Number of Reads | Length (bp) |
|---|---|---|
| High-quality consensus Seq. | 76,631 | 216,086,311 |
| Reconstructed Coding Contig | 19,902 | 60,494,776 |
| Unassigned Seq | 3344 | 10,608,597 |
| Fake Genome | 20,722 | 71,103,373 |
| Minimum read length | 100 | |
| Maximum read length | 13,544 | |
| Average read length | 3059 | |
| Number of Isoforms | Number of Transcripts | Percentage (%) |
| 1 | 13,492 | 66.9 |
| 2 | 3946 | 19.6 |
| 3 | 1269 | 6.3 |
| 4 | 630 | 3.1 |
| 5 | 381 | 1.9 |
| 6 | 185 | 0.9 |
| 7 | 116 | 0.6 |
| 8–25 | 153 | 0.8 |
| Total | 20,172 | 100 |
Figure 2Length distribution of the transcripts after de novo assembly.
Distribution of number of paralogs in the transcriptome of E. maculata.
| Number of Paralogs | Number of Transcripts |
|---|---|
| 1 | 20,246 |
| 2 | 84 |
| 3 | 14 |
| 4 | 18 |
| 5–20 | 27 |
Figure 3Paralogs and isoforms. (A): DOXP had three paralogs: DOXP.para1, DOXP.para2, and DOXP.para3. DOXP.para1 had three isoforms with different translation termination sites. DOXP.para3 had two isoforms due to alternative splicing and differences in translation initiation and termination sites. (B): PB84.1 is a tRNA ligase gene. It had no paralogs, but 10 isoforms, which differed by alternative splicing and different translation initiation and termination sites.
Figure 4GO analysis of the E. maculata transcripts.
Mapping information of the Illumina sequence reads and the results of differentially expressed genes.
| Mapping Information | Leaf | Root | Stem |
|---|---|---|---|
| No. of total reads | 25,971,888 | 29,095,594 | 26,009,774 |
| No. of mapped Paired-end reads | 18,411,506 | 17,458,816 | 16,843,542 |
| % Mapped Paired-end reads | 70.9 | 60 | 64.8 |
| No. of expressed genes | |||
| 0 | 2987 | 3642 | 2714 |
| >0 | 17,735 | 17,260 | 18,008 |
| Differential Expression | Leaf vs. Root | Root vs. Stem | Leaf vs. Stem |
| Up | 447 | 1049 | 87 |
| Down | 1660 | 177 | 266 |
Figure 5Venn diagram showing the number of unigenes expressed in three different organs.
Figure 6GO analysis of the organ-specific-expressing unigenes.
Enzymes involved in the biosynthesis of terpenoids, isopentyl diphosphate, and dimethylallyl diphosphate.
| Enzymes | Abbreviation | Pathway | No of Paralogs | Range of Isoform |
|---|---|---|---|---|
| Acetate-Mevalonate | ||||
| Acetoacetyl CoA thiolase | AAC thiolase | 1 | 1 | |
| 3-Hydroxy-3-methylglutaryl synthase | HMG-CoA Synthase | 3 | 1 | |
| 3-Hydroxy-3-methylglutaryl reductase | HMG-CoA Reductase | 5 | 1–3 | |
| Mevalonate kinase | MVA kinase | 1 | 1 | |
| Mevalonate phosphate kinase | MVAP kinase | 2 | 1–2 | |
| Mevalonate diphosphate decarboxylase | MVAPP carboxylase | 2 | 1–2 | |
| Non-Mevalonate | ||||
| 1-deoxy-D-xylulose-5-phophate synthase | DOXP synthase | 2 | 1–3 | |
| 1-deoxy-D-xylulose-5-phophate reductoisomerase | DOXP reductoisomerase | 3 | 1–3 | |
| Cytidine diphosphate 2-C-methyl-D-erythritol synthase | CDP-ME synthase | 2 | 1 | |
| Cytidine diphosphate 2-C-methyl-D-erythritol kinase | CDP-ME kinase | 1 | 1 | |
| 2C-methyl-D-erythritol synthase | MECP synthase | 4 | 1 | |
| 1-hydroxy-2-methyl-2-D-butenyl-4-diphosphate synthase | HMBPP synthase | 2 | 2 | |
| IPP/MDAPP synthase | IspH | 2 | 1 | |
| Terpenoid synthesis | ||||
| Isopentenyl-diphosphate delta-isomerase | IDI | 2 | 1–2 | |
| Geranyl diphosphate synthase | GPP synthase | 2 | 1 | |
| Farnesyl diphosphate synthase | FPP synthase | 1 | 2 | |
| Geranyl geranyl diphosphate synthase | GGPP synthase | 2 | 1 | |
| Monoterpene synthase | Monoterpene synthase | 2 | 1 | |
| Sesquiterpene synthase | Sesquiterpene synthase | 1 | 1 | |
| Diterpene synthase | Ent-Kaurene synthase | 1 | 1 | |
| Squalene synthase | Squalene synthase | 2 | 1 | |
| Triterpene synthase | Triterpene synthase | 3 | 1 | |
Figure 7Biochemical pathways of (a) the MVA and MEP pathways and (b) terpenoid biosynthesis. The numbers in parenthesis are the genes in the E. maculata transcriptomes. The numbers in the heat maps are the FPKM-normalized values.