| Literature DB >> 22151917 |
Roberto A Barrero1, Brett Chapman, Yanfang Yang, Paula Moolhuijzen, Gabriel Keeble-Gagnère, Nan Zhang, Qi Tang, Matthew I Bellgard, Deyou Qiu.
Abstract
BACKGROUND: Euphorbia fischeriana is an important medicinal plant found in Northeast China. The plant roots contain many medicinal compounds including 12-deoxyphorbol-13-acetate, commonly known as prostratin that is a phorbol ester from the tigliane diterpene series. Prostratin is a protein kinase C activator and is effective in the treatment of Human Immunodeficiency Virus (HIV) by acting as a latent HIV activator. Latent HIV is currently the biggest limitation for viral eradication. The aim of this study was to sequence, assemble and annotate the E. fischeriana transcriptome to better understand the potential biochemical pathways leading to the synthesis of prostratin and other related diterpene compounds.Entities:
Mesh:
Substances:
Year: 2011 PMID: 22151917 PMCID: PMC3273484 DOI: 10.1186/1471-2164-12-600
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Summary of Euphorbia fischeriana transcriptome assembly
| Assembly statistics | |
|---|---|
| Total number of mate-pair reads (before trimming) | 17,502,188 |
| Total number of read base pairs (bp) | 1,312,664,100 |
| Average read length (before trimming; bp) | 75 |
| Total number of read mate-pairs (after trimming) | 17,073,322 |
| Total number of read singletons (after trimming) | 209,321 |
| Average read length (after trimming; bp) | 68 |
| Total number of ESTs | 1,884 |
| Total number of EST base pairs (bp) | 1,275,624 |
| Average EST length | 677 bp |
| Total number of transcripts assembled (pre-isoform filtering) | 31,454 |
| Total number of transcripts assembled (post-isoform filtering) | 18,180 |
| Average length of all transcripts (bp) | 1,122 |
| Transcripts with E-value > = 1e-05 against nr | 15,191 (83.6%) |
| Average length of transcripts (bp) | 1,066 |
Statistics of functional annotation of transcripts
| Annotated Proteins | # transcripts | % Total transcripts |
|---|---|---|
| Similar to known proteins | 8,834 | 48.57% |
| Conserved hypothetical proteins | 6,356 | 34.95% |
| Hypothetical proteins (ORF > = 80aa) | 819 | 4.50% |
| Subtotal | 16,009 | 88.06% |
| Putative long ncRNAs | 2,158 | 11.87% |
| tRNA genes | 5 (12)a,b | 0.06% |
| Pseudo tRNA genes | 2 | 0.01% |
| rRNA genes | 6 | 0.03% |
| Subtotal | 2,171 (20)a | 11.94% |
| Total number of assembled transcripts | 18,180 (18,187)a | 100% |
aTotal number of tRNAs including four tRNAs identified in a transcriptome assembly using a shorter k-mer size of 17 and a minimum transcript length of 100 bp.
btranscripts EFI_002280 and EFI_003197 encodes three and two tRNA genes, respectively.
Figure 1The effect of query sequence length on the distribution of significant matches against NCBI non-redundant (nr) peptide database. The number of transcripts with matches (cut-off E-value of 1e-05) in NCBI peptide database (nr) is greatest with the longer assembled sequences.
Figure 2Statistics of homology search of transcripts against nr peptide database. A) E-value distribution of the top BLASTx hits with a cut-off E-value of 1e-05. B) Similarity distribution of the top BLASTx hits with a cut-off E-value of 1e-05. C) Species distribution of the top BLASTx hits is shown as a percentage of the total homologous sequences with an E-value greater than or equal to 1e-05.
Figure 3Frequencies and mean expression levels of transcripts matching GO terms. The percentage of transcripts matching GO terms is show for each category as grey bars and the normalized mean expression levels of transcripts matching each of these GO terms are shown as black diamonds.
Figure 4Frequencies and mean expression levels of transcripts matching KEGG pathways. The percentage of transcripts matching to KEGG pathways within each high level category are shown as grey bars, while the normalized mean expression levels of transcripts matching each KEGG pathways are indicated as black diamonds. B) Number of transcripts for each secondary metabolite pathways is shown.
Figure 5Euphorbiaceae comparative transcriptome analysis. The BLAST program tBLASTx was used in conjunction with OrthoMCL using a threshold E-value of 1e-20 to identify orthologous genes between E. fischeriana and related species. Sequence datasets from related species were made non-redundant using CD-HIT-EST [33]. The number of orthologous or putative species-unique gene clusters is shown for all comparisons.
Figure 6Mean expression levels within the Terpenoid Backbone, Diterpenoid and Zeatin Biosynthesis pathways. A) Normalized mean expression levels for enzymes within the Terpenoid Backbone Biosynthesis (TBB) pathways, namely, plastidic 2-C-methyl-D-erythritol 4-phosphate (MEP) pathway, Diterpenoid Biosynthesis (DB) pathway and Zeatin Biosynthesis (ZB) pathway are provided. B) Normalized mean expression levels for enzymes within the TBB pathways, namely, cytosolic mevalonic acid (MVA) pathway, DB and ZB pathways. Number of E. fischeriana transcripts, from distinct gene clusters, matching each enzyme are shown between brackets in panels A and B. Abbreviations: AACT, acetoacetyl-coenzyme A (CoA) thiloase; CMS, 2-C-methyl-erythritol 4-phosphate cytidyl transferase; DXR, 1-deoxy-D-xylulose 5-phosphate reductoisomerase; DXS, 1-deoxy-D-xylulose-5-phosphate synthase; FPPS, farnesyl diphosphate synthase; GGPPS, geranylgeranyl diphosphate synthase; GPPS, geranyl diphosphate synthase; NA, Not Available; HMGR, 3-hydroxy-3-methylglutaryl coenzyme A (HMG-CoA) reductase; IPI, isopentenyl diphosphate isomerase; MK, mevalonate kinase; MPK, mevalonate-5-phosphate kinase; CMK, 4-(cytidine 5'-diphospho)-2-C-methyl-D-erythritol kinase; MDD, mevalonate diphosphate decarboxylase; IDS, isopentenyl diphosphate/dimethylallyl diphosphate synthase; MCS, 2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase; HDS, 1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate synthase; HMGS, HMG-CoA synthase; HMG-CoA, 3S-hydroxy-3-methylglutaryl coenzyme A; DXP, 1-deoxy-D-xylulose 5-phosphate; MVA, 3R-Mevalonic acid; M5P, Mevalonate-5-phosphate; MPP, Mevalonate diphosphate; MEP, 2-C-methyl-D-erythritol 4-phosphate; CDP-ME, 4-(cytidine 5'-diphospho)-2C-methyl-D-erythritol; CDP-MEP, 4-(cytidine 5'-diphospho)-2C-methyl-D-erythritol 2-phosphate; cMEPP, 2C-methyl-D-erythritol 2,4-cyclodiphosphate; DMAPP, Dimethylallyl diphosphate; HMBPP, 1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate; IPP, isopentenyl diphosphate; G3P, Glyceraldehyde 3-phosphate; GPP, geranyl diphosphate; GGPP, geranylgeranyl diphosphate; FPP, farnesyl diphosphate; GGPPS, geranylgeranyl diphosphate; ent-KSA, ent-Kaurene synthase A; ent-KSB, ent-Kaurene synthase B; ent-Kox, ent-Kaurene oxidase; CS, Casbene synthase; tRNA-DMAT, tRNA Dimethylallyltransferase; cis-ZOG, cis-Zeatin O-beta-D-glucosyltransferase; ent-CPP, ent-Copalyl diphosphate; ent-K, ent-Kaurene; UDP, Uridine 5'-diphosphate.
Real time PCR primers used for expression validations of selected enzymes.
| Accession | Enzyme | 5'-Forward Primer Sequence-3' | 5'-Reverse Primer Sequence-3' |
|---|---|---|---|
| EFI_002990 | AACT-1 | ACTATGCTTGCAGCCCAAAG | ATTTCCCATGCCAACATCAT |
| EFI_012483 | AACT-2 | ACAATGCTTGCTGCACAGAC | TCTCCACAAACTCCCATTCC |
| EFI_015339 | CS-1 | GGAGAGCTATTTTTGGGCAGT | CGACTTGAGCAAATGAGTCGT |
| EFI_018002 | CS-2 | GCAATTGATCCATCAGCAAG | AAGCAAAACAACTCTGGCAAT |
| EFI_007143 | DXS-1 | CGCACTAAATTTTGGGTTGC | CAAATCCCTTGGAATTGGTG |
| EFI_010574 | DXS-2 | GCTGCAAAAAGCATCACAAA | GGAGCTGGCATTGCTTTTAC |
| EFI_003135 | DXS-3 | TTTGCAACAAGTGGCATCTC | ATAGCCAAAGCCTCCACAAA |
| EFI_010535 | GGPPS-1 | CAAAAAGCTTCGCAATTCCT | GATTTTTGCGGGTTCTCTGA |
| EFI_010585 | GGPPS-2 | ACTTGCAGCCGTTTGTTTCT | ATCAGCAACGAGGGAAAATG |
| EFI_008533 | GGPPS-3 | ATTGTTAGCGGGTGCTGAAG | CAGCTCTTCCGCCATTTCTA |
| EFI_016937 | GGPPS | TCAATTCGCTGTTCTGCTTC | CCCTTAGAAAGGGCGGAGTA |
| EFI_001905 | IDS/HDR | CCACAGACGACTCTGCTTCA | GGTGTGCTCATTCCCATTTT |
| EFI_000087 | HDS | GTTTGGGCGATACAATCAGG | ATCCACCTCTTCACCCTCCT |
| EFI_0 00846 | HMGR-1 | CTCCACCGCAAAACCTCTTA | AACGACATGGAGAGGAGTGG |
| EFI_011656 | HMGR-2 | CAGTGCTGTGAAATGCCTGT | AGCTCTTGTCATGCCATCCT |
| EFI_001179 | MDD | GAGACATGGGTGAGGATGGT | CCTCCCCATTAAGCCACATA |
| EFI_014705 | Actin-1 | GGGAACGAGTCCCTGGTAGT | CTGCGTTGGTGGTCTTACCT |
| EFI_006680 | Actin-2 | AAATATGGCCGACAGTGAGG | ATACCTCGCTTGGACTGAGC |
| EFI_000139 | α-tubulin | GGCAACTTTTCCATCCTGAG | TCCAAGACCAGAACCAGTCC |
| EFI_014153 | β-tubulin-1 | AAGCAGGTCAATGTGGGAAC | CATTGTCCCTGGTTCAAGGT |
| EFI_014578 | β-tubulin-2 | GCTTGCAAGGTTTCCAGGTA | TTTTCCACAAGCTGATGCAC |