| Literature DB >> 22860059 |
Miao He1, Ying Wang, Wenping Hua, Yuan Zhang, Zhezhi Wang.
Abstract
BACKGROUND: Hypericum perforatum L. (St. John's wort) is a medicinal plant with pharmacological properties that are antidepressant, anti-inflammatory, antiviral, anti-cancer, and antibacterial. Its major active metabolites are hypericins, hyperforins, and melatonin. However, little genetic information is available for this species, especially that concerning the biosynthetic pathways for active ingredients. METHODOLOGY/PRINCIPALEntities:
Mesh:
Substances:
Year: 2012 PMID: 22860059 PMCID: PMC3408400 DOI: 10.1371/journal.pone.0042081
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summary statistics of the sequence assembly generated from Hypericum perforatum.
| Total number of reads | 24,429,306 |
| Total nucleotides (nt) | 2,198,637,540 |
| GC percentage | 50.45% |
| Q20 percentage | 94.62% |
|
| |
| Total number of contigs | 192,465 |
| Average sequence size of contigs (bp) | 204 |
| Total number of scaffolds | 115,587 |
| Average sequence size of scaffolds (bp) | 298 |
| Total number of unigenes | 59,184 |
| Total nucleotides (nt) in unigenes | 24,986,432 |
| Average sequence size of unigenes (bp) | 422 |
Figure 1Distributions of lengths
(A) and gaps (B) for unigenes from .
Summary statistics of functional annotation for Hypericum perforatum unigenes in public protein databases.
| Public protein database | Number of unigene hits | Percentage (%) |
|
| 40,551 | 68.52 |
|
| 26,657 | 45.04 |
|
| 11,209 | 18.94 |
|
| 20,548 | 34.72 |
|
| 40,813 | 68.86 |
Figure 2COG and GO classifications of unigenes derived via Solexa sequencing in Hypericum perforatum.
(A), COG Function Classification of transcriptome. A total of 11,209 unigenes showing significant homology to COGs database at NCBI (E-value ≤1.0e−5) had COG classification among 24 categories. (B), H. perforatum unigenes with GO annotations based on Arabidopsis protein hits from NR. Right y-axis, percentage of genes; left y-axis, number of genes.
Figure 3Unigenes from Hypericum perforatum related to secondary metabolic pathways.
Figure 4Putative biosynthesis pathways for hypericin (A), hyperforin (B), and melatonin (C) in Hypericum perforatum.
Dashed box within (B) occurs in animals. Hyp-1, Hypericum perforatum phenolic oxidative coupling protein; MEP pathway, non-mevalonate pathway; MAT, dimethylallyltranstransferase; AS I, anthranilate synthase I; AS II, anthranilate synthase II; PAT, phosphoribosylanthranilate transferase; PAI, phosphoribosylanthranilate isomerase; IGPS, indole-3-glycerol phosphate synthase; TSA, Tryptophan synthase alpha chain; TSB, Tryptophan synthase beta chain; TDC, tryptophan decarboxylase; TPH, tryptophan hydroxylase; ORCA3, octadecanoid-derivative responsive Catharanthus AP2-domain protein 3; DXS, 1-D-deoxyxylulose 5-phosphate synthase; DXR, 1-deoxy-D-xylulose 5-phosphate reductoisomerase; CMS, 4-diphosphocytidyl-2-C-methyl-D-erythritol synthase; CMK, 4-(cytidine 5′-diphospho)-2-C-methyl-D-erythritol kinase; MCS, 2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase; HDS, (E)-4-hydroxy-3-methylbut-2-enyl-diphosphate synthase; IDS, isopentenyl-diphosphate:NAD(P)+ oxidoreductase; IPI, isopentenylpyrophosphate isomerase; GPPS, geranylgeranyl pyrophosphate synthase; DMAPP, dimethylallyl diphosphate; IPP, isopentenyl diphosphate; GPP, geranyl diphosphate; MEP, 2-C-methyl-Derythritol-4-phosphate; CDP-ME, 4-(cytidine-5′-diphospho)-2-C-methyl-Derythritol; CDP-MEP, 2-phospho-4-(cytidine-5′-diphospho)-2-C- methyl-Derythritol; Me-cPP, 2-C-methyl-D-erythritol-2,4, cyclodiphosphate; HMBPP, 1-hydroxy-2-methyl- 2-(E)-butenyl 4-diphosphate.
Putative unigenes related to the biosynthesis of hyperforin, hypericin, and melatonin.
| Enzyme | NU | MNCG | ||
|
| Type III PKSs | MAT | 91 | 50 |
| Hyp-1 | 12 | 6 | ||
| MEP pathway | DXS | 13 | 10 | |
| DXR | 2 | 2 | ||
| CMS | 2 | 2 | ||
| CMK | 2 | 1 | ||
| MCS | 2 | 2 | ||
| HDS | 4 | 1 | ||
| IDS | 2 | 2 | ||
| IPI | 2 | 2 | ||
| GPPS | 62 | 49 | ||
|
| Chorismate pathway | AS | 12 | 7 |
| PAT | 12 | 7 | ||
| PAI | 2 | 2 | ||
| IGPS | 2 | 1 | ||
| TSA | 2 | 1 | ||
| TSB | 8 | 5 | ||
| Tryptophan metabolism | TDC | 11 | 11 | |
| TPH | 17 | 15 | ||
|
| 260 | 126 | ||
NU, number of unigenes; MNCG, maximum number of coding genes. MAT, dimethylallyltranstransferase; Hyp-1, Hypericum perforatum phenolic oxidative coupling protein; MAT, dimethylallyltranstransferase; DXS, 1-D-deoxyxylulose 5-phosphate synthase; DXR, 1-deoxy-D-xylulose 5-phosphate reductoisomerase; CMS, 4-diphosphocytidyl-2-C-methyl-D-erythritol synthase; CMK, 4-(cytidine 5′-diphospho)-2-C-methyl-D- erythritol kinase; MCS, 2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase; HDS, (E)-4-hydroxy-3- methylbut-2-enyl-diphosphate synthase; IDS, isopentenyl-diphosphate:NAD(P)+ oxidoreductase; IPI, isopentenylpyrophosphate isomerase; GPPS, geranylgeranyl pyrophosphate synthase; AS, anthranilate synthase; PAT, phosphoribosylanthranilate transferase; PAI, phosphoribosylanthranilate isomerase; IGPS, indole-3-glycerol phosphate synthase; TSA, Tryptophan synthase alpha chain; TSB, Tryptophan synthase beta chain; TDC, tryptophan decarboxylase; TPH, tryptophan hydroxylase.
Putative transcription factors encoding unigenes in Hypericum perforatum.
| TF Family | NU | NATHB | NTGAD | Percentage(%) |
| NAC | 212 | 43 | 96 | 44.79 |
| C2H2 | 189 | 89 | 211 | 42.18 |
| AP2-EREBP | 144 | 53 | 138 | 38.40 |
| C3H | 132 | 52 | 165 | 31.52 |
| Homeobox | 120 | 55 | 102 | 53.92 |
| bHLH | 106 | 56 | 161 | 34.78 |
| MYB | 95 | 51 | 208 | 24.52 |
| MADS | 86 | 27 | 111 | 24.32 |
| WRKY | 77 | 34 | 72 | 47.22 |
| bZIP | 52 | 31 | 73 | 42.47 |
| GRAS | 52 | 15 | 33 | 45.45 |
| G2-like | 43 | 18 | 40 | 45.00 |
| Trihelix | 39 | 19 | 29 | 65.52 |
| ARF | 33 | 11 | 24 | 45.83 |
| Other | 392 | 190 | 452 | 42.04 |
NU, number of unigenes; NATHB, number of Arabidopsis TF genes hit by Blast; NTGAD, number of TF genes in AGRIS database; percentage = NATHB/NTGAD.
Figure 5Expression patterns of some novel transcripts related to secondary metabolism in Hypericum perforatum.
ERROR BARs indicate standard deviation.
Number of SSRs in Hypericum perforatum.
| Repeat motif | Repeat numbers | Number of SSRs | Percent (%) | |||||||
| 3 | 4 | 5 | 6 | 7 | 8 | 9 | >9 | |||
| Mono- | – | – | – | – | – | – | – | 1317 | 1317 | 17.23 |
| Di- | – | – | 1101 | 405 | 245 | 216 | 200 | 852 | 3019 | 39.5 |
| Tri- | – | – | 1117 | 407 | 198 | 83 | 50 | 135 | 1990 | 26.04 |
| Tetra- | – | 181 | 49 | 13 | 5 | 1 | 0 | 0 | 249 | 3.26 |
| Penta- | 377 | 76 | 23 | 3 | 1 | 2 | 0 | 0 | 482 | 6.31 |
| Hexa- | 453 | 89 | 31 | 13 | 0 | 0 | 0 | 0 | 586 | 7.67 |
| Total | 830 | 346 | 2321 | 841 | 449 | 302 | 250 | 987 | 7643 | |
| Percent(%) | 10.86 | 4.53 | 30.37 | 11 | 5.87 | 3.95 | 3.27 | 12.91 | ||
Frequency of di- and trinucleotide EST-SSR repeat motifs in Hypericum perforatum.
| Repeat motif | Repeat numbers | Total | Percent(%) | |||||
| 5 | 6 | 7 | 8 | 9 | >9 | |||
| AC/GT | 147 | 42 | 11 | 13 | 12 | 17 | 242 | 3.17 |
| AG/CT | 815 | 338 | 223 | 199 | 186 | 834 | 2595 | 33.95 |
| AT/AT | 127 | 22 | 11 | 4 | 2 | – | 166 | 2.17 |
| CG/GC | 12 | 3 | – | – | – | 1 | 16 | 0.2 |
| AAC/GTT | 83 | 32 | 17 | 8 | 1 | 7 | 148 | 1.94 |
| AAG/CTT | 289 | 122 | 60 | 34 | 22 | 98 | 625 | 8.18 |
| AAT/ATT | 48 | 17 | 12 | 1 | 2 | 1 | 81 | 1.06 |
| ACC/GGT | 80 | 34 | 14 | 2 | – | – | 130 | 1.7 |
| ACG/CGT | 54 | 17 | 3 | 1 | 1 | 1 | 77 | 1.01 |
| ACT/AGT | 17 | 4 | – | 1 | 1 | 1 | 24 | 0.31 |
| AGC/CTG | 76 | 25 | 10 | 2 | 1 | 1 | 115 | 1.5 |
| AGG/CCT | 320 | 123 | 61 | 25 | 13 | 2 | 554 | 7.25 |
| ATC/ATG | 83 | 26 | 18 | 9 | 9 | 14 | 159 | 2.08 |
| CCG/CGG | 67 | 7 | 3 | – | – | – | 77 | 1.01 |