| Literature DB >> 28922823 |
Dongyan Zhao1, John P Hamilton1, Gina M Pham1, Emily Crisovan1, Krystle Wiegert-Rininger1, Brieanne Vaillancourt1, Dean DellaPenna2, C Robin Buell1.
Abstract
Camptotheca acuminata is 1 of a limited number of species that produce camptothecin, a pentacyclic quinoline alkaloid with anti-cancer activity due to its ability to inhibit DNA topoisomerase. While transcriptome studies have been performed previously with various camptothecin-producing species, no genome sequence for a camptothecin-producing species is available to date. We generated a high-quality de novo genome assembly for C. acuminata representing 403 174 860 bp on 1394 scaffolds with an N50 scaffold size of 1752 kbp. Quality assessments of the assembly revealed robust representation of the genome sequence including genic regions. Using a novel genome annotation method, we annotated 31 825 genes encoding 40 332 gene models. Based on sequence identity and orthology with validated genes from Catharanthus roseus as well as Pfam searches, we identified candidate orthologs for genes potentially involved in camptothecin biosynthesis. Extensive gene duplication including tandem duplication was widespread in the C. acuminata genome, with 2571 genes belonging to 997 tandem duplicated gene clusters. To our knowledge, this is the first genome sequence for a camptothecin-producing species, and access to the C. acuminata genome will permit not only discovery of genes encoding the camptothecin biosynthetic pathway but also reagents that can be used for heterologous expression of camptothecin and camptothecin analogs with novel pharmaceutical applications.Entities:
Keywords: Camptotheca acuminata; camptothecin; genome annotation; genome assembly; tandem duplications
Mesh:
Substances:
Year: 2017 PMID: 28922823 PMCID: PMC5737489 DOI: 10.1093/gigascience/gix065
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:Camptotheca acuminata Decne, the Chinese Happy Tree, is a member in the Nyssaceae family that produces the anticancer compound camptothecin.
Figure 2:Genome aspects of Camptotheca acuminata. (A) Structure of camptothecin. (B) Key amino acid mutations (red rectangles) in DNA topoisomerase I in camptothecin-producing and non-producing species and their phylogenetic relationship.
Input libraries and sequences for de novo assembly of the Camptotheca acuminata genome
| BioProject ID | BioSample ID | Fragment size (bp) | No. of cleaned read pairs | Use |
| Paired end | ||||
| PRJNA361128 | SAMN06220985 | 180 | 96 955 546 | ALLPATHS-LG assembly |
| PRJNA361128 | SAMN06220986 | 268 | 89 381 055 | ALLPATHS-LG assembly |
| PRJNA361128 | SAMN06220987 | 352 | 61 207 691 | GapCloser |
| PRJNA361128 | SAMN06220988 | 429 | 50 688 562 | GapCloser |
| PRJNA361128 | SAMN06220989 | 585 | 21 856 610 | GapCloser |
| PRJNA361128 | SAMN06220990 | 609 | 22 217 954 | GapCloser |
| Mate pair | ||||
| PRJNA361128 | SAMN06220991 | 8111 | 9 923 643 | ALLPATHS-LG assembly |
| PRJNA361128 | SAMN06220992 | 7911 | 7 652 519 | ALLPATHS-LG assembly |
| PRJNA361128 | SAMN06220993 | 1377 | 12 800 554 | ALLPATHS-LG assembly |
| PRJNA361128 | SAMN06220994 | 3179 | 13 138 503 | ALLPATHS-LG assembly |
| PRJNA361128 | SAMN06220995 | 8879 | 13 599 241 | ALLPATHS-LG assembly |
All libraries were sequenced in paired-end mode, generating 150 nt reads.
Metrics of the final assembly of Camptotheca acuminata genome
| Metric | Value |
|---|---|
| Total scaffold length (bp) | 403 174 860 |
| Total no. of scaffolds (bp) | 1394 |
| Maximum scaffold length (bp) | 8 423 530 |
| Minimum scaffold length (bp) | 1002 |
| N50 scaffold size (bp) | 1 751 747 |
| N50 contig size (bp) | 107 594 |
| No. of Ns | 3 772 191 (0.9%) |
| No. gaps | 3825 |
Figure 3:Venn diagram showing orthologous and paralogous groups between Amborella trichopoda, Arabidopsis thaliana, Camptotheca acuminata, and Catharanthus roseus.
Figure 4:Key portions of the proposed camptothecin biosynthetic pathway and an example of physical clustering of candidate genes in Camptotheca acuminata. (A) The methylerythritol phosphate (MEP) pathway (green), iridoid pathway (blue), and condensation of secologanic acid with tryptamine via strictosidinic acid synthase (STRAS) to form strictosidinic acid prior to downstream dehydration, reduction, and oxidation steps yielding camptothecin. 7-DLGT: 7-deoxyloganetic acid glucosyltransferase; 7-DLH: 7-deoxyloganic acid hydroxylase; 7-DLS: 7-deoxyloganetic acid synthase; CMK: 4-diphosphocytidyl-2-C-methyl-D-erythritol kinase; CMS: 4-diphosphocytidyl-methylerythritol 2-phosphate synthase; CYC1: iridoid cyclase 1; DXR: 1-deoxy-D-xylulose-5-phosphate reductoisomerase; DXS: 1-deoxy-D-xylulose 5-phosphate synthase 2; G8H: geraniol 8-hydroxylase; GES: plastid geraniol synthase; GOR: 8-hydroxygeraniol oxidoreductase; GPPS: geranyl pyrophosphate synthase; HDR: 1-hydroxy-2-methyl-butenyl 4-diphosphate reductase; HDS: GCPE protein; IPI: plastid isopentenyl pyrophosphate, dimethylallyl pyrophosphate isomerase; MCS: 2C-methyl-D-erythritol 2,4-cyclodiphosphate synthase; SLAS: secologanic acid synthase; TDC: tryptophan decarboxylase. (B) Physical clustering of homologs of genes involved in the methylerythritol phosphate, iridoid, and alkaloid biosynthetic pathways of Catharanthus roseus on scaffold 151 of C. acuminata. Gene IDs are below the arrows. 7DLH: 7-deoxyloganic acid 7-hydroxylase; GOR: 8-hydroxygeraniol oxidoreductase; IPP2: isopentenyl diphosphate isomerase II; NMT: 16-hydroxy-2,3-dihydro-3-hydroxytabersonine N-methyltransferase.
Identification of candidate camptothecin biosynthetic pathway genes in the Camptotheca acuminata genome as revealed by sequence identity and coverage with characterized genes from the 2-C-methyl-D-erythritol 4-phosphate/1-deoxy-D-xylulose 5-phosphate and iridoid biosynthetic pathways from Catharanthus roseus
| Description | Abbreviation | Protein | Camptotheca gene ID | % coverage | % identity |
|---|---|---|---|---|---|
| MEP | |||||
| 1-deoxy-D-xylulose 5-phosphate synthase 2 | DXS | ABI35993.1 | Cac_g024944.t1 | 98 | 77.60 |
| 1-deoxy-D-xylulose-5-phosphate reductoisomerase | DXR | AAF65154.1 | Cac_g016318.t1 | 100 | 88.82 |
| 4-diphosphocytidyl-methylerythritol 2-phosphate synthase | CMS | ACI16377.1 | Cac_g018722.t1 | 88 | 77.82 |
| 4-diphosphocytidyl-2-C-methyl-D-erythritol kinase | CMK | ABI35992.1 | Cac_g021688.t1 | 99 | 76.17 |
| 2C-methyl-D-erythritol 2,4-cyclodiphosphate synthase | MCS | AAF65155.1 | Cac_g008169.t1 | 100 | 73.77 |
| GCPE protein | HDS | AAO24774.1 | Cac_g022763.t1 | 100 | 88.65 |
| 1-hydroxy-2-methyl-butenyl 4-diphosphate reductase | HDR | ABI30631.1 | Cac_g014659.t1 | 100 | 83.77 |
| Plastid isopentenyl pyrophosphate: dimethylallyl pyrophosphate isomerase | IPI | ABW98669.1 | Cac_g008847.t1 | 76 | 91.06 |
| Geranyl pyrophosphate synthase | GPPS | ACC77966.1 | Cac_g026508.t1 | 51 | 76.50 |
| Iridoid | |||||
| Geraniol 8-hydroxylase | G8H | CAC80883.1 | Cac_g017987.t1 | 95 | 76.71 |
| 8-hydroxygeraniol oxidoreductase | GOR | AHK60836.1 | Cac_g027560.t1 | 100 | 71.69 |
| Iridoid synthase | ISY | AFW98981.1 | Cac_g006027.t1 | 100 | 65.65 |
| Iridoid oxidase | IO | AHK60833.1 | Cac_g032709.t1 | 97 | 78.44 |
| UDP-glucose iridoid glucosyltransferase | 7DLGT | BAO01109.1 | Cac_g008744.t1 | 100 | 77.11 |
| 7-deoxyloganic acid 7-hydroxylase | 7DLH | AGX93062.1 | Cac_g012663.t1 | 96 | 69.58 |
| Loganic acid methyltransferase | LAMT | ABW38009.1 | Cac_g005179.t1 | 95 | 53.91 |
| Secologanin synthase | SLS | AAA33106.1 | Cac_g012666.t1 | 99 | 64.94 |
Only the top hit from the BLAST search is presented.