| Literature DB >> 29237703 |
Genevieve M Hoopes1, John P Hamilton1, Jeongwoon Kim1, Dongyan Zhao1, Krystle Wiegert-Rininger1, Emily Crisovan1, C Robin Buell2.
Abstract
Calotropis gigantea produces specialized secondary metabolites known as cardenolides, which have anticancer and antimalarial properties. Although transcriptomic studies have been conducted in other cardenolide-producing species, no nuclear genome assembly for an Asterid cardenolide-producing species has been reported to date. A high-quality de novo assembly was generated for C. gigantea, representing 157,284,427 bp with an N50 scaffold size of 805,959 bp, for which quality assessments indicated a near complete representation of the genic space. Transcriptome data in the form of RNA-sequencing libraries from a developmental tissue series was generated to aid the annotation and construction of a gene expression atlas. Using an ab initio and evidence-driven gene annotation pipeline, 18,197 high-confidence genes were annotated. Homologous and syntenic relationships between C. gigantea and other species within the Apocynaceae family confirmed previously identified evolutionary relationships, and suggest the emergence or loss of the specialized cardenolide metabolites after the divergence of the Apocynaceae subfamilies. The C. gigantea genome assembly, annotation, and RNA-sequencing data provide a novel resource to study the cardenolide biosynthesis pathway, especially for understanding the evolutionary origin of cardenolides and the engineering of cardenolide production in heterologous organisms for existing and novel pharmaceutical applications.Entities:
Keywords: Apocynaceae family; Calotropis gigantea; Genome Report; cardenolide; genome assembly; pharmaceutical
Mesh:
Substances:
Year: 2018 PMID: 29237703 PMCID: PMC5919723 DOI: 10.1534/g3.117.300331
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1C. gigantea and cardenolide metabolites. (A) C. gigantea plant. (B) Plant tissues used for RNA-sequencing libraries. 1, closed bud 2; sepals and petals; 3, peduncle and pedicel; 4, young leaf; and 5, gynostegium. (C) Mature leaf tissue used for RNA-sequencing libraries. (D) Cardenolide metabolite with the carbons numbered. Rx and Ry indicate ambiguity of the attached groups, which vary depending on the specific cardenolide.
Figure 2Heterozygosity of the C. gigantea genome and assembly. K-mer frequency plots generated in Jellyfish2 (Marçais and Kingsford 2011) using ALLPATHS-LG (Gnerre ) error-corrected genomic fragment reads.
Genome assembly metrics
| ALLPATHS-LG Assembly | PLATANUS Assembly | ALLPATHS-LG + GapCloser | ||
|---|---|---|---|---|
| Scaffold | Total length (bp) | 157,408,176 | 146,937,509 | 157,284,427 |
| Number | 1,538 | 16,684 | 1,536 | |
| N50 length (bp) | 806,518 | 187,271 | 805,959 | |
| Longest scaffold (bp) | 7,037,412 | 2,341,668 | 7,038,285 | |
| Gap size (bp) | 18,606,682 | 26,830,789 | 8,276,177 | |
| Contig | Total length (bp) | 138,806,556 | 120,116,579 | 149,009,524 |
| Number | 14,076 | 37,240 | 7,472 | |
| N50 length (bp) | 25,949 | 4,905 | 48,580 | |
| Longest contig (bp) | 417,030 | 70,238 | 788,128 | |
| BUSCO | Total complete | 87.80% | 77.20% | 89.80% |
| Single copy complete | 86.20% | 75.90% | 88.00% | |
| Duplicated complete | 1.60% | 1.30% | 1.80% | |
| Fragmented | 3.50% | 7.90% | 2.20% | |
| Missing | 8.70% | 14.90% | 8.00% | |
| Total number | 1440 | 1440 | 1440 |
Figure 3Homologous and syntenic relationships among the Apocynaceae family. (A) Venn diagram showing the number of orthologous and paralogous groups shared among the species. The number of singletons per species is also provided. (B) Rooted cladogram for the species generated in the OrthoFinder (Emms and Kelly 2015) output.
Figure 4Identification of putative C. gigantea genes involved in the cardenolide biosynthetic pathway. (A). Simplified schematic of the cardenolide biosynthetic pathway. (B and C) are neighbor-joining gene trees for 3β-hydroxysteroid dehydrogenase (3βHSD) and progesterone 5β-reductase (P5βR), respectively. Taxa for each tree are the C. gigantea candidates and functionally characterized proteins, for which the taxa labels include the GenBank identifier and species name; taxa are shaded according to their distance from the functionally characterized genes, with darker shades indicating smaller distances. Values on nodes indicate bootstrap support from 1000 bootstrap replicates. (D) Heat map of log2-transformed gene expression values (FPKM: fragments per kb exon model per million mapped reads) of candidate C. gigantea cardenolide biosynthesis genes. Cladograms were generated from conducting hierarchical clustering on the genes and samples. Blue and red colored genes are 3βHSD and P5βR candidates, respectively.