| Literature DB >> 26046541 |
Johannes A Hofberger1, Aldana M Ramirez2, Erik van den Bergh3, Xinguang Zhu4, Harro J Bouwmeester3, Robert C Schuurink2, M Eric Schranz3.
Abstract
An important component of plant evolution is the plethora of pathways producing more than 200,000 biochemically diverse specialized metabolites with pharmacological, nutritional and ecological significance. To unravel dynamics underlying metabolic diversification, it is critical to determine lineage-specific gene family expansion in a phylogenomics framework. However, robust functional annotation is often only available for core enzymes catalyzing committed reaction steps within few model systems. In a genome informatics approach, we extracted information from early-draft gene-space assemblies and non-redundant transcriptomes to identify protein families involved in isoprenoid biosynthesis. Isoprenoids comprise terpenoids with various roles in plant-environment interaction, such as pollinator attraction or pathogen defense. Combining lines of evidence provided by synteny, sequence homology and Hidden-Markov-Modelling, we screened 17 genomes including 12 major crops and found evidence for 1,904 proteins associated with terpenoid biosynthesis. Our terpenoid genes set contains evidence for 840 core terpene-synthases and 338 triterpene-specific synthases. We further identified 190 prenyltransferases, 39 isopentenyl-diphosphate isomerases as well as 278 and 219 proteins involved in mevalonate and methylerithrol pathways, respectively. Assessing the impact of gene and genome duplication to lineage-specific terpenoid pathway expansion, we illustrated key events underlying terpenoid metabolic diversification within 250 million years of flowering plant radiation. By quantifying Angiosperm-wide versatility and phylogenetic relationships of pleiotropic gene families in terpenoid modular pathways, our analysis offers significant insight into evolutionary dynamics underlying diversification of plant secondary metabolism. Furthermore, our data provide a blueprint for future efforts to identify and more rapidly clone terpenoid biosynthetic genes from any plant species.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26046541 PMCID: PMC4457800 DOI: 10.1371/journal.pone.0128808
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
The published terpenoid biosynthetic module in Arabidopsis.
| Gene ID | Annotation | Tandem duplicate | Bowers pair | Reference |
|---|---|---|---|---|
|
| ||||
| AT3G02780 |
| - | A12N076 | Campbell et al., 1998 |
| AT5G16440 |
| - | A12N076 | Campbell et al., 1998 |
|
| ||||
| AT1G31910 |
| Yes | - | Benveniste et al., 2002 |
| AT1G76490 |
| Yes | C2N120 | Caelles et al., 1989 |
| AT2G17370 |
| - | C2N120 | Caelles et al., 1989 |
| AT2G38700 |
| - | A11N067 | Cordier et al., 1999 |
| AT3G54250 |
| - | A11N067 | Benveniste et al., 2002 |
| AT4G11820 |
| - | - | Montamat et al., 1995 |
| AT5G27450 |
| - | - | Riou et al., 1994 |
| AT5G47720 |
| Yes | - | Ahumada et al., 2008 |
| AT5G48230 |
| Yes | - | Ahumada et al., 2008 |
|
| ||||
| AT1G63970 |
| - | - | Hsieh and Goodman, 2006 |
| AT2G02500 |
| - | - | Rohdich et al., 2000 |
| AT2G26930 |
| - | - | Hsieh et al., 2008 |
| AT4G15560 |
| - | A15N013 | Lange et al., 2003 |
| AT4G34350 |
| - | - | Hsieh and Goodman, 2005 |
| AT5G11380 |
| - | - | Lange et al., 2003 |
| AT5G60600 |
| - | - | Rodríguez-Concepción et al., 2002 |
| AT5G62790 |
| - | - | Schwender et al., 1999 |
|
| ||||
| AT1G49530 |
| Yes | - | Zhu et al., 1997a |
| AT2G18620 |
| Yes | A10N118 | Wang and Dixon, 2009 |
| AT2G18640 |
| Yes | - | Okada et al., 2000 |
| AT2G23800 |
| Yes | A10N309 | Zhu et al., 1997b |
| AT2G34630 |
| - | - | Bouvier et al., 2000 |
| AT3G14510 |
| Yes | - | Finkelstein et al., 2002 |
| AT3G14530 |
| Yes | - | Wang and Dixon, 2009 |
| AT3G14550 |
| Yes | - | Okada et al., 2000 |
| AT3G20160 |
| - | - | Zhu et al., 1997a |
| AT3G29430 |
| Yes | - | Finkelstein et al., 2002 |
| AT3G32040 |
| Yes | - | Finkelstein et al., 2002 |
| AT4G17190 |
| Yes | A21N001 | Cunillera et al., 2000 |
| AT4G36810 |
| - | A10N118 | Okada et al., 2000 |
| AT4G38460 |
| - | - | Oh et al., 2002 |
| AT5G47770 |
| Yes | A21N001 | Delourme et al., 1994 |
|
| ||||
| AT1G31950 |
| Yes | - | Lange et al., 2003 |
| AT1G33750 |
| - | - | Lange et al., 2003 |
| AT1G48800 |
| Yes | - | Lange et al., 2003 |
| AT1G61120 |
| Yes | - | Herde et al., 2008 |
| AT1G61680 |
| Yes | - | Chen et al., 2003 |
| AT1G66020 |
| Yes | - | Lange et al., 2003 |
| AT1G70080 |
| Yes | - | Lange et al., 2003 |
| AT1G79460 |
| Yes | - | Yamaguchi et al., 1998 |
| AT2G23230 |
| Yes | - | Dal Bosco et al., 2003 |
| AT2G24210 |
| - | - | Bohlmann et al., 2000 |
| AT3G14490 |
| Yes | - | Dal Bosco et al., 2003 |
| AT3G14520 |
| Yes | - | Lange et al., 2003 |
| AT3G14540 |
| Yes | - | Lange et al., 2003 |
| AT3G25810 |
| Yes | - | Chen et al., 2003 |
| AT3G25820 |
| Yes | - | Chen et al., 2004 |
| AT3G25830 |
| Yes | - | Chen et al., 2004 |
| AT3G29110 |
| Yes | - | Lange et al., 2003 |
| AT3G29190 |
| Yes | - | Lange et al., 2003 |
| AT3G29410 |
| Yes | - | Dal Bosco et al., 2003 |
| AT3G32030 |
| Yes | - | Lange et al., 2003 |
| AT4G02780 |
| - | - | Mann et al., 2010 |
| AT4G13280 |
| Yes | - | Ro et al., 2006 |
| AT4G13300 |
| Yes | - | Ro et al., 2006 |
| AT4G15870 |
| Yes | - | Aubourg et al., 1997 |
| AT4G16730 |
| Yes | - | Huang et al., 2010 |
| AT4G16740 |
| Yes | - | Fäldt et al., 2003 |
| AT4G20200 |
| Yes | - | Lange et al., 2003 |
| AT4G20210 |
| Yes | A21N124 | Tholl and Lee, 2011 |
| AT4G20230 |
| Yes | - | Dal Bosco et al., 2003 |
| AT5G23960 |
| - | - | Chen et al., 2003 |
| AT5G44630 |
| - | A21N124 | Tholl et al., 2005 |
| AT5G48110 |
| Yes | - | Dal Bosco et al., 2003 |
Gene abbreviations are adapted from the Arabidopsis Information Resource .
A TAIR10, www.arabidopsis.org, last accessed on December 13th, 2014.
B Ohnolog pair according to Bowers et al., 2003 [8].
C for a comprehensive review, see Tholl and Lee, 2011 and Phillips et al., 2008.
D assosciation of AtDXS3 to MEP pathway is subject of scientific debate (see Phillips et al., 2008).
The extended terpenoid phenotypic module in Arabidopsis, including triterpene- specific (C30) synthases.
| Gene ID | Annotation | Description | Tandem duplicate | Bowers pair | Reference |
|---|---|---|---|---|---|
|
| |||||
| AT4G15560 |
| Desoxy-xylulosephosphatesynthase 2 | Yes | A15N013 | Lange et al., 2003 |
|
| |||||
| AT1G62730 |
| N/A; Squalene/phytoene synthase | No | - | Wang et al., 2008 |
| AT1G66960 |
| Lupeol synthase 5 | Yes | - | Herrera et al., 1998 |
| AT1G78480 | - | N/A; Prenyltransferase/squalene oxidase | Yes | - | Hanada et al., 2010 |
| AT1G78500 |
| Pentacyclic triterpene synthase 6 | Yes | - | Husselstein-Muller et al., 2001 |
| AT1G78950 |
| Lupeol synthase 4 | Yes | - | Benveniste et al., 2002 |
| AT1G78955 |
| Camelliol synthase 1 | Yes | - | Kushiro et al., 1998 |
| AT1G78960 |
| Lupeol synthase 2 | Yes | - | Herrera et al., 1998 |
| AT1G78970 |
| Lupeol synthase 1 | Yes | - | Herrera et al., 1998 |
| AT3G29255 | - | N/A; Squalene cyclase (InterPro:IPR018333) | Yes | - | this manuscript |
| AT2G07050 |
| Cycloartenol synthase 1 | - | - | Lange et al., 2003 |
| AT3G45130 |
| Lanosterol synthase 1 | - | - | Benveniste et al., 2002 |
| AT4G15340 |
| Pentacyclic triterpene synthase 1 | Yes | - | Husselstein-Muller et al., 2001 |
| AT4G15370 |
| Pentacyclic triterpene synthase 2 | Yes | - | Husselstein-Muller et al., 2001 |
| AT5G36150 |
| Pentacyclic triterpene synthase 3 | - | - | Husselstein-Muller et al., 2001 |
| AT5G42600 |
| Marernal Synthase 1 | - | - | Benveniste et al., 2002 |
| AT5G48010 |
| Thalianol Synthase 1 | Yes | - | Benveniste et al., 2002 |
|
| |||||
| AT1G48820 |
| N/A; tandem duplicate of | Yes | - | Lange et al., 2003 |
| AT2G37140 | - | N/A; best BLAST hit is | - | - | Lange et al., 2003 |
Three letter gene abbreviations are adapted from the Arabidopsis Information Resource .
A TAIR10, www.arabidopsis.org, last accessed on December 13th, 2014.
B Ohnolog pair according to Bowers et al., 2003.
C For a comprehensive review, see Tholl and Lee, 2011.
Tandem Duplicates fractions among terpenoid specialized biosynthetic module in 13 genomes.
| Species | Genome-wide | core-TPS genes | MEP-pathway | MVA-pathway | IPP-isomerases | Prenyl-transferases | Triterpene synthases | Average |
|---|---|---|---|---|---|---|---|---|
|
|
| 94% * | - | 33% | - | 73% * | 68% * |
|
|
|
| 51% * | - | 7% | - | 42% | 40% |
|
|
|
| 62% * | 18% | 7% | - | 53% * | 20% |
|
|
|
| 52% * | - | - | - | - | 54% * |
|
|
|
| 37% | 50% | 56% | 100% | 25% | 37% |
|
|
|
| 73% * | 18% | 19% | - | 60% | 75% * |
|
|
|
| 42% * | 12% * | - | 50% | 7% * | 33% * |
|
|
|
| 91% * | 21% | 35% | - | - | 96% * |
|
|
|
| 80% * | - | 13% | 50% | 55% | 55% |
|
|
|
| 51% | - | 21% * | - | - | 25% |
|
|
|
| 68% * | 20% | - | - | 14% | 76% * |
|
|
|
| 42% | 31% | 40% | 33% | 10% | 58% |
|
|
|
| 57% * | 18% | - | - | - | 67% |
|
|
|
|
|
|
|
|
|
|
|
Minus indicates absence of tandem duplicates. Asterisks indicate significant enrichment compared to genome-wide tandem duplicate fraction based on fisher's exact test on count data (p-value threshold: 0.01). For absolute gene numbers and p-values, see S5 Table.
A C. Sativa, L. sativa and N. benthamiana and C. gynandraare excluded from this analysis due to technical reasons (see Materials & Methods section).
B Averages based on numbers of tandem and singleton genes, not on percentage values since gene counts in subsets are not equal.
Ohnolog duplicates fractions among the terpenoid specialized biosynthetic module in 13 genomes.
| Species | Genome-wide | CoGe-link | core-TPS genes | MEP-pathway | MVA-pathway | IPP-isomerases | Prenyl-transferases | Triterpene synthases | Average |
|---|---|---|---|---|---|---|---|---|---|
|
| 22% |
| 6% | 13% | 22% | 100% * | 33% * | 5% |
|
|
| 53% |
| 26% | 54% * | 73% * | 100% * | 74% * | 33% |
|
|
| 48% |
| 26% | 27% | 60% * | 100% * | 47% | 45% |
|
|
| 7% |
| - | - | - | - | 17% * | - |
|
|
| 6% |
| 3% | - | 22% * | - | - | - |
|
|
| 18% |
| 10% | - | 50% * | - | - | - |
|
|
| 62% |
| 48% | 56% | 85% * | 100% * | 53% | 27% |
|
|
| 22% |
| 2% | - | 15% | - | 40% * | - |
|
|
| 19% |
| 14% | - | 33% * | - | - | - |
|
|
| 11% |
| 4% | - | 17% * | - | - | - |
|
|
| 23% |
| - | - | 27% * | - | 29% * | 20% |
|
|
| 27% |
| 8% | 23% | 55% * | 66% * | 60% * | 16% |
|
|
| 7% |
| - | 1% | 17% * | - | 33% * | - |
|
|
|
|
|
|
|
|
|
|
|
Minus indicates absence of ohnolog duplicates. Asterisks indicate above-average fraction of ohnolog duplicates compared to the genome-wide background. For absolute values, see S5 Table.
A C. sativa and L. sativa, N. benthamiana and C. gynandra are excluded from this analysis due to technical restrictions.
B Averages based on numbers of tandem and singleton genes, not on percentage values since gene count in subsets is not equal.
C Link to the CoGe platform for comparative genomics for online-regeneration of the analysis for ohnolog identification.
Overview of protein domain annotation for the extended set of Arabidopsis terpenoid biosynthetic genes .
| Database | Predicted domains | Predicted domains specific for functional module | Genes with predicted domains | Genes with module-specific domains |
|---|---|---|---|---|
| Interpro | 64 | 49 | 85 | 48 |
| Panther | 20 | 18 | 85 | 59 |
| Pfam | 25 | 17 | 85 | 43 |
| Superfamily | 16 | 10 | 84 | 11 |
| Gene3D | 16 | 9 | 83 | 10 |
|
|
|
|
|
|
A 85 target genes in the extended set of Arabidopsis terpenoid biosynthetic genes.
Overview of gene and genome duplication responsible for DXS-like cluster extension; shown are all target genes for four genomes .
| Species | Gene Identifier | Clade | Origin of Duplication | Duplicate Group | Similarity | Identity |
|---|---|---|---|---|---|---|
|
| AT3G21500 | 1 | At-α WGD (A15N013) | 1 | 78.8% | 72.5% |
|
| AT4G15560 | 1 | At-α WGD (A15N013) | 1 | ||
|
| AT5G11380 | 3 | GTD (AT4G15560) | 1 | 68.6% | 53.3% |
|
| Bra001832 | 1 | Br-α WGT | 2 | 77.9%- 81.8% | 73.4%- 77.3% |
|
| Bra012779 | 1 | Br-α WGT | 2 | 92.0% | 93.7% |
|
| Bra033495 | 1 | Br-α WGT | 2 | ||
|
| Bra008967 | 3 | GTD (Bra033495) | 2 | 67.0% | 52.6% |
|
| Th2v17645 | 3 | Tandem (Th2v17646) | 3 | 4.3% | 6.5% |
|
| Th2v17646 | 3 | Tandem (Th2v17645) | 3 | ||
|
| Th2v18234 | 1 | Th-α WGT | 4 | 92.4%- 93.3% | 88.8%- 89.5% |
|
| Th2v26234 | 1 | Th-α WGT | 4 | 87.6% | 83.9% |
|
| Th2v25487 | 1 | Th-α WGT | 4 | ||
|
| Glyma07g38260 | 1 |
| 5 | 94.4% | 91.7% |
|
| Glyma17g02480 | 1 |
| 5 | ||
|
| Glyma15g10610 | 1 |
| 5 | 51.7% | 48.9% |
|
| Glyma13g28470 | 1 |
| 5 | ||
|
| Glyma04g07400 | 3 |
| 6 | 97.0% | 94.3% |
|
| Glyma06g07491 | 3 |
| 6 | ||
|
| Glyma17g07400 | 2 |
| 7 | 45.4% | 44.8% |
|
| Glyma13g01280 | 2 |
| 7 | ||
|
| Glyma18g28830 | 2 |
| 8 | 96.8% | 94.1% |
|
| Glyma08g37670 | 2 |
| 8 | ||
|
| Glyma08g37680 | 2 | Tandem (Glyma08g37670) | 8 | 97.6% | 96.1% |
|
| Glyma09g33320 | 2 | Segmental (Glyma08g37670) | 8 | 92.2% | 86.5% |
|
|
|
|
|
|
|
|
Analysis restricted to Genomes with most accurate identification of ohnologs due to technical limitation.
Based on encoded protein sequence.
Origin of GTD Duplicate based on lowest blastp e-value for alignment to other family members.
Embedded in most fractionated subgenome; similarity and identity scores shown relative to ohnologs in both other subgenomes.
Note significant length difference of both genes in this array; low similarity and identity scores indicate annotation error dividing one ORF into two neighboring genes. Both values are excluded for calculation of average.
Gene scored as Segmental Duplicate due to high synteny score of harbouring region while other members of duplicate group are sufficient to cover the synthenic depth of this genome (i.e. no WGT evident).