Literature DB >> 25534030

Comparative transcriptome analyses reveal core parasitism genes and suggest gene duplication and repurposing as sources of structural novelty.

Zhenzhen Yang1, Eric K Wafula2, Loren A Honaas1, Huiting Zhang3, Malay Das4, Monica Fernandez-Aparicio5, Kan Huang6, Pradeepa C G Bandaranayake7, Biao Wu7, Joshua P Der2, Christopher R Clarke4, Paula E Ralph8, Lena Landherr8, Naomi S Altman9, Michael P Timko6, John I Yoder7, James H Westwood4, Claude W dePamphilis10.   

Abstract

The origin of novel traits is recognized as an important process underlying many major evolutionary radiations. We studied the genetic basis for the evolution of haustoria, the novel feeding organs of parasitic flowering plants, using comparative transcriptome sequencing in three species of Orobanchaceae. Around 180 genes are upregulated during haustorial development following host attachment in at least two species, and these are enriched in proteases, cell wall modifying enzymes, and extracellular secretion proteins. Additionally, about 100 shared genes are upregulated in response to haustorium inducing factors prior to host attachment. Collectively, we refer to these newly identified genes as putative "parasitism genes." Most of these parasitism genes are derived from gene duplications in a common ancestor of Orobanchaceae and Mimulus guttatus, a related nonparasitic plant. Additionally, the signature of relaxed purifying selection and/or adaptive evolution at specific sites was detected in many haustorial genes, and may play an important role in parasite evolution. Comparative analysis of gene expression patterns in parasitic and nonparasitic angiosperms suggests that parasitism genes are derived primarily from root and floral tissues, but with some genes co-opted from other tissues. Gene duplication, often taking place in a nonparasitic ancestor of Orobanchaceae, followed by regulatory neofunctionalization, was an important process in the origin of parasitic haustoria.
© The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Entities:  

Keywords:  Orobanchaceae; gene duplication; novel traits; parasitism; protease; transcriptome

Mesh:

Year:  2014        PMID: 25534030      PMCID: PMC4327159          DOI: 10.1093/molbev/msu343

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


Introduction

Throughout evolutionary history, organisms have evolved a variety of sophisticated novel traits for survival and reproduction. For instance, insects have evolved wings for flying, and plants have evolved different patterns of floral shapes and colors to maximize the attraction of insects and other animals for pollination. The origin of such novel traits has been of longstanding interest to evolutionary biologists, and a wide range of approaches has been used to gain insights into the origin of specific traits. For example, examination of the sensory functions of cilia, the secretory structure of sponges, an early diverging group of multicellular animals, provided insights into the origin of the sensory system of metazoans (Ludeman et al. 2014). Phylogenetic histories of genes known to be involved in eye development and phototransduction revealed that a greater variety of eye types as found in pancrustacean arthropods, appeared to be associated with a higher rate of gene duplication (Rivera et al. 2010). Mutation and gene duplication have played an important role in generating complex pathways for refined eye development (Gehring 2011; Nei 2013), which resulted in many different eye types including the camera eye, the compound eye, and the mirror eye (Salvini-Plawen and Mayer 1961). The complete genome analysis of the basal angiosperm Amborella trichopoda, the sister species to all other extant flowering plants, revealed that a whole-genome duplication (WGD) led to the creation of many novel genes and functions associated with floral development and evolution, ultimately contributing to the diversification of flowering plants (). As seen in the above case studies, gene duplication is frequently associated with the evolution of novel functions (Stephens 1951; Nei 1969; Ohno 1970; Kaessmann 2010; Liberles et al. 2010). The most extensively documented proposal for the evolution of novel gene function is the classic gene duplication model proposed by Ohno (1970) and extended by Force, Lynch, and many others (Force et al. 1999; Lynch and Conery 2000; Tirosh and Barkai 2007; Liberles et al. 2010). Following gene duplication, one copy may retain its original function, whereas the other copy diverges, and can have a variety of different fates, including pseudogenization (Lynch and Conery 2000), hypofunctionalization (Duarte et al. 2006), subfunctionalization, or neofunctionalization. Subfunctionalization is due to complementary loss of some of the functional attributes that are initially shared by the new paralogs following duplication, whereas neofunctionalization can occur when one of the paralogs evolves a new expression pattern or sequence attribute and acquires a new function (Force et al. 1999; Lynch and Conery 2000; Tirosh and Barkai 2007; Liberles et al. 2010). Subneofunctionalization was proposed to describe processes that involved both (He and Zhang 2005). Polyploidy duplicates all of the genes in the genome at once (Otto and Whitton 2000), providing ample opportunities for the function of paralogous gene copies to diverge (Crow and Wagner 2006; Nei 2013), especially in plants, where both angiosperms (Jiao et al. 2011; ) and seed plants (Jiao et al. 2011) have been hypothesized to be ancestrally polyploid, and a large number of more recent polyploidy events have been detected (Schlueter et al. 2004; Cui et al. 2006; Soltis et al. 2009; Jiao et al. 2012; Vanneste et al. 2013). Thus, novel gene creation through single gene duplications, large-scale genome duplication, and neofunctionalization may all play significant roles in the origin of a novel function. For plants, the evolution of parasitism is one of the most extraordinary examples of evolution of novel traits, as parasitic plants have evolved the ability to form a connection that allows it to feed off plants of other species, allowing some parasites to completely abandon photosynthesis, one of the hallmarks of life for most plants. Parasitism is enabled by specialized feeding structures known as haustoria (Kuijt 1969; Heide-Jørgensen 2013a), which have evolved independently at least 11 times in angiosperm evolution (Barkman et al. 2007; Westwood et al. 2010). Most haustorial parasitic plants invade host roots, whereas some are able to form haustorial connections with stems, and rarely, leaves. Parasitic plants can be classified as hemiparasites (if they retain some photosynthetic capability, and thus are at least partly autotrophic) or holoparasites (if they are entirely heterotrophic). Furthermore, parasitic plants can be classified by their degree of host dependence. Facultative parasites must retain photosynthetic abilities and are opportunistic parasites that are able to complete their life cycle without attaching to a host (Kuijt 1969; Westwood et al. 2010; Naumann et al. 2013), whereas obligate parasites must form a host attachment in order to complete their life cycle. The Orobanchaceae species that are the focus of this work are unusual in that this is the only family of plants that spans the entire spectrum from facultative to obligate parasitism and from hemiparasites to holoparasites (Westwood et al. 2010), even including a basal nonparasitic lineage, and thus this family is an ideal group for investigating the evolution of parasitism. Haustoria of parasitic plants transport water, minerals (including nitrogen), and often organic nutrients including host photosynthate. Bulk flow of water and dissolved nutrients from the vascular system of host plants is achieved by parasitic plants by maintaining a lower water potential relative to the host (Smith and Stewart 1990). Holoparasites acquire almost all of their fixed carbon from their hosts through phloem connections (Dorr and Kollmann 1995); sugar accumulated in haustoria ranges from 6- to 8-fold higher than in their hosts (Aber et al. 1983). The ability of the haustorium to efficiently transfer host resources results in substantial yield loss to several economically important crop plants. For example, in sub-Saharan Africa, witchweed (Striga spp.) infests over 50 million hectares of arable farmland cultivated with corn and legumes, causing annual yield loss estimated to exceed $10 billion USD (Scholes and Press 2008). Furthermore, infestations of staple crops can result in substantial or near complete yield loss, exacerbating problems of low food security. Identifying genes with key roles in parasitism may reveal novel strategies to control weedy agricultural pest species (Aly et al. 2009; Alakonya et al. 2012; Bandaranayake et al. 2012; Westwood et al. 2012; Bandaranayake and Yoder 2013b; Ranjan et al. 2014). Despite decades of research, only a few genes have been previously characterized with specific roles in the parasitic process in Orobanchaceae. One quinone oxidoreductase gene (TvQR1) is necessary for haustorium initiation through redox bioactivation of haustorial inducing factors (HIFs) in Triphysaria, a facultative root hemiparasite (Bandaranayake et al. 2010; Ngo et al. 2013). Additionally, TvPirin is upregulated by HIFs (or by contact with host roots) and putatively functions as a positive regulator of other genes needed for haustorial development (Bandaranayake et al. 2012). Finally, mannose 6-phosphate reductase (M6PR) in Orobanche, a root holoparasite, was also shown to be involved in parasite metabolism. Silencing of the parasite M6PR gene by RNAi from the host resulted in decreased mannitol concentration in the haustorium tubercle and increased tubercle mortality, thus clarifying the role of mannitol in parasitism (Aly et al. 2009). Although not in the family of Orobanchaceae, experimental characterization also supports the role of two parasitism genes in Cuscuta: One encoding a cysteine protease in the stem parasite, Cuscuta reflexa (Bleischwitz et al. 2010), and a SHOOT MERISTEMLESS-Like (STM) gene in C. pentagona (Alakonya et al. 2012). STM encodes a KNOTTED-like homeobox transcription factor (TF) with a known role in promoting cytokinin biosynthesis in the shoot apical meristem. Silencing of the STM gene in Cuscuta by the production of small RNA by host plants resulted in reduced haustorial development and increased growth of infected host plants (Alakonya et al. 2012). Next generation transcriptome sequencing technologies provide great opportunities to use advanced, yet inexpensive, genome-scale tools to gain a better understanding of the molecular biology of parasitic plants (Westwood et al. 2010, 2012; Ranjan et al. 2014). A major focus of the Parasitic Plant Genome Project (PPGP; http://ppgp.huck.psu.edu/, last accessed December 18, 2014) is the identification of genes related to haustorial initiation and development through a comparative transcriptomics approach of multiple stages of parasite growth and development in Orobanchaceae, the youngest (estimated at 32 My) and one of the largest and most diverse extant lineages of parasitic plants (Naumann et al. 2013). We focus on three species of parasitic plants in Orobanchaceae that together represent the complete range of parasitic dependence: Triphysaria versicolor (a facultative hemiparasite), Striga hermonthica (an obligate hemiparasite), and Phelipanche aegyptiaca (syn. Orobanche aegyptiaca; an obligate holoparasite) (Westwood et al. 2010, 2012). Also included in this study was a single autotrophic lineage, the genus Linden bergia, which is sister to the rest of the Orobanchaceae (Young et al. 1999). Overall, this project yielded nearly 3 billion sequence reads from more than 30 tissue- and stage-specific as well as normalized whole-plant libraries. Several published studies have already utilized the data generated by this project for focused analyses on a range of topics, including gene targets for host-induced gene silencing (Bandaranayake and Yoder 2013b) and herbicide control (Westwood et al. 2012); host-dependent parasite gene expression at the host–parasite interface (Honaas 2013); genes involved in strigolactone (Liu et al. 2014), karrakin (Nelson et al. 2011; Nelson 2013), and ABA pathways (Lechat et al. 2012) related to parasite seed germination; horizontal gene transfer in the parasites (Zhang, Cowled, et al. 2013; Zhang et al. 2014); evolutionary conservation and loss of genes involved in photosynthetic (Wickett et al. 2011) and symbiotic processes (Delaux et al. 2014); and housekeeping genes for quantitative polymerase chain reaction (qPCR) studies (Fernandez-Aparicio et al. 2013) and molecular markers for systematic research (Eizenberg et al. 2012). However, to date, no comprehensive analysis has been published of the entire data set to identify genes that are likely essential to the parasitic lifestyle and the origin of parasitic processes. The origin of parasitism in plants has been proposed to follow two general mechanisms. The first considers the striking morphological similarity between some parasitic plant haustoria, root nodules, and crown galls; it was thus proposed that parasites may have evolved through endophytic association or horizontal gene transfer of genes from bacteria or other microorganisms that could confer parasitic ability (Atsatt 1973). The second mechanism, termed the endogenous model (Bandaranayake and Yoder 2013a), was that parasitic functions may have evolved through neofunctionalization from plant genes encoding nonparasitic functions. These mechanisms are not necessarily mutually exclusive, and both may have been important to the evolution of parasitism. In this study, we have focused on seeking evidence relevant to the endogenous model for the origin of parasitism. We utilized differential expression (DE) analysis and expression clustering to identify upregulated genes associated with haustorium initiation, development, and physiology. Through identification of a “core set of parasitism genes” shared by multiple species of parasites, our results also shed light on the evolutionary mechanism(s) that led to the origin of the haustorium in the Orobanchaceae. As the haustorium is a novel structure at the core of the parasitic process, comparative analysis of the genes and gene expression patterns of both parasitic and nonparasitic plants enabled us to propose its genetic origins.

Results

Assembly Statistics

For each of the three parasitic plants in this study (T. versicolor, S. hermonthica, and P. aegyptiaca), we generated 11–14 stage-specific libraries (Westwood et al. 2012), plus additional whole-plant normalized libraries using RNA from all developmental stages in each species (fig. 1). Additionally, a whole-plant normalized library of Lindenbergia philippensis was sequenced to represent the nonparasitic sister lineage of the parasites. A grand total of 2,995,494,710 Illumina reads and 3,153,353 Roche 454 GS-FLX reads were generated. Hybrid assemblies combining all sequencing data for each species resulted in unigene numbers ranging from 117,470 in Striga to 131,173 in Triphysaria (table 1). Average unigene length varied between 581 bp (Triphysaria) and 745 bp (Striga), whereas average N50 lengths ranged from 789 to 1,183 bp with the N50 unigene counts ranging from 21,356 to 24,729. To evaluate the completeness of our transcriptome sequence data sets, we examined the frequency of capture of three known conserved sets of plant genes in the transcriptome assemblies, namely the universally conserved orthologs (UCOs) (Kozik et al. 2008; Der et al. 2011; Williams et al. 2014), conserved single copy genes from COSII (Fulton et al. 2002; Wu et al. 2006; Williams et al. 2014), and the set of conserved single copy genes in PlantTribes2 (Wall et al. 2008) (http://fgp.bio.psu.edu/tribedb/10_genomes/, last accessed December 18, 2014). The UCO list was obtained from the Compositae genome project (http://compgenomics.ucdavis.edu/compositae_reference.php, last accessed December 18, 2014) and the COSII gene list was obtained from SolGenomics (http://solgenomics.net/documents/markers/cosii.xls, last accessed December 18, 2014). The single copy gene list containing 970 single copy orthogroups from PlantTribes2.0 (http://fgp.bio.psu.edu/tribedb/10_genomes/, last accessed December 18, 2014) was identified as single copy in the seven angiosperm genomes included in the classification: Arabidopsis thaliana Columbia (version 7), Carica papaya (version 1), Populus trichocarpa (version 1), Medicago truncatula (version 1), Oryza sativa (version 5), Sorghum bicolor (version 1), and Vitis vinifera (version 1). The Ar. thaliana proteins from each of the three conserved single copy gene lists were used as the query in tBLASTn search of each transcriptome assembly. A gene was considered detected if it returned a hit with an E value smaller than 1e-10 and at least 30 amino acids long. Results from this analysis shown in table 2 indicate that gene coverage ranged from at least 90% in Phelipanche (PlantTribes single copy analysis) to 100% (UCO analysis) in Triphysaria combined assemblies. These results suggested that our assemblies have excellent gene coverage and are very likely to capture the large majority of the expressed genes in a transcriptome. Additionally, to validate the accuracy of the de novo transcriptome assemblies, we used reverse transcription (RT)-PCR to amplify a total of 33 contigs spanning a range of assembly sizes in the three parasitic species. The estimated sizes from the amplified cDNAs agree well with the expected sizes from the de novo transcriptome assemblies (R2 for the three species of Triphysaria, Striga, and Phelipanche range from 0.973 to 0.999), suggesting a high degree of accuracy in the de novo assemblies (supplementary fig. S1, Supplementary Material online). Seven of the Triphysaria sequences were selected for validation sequencing and matched precisely the predicted length of the contig and differed from the reference assembly by at most a few single nucleotide polymorphisms, as would be expected for allelic variants in an outcrossing species. To conclude, we have produced high-quality large scale transcriptome assemblies that serve as a valuable resource for comparative studies of parasitic plant gene content and expression.
F

An illustration of stages of each parasitic plant used in the PPGP (Westwood et al. 2012) in this study. Drawings are based on original photographs, as shown in Nickrent et al. (1979), Musselman and Hepper (1986), Zhang (1988), and Rumsey and Jury (1991). Additional sequences from the parasite–host interface (Honaas et al. 2013) were also used to study haustorial-specific gene expression (stage 4). H, host; P, parasite; V, vasculature; R, root; S, shoot; N/A, not applicable.

Table 1.

Transcriptome Assembly Statistics in the Postprocessed Combined Assembly for Each Study Species.

SpeciesAssembly IDNumber of ContigsAssembly Size (Mb)Number of N50 ContigsN50 Contig Length (bp)Mean Contig Length (bp)
Triphysaria versicolorTrVeBC3131,17376.2024,729789580.91
Striga hermonthicaStHeBC3117,47087.5321,3561,183745.17
Phelipanche aegyptiacaPhAeBC5129,45083.8021,5521,010643.48
Table 2.

Transcriptome Gene Capture Statistics in Three Parasitic Species.

Gene SetTotalTrVeBC3TrVeBC3 Proportion (%)StHeBC3StHeBC3 Proportion (%)PhAeBC5PhAeBC5 Proportion (%)
COSII single copy22021698.1821497.2720191.36
PlantTribes2.0 single copy97094997.8495298.1486989.59
UCO357357100.0035699.7235499.16
An illustration of stages of each parasitic plant used in the PPGP (Westwood et al. 2012) in this study. Drawings are based on original photographs, as shown in Nickrent et al. (1979), Musselman and Hepper (1986), Zhang (1988), and Rumsey and Jury (1991). Additional sequences from the parasite–host interface (Honaas et al. 2013) were also used to study haustorial-specific gene expression (stage 4). H, host; P, parasite; V, vasculature; R, root; S, shoot; N/A, not applicable. Transcriptome Assembly Statistics in the Postprocessed Combined Assembly for Each Study Species. Transcriptome Gene Capture Statistics in Three Parasitic Species.

Validation Genes with Known Roles in Orobanchaceae Parasitism

We validated the expression data from RNA sequencing (RNA-seq) using expression profiles of two genes that are known to play a role in parasitism: TvQR1 (Bandaranayake et al. 2010; Ngo et al. 2013) and TvPirin (Matvienko et al. 2001; Bandaranayake et al. 2012; Ngo et al. 2013). Both genes are upregulated in Triphysaria roots exposed to the HIF (quinone 2,6 dimethoxy-1,4-benzoquinone; DMBQ) (stage 2) compared with roots without exposure (stage 1). All BLASTn alignments of the TvQR1 gene with an E value cutoff of e-10 or smaller were used to construct the putative full-length transcript representing TvQR1. The expression level of TvQR1 in each stage was calculated as the combined expression of all the unigene hits within TrVeBC3_12199 trinity component that contributed to the construction of the full-length reference (TrVeBC3_12199.1–TrVeBC3_12199.9). For profiling gene expression we also used three libraries made from host–parasite interfaces of haustoria from the three parasitic plants (Honaas 2013). The interface tissues from the three parasitic plants were targeted by a laser-capture microdissection approach (Honaas 2013; Honaas et al. 2013). Reads from stage 4 interface libraries were mapped onto the combined assemblies to quantify the gene expression in the interface. The RNA-Seq data for the gene TvQR1 showed high and specific expression for root tissue (stage 1 and stage 2) but low expression levels in other tissues (haustoria, seed, and above ground tissue). When root tissue was treated with DMBQ (stage 2), the expression of TvQR1 increased relative to roots without any treatment (stage 1) (fig. 2), which is consistent with results obtained in previous studies (Bandaranayake et al. 2010). These results confirm the expected expression for TvQR1 (fig. 2).
F

Gene expression profiles from RNA-seq data for two previously characterized parasitism genes in Triphysaria (QR1, left, and Pirin, right). The two genes were shown to be upregulated in stage 2 relative to stage 1 by RT-PCR, which was confirmed by RNA-Seq data. The organ/stage of the parasite sampled is shown along the x axis. Labels on the x axis refer to stage (see fig. 1) and “u” means that this facultative parasite was growing “unattached” to any host (i.e., 6.1u means unattached stage 6.1). The y axis gives expression values as FPKM on a log2 scale.

Gene expression profiles from RNA-seq data for two previously characterized parasitism genes in Triphysaria (QR1, left, and Pirin, right). The two genes were shown to be upregulated in stage 2 relative to stage 1 by RT-PCR, which was confirmed by RNA-Seq data. The organ/stage of the parasite sampled is shown along the x axis. Labels on the x axis refer to stage (see fig. 1) and “u” means that this facultative parasite was growing “unattached” to any host (i.e., 6.1u means unattached stage 6.1). The y axis gives expression values as FPKM on a log2 scale. The only significant hit for gene TvPirin in the assembly was contig TrVeBC3_1063.1, which included the full-length coding sequences and 5′- and 3′-untranslated regions. As expected (Matvienko et al. 2001; Bandaranayake et al. 2012), this gene showed the highest expression in stage 1 and stage 2 (root tissue), with the expression in stage 2 (root treated with DMBQ) higher than stage 1 (untreated root) (fig. 2). Postattachment root tissue (6.3) also showed relatively high expression levels for TvPirin, suggesting that this gene is highly expressed in roots. Given the consistency in expression validation as well as assembly validation (supplementary fig. S1, Supplementary Material online), our RNA-Seq assemblies should be able to provide good estimates of gene expression in the species within this study.

Differential Gene Expression and Clustering to Identify Haustorial Genes

A DE analysis for the common stages present in all three parasitic plants (stages 0, 1, 2, 3, 4, 6.1, and 6.2) was performed to identify DE patterns for any pairwise comparison among the seven stages. Next, we conducted two clustering analyses using K-means clustering and self-organizing maps (SOM clustering) to identify clusters of coexpressed genes with high expression in postattachment haustorial stages (3 and/or 4) for each parasitic plant. Clusters of coexpressed genes that exhibited significantly higher gene expression in postattachment haustorial stages were extracted from the K-means cluster analysis for each species. We refer to these upregulated genes in postattachment haustorial stages as “haustorial genes.” A boxplot and expression heat map for each cluster of haustorial genes in each species were used to visualize the specific expression patterns for each of the parasitic plants (fig. 3A and B). DE analyses and clustering approaches were performed for expression of both unigenes and a more inclusive putative transcript definition, the “component-orthogroup” (supplementary data S1 and S2, Supplementary Material online). The latter is defined as a representative sequence for a Trinity component containing all associated unigenes (e.g., splice forms, alleles, subassemblies), so long as they are assigned to the same orthogroup (i.e., clusters of homologous genes representing narrowly defined gene lineages) in the gene family classification used by the (). The unigenes and component-orthogroups identified by SOM clustering are shown in the supplementary section, Supplementary Material online (supplementary data S3, Supplementary Material online).
F

Gene expression clustering boxplots (A) and heatmaps (B) of upregulated genes in postattachment haustorial stages 3 and 4 (haustorial genes) in parasitic Orobanchaceae. (A) One cluster of highest expression in post attachment haustorial stages with K-means clustering in each species (511 in Triphysaria, 958 in Striga, and 126 in Phelipanche). Expression in each stage is represented by a boxplot. The upper whisker of the boxplot indicates the highest expression value for features within each cluster; the lower whisker, the lowest expression value; and the middle line, the median expression. The upper and lower edges of the box represent the 75th and 25th percentile, respectively. Expression of genes in the postattachment haustorial stages is highlighted in green. A description of the focal stages is shown on the lower right. (B) Gene expression heat map of component-orthogroups with upregulated expression in postattachment haustorial (stage 3 and/or 4) stages identified by K-means and hierarchical clustering in Triphysaria, Striga, and Phelipanche. The color-intensity in the heat map represents expression value (log2FPKM).

Gene expression clustering boxplots (A) and heatmaps (B) of upregulated genes in postattachment haustorial stages 3 and 4 (haustorial genes) in parasitic Orobanchaceae. (A) One cluster of highest expression in post attachment haustorial stages with K-means clustering in each species (511 in Triphysaria, 958 in Striga, and 126 in Phelipanche). Expression in each stage is represented by a boxplot. The upper whisker of the boxplot indicates the highest expression value for features within each cluster; the lower whisker, the lowest expression value; and the middle line, the median expression. The upper and lower edges of the box represent the 75th and 25th percentile, respectively. Expression of genes in the postattachment haustorial stages is highlighted in green. A description of the focal stages is shown on the lower right. (B) Gene expression heat map of component-orthogroups with upregulated expression in postattachment haustorial (stage 3 and/or 4) stages identified by K-means and hierarchical clustering in Triphysaria, Striga, and Phelipanche. The color-intensity in the heat map represents expression value (log2FPKM). Genes that are differentially upregulated in developmentally similar stages of haustorial development, and are evolutionarily conserved across species, are likely to play an important role in parasitism. We examined three species in Orobanchaceae that exhibit varying levels of host dependence and photosynthetic ability. These species serve as divergent biological replicates; shared gene sequences that show conserved upregulation in parasitic structures are likely to be important to the parasitic process. We used orthogroups to associate homologous (and putatively orthologous) genes across the Orobanchaceae species in this study (see Materials and Methods, supplementary data S4 and S5, Supplementary Material online). The final list of candidate haustorial genes was defined as the union of orthogroups represented by upregulated unigenes or component-orthogroups from each species. We also identified unique orthogroups and orthogroups present in only two of the three species. Both K-means clustering and SOM clustering were used to identify genes with high and specific expression after attachment to a host (supplementary data S3 and S6, Supplementary Material online). As a result, we identified 185 orthogroups that contained genes (874 unigenes and 488 component-orthogroups) highly expressed in at least two of the species (fig. 4). Forty orthogroups were identified that show their highest level of expression during haustorial development in all three parasitic plants. It is important to note that most of the two-way shared haustorial genes are likely to represent a true set of haustorial genes, because at least 70% of these two-way shared orthogroups also contain genes with increased expression in haustorial stages in the third species (supplementary data S7, Supplementary Material online).
F

Venn diagram illustrating the number of orthogroups with upregulated expression in stage 3 and/or stage 4 (by K-means and SOM) in Triphysaria, Striga, and Phelipanche.

Venn diagram illustrating the number of orthogroups with upregulated expression in stage 3 and/or stage 4 (by K-means and SOM) in Triphysaria, Striga, and Phelipanche.

Shared Haustorial Genes Are Enriched for Proteolysis and Extracellular Region Localization

To determine whether the haustorial genes are enriched for specific molecular processes or biochemical pathways, we took the highly expressed unigenes belonging to an orthogroup shared by at least two species and aligned them with BLASTx to the TAIR database (Lamesch et al. 2012). Best hits in Arabidopsis (E value ≤ e-10) were used as the input for a DAVID enrichment analysis (Huang et al. 2009) for enriched Pfam domains, Gene Ontology (GO) molecular functions (MFs), biological processes (BPs), and cellular components (CCs) (supplementary data S8, Supplementary Material online). All three parasites share enrichment for the Pfam term serine carboxypeptidase. In addition, the haustorial genes in Triphysaria and Striga are significantly enriched for eukaryotic aspartyl protease (AP), peroxidase, and leucine-rich repeat N-terminal domains (table 3 and supplementary data S3 and S8, Supplementary Material online). There was a similar high level of enrichment for eukaryotic AP and leucine-rich repeat N-terminal domains in Phelipanche, but this was not significant after correcting the P value for multiple testing. The smaller number of haustorial genes in Phelipanche limited the power to detect significantly enriched terms. Finally, pectate lyase (PL) was significantly enriched among shared genes in Triphysaria.
Table 3.

Enriched Pfam Domains in the Shared Set of Haustorial Unigenes Identified by Either K-Means or SOM Clustering in Triphysaria, Striga, and Phelipanche.

Pfam TermTriphysaria
Striga
Phelipanche
Fold ChangeBonferroni-Adjusted P ValueFold ChangeBonferroni-Adjusted P ValueFold ChangeBonferroni-Adjusted P Value
Serine carboxypeptidase24.05.08E-07*24.04.26E-08*39.13.58E-05*
Eukaryotic AP39.62.64E-06*40.79.94E-08*41.4NS
Peroxidase22.47.56E-11*14.03.80E-05*NANA
Leucine rich repeat N-terminal domain_26.11.67E-02*7.41.07E-04*6.7NS
FAD_binding_416.13.85E-02*23.16.99E-06*NANA
PL41.22.02E-06*15.9NSNANA

Note.—Significance levels for category enrichment relative to background are given as Bonferroni-adjusted P values. NA means that enrichment information for the particular term is not identified by the test; NS, nonsignificant.

Enriched Pfam Domains in the Shared Set of Haustorial Unigenes Identified by Either K-Means or SOM Clustering in Triphysaria, Striga, and Phelipanche. Note.—Significance levels for category enrichment relative to background are given as Bonferroni-adjusted P values. NA means that enrichment information for the particular term is not identified by the test; NS, nonsignificant. The largest GO BP category in terms of the number of genes (supplementary data S8, Supplementary Material online) is “proteolysis,” represented by genes encoding AP or serine-type peptidase and subtilase, followed by oxidation–reduction processes, such as peroxidases, and protein phosphorylation, such as kinases. There are also two genes involved in transport activity; one is an oligopeptide transporter and the other is a glutamate-receptor protein. Moreover, six genes involved in cell wall modification were identified, including three genes encoding PL or PL-like proteins, one encoding pectin methylesterase inhibitor (PMEI), one encoding cellulase, and one encoding carbohydrate-binding X8 protein. GO MF terms such as “serine-type peptidase activity” and “aspartic-type endopeptidase activity” were significantly enriched (table 4). The KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway terms “phenylalanine metabolism” and “methane metabolism” and “phenylpropanoid biosynthesis” were also significantly enriched in the haustorial upregulated gene set. In addition, other enriched GO terms, such as “cellular response to hydrogen peroxide” and “cellular response to reactive oxygen species (ROS),” support the suggestion by Torres et al. (2006) that ROS may be an important signaling intermediate during the parasitic plant–host plant interaction.
Table 4.

Enriched GO Cellular Component (GO-CC), Biological Process (GO-BP) and Molecular Function (GO-MF), KEGG Pathway, and Tissue Expression Terms among Shared Set of Haustorial Unigenes Identified by Either K-means or SOM Clustering in Triphysaria, Striga, and Phelipanche.

TermTriphysaria
Striga
Phelipanche
Fold ChangeBonferroni-Adjusted P ValueFold ChangeBonferroni-Adjusted P ValueFold ChangeBonferroni-Adjusted P Value
GO-CC: External encapsulating structure5.33.65E-12*5.11.65E-12*3.6NS
GO-CC: Cell wall5.21.69E-11*5.07.09E-12*3.7NS
GO-CC: Extracellular region3.54.53E-11*3.33.97E-10*2.8NS
GO-BP: Proteolysis3.41.92E-06*4.47.51E-14*4.93.88E-05*
GO-BP: Cellular response to hydrogen peroxide18.01.87E-09*10.24.18E-03*NANA
GO-BP: Cellular response to ROS16.17.26E-09*9.18.65E-03*NANA
GO-MF: Serine-type peptidase activity13.34.35E-13*15.89.14E-21*21.93.01E-10*
GO-MF: Aspartic-type endopeptidase activity15.02.59E-06*18.37.31E-11*15.7NS
GO-MF: Electron carrier activity3.66.71E-04*2.84.98E-02*NANA
KEGG: Phenylalanine metabolism14.28.68E-10*10.67.95E-06*NANA
KEGG: Methane metabolism14.09.99E-10*10.58.77E-06*NANA
KEGG: Phenylpropanoid biosynthesis10.91.62E-08*8.26.07E-05*NANA
Tissue-specificity: Specifically expressed in root cap cells49.77.06E-0147.27.68E-01NANA
(NS)(NS)
Tissue-specificity: Expressed in flowers, but not in leaves33.28.40E-01NANANANA
(NS)

Note.—Significance levels for category enrichment relative to background are given as Bonferroni-adjusted P values. NA means enrichment information for the particular term was not identified by the test; NS, nonsignificant.

*P < 0.05.

Enriched GO Cellular Component (GO-CC), Biological Process (GO-BP) and Molecular Function (GO-MF), KEGG Pathway, and Tissue Expression Terms among Shared Set of Haustorial Unigenes Identified by Either K-means or SOM Clustering in Triphysaria, Striga, and Phelipanche. Note.—Significance levels for category enrichment relative to background are given as Bonferroni-adjusted P values. NA means enrichment information for the particular term was not identified by the test; NS, nonsignificant. *P < 0.05. Significant enrichment of genes with the GO CC category “external encapsulating structure,” “cell wall,” and “extracellular regions” were found in both Triphysaria and Striga (table 4). A concordant pattern of enrichment was observed for these categories in Phelipanche, though, like the domain analysis, the conservative test, corrected for multiple comparisons, did not find these enrichments significant in this species. Future research with experimental evidence is needed to determine whether the extracellular predictions for these candidate haustorial genes hold true or not, and whether the proteins they encode affect the parasite–host interactions.

Two Examples Illustrating Haustorial Gene Expression Evolution

To further understand the evolutionary origin of some of these haustorial-related genes, we examined patterns of gene expression in orthologs in three nonparasitic model plant or crop species: Thale cress (Ar. thaliana), barrel medic (M. truncatula), and tomato (Solanum lycopersicum). To illustrate this approach, we present a detailed analysis of the PL and peroxidase gene families, which were identified by the presence of enriched Pfam domains in Triphysaria (PL) or in both Triphysaria and Striga (peroxidase). Following an early, possibly angiosperm-wide duplication in the PL gene family (fig. 5A), a subsequent gene duplication gave rise to two paralogous gene lineages shared by Mimulus and members of Orobanchaceae. One paralogous gene in parasitic Orobanchaceae and their orthologs in Arabidopsis and tomato show principal expression levels in floral tissues (supplementary data S21, Supplementary Material online; Goda et al. 2004; Sun and van Nocker 2010); the other paralog exhibits abundant expression in the haustorium of parasitic Orobanchaceae (fig. 5A). The more recent gene duplication in a common ancestor of Mimulus and Orobanchaceae gave rise to two Striga genes, one having peak expression in haustorial tissue and the other in flower (fig. 5A). Conservation of principal gene expression in floral tissues of nonparasites and the maintenance of a floral-expressed ortholog in Striga strongly suggests that the haustorial expression of this gene was co-opted from an ancestral gene acting in flowers, and was recruited to haustoria following gene duplication through regulatory neofunctionalization. Alternatively, because the ancestral gene may have been expressed in both floral tissue and root, a two-step process in which the ancestral gene first subfunctionalized and then shifted to haustoria would be an example of subneofunctionalization (He and Zhang 2005).
F

Gene family phylogeny and gene expression profile of two orthogroups showing: (A) Shift of gene expression from flower to haustorium following gene duplication, and (B) shift of gene expression from root to haustorium without gene duplication. Gene duplication events relevant to the origin of parasite genes are shown on the tree with blue rectangular bars. Sequence names are color-coded to represent different lineages: Basal angiosperms (blue), monocots (yellow), basal eudicots (purple), rosids (red), and asterids (green). Parasite genes are highlighted with yellow background and green foreground. The expression of parasite genes and nonparasite genes in Arabidopsis, Medicago, and tomato are shown using heat maps. Green and red lines are connecting genes from the phylogeny to the heatmap for nonparasitic genes (except in Arabidopsis orthogroup 1131 where genes are labeled as PLLs) and parasitic genes. The tissue with the highest expression was labeled in red. The color intensity in heatmaps refers to expression measurements with an RNA-Seq approach in parasites and tomato, and with microarrays in Arabidopsis and Medicago. Int or int means “interface” tissue of haustoria (∼stage 4) (Honaas 2013), and hau means “haustoria.”.

Gene family phylogeny and gene expression profile of two orthogroups showing: (A) Shift of gene expression from flower to haustorium following gene duplication, and (B) shift of gene expression from root to haustorium without gene duplication. Gene duplication events relevant to the origin of parasite genes are shown on the tree with blue rectangular bars. Sequence names are color-coded to represent different lineages: Basal angiosperms (blue), monocots (yellow), basal eudicots (purple), rosids (red), and asterids (green). Parasite genes are highlighted with yellow background and green foreground. The expression of parasite genes and nonparasite genes in Arabidopsis, Medicago, and tomato are shown using heat maps. Green and red lines are connecting genes from the phylogeny to the heatmap for nonparasitic genes (except in Arabidopsis orthogroup 1131 where genes are labeled as PLLs) and parasitic genes. The tissue with the highest expression was labeled in red. The color intensity in heatmaps refers to expression measurements with an RNA-Seq approach in parasites and tomato, and with microarrays in Arabidopsis and Medicago. Int or int means “interface” tissue of haustoria (∼stage 4) (Honaas 2013), and hau means “haustoria.”. In contrast to the PL gene, a peroxidase gene family shows a shift of gene expression from roots in all related nonparasitic model species to haustorial tissue of parasitic plants without gene duplication (fig. 5B). The peroxidase gene highly expressed in roots of Arabidopsis was characterized to be involved in the production of ROS (Kim et al. 2010), stress response (Llorente et al. 2002), and pathogenic responses (Ascencio-Ibanez et al. 2008).

Identifying Haustorial Initiation Genes and “Parasitism Genes”

Genes that are upregulated in response to the haustorial initiation factor (HIF) DMBQ might also play an important role in parasitism. For example, TvQR1, which is upregulated following DMBQ exposure, encodes a quinone reductase that acts early in the HIF signaling pathway (Bandaranayake et al. 2010). Stage 1 in the tiny-seeded Striga and Phelipanche (Westwood et al. 2012) consists of whole germinated seedlings, whereas in the much larger-seeded species (Triphysaria), stage 1 comprised excised radicles of germinated seedlings. Upon treatment of stage 1 seedlings or roots with DMBQ or a host root, the plants progress to stage 2 (haustorial initiation). DESeq was used to identify “haustorial initiation genes” (HIGs), defined here as unigenes and component-orthogroups that have significantly higher gene expression in stage 2 compared with stage 1 (supplementary data S9, Supplementary Material online). Triphysaria HIGs were enriched for GO MF terms such as “magnesium ion binding,” “calcium ion binding,” “calcium-transporting ATPase activity,” “calcium ion transmembrane transporter activity,” and “cation-transporting ATPase activity” (supplementary data S10, Supplementary Material online). The co-occurrence of putative functions of calcium ion transport and ion binding with ATPase activity suggest a possible involvement of a Ca2+ATPase (Brini and Carafoli 2011) in the regulation of the haustorium initiation pathway. In Striga, however, a distinct set of enriched GO MF terms was detected, including “nucleotide binding,” “ATPase activity,” and “ATP-dependent helicase activity.” This may suggest a different picture associated with HIF exposure between facultative and obligate parasitic plants. By combining the list of shared upregulated genes in haustoria (fig. 4 and supplementary data S5, Supplementary Material online) with HIGs (fig. 6 and supplementary data S11, Supplementary Material online), we define a joint set of genes in these parasites that we call parasitism genes. Parasitism genes are defined as having enhanced expression in parasitic structures, and likely play a role in parasite biology, as opposed to parasite-specific sequences which are defined only by their joint presence in parasitic and absence from nonparasitic plants (see below). In total, we identify 1,809 parasitism genes in these three parasitic species that are assigned to 298 orthogroups that were shared by at least two species. As most of the parasitism genes shared by two of the parasitic plants also show an upregulated pattern in the third species (supplementary data S7, Supplementary Material online), this set of genes upregulated in parasitic processes constitutes a shared set of parasitism genes in Orobanchaceae. There are almost 300 gene families shared by at least two species that have genes upregulated in parasitic processes.
F

HIGs: Orthogroups containing genes upregulated in root or seedlings following HIF exposure (stage 2) compared with germinating seedlings (stage 1) in Triphysaria, Striga, and Phelipanche. The numbers in parenthesis are the corresponding set of orthogroups when including genes that are highly expressed in stage 1 (relative to any other stages of stage 0, 2, 3, 4, 6.1, and 6.2—by K-means clustering) of Phelipanche.

HIGs: Orthogroups containing genes upregulated in root or seedlings following HIF exposure (stage 2) compared with germinating seedlings (stage 1) in Triphysaria, Striga, and Phelipanche. The numbers in parenthesis are the corresponding set of orthogroups when including genes that are highly expressed in stage 1 (relative to any other stages of stage 0, 2, 3, 4, 6.1, and 6.2—by K-means clustering) of Phelipanche. The number of orthogroups containing HIGs varied widely in the three parasites (2,285 in Striga, 17 in Phelipanche, and 249 in Triphysaria). Although the low number of HIGs in Phelipanche could be a result of lower power to detect DE (fewer replicated libraries and a smaller total volume of data; supplementary data S12, Supplementary Material online), we also observed that a large number of genes are highly upregulated in Phelipanche stage 1, suggesting that Phelipanche may automatically begin haustorial initiation without HIF exposure, which was reported by a previous study where haustorial initiation, as an exception, does not require the application of HIF (Joel and Losner-Goshen 1994a). To examine this possibility, we expanded the list of HIGs by including genes highly expressed in stage 1 (a cluster of upregulated expression in stage 1 relative to any other stages 0, 2, 3, 4, 6.1, and 6.2) of Phelipanche and performed another Venn diagram analysis. In this expanded set, there are eight additional orthogroups including HIGs that were shared by all three parasitic plants as well as an additional 65 orthogroups shared by two species (fig. 6—number in parenthesis). An examination of the eight orthogroups revealed genes coding for the following functions: Cytochrome P450, heat shock protein 70, ribosomal protein, peptidase C48, oleosin, ATPase, pyruvate kinase, and integrase. This is consistent with the possibility that Phelipanche starts haustorial initiation at an earlier stage (stage 1) than other two hemiparasites. Alternatively, because haustorium development in response to HIFs in Phelipanche is not as evident as in Striga or Triphysaria (Joel and Losner-Goshen 1994a), there may actually be fewer HIGs in Phelipanche.

A Majority of Parasitism Genes Evolved through Gene Duplication

We next explored the possibility that these putative parasitism genes evolved through gene duplication. We utilized an automated tree-building pipeline (Jiao et al. 2011; ) to construct gene families for 114 orthogroups containing parasitism genes with a manageable number of genes to score for gene duplications (orthogroup number ranging from 1,000 to 9,999). Each orthogroup contained homologs from 22 other plant species used in the construction of the classification (), plus the genes from Orobanchaceae identified here. Manual inspection of the gene tree phylogenies was performed to find parasite genes that may have been missing in one or more “whole plant” combination assemblies. We also manually examined each alignment (and resulting tree) for frame shift and translation errors that could result in extremely long (greater than 10× others) branches. These errors were corrected when possible, or the sequence was eliminated from the matrix to avoid spurious topologies. Together, these gene family phylogenies give us a broad view of how parasitism genes evolved. Of the 114 orthogroups (supplementary data S13, Supplementary Material online) containing parasitism genes, gene duplications were detected in 58 trees at ≥50% bootstrap support and 38 trees at ≥80% bootstrap support value. By mapping the duplication events observed in parasitic plants onto phylogenetic species trees, and examining bootstrap support values for key supporting nodes, we determined when the putative parasite paralogs were duplicated (supplementary data S14, Supplementary Material online). A detailed scheme illustrating various duplication events for when parasitism genes were duplicated is shown in supplementary data S15, Supplementary Material online. The greatest proportion of duplicated gene families supported a gene duplication event that occurred in a common ancestor of Mimulus and Orobanchaceae (but not seen in Solanaceae, other asterids, or rosids) (table 5).
Table 5.

Phylogenetic Placement of Gene Duplications Observed in Gene Families with Shared Parasitism Genes.

Duplicated LineagesOrthogroups with Duplication
BS ≥ 80%BS ≥ 50%
Parasite-wide9 (23.68)13 (22.41)
Orobanchaceae-wide4 (10.53)9 (15.52)
Orobanchaceae+Mimulus21 (55.26)28 (48.28)
EuAsterid1-wide1 (2.63)1 (1.72)
Asterid-wide1 (2.63)1 (1.72)
Core-eudicot-wide5 (13.165)8 (13.79)
Eudicot-wide3 (7.89)8 (13.79)
Total38 (100)58 (100)

Note.—Values in parentheses are given in percentage.

Phylogenetic Placement of Gene Duplications Observed in Gene Families with Shared Parasitism Genes. Note.—Values in parentheses are given in percentage. As with the parasite genes in general, most of the duplicated parasitism genes detected in this analysis were annotated with terms related to peptidase activity (such as AP, serine carboxypeptidase) and cell wall modification processes (PL, PMEI, carbohydrate-binding X8 protein, and glycosyl hydrolase). In addition, three TFs (homeodomain-like TF, ethylene responsive TF, and lateral organ boundaries (LOB) domain-containing protein), genes with transporter activity (cationic amino acid transporter, major facilitator family protein, NOD26-like intrinsic protein, and an oligopeptide transporter), a peroxidase, and a leucine-rich repeat-containing protein were also derived from scorable gene duplications.

Regulatory Neofunctionalization and Origin of the Haustorium from Root and Flower

Tissue Expression Clustering

To obtain a global view of transcriptional profiles throughout parasite growth and development in each species, expression values for each unigene were clustered by tissue and stage expression levels using complete linkage and correlation distances using the pvclust routine in R (Racine 2012) (supplementary data S16, Supplementary Material online). The expression clustering in each of the three parasitic plants (fig. 7) shows that overall gene expression from vegetative and reproductive above-ground tissues is quite different from the below-ground structures. In both Striga and Triphysaria, above-ground stage 6.1 (vegetative structures) clustered with above-ground stage 6.2 (reproductive structures; floral buds), and were separated from the remaining tissues with 100% bootstrap support. In contrast, pre-emergent shoots occurring underground (stage 5.1) in Phelipanche clustered with above-ground shoots (stage 6.1), and the two shoot transcriptomes clustered with floral buds (stage 6.2). It is notable that Phelipanche and Striga both produce pre-emergent shoots (stage 5.2), but the overall expression patterns of this stage in the two species is somewhat different. The fact that Striga pre-emergent shoots (stage 5.1) do not cluster with emergent shoots (stage 6.1), but are more similar to other below-ground stages (haustorium—stage 4 and pre-attachment roots—stage 5.2), may be due to the fact that photosynthetic activity in Striga shoots only becomes active after emergence, whereas Phelipanche is not capable of photosynthetic activity at all.
F

Overall similarity of transcriptional profiles of all stages in three parasites. Numerical values represent supports as estimated by the approximately unbiased (on left, in red) and bootstrap (on right, in blue) as described in the Materials and Methods. Clustering was performed with complete linkage and correlation distance. Haustoria tissues (from stages 3 and 4) are labeled in green, whereas root tissues (from stages 1, 2, and 5.2) are labeled in orange.

Overall similarity of transcriptional profiles of all stages in three parasites. Numerical values represent supports as estimated by the approximately unbiased (on left, in red) and bootstrap (on right, in blue) as described in the Materials and Methods. Clustering was performed with complete linkage and correlation distance. Haustoria tissues (from stages 3 and 4) are labeled in green, whereas root tissues (from stages 1, 2, and 5.2) are labeled in orange. Cluster analysis (fig. 7) also shows that in all three species haustorial gene expression is overall most similar to root expression. In Triphysaria and Phelipanche, expression patterns from both stage 3 and stage 4 haustorial tissues cluster with roots. (Roots from Triphysaria were taken from germinated seedlings [stages 1 and 2], whereas roots for Phelipanche were from the late postattachment stage, but prior to shoot emergence from the soil [stage 5.2].) In Striga, a late stage 4 haustorial tissue is most similar to roots prior to the above-ground emergence of shoots, a scenario similar to Phelipanche.

Expression of Orthologs of Parasite Genes in Nonparasitic Models

The pattern we see of haustorial expression being most similar to root is consistent with the longstanding hypothesis that the haustorium was derived from a modified root (Kuijt 1969; Musselman and Dickison 1975; Joel 2013). However, it is also possible that individual genes functioning in the haustorium have been recruited from genes normally expressed in other plant organs. To investigate this scenario, we compared gene expression of candidate parasitism genes with extensive gene expression data from multiple tissues and organs of Ar. thaliana, M. truncatula, and tomato to examine the evolution of expression patterns across large evolutionary distances. When we trace the haustorial gene expression back to orthologous genes in nonparasitic plants, we found significantly higher organ-specific expression in root and floral tissue, than in leaf, seed, or hypocotyl (fig. 8 and supplementary data S22, Supplementary Material online).
F

Haustorial genes in parasitic species were recruited from root, flower, and other tissues. Values on the y axis show the number of orthogroups containing haustorial genes as identified from expression analysis of nonparasitic model species Arabidopsis, Medicago, and tomato. Tissues on the x axis represent the principally expressed tissue for orthologs of upregulated haustorial genes in Arabidopsis (light black), Medicago (gray), and tomato (black). Floral tissues are highlighted with stars on top of the bars.

Haustorial genes in parasitic species were recruited from root, flower, and other tissues. Values on the y axis show the number of orthogroups containing haustorial genes as identified from expression analysis of nonparasitic model species Arabidopsis, Medicago, and tomato. Tissues on the x axis represent the principally expressed tissue for orthologs of upregulated haustorial genes in Arabidopsis (light black), Medicago (gray), and tomato (black). Floral tissues are highlighted with stars on top of the bars. In all three nonparasitic model plant species, the orthologs of haustorial genes are expressed most highly in root and floral (or fruit) tissues, suggesting that these were the major sources of genes recruited to the haustorium. For both Medicago and tomato, root is the most frequent source, which is consistent with the similarity between root and haustorial expression in the parasites (fig. 7). In Arabidopsis, orthologs of haustorial genes are also commonly upregulated in roots, but even more are upregulated in pollen. It is possible that the slightly different picture obtained from the three nonparasitic plants might be due to differences in tissue sampling in the nonparasitic model species. For example, floral tissue sampling is less extensive in tomato and Medicago than in Arabidopsis.

Parasitism Genes Show Signatures of Adaptive Evolution or Relaxed Constraint in Parasite Lineages

To examine whether altered selection patterns play a role in the evolution of parasitism, we utilized the branch model in PAML for hypothesis testing. We labeled the parasite genes as the foreground and the remaining genes as the background, and then identified genes that show accelerated evolution (dN/dS ratio) in parasitic lineages compared with the background. We focused on parasitism genes identified in our analyses. The branch model implemented in PAML was used to identify orthogroups that show a significantly higher dN/dS ratio in parasitic lineages compared with nonparasitic lineages. Twenty-seven orthogroups were found to have greater dN/dS in parasitic lineages compared with nonparasitic lineages, whose GO BPs include proteolysis, cell wall modification, oxidation–reduction process, transport, protein glycosylation, cytokinin metabolic process, and ubiquitin-dependent protein catabolic process (supplementary table S1, Supplementary Material online). To examine whether there are sites that have evolved under positive selection, the branch-site model was performed on orthogroups that show an elevated dN/dS ratio in foreground parasite lineages relative to background nonparasitic lineages. Nine orthogroups were found to contain sites under positive selection. They include two orthogroups encoding AP and one orthogroup each encoding serine carboxypeptidase, expansin, glycosyl transferase, pectin methyltransferase inhibitor, PAR1 protein, and C2H2, and C2H2 zinc finger family protein (supplementary data S17, Supplementary Material online), respectively. We then compared the dN/dS ratio of haustoria-specific genes with nonhaustorial-specific genes. The dN/dS ratio was calculated by selecting orthologous pairs across three different parasitic species by best BLAST hit based on an E-value cutoff of e-10 (PaSh: between P. aegyptiaca and S. hermonthica, PaTv: between P. aegyptiaca and T. versicolor TvSh: between T. versicolor and S. hermonthica). dN and dS were calculated separately by codeml in PAML (supplementary data S18, Supplementary Material online). The distribution for genome-wide dN/dS values was represented with a symmetric violin plot from each pairwise species comparison. This overall distribution was made by calculating the dN/dS ratio for the orthologous pairs from all unigenes from each species pair. The distributions of all haustorial gene pairs (in red) and a randomly chosen equally sized set of nonhaustorial gene pairs (in blue) were represented by a dotplot and a density plot. In all three cases, the peaks of the density plots for the haustorial genes are above the peak for nonhaustorial genes. The haustorial genes exhibit significantly greater dN/dS ratio compared with the nonhaustorial genes for all three pairwise species comparisons (Wilcoxon rank, P value < 0.01) (fig. 9).
F

Haustorial genes show evidence of adaptive selection or relaxed selective constraint. Symmetric violin plots show the genome-wide distributions of dN/dS values for comparisons of Phelipanche aegyptiaca (Pa), Striga hermonthica (Sh) and Triphysaria versicolor (Tv). Red dots represent haustorial genes shared by all three species, whereas blue dots represent randomly selected nonhaustorial genes from the relevant species. The density plots colored in red and blue represent the frequency distributions of the individual dN/dS values as seen in the dot distributions of haustorial and nonhaustorial genes.

Haustorial genes show evidence of adaptive selection or relaxed selective constraint. Symmetric violin plots show the genome-wide distributions of dN/dS values for comparisons of Phelipanche aegyptiaca (Pa), Striga hermonthica (Sh) and Triphysaria versicolor (Tv). Red dots represent haustorial genes shared by all three species, whereas blue dots represent randomly selected nonhaustorial genes from the relevant species. The density plots colored in red and blue represent the frequency distributions of the individual dN/dS values as seen in the dot distributions of haustorial and nonhaustorial genes.

The Majority of the Parasite-Specific Sequences Have Unknown Functions

The transcriptome assemblies allowed us to identify parasite-specific sequences (Yoshida et al. 2010), some of which may be associated with parasitic functions. This was done by building a secondary OrthoMCL orthogroup classification of all of the genes and transcripts that were not assigned to the initial orthogroup classification of 22 plant genomes. In addition to the singleton genes from the 22 genomes, we used in the secondary classification all of the unassigned transcripts from the three parasitic species and from Lindenbergia, Lactuca, and Helianthus. This strategy, which includes multiple nonparasitic lineages closely related to the parasites, allows a highly sensitive means of distinguishing sequences found only in the parasitic Orobanchaceae. We identified 84 novel orthogroups that contain sequences from all three parasitic Orobanchaceae species, but lack sequences from any nonparasitic plant (supplementary table S2, Supplementary Material online). 178, 180, and 139 unigenes were found in these orthogroups from Triphysaria, Striga, and Phelipanche, respectively. The large majority of these sequences had no significant BLAST alignments to any of the nonparasitic species, whereas a few of the predicted peptide sequences (6, 18, and 13 from Triphysaria, Striga, and Phelipanche, respectively) had hits to genes of unknown function in the annotation databases. Most of the significant hits in the annotation databases corresponded to sequences that were transposon and retrotransposon related, such as GAG-preintegrase domain protein and retrotransposon gag protein (supplementary table S2, Supplementary Material online). A sequence with homology to a homing intron endonuclease with two LAGLIDADG motifs (Belfort and Roberts 1997) was also among the sequences with annotations. One orthogroup contained sequences with distant homology to genes annotated as mitochondrial aconitase with a putative role in mitochondrial oxidative electron transport (Yan et al. 1997). We also obtained the stage-specific expression profiles for each of parasite-specific genes. Six and four of the parasite-specific unigenes in Striga and Triphysaria were also among the list of significantly upregulated haustorial genes (supplementary table S2, Supplementary Material online). In addition, three parasite-specific orthogroups contained genes from all three parasite species that exhibited a pattern of increased expression in the haustorial postattachment penetration stages (supplementary table S2, Supplementary Material online).

Discussion

The PPGP has used large-scale transcriptome sequencing to interrogate multiple stages of parasite growth and development of three related parasitic plants spanning a wide range of parasite ability, enabling an integrated analysis of genes upregulated in parasitic processes in Orobanchaceae. The large comparative framework has made it possible to identify, for the first time, a set of genes we believe are essential or core to parasitism. These candidate parasitism genes will be a valuable resource for future functional studies as we strive to understand the genetic changes that led to the parasitic lifestyle, as well as those that resulted from the transition to heterotrophy. Among the core parasitism genes are specific members of gene families encoding cell wall modifying enzymes (cellulase, PLs, glycosyl hydrolases, and pectin methylesterase), and peroxidase enzymes, proteins that are known to be involved in the parasite invasion process (Singh A and Singh M 1993; Antonova and TerBorg 1996; Losner-Goshen et al. 1998; Pérez-de-Luque 2013). We also identified a variety of genes (encoding proteases, transporters, regulatory proteins [TFs and receptor protein kinases], and others, including many genes of unknown function) that are coexpressed in parasitic stages and may be important in haustorial development and function. Because homologs of these haustorially expressed genes encode proteins also functioning in nonparasitic plants, this supports the endogenous mechanism for the origin of parasitism (Bandaranayake and Yoder 2013a). For this collection of genes, the evolution of regulatory sequences resulting in novel expression in the haustorium was likely essential to the evolution of parasitic functions. By expanding our analyses to orthologous genes in nonparasitic model plants, we have gained insights into the evolution of parasitism and the source of genes that shifted expression to parasite tissues, and presumably function there. Although some genes have gained haustorial expression in the absence of detectable gene duplication, a majority of the parasitism genes originated following a gene duplication event.

Cell Wall Degradation Enzymes and the Haustorium

Enzymatic degradation of the host cell walls has been suggested to be important in the penetration of the parasite across the host root cortex as it attempts to reach and connect with host vascular tissues (Kuijt 1977). Baird and Riopel (1984) observed that the intrusive cells at the tip of the penetration peg of the haustorium contained a densely staining cytoplasm, indicative of high levels of cell wall hydrolytic activity. Additionally, early histological studies observed that during penetration of the host endodermis by the haustoria of P. aegyptiaca, there was the dissolution of the middle lamella between host cell walls (Joel and Losner-Goshen 1994b) and degradation of the cutin of the Casparian strips (Joel et al. 1998). Cell wall degrading enzymes such as cellulase, polygalacturonase, and xylanase were found in the tubercles of P. aegyptiaca (Singh A and Singh M 1993), suggesting that they play a role in the penetration process necessary for establishing haustorial connections with the host vasculature (Pérez-de-Luque 2013). In the enriched set of upregulated prehaustorial and haustorial genes in dodder (C. pentagona), many genes encoding cell wall modifying enzymes were found including PLs, pectin methylesterase, cellulases, and expansins (Ranjan et al. 2014). In our study, four glycosyl hydrolase and five PL genes were upregulated in haustorial tissues in at least two species of Orobanchaceae. GO enrichment analysis of the CC terms identified cell wall and extracellular localization annotation terms as being significantly enriched among the upregulated haustorial genes, suggesting that proteins encoded by these genes tend to be secreted where they could impact cell wall integrity of the parasite or the host. Consistent with this idea was the evidence of disintegration of the middle lamella in S. gesnerioides attacking cowpea (Reiss and Bailey 1998). Glycosyl hydrolases were shown to have a role in hydrolysis and degradation of structural or storage polysaccharides, including cellulose and hemicellulose (Henrissat et al. 1995). Pectic enzymes have long been recognized as important proteins in cell wall loosening or disassembly. Almost all cell wall penetration processes, including pollen tube growth and bacterial or fungal pathogenic invasion, involve the modification of pectins that are integral for cell wall stability. For instance, studies reported the role of pectic enzymes in degrading the plant cell wall during the invasion process by bacterial or fungal pathogens (Delorenzo et al. 1991; Volpi et al. 2011). Similarly, a recent study identified a PL to be required for root infection by rhizobia during nodulation (Xie et al. 2012). Additional evidence also supports a direct role of PLs in loosening of the cell wall in fruit ripening (Marin-Rodriguez et al. 2002). Thus, it is likely that PLs play a role in loosening the host cell wall for invasion by parasitic plants. Immunological detection of pectin methyl esterase (PME) at the penetration site and demethylated pectins at the cell walls adjacent to the intrusive cells of Orobanche (Losner-Goshen et al. 1998) implies a role for pectin degradation in the haustorial penetration process. Three orthogroups (orthogroup 2875, 6176, and 19181) encoding putative PMEIs were identified to contain genes that were specific to haustorial tissue in at least two species of Orobanchaceae. It would be interesting to determine whether PMEs and PMEIs have distinct roles in parasite–host interactions. A previous analysis revealed another gene involved in cell wall modification, a beta-expansin, which was highly expressed in the haustorial interface between T. versicolor and its grass family host (Honaas et al. 2013).

Proteases, Transporters, and the Haustorium

Genes involved in proteolysis, largely proteases, and proteinases, account for a great proportion of transcripts upregulated in the haustorium and shared by at least two of the parasite species that we investigated. The upregulated haustorial genes identified in this study include four genes encoding subtilisin-like serine protease similar to those required for virulence in bacterial pathogens (Kennan et al. 2010). In addition, a subtilisin-like protein from soybean was reported to activate defense-related genes (Pearce et al. 2010). In nonparasitic plants, serine proteases often play a role in various processes including protein degradation/processing, hypersensitive response, and signal transduction (Antao and Malcata 2005), but what roles they take on in Orobanchaceae parasites is not yet clear. In addition to serine protease, the eukaryotic APs are also enriched among upregulated haustorial genes. APs, a large gene family with members present in all living organisms, play central roles in protein degradation, processing, and maturation (Chen et al. 2009). Plant APs are expressed in various organs including seed, root, grain, leaf, and flower (Chen et al. 2009). Also they play a role in seed germination, where they degrade seed-storage proteins to provide amino acids to growing plants (Higgins 1984). Other studies identified APs of blood-feeding malaria parasites to play a role in degrading hemoglobin proteins to amino acids for nutrition (Brinkworth et al. 2001). In addition, a marked expansion within the AP gene family was found in the xylem-feeding hemiparasites, Triphysaria and S. hermonthica (Dorr 1997; Neumann et al. 1999), but not in the phloem-feeding obligate parasite, Phelipanche (Aly et al. 2011), which has a lower rate of nutrient uptake from the xylem stream, suggesting that these proteins may play a pivotal role in nutrient mobilization only in the hemiparasites. Our analyses indicate that four APs show peak expression in haustorial tissue, whereas their orthologs in Arabidopsis and tomato (or paralogs in the parasitic plants) have peak expression in root and flower (supplementary fig. S2, Supplementary Material online). This provides another line of evidence for gene recruitment to haustorial function, and suggests that this has occurred through regulatory neofunctionalization. A recent study reported that a rice AP plays an indispensible role in pollen tube germination and growth, and that loss of function results in reduced male fertility (Huang et al. 2013). In addition, a secretome analysis (Kall et al. 2007) of predicted peptides for upregulated haustorial genes revealed that haustorial genes are more likely than nonhaustorial genes to be extracellularly localized or contain a signal peptide structure (supplementary data S19, Supplementary Material online). The fact that upregulated haustorial genes are enriched for proteases with signal peptides suggests that the evolution of parasitism may be associated with an expansion of the suite of secreted proteases to aid parasite attack and/or feeding Analysis of selective constraints showed that proteases with expression specific to haustoria in the parasites show a greater dN/dS as compared with their orthologous genes in nonparasitic plants. The greater dN/dS ratio for these upregulated haustorial proteases suggests either a relaxation of purifying selection or adaptive evolution of these protease-encoding genes associated with the evolution of parasitism. The fact that particular sites were indicated as evolving adaptively, especially in the functional domains, provides support for the latter hypothesis (supplementary data S17, Supplementary Material online). We also found five genes encoding transporters upregulated in the haustorium that are shared by at least two species: One ABC transporter, two oligopeptide transporters, one zinc transporter, and one glutamate transporter. Upregulated genes encoding transporters including sugar transporter, amino acid transporter, and ammonium transporter were also identified to be enriched in haustorial tissue of the parasite Cuscuta (Ranjan et al. 2014). Interestingly, S. hermonthica infection has been shown to increase amino acid levels in xylem sap of its Sorghum host, with glutamate being the predominant form of translocated nitrogen (Pageau et al. 2003). The assimilation of host 15N-labeled nitrate into the parasite (Pageau et al. 2003) provided evidence for the potential role of a glutamate transporter in nitrogen translocation between the host and parasite.

Gene Duplication and Regulatory Neofunctionalization—Origin of Parasitism

The identification of haustorium genes allowed us to gain new insights into the evolution of parasitism. We have used phylogenetic analysis to show that a majority of the genes with a putative role in parasite functions arose by gene duplication. Most of the duplications occurred in a nonparasitic common ancestor of the parasitic Orobanchaceae species and Mimulus (a nonparasitic plant in the related nonparasitic family Phrymaceae) (Schaferhoff et al. 2010; Refulio-Rodriguez and Olmstead 2014). This suggests that either multiple independent gene duplications, or a WGD event occurring before the divergence of Mimulus and Orobanchaceae (Wickett et al. 2011), may have resulted in the diversification of genes important to haustorial development, and ultimately contributed to the rise of parasitic plants. In contrast, relatively few of the parasitism genes arose through duplications occurring in a more recent common ancestor of just the parasites or Orobanchaceae. The fact that the gene duplications that produced parasitism genes occurred in a nonparasitic ancestor, well before the origin of the parasitic Orobanchaceae about 32 Ma (Naumann et al. 2013), is consistent with the idea that gene duplication does not immediately give rise to novel functions, as described recently by the WGD Radiation Lag-Time Model (Schranz et al. 2012). We mapped expression data from multiple stages of parasite development onto parasite genes and found that under most circumstances, the two copies derived from a gene duplication event show different expression profiles. In addition, we interrogated the expression profiles of orthologous genes from tomato, Medicago, and Arabidopsis. By comparing the parasite duplicate’s expression with orthologous gene expression in these related nonparasitic plants, we gained insight into how parasitism genes evolved, both in gene sequence and in gene expression. The fact that these parasitism genes shift their expression from root or flower in related nonparasitic plants to haustoria of parasitic plants, through gene duplication or otherwise (fig. 7 and supplementary fig. S2, Supplementary Material online), supports inferences of neofunctionalization in the evolution of parasitism (Conant and Wolfe 2008; Innan and Kondrashov 2010). While investigating the evolution of parasitism genes, we also found that a number of them play a role in symbiotic nodulation in nonparasites such as genes encoding ethylene responsive factor (ERF) TF (Vernie et al. 2008), oligopeptide transporter (Nogales et al. 2009), peroxidase, and pectinesterase inhibitor (Young et al. 2011; Zouari et al. 2014), suggesting possible parallels between the evolution of parasitism and that of mutualism, both of which involve invasion of host tissues. Similar evidence that P. aegyptiaca induced upregulation of genes involved in nodulation in Lotus japonicus supports this idea (Hiraoka et al. 2009).

Origin of the Haustorium Involves Co-Option of Root and/or Flower Genes

The results of this study shed light on the origin of haustoria in Orobanchaceae, where molecular phylogenetic studies have identified the nonparasitic Lindenbergia as sister to the parasitic Orobanchaceae and are thus consistent with a single origin of haustorial parasitism in Orobanchaceae (dePamphilis et al. 1997; Olmstead et al. 2001; Schneeweiss et al. 2004; Bennett and Mathews 2006; Angiosperm Phylogeny Group 2009; McNeal et al. 2013). Two lines of evidence—global gene expression data in the parasites, and expression specificity in related nonparasitic plants—suggest that gene expression patterns in haustorial tissues are most similar to those of root. The second largest number of haustorial genes shows floral-specific expression patterns in nonparasitic models. These observations suggest a possible mechanism of parasitism through neofunctionalization, where genes with a role in root and floral biology in nonparasite species were co-opted to haustorial function in parasite species. The root is a likely source for processes useful to subterranean haustorial structures. Both haustoria and roots operate underground, are physically adjacent, and haustoria are derived from apex of the primary root, sometimes from lateral root extensions (Heide-Jørgensen 2013b). Additionally, both haustoria and root are highly specialized organs for nutrient uptake and transfer. The recruitment of many haustorial genes from those normally expressed in floral tissue such as pollen is more surprising, but the idea that haustorial growth is similar to the intrusive growth of pollen tubes was explored recently (Thorogood and Hiscock 2010; Pérez-de-Luque 2013). The authors propose that neighboring host cells recognize the parasite as alien without reacting against the “invasion,” similar to the way that plants recognize intrusive growth of pollen tubes (Thorogood and Hiscock 2010; Pérez-de-Luque 2013). Specifically we found that genes, like PLs that are used in polarized pollen tube growth to rapidly invade stylar tissue (Krichevsky et al. 2007), are expressed during the invasion of host tissue by the growing parasitic haustorium. One possible explanation is that the penetration peg of the haustorium, which grows rapidly into the host tissue, may have co-opted genes from the polarized, invasive growth found in the pollen tubes of flowers (Sampedro and Cosgrove 2005; Krichevsky et al. 2007; Honaas et al. 2013). It is also likely that some pectic enzymes involved in loosening the pollen tube cell walls so that they can elongate into the female reproductive tissues (Taniguchi et al. 1995) are recruited in the penetration and growth of the haustoria toward the host vascular tissue.

Parasite-Specific Genes—Mobile Elements

Of the 84 parasite-specific orthogroups detected in our analysis, most contained sequences with no known function. However, almost all of the remaining sequences with an annotated Pfam domain have significant BLAST alignments with genes encoding proteins involved in the transfer of mobile elements, including retrotransposon gag protein, GAG-preintegrase domain, and a LAGLIDADG homing endonuclease (supplementary table S2, Supplementary Material online). Retrotransposon gag proteins are associated with the transposition of retrotransposons to telomere-associated structures in Drosophila (Rashkova et al. 2002), whereas GAG-preintegrase domain proteins are associated with chromosomal rearrangements by retrovirus insertion activity (Houzet et al. 2003). Interestingly, some LAGLIDADG endonucleases (Belfort and Roberts 1997) encoded by self-splicing group I introns are implicated in the highly mobile transfer and insertion of copies of the intron to specific target sequences. Intron homing by the LAGLIDADG endonuclease activity is implicated in the widespread horizontal gene transfer of the self-splicing group I intron in plant mitochondrial cox1 genes (Vaughn et al. 1995; Cho et al. 1998; Barkman et al. 2007; Sanchez-Puerta et al. 2008, 2011). Thus, both transposable elements and homing introns by the retrotransposon gag proteins, GAG-preintegrase domain proteins, and LAGLIDADG homing endonuclease are involved in horizontal gene transfer (Daniels et al. 1990; Rodelsperger and Sommer 2011), supporting the possibility of a mechanistic link between parasitism in plants and at least some horizontal gene transfers (Barkman et al. 2007; Xi et al. 2013; Zhang, Fernandez-Aparicio, et al. 2013).

Conclusion

In this article, we have shown that parasitic plants have evolutionarily recruited many genes for haustorial development and host penetration from genes that were involved in other processes in related nonparasitic plants, primarily root or flower development. These candidate parasitism genes are being functionally characterized to determine whether they are essential to parasite function and survival. The observation that genes with similar GO classifications (cell wall modification process and transporters) are also upregulated in the haustoria of Cuscuta (Ranjan et al. 2014) increases the likelihood that these genes do play important roles in haustorial function. In Orobanchaceae, genes recruited from root or pollen tube development show evidence of potentially adaptive changes in elevated dN/dS ratios and sites with excess nonsynonymous changes in parasitic lineages. The study of parasitic haustoria in Orobanchaceae indicates that two modes of regulatory neofunctionalization—either following gene duplication or in unduplicated orthogroup lineages—have provided the mechanism through which a novel structure has evolved.

Materials and Methods

Tissues, Libraries, and Sequence Data

Multiple stages of parasite development from the species T. versicolor, S. hermonthica, and P. aegyptiaca within Orobanchaceae were interrogated by transcriptome sequencing. Detailed descriptions of the stages ranging from seed and seedling, though haustorial development to above ground tissues such as leaf, stem, and flowering, are illustrated by Westwood et al. (2012) and in figure 1. Tissues from each stage were subjected to RNA extraction and library preparation, followed by subsequent Illumina paired-end sequencing (Honaas et al. 2013). Methods for Illumina and 454 paired-end mRNA-Seq library construction and sequencing are as described in Wickett et al. (2011). Additional Illumina transcriptome sequences were also obtained from a single normalized library for each species using pooled RNAs of all stages. Finally, a normalized whole plant library was also constructed and Illumina sequenced, as above, for L. phillipensis, representing the nonparasitic sister group to the parasitic Orobanchaceae (dePamphilis et al. 1997; Young et al. 1999; Olmstead et al. 2001; Schneeweiss et al. 2004; Angiosperm Phylogeny Group 2009; McNeal et al. 2013).

Assembly, Cleaning, and Annotation (Including Gene Family Classification)

Duplicate reads in the Illumina sequence data were removed with CLC Assembly Cell version 3.2.0 (http://www.clcbio.com/products/clc-assembly-cell/, last accessed December 18, 2014). Adapters and bases with a quality score lower than Q20 were trimmed from the ends of the reads, and these reads were retained only if at least half of the sequence had quality ≥Q20. Raw Roche 454 sequence files in Standard Flowgram Format were converted to FASTA and associated quality files along with clipping of sequence adapters and low-quality bases using sff_extract version 0.2.10 (http://bioinf.comav.upv.es/sff_extract/, last accessed December 18, 2014). De novo assemblies of Illumina reads from each species were performed using Trinity release 2011-10-29 (Grabherr et al. 2011) and de novo hybrid assemblies of combined 454 and Illumina reads were performed using CLC Assembly Cell version 3.2.0 with default parameters. Assembled transcripts from both assemblies were combined by assigning hybrid CLC transcripts to Trinity components that yielded the best bitscore with BLASTN (E value = 1e-10). The resulting combined assemblies were filtered by removing contigs without coding regions (Iseli et al. 1999) as well as redundant transcripts (Edgar 2010). The assemblies for parasite species (Triphysaria, Striga, and Phelipanche) were then cleaned to remove contaminant sequences using a three-step process: 1) The transcripts were screened against the NCBI nonredundant protein database using BLASTX (E value = 1e-5) to remove nonplant transcripts, 2) the transcripts were then screened with BLASTN (E value = 1e-10) against a collection of publicly available genomes and expressed sequence tag data sets from the experimental host plants (to identify and remove host transcripts), and 3) after performing this screen on each of the parasite species, BLASTN (E value = 1e-10) of host candidate sequences was screened against the Orobanchaceae species (Lindenbergia, Triphysaria, Striga, and Phelipanche) databases (not including the parasite species being cleaned) to retain the transcripts that were much better matches to other Orobanchaceae family members than to the host plant. Open reading frames (ORFs) and protein sequences were predicted from the reconstructed transcript assemblies with ESTScan version 2.0 (Iseli et al. 1999). We experimented with different reference sequences to guide the ESTScan predictions, and other protein prediction programs such as GeneWise (Birney et al. 2004). Finally we chose ESTscan with an Arabidopsis reference to obtain the best balance of length and protein number in the resulting protein set. The predicted protein sequences were used for BLASTP (E value = 1e-5) searches against Swissprot, TAIR10, and trEMBL databases to assign putative functional annotations in the form of human readable descriptions using the automated assignment of human readable descriptions (AHRD) pipeline (https://github.com/groupschoof/AHRD, last accessed December 18, 2014). AHRD uses similarity searches and lexical analysis for automatic AHRD to protein sequences. These translated transcripts were also annotated with Pfam domains using InterProScan version 4.8 (McDowall and Hunter 2011), and identified domains were directly translated into GO terms.

Defining a Component-Orthogroup from Each De Novo Assembly

Very large transcriptome sequence data sets, including those produced by this project, result in complex de novo assemblies including many splice variants and distinct alleles (Grabherr et al. 2011). Due to the assembly complexity, we combined expression information for unigenes that were assigned to the same Trinity component and mapped to the same orthogroup (Wickett et al. 2011). We call these unigenes from the same component and orthogroup a “component-orthogroup.”

Read Mapping and Expression Normalization

High-quality nonredundant Illumina reads from individual stage-specific samples were independently mapped on each parasite’s postprocessed transcripts using CLC Genomic Workbench version 6.0.4 (parameters: mismatch cost = 2, insertion cost = 3, deletion cost = 3, length fraction = 0.5, similarity = 0.8, min insert size = 100, and max insert size = 300). Transcript abundance was then estimated using the CLC Genomic Workbench RNA-Seq program with unique reads counted for their matching transcripts, and nonspecifically mapped reads allocated on a proportional basis relative to the number of uniquely mapped reads. The numbers of reads mapped per library were normalized by the fragments per kilobase per million mapped reads (FPKM) (Mortazavi et al. 2008) method that corrects for biases in the total transcript size and normalizes for the total number of read sequences obtained in each sample library. The read counts and FPKM values of transcripts for each Trinity component classified as an orthogroup were summed up to obtain each component-orthogroup’s expression in each library.

PV-Clustering of Global Transcriptional Profiles

To get an overall picture of the global gene expression profile, stages were clustered by the pvclust (Suzuki and Shimodaira 2006) command in R using the expression of all unigenes in each stage. Pvclust not only clusters stages but also infers confidence support with approximately unbiased multiscale resampling and bootstrap resampling (bs) values, obtained by resampling genes from the total population of unigenes.

Identification of Differentially Expressed Genes for Candidate Parasite Gene Assignment

We first used a DE analysis to limit the number of genes for candidate parasitism gene identification. DE analysis was performed within each species using DESeq package in R (Anders and Huber 2010), which utilizes the negative binomial distribution to model the read count data for variance estimation. Only the stages that were shared among the three parasite species were used in this analysis (stage 0, 1, 2, 3, 4, 6.1, and 6.2). DE analysis using pairwise comparison was undertaken to identify genes that were differentially expressed in at least two stages.

K-Means Clustering to Identify Clusters with High Expression in Haustoria

DE analysis resulted in a list of genes with varying expression patterns across the seven shared stages. K-means clustering was used to identify clusters of coexpressed features with high expression in haustoria tissue. The expression of the DE unigenes measured by log2FPKM in stages 0, 1, 2, 3, 4, 6.1, and 6.2 constituted an expression matrix as the input for clustering analysis. To determine the optimum number of clusters needed to identify a single cluster with high expression in haustorium tissue, an R script was developed to generate a series of pdf files to represent each cluster’s expression profile for a total number of specified clusters ranging from 2 to 30. Each cluster contained a set of coexpressed genes whose expression in each stage was reflected in a boxplot with the median expression of the coexpressed genes connected by a line. The criteria used to find the appropriate number of clusters were the smallest number of clusters that enabled the visualization of a haustorial-specific (stage 3 and or stage 4) cluster. This means that for a smaller cluster number, we cannot identify the haustoria-specific pattern; a large cluster number may split the haustoria-specific cluster into several clusters, but is unnecessary in terms of gene identification.

Hierarchical Clustering to Identify Putative Parasite Features within Each Species

After the appropriate number of clusters was chosen, hierarchical clustering was used to identify genes with expression patterns of interest; that cluster was extracted with the cutree function in R. Each cluster’s expression profile was determined by plotting a heat map using the heatmap.2 function from the gplots package in R.

Self Organizing Maps (SOM Clustering) to Identify Patterns of Haustorial-Specific Expression

SOM clustering was used to maximize the identification of genes that show a high and specific expression pattern. The analysis was performed on the web server with the unsupervised learning SOM clustering of GenePattern developed by the Broad Institute (Reich et al. 2006). SOM clustering clearly reveals the overall pattern from the genome-wide gene expression data by reducing dimensionality of the original data. SOM clustering of GenePattern involved three steps: Data preprocessing, SOM clustering, and SOMClusterViewer. The expression matrix (log2FPKM) of the differentially expressed features was used as the input file for preprocessing, in which a row normalization and gene filtering were performed. The threshold and filter were performed with default parameters (floor: -3; ceiling: 18; min fold change: 1.5; min delta: 5). SOM clustering was performed by manually selecting the cluster-range. The default was chosen at the beginning and changed until the pattern with haustorial high and specific expression was revealed. The SOMClusterViewer displayed the expression profile of each cluster identified by SOM clustering. Finally for all three data sets from the three species, the cluster range was chosen at 6–8, which identified one or more clusters with high and specific expression in haustorial tissue.

Enrichment Analysis—Parasite Genes Versus Whole Plant Background

The identification of a list of parasite genes allowed us to ask what biological functions are enriched compared with the whole plant background within each species. For each parasite gene, we obtained its BLASTx best hit in Arabidopsis and used these TAIR hits to identify GO assignments (Ashburner et al. 2000) and perform enrichment analysis. Arabidopsis was selected for this analysis because of its relatively complete GO-term annotation. Enrichment analysis was performed with DAVID using Bonferoni adjusted P values for multiple tests (Huang et al. 2009) by comparing GO assignments for foreground (putative orthologs of parasite genes in Arabidopsis) versus background (all genes from the Arabidopsis genome). Enriched components with statistical significance, with annotations including Pfam domains, GO MF, GO BP, GO CC, and KEGG pathway, were identified for a set of shared parasite genes from each species.

Phylogenies and Ka/Ks Constraint Analysis for Parasite Genes

Transcripts from the Orobanchaceae were assigned into orthogroups defined by 586,228 protein-coding genes of 22 representatives of sequenced land plant genomes () using OrthoMCL. The selected taxa include nine rosids (Ar. thaliana, Thellungiella parvula, Ca. papaya, Theobroma cacao, Po. trichocarpa, Fragaria vesca, Glycine max, M. truncatula, and V. vinifera), three asterids (So. lycopersicum, Sol. tuberosum, and Mimulus guttatus), two basal eudicots (Nelumbo nucifera and Aquilegia coerulea), five monocots (O. sativa, Brachypodium distachyon, Sor. bicolor, Musa acuminata, and Phoenix dactylifera), one basal angiosperm (A. trichopoda), one lycophyte (Selaginella moellendorffii), and one moss (Physcomitrella patens). Of the plants with sequenced genomes, Mimulus, an emerging asterid model plant of family Phyrmaceae, is the most closely related to Orobanchanchaceae (Schaferhoff et al. 2010; Refulio-Rodriguez and Olmstead 2014). Candidate orthogroups for unigenes from transcriptome assemblies of Lindenbergia, Triphysaria, Striga, Phelipanche, and two Asteraceae species, Lactuca sativa and Helianthus annuus, were identified by retaining BLASTP (McGinnis and Madden 2004) hits with E value ≤1e-5 for predicted peptide searches against the orthogroup-classified proteomes from those 22 sequenced plant genomes. HMM (Eddy 2011) searches of the translated transcripts were then performed on constructed candidate HMM orthogroup classification profiles, and orthogroups yielding the best bitscore were assigned to the transcripts. Once unigenes were found that had high and DE in haustoria, phylogenies were estimated for their corresponding orthogroups. Amino acid alignments of sequences within these orthogroups (including any translated transcripts that were assigned to the orthogroup as described above) were generated with MAFFT (Katoh and Standley 2013) and the corresponding DNA sequences were forced onto the amino acid alignments using a custom perl script. DNA alignments were then trimmed with trimAL (Capella-Gutierrez et al. 2009) to remove sites with less than 10% of the taxa. Orthogroup alignments were required to contain transcripts with alignment coverage of at least 50%. Otherwise, the failing transcripts were removed from the orthogroup amino acids and DNA alignments, and the alignments were regenerated. This process was iterated until all of the sequences covered at least 50% of the alignment. Finally, maximum likelihood (ML) phylogenetic trees of DNA alignments for orthogroups containing parasite sequence(s) were generated using RAxML (Stamatakis 2006) with the GTRGAMMA model. To evaluate the reliability of the branches on the tree, 100 pseudosamples for the alignment were generated to estimate branch support using the bootstrap method (Felsenstein 1985).

Scoring Gene Duplications

Gene duplication events were scored by referring to each rooted gene tree. Genes from Physcomitrella and/or Selaginella were used as outgroups to root each tree; when these outgroups were not present in the orthogroup, Amborella and/or or monocots were used. Because our interest in this article was focused on when parasitism genes evolved, we limited our analysis to gene trees that contain parasitism genes. Possible topologies showing gene duplications giving rise to parasitism genes are illustrated in supplementary data S15, Supplementary Material online. Gene duplications were scored if a parasitism gene from one or more of the three parasitic species (Triphysaria, Striga, and Phelipanche) was present in each duplicated clade, and if bootstrap values for key nodes met defined criteria. In addition, a sequence from at least one taxon had to be present in each duplicated clade. To illustrate, a Mimulus+Orobanchaceae-wide duplication (including nonparasitic Lindenbergia and three parasites—Triphysaria, Striga, and Phelipanche), and shown as (((M1O1)bootstrap1, (M2O2)bootstrap2)bootstrap3), is required to meet the following criteria: 1) Each clade defined by the nodes M1O1 and M2O2 contains at least one gene from the parasite taxa; 2) at least one taxon of Mimulus or Orobanchaceae has to be present in both clades defined by M1O1 or M2O2; and 3) bootstrap2 and at least one of either bootstrap1 or bootstrap2 must be greater than or equal to the bootstrap stringency cutoffs of 50% or 80%. An example of a scored gene tree with a supported gene duplication is given in supplementary figure S3, Supplementary Material online.

Selective Constraint Analysis

To perform the constraint analysis to infer adaptive or purifying selection, PAML (Yang 1997, 2007) software based on ML was utilized for hypothesis testing. The branch model in PAML was used to test whether the foreground branch of interest has significantly different dN/dS ratio (omega, ω) compared with the background ω. The codeml tool in PAML was used to perform such analyses. To estimate significance of one particular hypothesis, a likelihood ratio test was used. The one-ratio model and branch model in codeml of PAML were used to test whether the branch model fits the model significantly better than the one-ratio model. When the one-ratio model is correct, the distribution of the likelihood-ratio test statistic follows a chi-square distribution with the degrees of freedom being equal to the number of additional parameters in the branch model test. The test statistics was calculated by taking twice the difference between log likelihood-values from the two tests. These models were fitted to examine which model was more appropriate for the data. The branch-site model was further used when the branch-model identified significant differences between the foreground and background lineages. The branch-site model was also used to identify sites under positive selection for the indicated foreground lineages. To perform the branch-site model, the codeml file was set to model = 2, NSsites=2. The null model was set to fix ω at 1, whereas the alternative model was set to estimate ω. Sites identified by PAML with a probability greater than 0.95 by Bayes Empirical Bayes analysis were examined further through looking at the peptide alignment.

Expression of Haustorial Orthologs in Nonparasitic Species

We utilized the existing expression profile data for growth and developmental stages in Ar. thaliana, M. truncatula, and Sol. lycopersicum, which we refer to as Arabidopsis, Medicago, and tomato in this analysis, respectively. The expression profiles for the Arabidopsis and Medicago genes were extracted from the PLEXdb database (Dash et al. 2012) that contains microarray data (Arabidopsis, AT40: Expression Atlas of Arabidopsis Development [AtGenExpress]; Medicago, ME1: The M. truncatula Gene Expression Atlas), whereas the data for tomato were from the digital expression (RNA-Seq) experiment (D007: Transcriptase analysis of various tissues in wild species Sol. pimpinellifolium, LA1589) in the Tomato Functional Genomics Database (Fei et al. 2011). First, parasite genes with upregulated haustorial expression were identified, and their putative orthologs were obtained as the best blast hits within the same orthogroup in Arabidopsis, Medicago, and tomato among the sequences. To find the expression information of the genes in Arabidopsis and Medicago used in the gene family analysis, we used the microarray expression information of probes for these genes by BLASTn. For tomato, expression for each gene was retrieved directly from the RNA-Seq database. To make the expression easily comparable across species, the expression values for similar tissues were averaged (supplementary data S20, Supplementary Material online). In the Arabidopsis gene expression atlas, all vegetative_leaf and rosette_leaf were combined as “leaf,” the sepal, petal, stamen, and carpel were combined as “flower,” different root tissues (root7, root_MS1, and root_GM) were combined as “root.” In the Medicago gene expression atlas, expression data for tissue responding to nod factors (Nod 4d, Nod 10d, and Nod 14d) were combined as “nod,” root and root-0d as “root,” and seed 10d, seed 12d, seed 16d, seed 20d, seed 36d as “seed.” In the tomato RNA-Seq data, newly developed leaves and mature green leaflets were combined and labeled as “leaf,” flower buds 10 days before anthesis or younger and flowers at anthesis (0 DPA) as “flower,” and fruit at 10 DPA, 20 DPA, 33 DPA as “fruit.” We scored the expression for upregulated haustorial orthogroups based on the principally expressed tissue. To do this, we divided the expression of a gene in a given tissue by its summed expression across all tissues to obtain a normalized expression in each tissue. Then, we identified genes as “principally expressed” in a tissue if its normalized expression was ≥2-fold higher in that tissue than in any other. If similarly high expression was found in two tissues, both tissues were scored. Genes with broad expression in more than three tissues were not scored. In addition to this binary scoring of the principally expressed tissue for haustorial genes, we also scored each tissue quantitatively using the average of each tissue’s expression across all haustorial gene orthologs. We excluded genes whose highest expressions across all tissues were in the lower 25th percentile. We then averaged expression of each tissue across all genes and ranked each tissue based on the expression. All orthogroups were subject to this step and finally each tissue type that supported a possible haustorial origin was scored by the number of upregulated haustorial orthogroups and the average expression across all orthogroups.

Supplementary Material

Supplementary figures S1–S3, tables S1 and S2, and data S1–S23 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/) and the PPGP website through the link provided (http://ppgp.huck.psu.edu/5_Our_own_link_Supplementary_Data_Final_submission_2014.zip, last accessed December 18, 2014).
  121 in total

Review 1.  Preservation of duplicate genes by complementary, degenerative mutations.

Authors:  A Force; M Lynch; F B Pickett; A Amores; Y L Yan; J Postlethwait
Journal:  Genetics       Date:  1999-04       Impact factor: 4.562

2.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models.

Authors:  Alexandros Stamatakis
Journal:  Bioinformatics       Date:  2006-08-23       Impact factor: 6.937

3.  A new retroelement constituted by a natural alternatively spliced RNA of murine replication-competent retroviruses.

Authors:  Laurent Houzet; Jean Luc Battini; Eric Bernard; Valerie Thibert; Marylène Mougel
Journal:  EMBO J       Date:  2003-09-15       Impact factor: 11.598

4.  Phylogeny of Lamiidae.

Authors:  Nancy F Refulio-Rodriguez; Richard G Olmstead
Journal:  Am J Bot       Date:  2014-02-08       Impact factor: 3.844

5.  Striga hermonthica MAX2 restores branching but not the Very Low Fluence Response in the Arabidopsis thaliana max2 mutant.

Authors:  Qing Liu; Yanxia Zhang; Radoslava Matusova; Tatsiana Charnikhova; Maryam Amini; Muhammad Jamil; Monica Fernandez-Aparicio; Kan Huang; Michael P Timko; James H Westwood; Carolien Ruyter-Spira; Sander van der Krol; Harro J Bouwmeester
Journal:  New Phytol       Date:  2014-01-31       Impact factor: 10.151

Review 6.  The plasma membrane Ca²+ ATPase and the plasma membrane sodium calcium exchanger cooperate in the regulation of cell calcium.

Authors:  Marisa Brini; Ernesto Carafoli
Journal:  Cold Spring Harb Perspect Biol       Date:  2011-02-01       Impact factor: 10.005

7.  A novel cold-inducible gene from Arabidopsis, RCI3, encodes a peroxidase that constitutes a component for stress tolerance.

Authors:  Francisco Llorente; Rosa María López-Cobollo; Rafael Catalá; José Miguel Martínez-Zapater; Julio Salinas
Journal:  Plant J       Date:  2002-10       Impact factor: 6.417

8.  Global analysis of Arabidopsis gene expression uncovers a complex array of changes impacting pathogen response and cell cycle during geminivirus infection.

Authors:  José Trinidad Ascencio-Ibáñez; Rosangela Sozzani; Tae-Jin Lee; Tzu-Ming Chu; Russell D Wolfinger; Rino Cella; Linda Hanley-Bowdoin
Journal:  Plant Physiol       Date:  2008-07-23       Impact factor: 8.340

9.  Analysis of promoter activity of members of the PECTATE LYASE-LIKE (PLL) gene family in cell separation in Arabidopsis.

Authors:  Lingxia Sun; Steven van Nocker
Journal:  BMC Plant Biol       Date:  2010-07-22       Impact factor: 4.215

10.  trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses.

Authors:  Salvador Capella-Gutiérrez; José M Silla-Martínez; Toni Gabaldón
Journal:  Bioinformatics       Date:  2009-06-08       Impact factor: 6.937

View more
  54 in total

1.  Horizontal gene transfer is more frequent with increased heterotrophy and contributes to parasite adaptation.

Authors:  Zhenzhen Yang; Yeting Zhang; Eric K Wafula; Loren A Honaas; Paula E Ralph; Sam Jones; Christopher R Clarke; Siming Liu; Chun Su; Huiting Zhang; Naomi S Altman; Stephan C Schuster; Michael P Timko; John I Yoder; James H Westwood; Claude W dePamphilis
Journal:  Proc Natl Acad Sci U S A       Date:  2016-10-24       Impact factor: 11.205

2.  Mechanistic model of evolutionary rate variation en route to a nonphotosynthetic lifestyle in plants.

Authors:  Susann Wicke; Kai F Müller; Claude W dePamphilis; Dietmar Quandt; Sidonie Bellot; Gerald M Schneeweiss
Journal:  Proc Natl Acad Sci U S A       Date:  2016-07-22       Impact factor: 11.205

Review 3.  How interactions with plant chemicals shape insect genomes.

Authors:  Andrew D Gloss; Patrick Abbot; Noah K Whiteman
Journal:  Curr Opin Insect Sci       Date:  2019-09-24       Impact factor: 5.186

Review 4.  Plant science's next top models.

Authors:  Igor Cesarino; Raffaele Dello Ioio; Gwendolyn K Kirschner; Michael S Ogden; Kelsey L Picard; Madlen I Rast-Somssich; Marc Somssich
Journal:  Ann Bot       Date:  2020-06-19       Impact factor: 4.357

5.  SMAX1-dependent seed germination bypasses GA signalling in Arabidopsis and Striga.

Authors:  Michael Bunsick; Shigeo Toh; Cynthia Wong; Zhenhua Xu; George Ly; Christopher S P McErlean; Gianni Pescetto; Kawther Elfituri Nemrish; Priscilla Sung; Jack Daiyang Li; Julie D Scholes; Shelley Lumba
Journal:  Nat Plants       Date:  2020-05-25       Impact factor: 15.793

6.  Cryptic host-specific diversity among western hemisphere broomrapes (Orobanche s.l., Orobanchaceae).

Authors:  Adam C Schneider; Alison E L Colwell; Gerald M Schneeweiss; Bruce G Baldwin
Journal:  Ann Bot       Date:  2016-08-18       Impact factor: 4.357

7.  Marker Development for Phylogenomics: The Case of Orobanchaceae, a Plant Family with Contrasting Nutritional Modes.

Authors:  Xi Li; Baohai Hao; Da Pan; Gerald M Schneeweiss
Journal:  Front Plant Sci       Date:  2017-11-21       Impact factor: 5.753

8.  Genomics of sorghum local adaptation to a parasitic plant.

Authors:  Emily S Bellis; Elizabeth A Kelly; Claire M Lorts; Huirong Gao; Victoria L DeLeo; Germinal Rouhan; Andrew Budden; Govinal B Bhaskara; Zhenbin Hu; Robert Muscarella; Michael P Timko; Baloua Nebie; Steven M Runo; N Doane Chilcoat; Thomas E Juenger; Geoffrey P Morris; Claude W dePamphilis; Jesse R Lasky
Journal:  Proc Natl Acad Sci U S A       Date:  2020-02-11       Impact factor: 11.205

9.  Local Auxin Biosynthesis Mediated by a YUCCA Flavin Monooxygenase Regulates Haustorium Development in the Parasitic Plant Phtheirospermum japonicum.

Authors:  Juliane K Ishida; Takanori Wakatake; Satoko Yoshida; Yumiko Takebayashi; Hiroyuki Kasahara; Eric Wafula; Claude W dePamphilis; Shigetou Namba; Ken Shirasu
Journal:  Plant Cell       Date:  2016-07-06       Impact factor: 11.277

Review 10.  Using biotechnological approaches to develop crop resistance to root parasitic weeds.

Authors:  Radi Aly; Maor Matzrafi; Vinay Kumar Bari
Journal:  Planta       Date:  2021-04-12       Impact factor: 4.116

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.