Literature DB >> 30949663

Key changes in gene expression identified for different stages of C4 evolution in Alloteropsis semialata.

Luke T Dunning1, Jose J Moreno-Villena1, Marjorie R Lundgren1, Jacqueline Dionora2, Paolo Salazar2, Claire Adams3, Florence Nyirenda4, Jill K Olofsson1, Anthony Mapaura5, Isla M Grundy6, Canisius J Kayombo7, Lucy A Dunning8, Fabrice Kentatchime9, Menaka Ariyarathne10, Deepthi Yakandawala10, Guillaume Besnard11, W Paul Quick1,2, Andrea Bräutigam12, Colin P Osborne1, Pascal-Antoine Christin1.   

Abstract

C4 photosynthesis is a complex trait that boosts productivity in tropical conditions. Compared with C3 species, the C4 state seems to require numerous novelties, but species comparisons can be confounded by long divergence times. Here, we exploit the photosynthetic diversity that exists within a single species, the grass Alloteropsis semialata, to detect changes in gene expression associated with different photosynthetic phenotypes. Phylogenetically informed comparative transcriptomics show that intermediates with a weak C4 cycle are separated from the C3 phenotype by increases in the expression of 58 genes (0.22% of genes expressed in the leaves), including those encoding just three core C4 enzymes: aspartate aminotransferase, phosphoenolpyruvate carboxykinase, and phosphoenolpyruvate carboxylase. The subsequent transition to full C4 physiology was accompanied by increases in another 15 genes (0.06%), including only the core C4 enzyme pyruvate orthophosphate dikinase. These changes probably created a rudimentary C4 physiology, and isolated populations subsequently improved this emerging C4 physiology, resulting in a patchwork of expression for some C4 accessory genes. Our work shows how C4 assembly in A. semialata happened in incremental steps, each requiring few alterations over the previous step. These create short bridges across adaptive landscapes that probably facilitated the recurrent origins of C4 photosynthesis through a gradual process of evolution.
© The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Experimental Biology.

Entities:  

Keywords:  Adaptation; C4 photosynthesis; complex trait; intermediates; phylogenetics; transcriptomics

Year:  2019        PMID: 30949663      PMCID: PMC6598098          DOI: 10.1093/jxb/erz149

Source DB:  PubMed          Journal:  J Exp Bot        ISSN: 0022-0957            Impact factor:   6.992


Introduction

The origins of traits composed of multiple anatomical and/or biochemical components have always intrigued evolutionary biologists (Darwin, 1859; Meléndez-Hevia ; Lenski ). If such traits gain their function only through the co-ordinated action of multiple components, their evolution via natural selection must cross a valley in the adaptive landscape. Despite this obstacle, complex traits have evolved repeatedly in diverse groups of organisms. This apparent paradox is solved for most traits by the existence of intermediate stages, which act as evolutionary enablers, creating bridges over the valleys of the adaptive landscape (Jacob, 1977; Dawkins, 1986; Weinreich ; Blount ; Vopalensky ; Werner ). The accessibility of new traits probably depends on the length and complexity of such bridges, which are generally unknown. Quantifying the evolutionary gap between phenotypic states is therefore crucial to contextualize the likelihood of a novel trait evolving. An excellent system to study the evolutionary trajectories of an adaptive trait is C4 photosynthesis. This metabolic pathway increases CO2 concentration at the active site of assimilation via the Calvin–Benson cycle (Hatch, 1987; Sage, 2004; Christin and Osborne, 2014). This avoids the energetically costly process of photorespiration, effectively increasing photosynthetic efficiency in warm and arid conditions (Sage , 2018). This CO2-concentrating mechanism relies on a set of specific leaf anatomical properties and the co-ordinated action of up to 10 enzymes carrying the C4 reactions (hereafter ‘core C4 enzymes’) and numerous associated proteins (Supplementary Table S1 at JXB online; Hatch, 1987; Bräutigam ; Sage ; Külahoglu ; Lundgren ; Yin and Struik 2018). Despite its apparent complexity, C4 photosynthesis is a textbook example of convergent evolution, having independently evolved >60 times within flowering plants (Sage ). The origins of C4 photosynthesis were probably facilitated by the presence of anatomical enablers in some groups (Christin ; Sage ), but the processes leading to a functioning C4 biochemical pathway within these anatomical structures are less well understood. All C4 enzymes studied so far exist in C3 plants, but are involved in different pathways (Aubry ). There is a bias in the recruitment of genes into the C4 system, with genes ancestrally abundant in the leaves of C3 plants preferentially co-opted for C4 (Christin ; John ; Emms ; Moreno-Villena ). Changes to their expression patterns and/or kinetic properties of the encoded enzyme then followed (Bläsing ; Hibberd and Covshoff, 2010; Huang ; Moreno-Villena ), with cell-specific expression realized in some cases through the recruitment of pre-existing regulatory mechanisms (Brown ; Kajala ; Cao ; Reyna-Llorens and Hibberd, 2017; Borba ; Reyna-Llorens ). The evolutionary transition between C3 and C4 phenotypes involves intermediate stages that only have some of the anatomical and biochemical modifications typical of C4 plants (Monson and Moore, 1989; Sage , 2018). In particular, some C3+C4 plants perform a weak C4 cycle that is responsible for only part of their carbon assimilation (these correspond to ‘type II C3–C4 intermediates’; Ku ; Monson ; Schlüter and Weber, 2016). This weak C4 cycle might have emerged through the up-regulation of C4-related enzymes to balance nitrogen among cellular compartments in the multiple lineages of plants that use a photorespiratory pump (Sage , 2012; Mallmann ; Bräutigam and Gowik, 2016). Metabolic models suggest that any increase in flux of CO2 fixed through the C4 cycle in intermediate plants directly translates into biomass gain, selecting for gradual increases in C4 gene expression (Heckmann ; Mallmann ). The current model of C4 evolution therefore assumes gradual, yet abundant, changes in plant transcriptomes and genomes during the transition from C3 ancestors to physiologically C4 descendants. Indeed, comparisons of C3 and C4 species have typically identified thousands of differentially expressed genes encoding C4 enzymes, regulators, and accessory metabolite transporters (Bräutigam , 2014; Gowik ; Külahoglu ; Li ; Lauterbach ). These large numbers might partially result from the comparison of species typically separated by millions of years of divergence (Christin ), which leaves ample time for the accumulation of secondary changes linked to the C4 trait beyond the minimal requirements, as well as variation in other unrelated traits (Heyduk ). Even within a single species where photosynthetic transitions can be induced, the number of differentially expressed genes identified in transcriptome comparisons can be extremely high (Chen ). Previous efforts have, however, typically targeted very few individuals per C4 lineage, such that the initial bout of co-option that generated a C4 cycle cannot be distinguished from subsequent adaptation via natural selection and diversification caused by genetic drift (Christin and Osborne, 2014; Reeves ; Heyduk ). In this study, the transcriptomes of mature leaves are compared among plant populations using a phylogenetic approach. The work aims to quantify the phenotypic differences in gene expression between the C3 phenotype and plants using a weak C4 cycle (C3+C4 state), independently from those responsible for the transition to the full C4 type, and finally from those involved in the adaptation of an existing C4 phenotype. The time elapsed between transitions, and therefore the number of changes unrelated to C4 emergence, is reduced by focusing on a single species containing a diversity of photosynthetic types, the grass Alloteropsis semialata. Congeners of A. semialata are C4, but previous comparative transcriptomics and leaf anatomy have shown that C4 biochemistry emerged multiple times in the genus, from a common ancestor with some C4-like characters (Fig. 1; Dunning ). Capitalizing on the physiological diversity existing within A. semialata, leaf transcriptomes from multiple individuals originating from diverse populations of each photosynthetic type in this species are analysed, together with closely related C3 and C4 species, to detect the changes in gene expression linked to (i) the phenotypic difference between C3 plants and C3+C4 intermediates; (ii) the shift to fixing carbon exclusively via the C4 pathway in solely C4 plants; and (iii) the subsequent adaptation of the C4 cycle in geographically isolated C4 populations. This deconstruction of the genetic origins of a complex biochemical pathway sheds new light on the number of genetic changes needed to move to another part of the adaptive landscape during different stages of a stepwise physiological transition.
Fig. 1.

Phylogenetic tree inferred from multiple nuclear markers and sampling locations. (A) This phylogeny was inferred under maximum likelihood using transcriptome-wide markers. The scale indicates the number of nucleotide substitutions per site, and bootstrap support values are indicated near nodes. AANG=A. angusta. For A. semialata, population names indicate the country of origin; AUS=Australia, BUR=Burkina Faso, CMR=Cameroon, MAD=Madagascar, PHI=Philippines, RSA=South Africa, TAN=Tanzania, SRI=Sri Lanka, TPE=Chinese Taipei, ZAM=Zambia, ZIM=Zimbabwe. Populations sampled with biological replicates and used for differential expression analysis are indicated by the large circles and bold population names. Nuclear clades from Olofsson are indicated. Branch colors indicate the ancestral photosynthetic types, based on the transcriptomes and leaf anatomy detailed investigations of Dunning . The hashed green at the base of A. semialata indicates uncertainty between C3 and C3+C4 states. (B) Distribution of A. semialata photosynthetic types and sampling locations, with color codes as in (A). Shadings indicate the approximate ranges of the three photosynthetic types of A. semialata, based on Lundgren .

Phylogenetic tree inferred from multiple nuclear markers and sampling locations. (A) This phylogeny was inferred under maximum likelihood using transcriptome-wide markers. The scale indicates the number of nucleotide substitutions per site, and bootstrap support values are indicated near nodes. AANG=A. angusta. For A. semialata, population names indicate the country of origin; AUS=Australia, BUR=Burkina Faso, CMR=Cameroon, MAD=Madagascar, PHI=Philippines, RSA=South Africa, TAN=Tanzania, SRI=Sri Lanka, TPE=Chinese Taipei, ZAM=Zambia, ZIM=Zimbabwe. Populations sampled with biological replicates and used for differential expression analysis are indicated by the large circles and bold population names. Nuclear clades from Olofsson are indicated. Branch colors indicate the ancestral photosynthetic types, based on the transcriptomes and leaf anatomy detailed investigations of Dunning . The hashed green at the base of A. semialata indicates uncertainty between C3 and C3+C4 states. (B) Distribution of A. semialata photosynthetic types and sampling locations, with color codes as in (A). Shadings indicate the approximate ranges of the three photosynthetic types of A. semialata, based on Lundgren .

Materials and methods

Species sampling and growth conditions

Three biological replicates from 10 separate populations/species were used for differential gene expression analyses. Seven of these were geographically distinct Alloteropsis semialata populations including: two C3 populations from South Africa (RSA6) and Zimbabwe (ZIM1502) that represent extremes of the C3 geographic range (Fig. 1B; Lundgren ), two geographically distant C3+C4 populations from Tanzania (TAN1602) and Zambia (ZAM1503) that are hypothesized to operate a weak C4 cycle (Lundgren ), and three C4 populations from Cameroon (CMR1601), Tanzania (TAN4), and the Philippines (PHI1601) that sample the two C4 genetic subgroups (Olofsson ; Supplementary Fig. S1). The C4 populations of A. semialata have decreased CO2 compensation points, increased carboxylation efficiencies, and shifts in carbon isotopes compared with the C3 populations that confirm their photosynthetic type (Lundgren ). The C4 leaves are characterized by increased vein density, phosphoenolpyruvate carboxylase (PEPC) protein abundance, and transcript abundance of genes encoding some C4 enzymes compared with the C3 types (Lundgren , 2019; Dunning ). The C3+C4A. semialata also show elevated leaf levels of PEPC protein and genes for some C4 enzymes, and increased concentration of chloroplasts in bundle sheaths in comparison with the C3 populations, but no increase in vein density (Lundgren ; Dunning ). However, while slightly shifted compared with their C3 conspecifics, their carbon isotope ratios are not in the C4 range, which is common in plants performing a weak C4 cycle, responsible for only part of their CO2 uptake (i.e. ‘type II intermediates’; Monson ; von Caemmerer, 1992; Sage ; Lundgren ). This results in a reduced CO2 compensation point and oxygen inhibition (Lundgren ), as observed in other species acquiring part of their carbon via a weak C4 cycle (Ku ). In addition to the seven A. semialata populations, we included one population of each of the C4 congeners A. angusta (AANG1 from Uganda) and A. cimicina (from Madagascar) to enable comparison of convergent C4-related changes in gene expression (Supplementary Fig. S1). Finally, an Entolasia marginata population from Australia was included as a C3 outgroup. Three distinct genotypes for eight of the 10 populations described above were retrieved from a recent data set (Dunning ) or sequenced here. For the two other populations, sufficient biological replicates were not available. For A. angusta, we sequenced three clones of a single wild collected plant that were established >1 year before the study, while for E. marginata we sequenced two different genotypes and a clone of one of these genotypes, similarly established before the study (see Supplementary Table S2 for detailed sample collection information). To evaluate the diversity of gene expression across the spectrum of photosynthetic types and the genetic variation within each photosynthetic type, we supplemented the above data with a single biological replicate from a further 15 geographically distinct populations (12 from previously published data; Dunning , 2019; Fig. 1A). The three newly sequenced individuals are two C4A. semialata from Sri Lanka (SRI1702, lat: 6.81 long: 80.92) and Zambia (ZAM1726, lat: –14.21 long: 28.60), and a C3 individual from Zimbabwe (ZIM1503, lat: –18.78 long: 32.74). In total, we had 45 RNA sequencing (RNA-Seq) libraries from 25 populations/species, with three biological replicates sampled from 10 populations and a single biological replicate sampled from the remaining 15 populations (Fig. 1A). All plants were collected from the field as seeds or live cuttings, and subsequently grown under controlled conditions at the University of Sheffield as previously described (Dunning ). In brief, plants were potted in John Innes No. 2 compost (John Innes Manufacturers Association, Reading, UK) and maintained under wet, nutrient-rich conditions in controlled-environment chambers (Conviron BDR16; Manitoba, Canada) set to 60% relative humidity, 500 μmol m–2 s–1 light intensity, 14 h photoperiod, and day/night temperatures of 25/20 °C. After a minimum of 30 d in these growth conditions, young fully expanded leaves were sampled for transcriptome analyses.

RNA extraction, sequencing, and transcriptome assembly

RNA extraction, library preparation, and sequencing were performed as previously described (Dunning ). In brief, total RNA was extracted from the distal half of fully expanded fresh leaves, sampled in the middle of the light period, using the RNeasy Plant Mini Kit (Qiagen, Hilden, Germany) with an on-column DNA digestion step (RNase-Free DNase Set; Qiagen). Total RNA was used to generate 34 indexed RNA-Seq libraries using the TruSeq RNA Library Preparation Kit v2 (Illumina, San Diego, CA, USA). Each library was subsequently sequenced on 1/24 of a single Illumina HiSeq 2500 flow cell (with other samples from the same or unrelated projects), which ran for 108 cycles in rapid mode at the Sheffield Diagnostic Genetics Service. The raw RNA-Seq data were cleaned using the Agalma pipeline v.0.5.0 to remove low quality reads (Q<30), and sequences corresponding to rRNA or containing adaptor contamination (Dunn ). De novo transcriptomes were assembled using Trinity (version trinityrnaseq_r20140413p1; Grabherr ). All raw data and transcriptome assemblies have been submitted to the NCBI repository (Bioproject PRJNA401220). Coding sequences (CDS) longer than 500 bp were predicted for each population using OrfPredictor (Min ), which uses homology to a user-supplied reference protein database or ab initio predictions if no suitable match is found. The protein database used comprised the complete coding sequences of eight model species: Arabidopsis thaliana, Brachypodium distachyon, Glycine max, Oryza sativa, Populus trichocarpa, Setaria italica, Sorghum bicolor, and Zea mays.

Phylogenetic reconstruction using core orthologs

Single-copy orthologs were extracted from the newly and previously published transcriptome assemblies (Dunning ) to infer phylogenetic relationships among individuals. Homologous sequences to 581 single-copy plant core orthologs previously determined in the Inparanoid ortholog database (Sonnhammer and Östlund, 2015) were identified using a Hidden Markov Model-based search tool (HaMSTR v.13.2.3; Ebersberger ). Sequences of the single-copy plant core orthologs were subsequently aligned using a previously described stringent alignment and filtering pipeline (Dunning ). In brief, the CDS were translationally aligned and filtered using T-COFFEE v. 11.00.8cbe486 (Notredame ) before trimming with gblocks v.0.91 (Castresana, 2000). Sequences shorter than 100 bp after trimming, and ortholog alignments with a mean nucleotide identity <95% were discarded, retaining 504 markers. A maximum likelihood tree was inferred using IQ-TREE v.1.6.3 (Nguyen ), which determined the most appropriate nucleotide substitution model prior to inferring a phylogeny with 1000 ultrafast bootstrap replicates.

Differential expression analyses

For differential expression analysis, we used the 45 144 cDNA sequences from the A. semialata reference genome (Dunning ; accession number QPGU00000000) as a reference. Cleaned reads were mapped to the reference using Bowtie2 v.2.3.4.1 (Langmead and Salzberg, 2012) recording all alignments. Counts for each transcript were then calculated using eXpress v.1.5.1 (Roberts and Pachter, 2013) with default parameters, and are reported in reads per kilobase of transcript per million mapped reads (RPKM). A multivariate analysis was used to assess similarities and differences in overall transcriptome expression profiles between samples. Clustering of expression profiles based on the biological coefficient of variation (BCV) were identified with multidimensional scaling (MDS) in edgeR v3.4.2 (Robinson ). Differential expression analysis in edgeR was restricted to the 10 populations with three biological replicates. For each pair of populations, differentially expressed genes were identified as those with an associated false discovery rate (FDR) below 0.05. The overlap between pairwise comparisons was used to identify changes associated with specific branches of the phylogenetic tree inferred from core orthologs. Changes were assigned to a branch if significant results were detected for all pairwise tests involving one member of the descending clade and one population outside the clade, and the direction of expression change was consistent. This summary of pairwise tests was done separately for each C3+C4/C4 clade (A. cimicina, A. angusta, and A. semialata) with all C3 populations so that convergent gene expression shifts could be detected. Overall, by grouping the differential expression results based on the phylogenetic clades, we are able to identify changes in gene expression that coincide with specific physiological transitions, as well as those that precede or follow these transitions.

Results

Transcriptome sequencing

Over 190 million 108 bp paired-end reads were used in this study, including >167 million for the 10 populations sampled in triplicate (Supplementary Table S3). For these 30 samples used in differential expression analyses, the data comprised 36.13 Gb, with a mean of 1.20 Gb per library (SD=0.54 Gb; Supplementary Table S3). Over 95% of reads were retained after cleaning, and a de novo transcriptome was assembled for each of the populations using all available reads.

Phylogenetic relationships based on concatenated ortholog alignments

A phylogenetic tree was inferred from a concatenated alignment of 504 ‘core orthologs’ extracted from the predicted coding sequences from 25 transcriptome assemblies (12 assembled here), for a total of 573 762 bp after cleaning. Each population was represented by at least 126 048 bp (mean=468 507 bp; SD 94 782 bp). The concatenated alignment had 21.1% gaps and 6.3% of sites were parsimony informative. The phylogeny was inferred using the GTR+F+R4 substitution model, which was the best fit model according to the Bayesian information criterion (BIC). The phylogenetic relationships were congruent with previous genome-wide nuclear trees (Olofsson ; Dunning ), and confirmed that all the sampled C4 populations of A. semialata form a monophyletic group, which is sister to the C3+C4 populations (Fig. 1). These two are in turn sister to the C3 populations, so that previously inferred nuclear clades I (C3), II (C3+C4), III and IV (both C4) are retrieved, with the polyploid populations (RSA3 and RSA4) branching in between and the Cameroonian population at their base (Olofsson ; Fig. 1). Alloteropsis angusta and A. cimicina branched successively outside of A. semialata (Fig. 1), again mirroring previous results (Lundgren ; Olofsson ; Dunning ).

Transcriptome-wide patterns

A mean of 57.4% (SD=12.05%) of cleaned reads from the 45 RNA-Seq libraries mapped back to the 45 144 cDNA sequences extracted from the reference A. semialata genome (only A. semialata samples n=34, mean=64.1%, SD=4.3%). In total, 59.8% (n=26 975) of gene sequences had expression levels of >1 read per million of mapped reads in at least three samples and were retained for differential expression analysis. Based on their expression profiles, samples group strongly by species (Fig. 2A). When focusing on A. semialata, the main phylogenetic groups are recovered, which match the photosynthetic types (Figs 1, 2B). There is no apparent effect of the source study, with previous and new transcriptomes of the same species grouping together (Fig. 2). Differential expression analysis was performed for each pair of the 10 populations that had three biological replicates. The 45 pairwise tests performed returned an average of 4880 (SD=2125) significantly (FDR <0.05) differentially expressed genes (Fig. 3; Supplementary Table S4). The number of differentially expressed genes is highest between the most distantly related populations and lowest among close relatives (Fig. 3). Complete expression results are available in Supplementary Tables S4 and S5.
Fig. 2.

Expression profile similarity across all samples. Expression profiles are clustered in multidimensional scaling (MDS) plots using (A) all samples and (B) only A. semialata samples. Species and photosynthetic types are indicated and population names are as in Fig. 1.

Fig. 3.

Number of differentially expressed genes among pairs of populations. The heatmap shows the number of significantly differentially expressed genes detected for each pair of populations. The phylogenetic relationships among populations are indicated on the side, using an ultrametric version of the tree presented in Fig. 1.

Expression profile similarity across all samples. Expression profiles are clustered in multidimensional scaling (MDS) plots using (A) all samples and (B) only A. semialata samples. Species and photosynthetic types are indicated and population names are as in Fig. 1. Number of differentially expressed genes among pairs of populations. The heatmap shows the number of significantly differentially expressed genes detected for each pair of populations. The phylogenetic relationships among populations are indicated on the side, using an ultrametric version of the tree presented in Fig. 1.

Differences between the C3 and C3+C4 states of A. semialata

As expected, the long divergence time between the C3 outgroup (Entolasia marginata) and A. semialata results in a large number of significant expression changes (branch A in Fig. 4). A total of 825 genes are down-regulated along this branch (3.1% of those expressed in leaves), including two genes encoding PEPC (ppc-1P2 and ppc-2P1; ASEM_AUS1_43423 and ASEM_AUS1_37421; Supplementary Table S6), which drop to barely detectable levels in all A. semialata accessions, and are therefore unlikely to be linked to photosynthetic diversification. A total of 1500 genes (5.6%) are up-regulated in A. semialata compared with the C3 outgroup (branch A in Fig. 4; Supplementary Table S6). This includes genes encoding the C4-related enzymes malate dehydrogenase (NAD-MDH; nadmdh-2P4; ASEM_AUS1_14800), AMP kinase (AK; ak-3P3; ASEM_AUS1_08191 and ASEM_AUS1_08195), glyceraldehyde 3-phosphate dehydrogenase (GAPDH; gapdh-1P2; ASEM_AUS1_06811), and phosphoenolpyruvate carboxylase kinase (PEPC-K; pepck-1P3 and pepck-3P6; ASEM_AUS1_38337 and ASEM_AUS1_12272), although their expression levels remain fairly low in all A. semialata regardless of the photosynthetic type (mean=42 RPKM; SD=37; Supplementary Table S5). One gene encoding an enzyme linked to the photorespiratory pathway is also up-regulated (hpr-2P3; ASEM_AUS1_28984), although levels again remain fairly low within A. semialata (mean=19 RPKM; SD=13; Supplementary Table S5). The rest of the numerous genes varying in expression between the whole of A. semialata and the outgroup do not have known links to the C4 pathway. A total of 60 genes (0.22%) are differentially expressed along the branch leading to the C3 populations of A. semialata (branch B in Fig. 4). None of these 60 genes encodes a protein known to function as part of the C4 pathway (Table S6).
Fig. 4.

Phylogenetic patterns of changes in gene expression. The maximum-likelihood phylogeny from Fig. 1 is shown unrooted after pruning the populations not used for expression analyses. For each branch, the number of differentially expressed genes is indicated, with numbers next to arrows indicating those that are consistently up- or down-regulated as one moves along the tree from the outgroup Entolasia marginata. Each population has three biological replicates, and colors indicate the photosynthetic type (blue=C3; green=C3+C4; red=C4). The scale indicates number of nucleotide substitutions per site, with truncated branches highlighted by two bars. The two grayed out C4 congeners were excluded from these analyses, and results that involve them can be found in Supplementary Fig. S3.

Phylogenetic patterns of changes in gene expression. The maximum-likelihood phylogeny from Fig. 1 is shown unrooted after pruning the populations not used for expression analyses. For each branch, the number of differentially expressed genes is indicated, with numbers next to arrows indicating those that are consistently up- or down-regulated as one moves along the tree from the outgroup Entolasia marginata. Each population has three biological replicates, and colors indicate the photosynthetic type (blue=C3; green=C3+C4; red=C4). The scale indicates number of nucleotide substitutions per site, with truncated branches highlighted by two bars. The two grayed out C4 congeners were excluded from these analyses, and results that involve them can be found in Supplementary Fig. S3. Within A. semialata, a C4 cycle, weak or strong, characterizes the monophyletic group of C3+C4 and C4 populations, but not its C3 sister group. Along the branch leading to C3+C4 and C4 accessions, we detect 67 significantly differentially expressed genes (branch E in Fig. 4; Table 1). Of those, 58 (0.22% of all expressed genes) are consistently up-regulated in the C3+C4 and C4 populations compared with the C3 samples, including three genes that encode key C4 enzymes: aspartate aminotransferase (ASP-AT; aspat-3P4; ASEM_AUS1_08268), phosphoenolpyruvate carboxykinase (PCK; pck-1P1; ASEM_C4_17510), and PEPC (ppc-1P3; ASEM_C4_19029; Supplementary Table S6). These three genes reach very high levels in the leaves of all C3+C4 and C4 individuals (mean=1766 RPKM; SD=585; Fig. 5: Supplementary Table S5), including the C4 congener A. angusta (mean=5002 RPKM; SD=2607; Supplementary Table S5). The other genes whose expression changes significantly along the same branch mostly remain at low to moderate levels in all A. semialata, but a number of them are also significant in A. angusta, and two of them in A. cimicina (Table 1; Supplementary Table S6). The significant genes include one for Nudix hydrolase, which was previously identified in a comparison of rice and C4 grasses (Ding ). The remaining genes have not, however, been related to C4 photosynthesis in previous screens of grasses (Ding ; Huang ). A gene for a callose synthase is down-regulated in the C3+C4/C4 group as well as in A. angusta (Table 1), which might be linked to plasmodesmatal widening to facilitate intercellular fluxes, as suggested for other genes linked to callose synthesis (Bräutigam ; Huang and Brutnell, 2016). Some of the other differentially expressed genes encode proteins that have been previously suggested as being involved in metabolic/structural differences between photosynthetic types (e.g. acyl transferase and pyruvate dehydrogenase; Huang and Brutnell, 2016) or that might be linked to plasmodesmata (e.g. phosphatidylglycerol/phosphatidylinositol transfer protein), although the functional links with photosynthetic diversification remain to be tested.
Table 1.

List of genes with SwissProt annotations differentially expressed in key comparisons within Alloteropsis semialata from C3 to C3+C4, and C3+C4 to C4

GeneSwissProt protein description Arabidopsis orthologMean RPKM
C3C3+C4C4
Genes up-regulated in C3+C4 and C4A. semalata (branch E in Fig. 4)
ASEM_AUS1_17510aPhosphoenolpyruvate carboxykinase (PCK)AT4G37870211683017
ASEM_AUS1_08268aAspartate aminotransferase (ASP-AT)AT5G1152015818431196
ASEM_AUS1_19029aPhosphoenolpyruvate carboxylase (PEPC)AT2G42600958281118
ASEM_AUS1_30031aFruit bromelainAT1G0626011260497
ASEM_AUS1_08709Iron–sulfur cluster assembly protein 1AT4G2222067394473
ASEM_AUS1_11198Bifunctional TENA2 proteinAT3G16990104380
ASEM_AUS1_1991450S ribosomal protein L17AT5G6465017858
ASEM_AUS1_02887aCysteine proteinase 1AT2G3223004454
ASEM_AUS1_16281aProbable carboxylesterase 15AT5G0657011650
ASEM_AUS1_11666Putative protease Do-like 14AT5G2766016339
ASEM_AUS1_18766aNudix hydrolase 16AT3G1260042438
ASEM_AUS1_21431aDNA-binding protein MNB1BAT4G3557009430
ASEM_AUS1_24040a,bPutative phosphatidylglycerol/phosphatidylinositol transfer proteinAT3G1178043224
ASEM_AUS1_08934Putative F-box proteinAT4G3887001823
ASEM_AUS1_44075Indole-3-acetaldehyde oxidaseAT5G2096002822
ASEM_AUS1_24692Dihydrolipoyllysine-residue acetyltransferase component 1 of pyruvate dehydrogenase complexAT3G5220001320
ASEM_AUS1_38810UDP-glycosyltransferaseAT1G0568003517
ASEM_AUS1_24427Putative F-box proteinAT1G6577001916
ASEM_AUS1_43609aFlavin-containing monooxygenase FMO GS-OX-like 9AT5G078000713
ASEM_AUS1_40960Cysteine-rich receptor-like protein kinase 26 AT4G2324011813
ASEM_AUS1_16960aValine-tRNA ligaseAT1G1461002612
ASEM_AUS1_27461bAspartic proteinase nepenthesin-2AT2G032000212
ASEM_AUS1_15840Tyrosine-tRNA ligaseAT2G338400410
ASEM_AUS1_22664Probable nucleolar protein 5-1AT5G271200198
ASEM_AUS1_39034Putative protease Do-like 14AT5G276600117
ASEM_AUS1_21913Protein NEN1 AT5G07710056
ASEM_AUS1_01903Disease resistance protein RPMAT3G07040072
Genes down-regulated in C3+C4 and C4A. semialata (branch E in Fig. 4)
ASEM_AUS1_2173460S ribosomal protein L23aAT3G55280206072
ASEM_AUS1_01414a,bAcyl transferase 4AT3G621601501817
ASEM_AUS1_31537Pumilio homolog 23AT1G7232049129
ASEM_AUS1_0006140S ribosomal protein SAAT3G047704277
ASEM_AUS1_22162Tubulin alpha-3 chainAT4G149603263
ASEM_AUS1_22449aCallose synthase 3AT5G130003021
ASEM_AUS1_04268a40S ribosomal protein S21AT5G277002000
ASEM_AUS1_06562a,bPTI1-like tyrosine-protein kinase 3AT3G59350511
Genes up-regulated in C4A. semialata (branch I in Fig.4)
ASEM_AUS1_39556a,bPyruvate, phosphate dikinase 1 (PPDK)AT4G15530601331149
ASEM_AUS1_24184aPhosphatidylglycerol/phosphatidylinositol transfer proteinAT3G1178001104
ASEM_AUS1_29700Protein SRG1AT1G170202186
ASEM_AUS1_16577aLactoylglutathione lyaseAT1G118400046
ASEM_AUS1_06220 S-Norcoclaurine synthase 1AT1G170201139
ASEM_AUS1_24241DnaJ homolog subfamily A member 1AT3G142001133
ASEM_AUS1_44200aAquaporin TIP1-1AT2G368300017
ASEM_AUS1_13652Transcription factor TGAL4AT1G08320007
ASEM_AUS1_00246Nicotinamide adenine dinucleotide transporter 2AT1G25380002
Genes down-regulated in C4A. semialata (branch I in Fig.4)
ASEM_AUS1_43847a,bShort-chain dehydrogenase TIC 32AT4G2342018110

SwissProt protein description and Arabidopsis ortholog information are based on top-hit blast matches. Mean RPKM is derived from the seven A. semialata populations used for differential expression analysis (full summary of results can be found in Supplementary Table S6).

Significant change in the same direction in A. angusta.

Significant change in the same direction in A. cimicina

Fig. 5.

Expression levels across accessions. Expression levels in reads per kilobase of transcript per million mapped reads are shown for four example genes. The SD for populations with biological replicates is indicated. Colors indicate the photosynthetic types; blue=C3; green=C3+C4; red=C4.

List of genes with SwissProt annotations differentially expressed in key comparisons within Alloteropsis semialata from C3 to C3+C4, and C3+C4 to C4 SwissProt protein description and Arabidopsis ortholog information are based on top-hit blast matches. Mean RPKM is derived from the seven A. semialata populations used for differential expression analysis (full summary of results can be found in Supplementary Table S6). Significant change in the same direction in A. angusta. Significant change in the same direction in A. cimicina Expression levels across accessions. Expression levels in reads per kilobase of transcript per million mapped reads are shown for four example genes. The SD for populations with biological replicates is indicated. Colors indicate the photosynthetic types; blue=C3; green=C3+C4; red=C4.

Changes during the transition from C3+C4 to C4 in A. semialata

Within A. semialata, a strong C4 cycle characterizes a monophyletic group of populations (Fig. 1A), but only 16 genes (0.06% of all expressed genes) were significantly differentially expressed along the branch separating this group from the other populations (branch I in Fig. 4). Of these, 15 were consistently up-regulated in the C4 populations, including one gene encoding the core C4 enzyme pyruvate orthophosphate dikinase (PPDK; ppdk-1P2; ASEM_AUS1_39556), which reaches very high levels in all C4 populations (mean=4479 RPKM; SD=2293; Table 1; Fig. 5; Supplementary Table S6), including the congeners A. cimicina (mean=1766 RPKM; SD=585; Table S5) and A. angusta (mean=1367 RPKM; SD=1100; Supplementary Table S5). The other genes up-regulated in the C4 accessions, which include transcription factors and some transporters, reach moderate levels in the C4 accessions, although some are also significantly up-regulated in A. angusta (Table 1). Significant changes in the abundance of the genes for the phosphatidylglycerol/phosphatidylinositol transfer protein might be linked to modifications of plasmodesmata to facilitate metabolite exchanges (Grison ), while aquaporins might be involved in membrane diffusion of CO2 (Kaldenhoff ). However, whether these genes played a direct role in the photosynthetic diversification of A. semialata remains speculative.

Adaptation of C4 photosynthesis in independent lineages

The three C4 populations included in the differential expression analyses come from geographically distant locations and diverged more than half a million years ago (Lundgren ; Olofsson ), explaining the large number of differentially expressed genes among them (Fig. 3). Interestingly, this includes enzymes linked to the C4 cycle, with genes encoding PEPC (ppc-1P3; ASEM_AUS1_12633), NAD-MDH (nadmdh-1P8; ASEM_AUS1_25602), PEPC-K (pepck-1P3; ASEM_C4_38337), NADP-MDH (nadpmdh-3P4; ASEM_AUS1_33376), and a sodium bile acid symporter (SBAS; sbas-4P4; ASEM_AUS1_12098) all up-regulated in the C4 plants from the Philippines (PHI1601; Supplementary Table S6). A comparison of expression levels in the other transcriptomes (including the 15 populations not used for the differential expression) indicates that the gene sbas-4P4 has qualitatively higher expression in all C4 individuals from clade IV of A. semialata (mean=898 RPKM; SD=483), but not in the other C4 individuals (mean=27 RPKM; SD=19) or the other A. semialata populations as a whole (mean=20 RPKM; SD=13; Fig. 5; Supplementary Table S5). This gene is orthologous to a group of Arabidopsis paralogs including BASS6 (At4g22840), which has the ability to transport glycolate, and appears to be involved in a process decreasing photorespiration (South ). The Arabidopsis paralog previously related to C4 photosynthesis transports pyruvate (BASS2; Furumoto ), but its precise function might differ between the Alloteropsis and Arabidopsis orthologs. In addition, a gene encoding the photorespiratory enzyme peroxisomal (S)-2-hydroxy-acid oxidase (GLO; glo-1P1; ASEM_AUS1_30871) is down-regulated in only one of the three C4 populations (CMR1601; Supplementary Table S6). There is quite a large variation in the expression of individual genes encoding some other C4 enzymes, with some more abundant in the C4 than C3+C4A. semialata populations on average, yet relatively low in other C4 individuals. These genes include alanine aminotransferase (ALA-AT; alaat-1P5; ASEM_AUS1_25403; C4 mean=1105 RPKM; SD=812; C3+C4 mean=134 RPKM; SD=59; significantly differentially expressed in 13 of the 15 required pair-wise tests), which has low expression in C4 individuals from Tanzania (TAN4-08; RPKM=135) and Cameroon (CMR1601-07; RPKM=154). Similarly, one of the genes encoding the NADP-malic enzyme (nadpme-1P4; NADP-ME, ASEM_AUS1_06611; significantly differentially expressed in seven of the 15 required pair-wise tests) is on average more abundant in the C4 and C3+C4 (mean=300 RPKM; SD=235) than C3 (mean=75 RPKM; SD=32) A. semialata populations, but low within some C4 individuals (e.g. TAN4-01 RPKM=82; TAN4-08 RPKM=54; ZAM1503-08 RPKM=50; Fig. 5). This gene is also significantly up-regulated in A. cimicina and A. angusta (Supplementary Table S5). One of the genes for PEPC kinase (pepck-1P3) reaches high levels in several C4 accessions of A. semialata (Supplementary Table S5). Similarly, some genes for the small unit of Rubisco reach very low levels in some C4 accessions. For instance, the gene AUS1_20231 is at low levels in most C4A. semialata, yet remains very high in others, while the paralog AUS1_26631 reaches extremely low levels, specifically in the Asian group of C4A. semialata (Supplementary Table S5). A third paralog (AUS1_26630) remains high in all accessions, so that the total abundance of genes for Rubisco is not markedly decreased, which is congruent with the high Rubisco protein abundance in the leaf of the C4A. semialata (Ueno and Sentoku, 2006). The number of genes significantly differentially expressed in the C4A. cimicina and A. angusta lineages is much higher, since only one population represents each of these species (Supplementary Fig. S3). As previously reported (Dunning ), a high number of genes encoding core C4 enzymes, regulatory proteins, and transporters are up-regulated in A. cimicina (Supplementary Table S7), and to a lesser extent in A. angusta (Supplementary Table S8), while some photorespiration and Rubisco genes are down-regulated in both species. Besides the differentially expressed genes, a number of C4-related genes are abundant in all samples independent of their photosynthetic type. This is especially the case of genes encoding β-carbonic anhydrase (βca-2P3; ASEM_AUS1_16750; mean=1682 RPKM, SD=1027, minimum=290) and malate dehydrogenases [nadpmdh-1P1 (ASEM_AUS1_23802; mean=443 RPKM, SD=501, minimum=117), nadpmdh-3P4 (ASEM_AUS1_33376; mean=447 RPKM, SD=184, minimum=166), and nadmdh-3P5 (ASEM_AUS1_22160; mean=157 RPKM, SD=69, minimum=41)]. Transcripts for these genes were also abundant in the leaves of distantly related C3 grasses, and their up-regulation very probably pre-dates the diversification of the group (Moreno-Villena ).

Discussion

Sampling the natural diversity to limit false positives

RNA-Seq is routinely used to identify genes differentially expressed between individuals with distinct phenotypes, leading to lists of candidate genes underpinning these differences (e.g. Shen ; Dunning ; Fracasso ). When comparing distinct species, the risk of false positives is very high, as all changes in gene expression unrelated to the studied phenotypic transitions are detected. Here, 77.1% of genes expressed in the leaves are significantly differentially expressed in at least one pairwise comparison between our 10 populations (49.8% within A. semialata), which all belong to a relatively small group of closely related grasses. A powerful strategy to reduce false positives is to consider multiple independent origins of the trait of interest, and retain only those genes differentially expressed in all lineages (Ding ; Rao ). Such a filter would, however, exclude non-convergent changes in gene expression. The alternative approach adopted here was to carry out multi-individual comparisons to infer changes along specific branches of the phylogenetic tree. The problem of false positives remains, as changes coinciding with the studied transitions would also be detected. However, working within a species complex decreases the number of false positives, as shorter divergence times are likely to result in fewer unrelated changes in gene expression. Because most changes cluster on terminal branches (Fig. 4), probably representing neutral changes that do not persist over evolutionary time, the inference of changes on short internal branches is less likely to be affected by drift. Indeed, a comparison of a C3A. semialata with the C4 sister species A. angusta would identify >5000 (18% of genes expressed in the leaves) differentially expressed genes (Fig. 3). This number drops by ~50% when comparing individual C3 and C4 populations within A. semialata, but still includes all changes that occurred before, during, and after the C3 to C4 transition. After incorporating multiple populations of each type, only 67 genes (0.25% of genes expressed in the leaves) are identified that differ in expression between the C3 and C3+C4 phenotypes, and 16 (0.06% of genes expressed in the leaves) between the C3+C4 and C4 states. Changes in some of these genes might not be directly linked to the diversification of photosynthetic types, but several were convergently modified in A. angusta and/or A. cimicina (Table 1). These genes represent the best candidates for a role in the emergence and subsequent strengthening of a C4 cycle in the group.

Emergence and reinforcement of the C4 cycle in Alloteropsis semialata

The phylogenetic relationships and genus-wide comparisons of transcriptomes and leaf anatomical traits indicate that the last common ancestor of all A. semialata might have possessed a weak C4 cycle based on the up-regulation of some enzymes (Fig. 1; Dunning ). A large number of genes are differentially expressed between all A. semialata and the C3 outgroup, which is not surprising given the evolutionary distance of at least 15 million years (Christin ). However, these include relatively few genes encoding C4 enzymes (Supplementary Table S6). We conclude that the transcriptome of the C3A. semialata differs from that of other C3 grasses by relatively few C4-related genes. The C3 group might represent a reversal from a C3+C4 state to a phenotype with expression levels similar to the C3 outgroup. In such a scenario, C4-related changes that happened in the last common ancestor of A. semialata and were reversed in the C3 group would be assigned to the branch leading to the C3+C4 and C4 groups. Because they focus on the phenotypic gaps in gene expression between the C3 state and those using a weak or strong C4 cycle, our transcriptome comparisons are therefore not heavily influenced by potential evolutionary reversals or reticulate evolution. In total, 67 genes are differentially expressed in the group encompassing C3+C4 and C4 phenotypes, and these include only three genes encoding core C4 enzymes that are up-regulated in all C3+C4 and C4 individuals (genes for ASP-AT, PCK, and PEPC; Table 1; Supplementary Table S5). These three enzymes form an aspartate shuttle based on the PCK decarboxylase (Fig. 6), which theoretically cannot sustain a full C4 pathway on its own without creating an energetic imbalance among cell types (Wang ). However, it might create a weak CO2-concentrating mechanism in C3+C4 plants that can function without dramatic energetic consequences due to its co-existence with a C3 type of photosynthesis. While the functional significance of the other changes detected along the same branch is not always known, several might be linked to the control of plasmodesmata and thereby intracellular exchanges (Table 1). Other small adjustments of the cellular metabolism might remain undetected, but none of the other major C4 enzymes or transporters is significantly up-regulated during the emergence of a weak C4 cycle (Table 1). The apparently few changes in transcription required to operate a weak C4 cycle in the C3+C4 intermediates may be facilitated by C4-like anatomical properties and an abundance of genes for some key enzymes in the ancestor, as observed in other C3 grasses (Christin , b; Emms ; Dunning ; Moreno-Villena ), and recent evidence suggests that some anatomical traits themselves might emerge via very few genetic changes (Wang ). While it is only responsible for part of the plant’s CO2 uptake, the weak C4 cycle of C3+C4 plants reduces photorespiration (Ku ; Lundgren ), which confers a selective advantage analogous to that of a complete C4 cycle in tropical conditions (Sage ; Christin and Osborne, 2014; Lundgren and Christin, 2017), and allows the evolution of a stronger C4 cycle under natural selection for faster biomass accumulation (Heckmann ; Mallmann ; Bräutigam and Gowik, 2016).
Fig. 6.

Putative C4 pathway in Alloteropsis semialata. A C4 cycle is suggested for A. semialata based on the transcript abundance of C4-related genes, and the literature (Frean ; Ueno and Sentoku, 2006). Pathway components are colored per the differential expression analysis, with those in black being putatively sufficiently abundant in C3 ancestors, parts of the pathway in green those up-regulated during the transition to C3+C4, and parts in red those up-regulated during the transition from C3+C4 to C4. ALA-AT=alanine aminotransferase, ASP-AT=aspartate aminotransferase, CA=carbonic anhydrase, NADP-MDH=NADP malate dehydrogenase, NAD(P)-ME=NAD(P) malic enzyme, PCK=phosphoenolpyruvate carboxykinase, PEPC=phosphoenolpyruvate carboxylase, PEPP=phosphoenolpyruvate phosphatase, PPDK=pyruvate orthophosphate dikinase, PCR=photosynthetic carbon reduction (Calvin–Benson cycle).

Putative C4 pathway in Alloteropsis semialata. A C4 cycle is suggested for A. semialata based on the transcript abundance of C4-related genes, and the literature (Frean ; Ueno and Sentoku, 2006). Pathway components are colored per the differential expression analysis, with those in black being putatively sufficiently abundant in C3 ancestors, parts of the pathway in green those up-regulated during the transition to C3+C4, and parts in red those up-regulated during the transition from C3+C4 to C4. ALA-AT=alanine aminotransferase, ASP-AT=aspartate aminotransferase, CA=carbonic anhydrase, NADP-MDH=NADP malate dehydrogenase, NAD(P)-ME=NAD(P) malic enzyme, PCK=phosphoenolpyruvate carboxykinase, PEPC=phosphoenolpyruvate carboxylase, PEPP=phosphoenolpyruvate phosphatase, PPDK=pyruvate orthophosphate dikinase, PCR=photosynthetic carbon reduction (Calvin–Benson cycle). The transition from a weak to a strong C4 cycle in A. semialata changes carbon isotope signatures (the method most often used to identify photosynthetic types) from non-C4 values to values diagnostic of C4 plants (von Caemmerer, 1992; Lundgren ). This shift indicates a strengthened connection between the C3 and C4 cycles and a decreased leakiness, so that less atmospheric CO2 is directly fixed by the Calvin–Benson cycle (Monson ; von Caemmerer, 1992). Within A. semialata, this might have been mediated by the reduced distance between veins in the C4A. semialata (Lundgren , 2019; Dunning ) and/or biochemical alterations. The up-regulation of relatively few genes (0.06%) coincided with the phenotypic transitions, and only one of these encoded an enzyme with a known C4 function, namely PPDK. This enzyme is responsible for the regeneration of PEP, the substrate of PEPC (Fig. 6). An increased PPDK activity is also observed between species of Flaveria performing a weak and a strong C4 cycle, and it has been suggested that this provides PEPC with PEP at higher rates, thereby increasing the efficiency of the C4 pathway (Monson and Moore, 1989; Sage ). Based on the literature and our transcriptome data, the C4 cycle of A. semialata relies on a minimum of seven enzymes (Fig. 6; Frean ; Ueno and Sentoku, 2006). Genes for some of these enzymes (NAD-MDH and AK) increased in the common ancestor of the whole group, potentially as part of an ancestral weak C4 cycle (Fig. 1; Dunning ). Within A. semialata, further increases in transcript abundance are observed in the C3+C4 versus C3 or C4 versus C3+C4 comparisons (Table 1) for genes encoding PEPC and three other enzymes (i.e. ASP-AT, PCK, and PPDK; Fig. 5). The expression of genes encoding carbonic anhydrase and others NAD(P)-MDHs in the C3 ancestor of the group might have been sufficient to sustain a functioning C4 cycle (Supplementary Table S5; Moreno-Villena ). Genes for the last of these enzymes (NADP-ME) are abundant in some C4 individuals (Fig. 5; Supplementary Table S5), and might be expressed only in specific conditions, as suggested previously (Frean ). C4 populations of A. semialata are also characterized by a set of specific anatomical modifications and changes in the cellular localization of some enzymes (Ueno and Sentoku, 2006; Lundgren , 2019; Dunning ). Gene expression changes responsible for these modifications would not necessarily be captured by our transcriptome analyses of full mature leaves, and the evolution of the C4 phenotype almost certainly involves more genetic changes than those detected here. While protein abundance is not a direct function of gene expression, the two are correlated (Schwanhäusser ; Csárdi ; Koussounadis ). In the case of A. semialata, the three C4 enzymes with genes differentially expressed in the C3+C4/C4 transcriptomes (PEPC, ASP-AT, and PCK) are also those with large differences in activities between the C3 and C4A. semialata in a previous study (Ueno and Sentoku, 2006). Transcriptome comparisons offer a first assessment of the changes underlying adaptive transitions, allowing subsequent investigations of responsible regulatory elements, post-transcriptional processes, changes of the protein kinetics, and verification of gene functions via genetic manipulation (e.g. Wang ; Borba ). Overall, our comparative transcriptomics show that, once the required enablers are present, the transition between C3 and C3+C4 with some C4 activity, and between C3+C4 and a rudimentary C4 metabolism might have required fewer changes in gene expression in A. semialata than previously suggested based on other comparisons (Bräutigam , 2014; Gowik ; Külahoglu ; Li ). These changes were spread between the C3/C3+C4 and C3+C4/C4 transitions, supporting a stepwise model of evolution (Mallmann ), where evolutionarily stable adaptive peaks can be reached with few mutations.

Adaptation continued after the emergence of a rudimentary C4 pathway

The CO2 pump generated by the C4 cycle of A. semialata is less efficient than that of other C4 species (Niklaus and Kelly, 2019), as illustrated by the incomplete segregation of enzymes between different cell types (Ueno and Sentoku, 2006) and slightly elevated CO2 compensation points lying at the upper limit of those observed in C4 species (Lundgren ). Therefore, A. semialata may be considered to exhibit an incipient C4 cycle, which has not been optimized through protracted evolutionary periods, as suggested in the most recent models (Bräutigam and Gowik, 2016). The analyses conducted here, which compared all C4 individuals with the C3+C4 or C3 conspecifics, can detect the changes that happened in the early C4 members of the group, before the diversification of the C4 genotypes. However, transcriptome comparisons across C4 individuals of A. semialata show evidence of additional alterations of the leaf biochemistry subsequent to the initial emergence of a C4 cycle, with the abundance of some C4-related enzymes varying across C4 populations (e.g. NAD-MDH) and photorespiratory proteins down-regulated in only some of the C4 populations (Supplementary Tables S5, S6). These changes are likely to represent the adaptation of the C4 cycle after its initial emergence (Heyduk ; Niklaus and Kelly, 2019), previously illustrated for A. semialata by variation in the identity of genes responsible for an abundance of the key C4 enzyme PEPC across C4 genotypes (Dunning ) and leaf anatomy (Lundgren ), and recently reported for Gynandropsis gynandra (Reeves ). The C4 pathway proposed for A. semialata, based on the up-regulation of four core C4 enzymes in addition to those present in C3 ancestors (Fig. 6), might serve as an intermediate stage toward more complex and more efficient C4 cycles. The congeneric C4A. cimicina and A. angusta have transcriptomes more typical of other C4 species, with very high levels of numerous C4-related enzymes, including a number of regulatory proteins and metabolite transporters (Supplementary Table S5), as would be predicted from other study systems, and an abundance of amino acid transitions adapting the proteins for the new catalytic context (Bräutigam , 2014; Gowik ; Mallmann ; Christin ; Dunning ). These two species might have undergone more adaptive changes, due to an earlier C4 origin or faster evolutionary rate. As illustrated by the additional C4-related genes up-regulated in the C4 plants from the Philippines, the rudimentary C4 trait of A. semialata is likely to undergo similar secondary adaptations over evolutionary time.

Conclusions

In this study, the transcriptomes of individuals from the grass A. semialata are analysed in a phylogenetic context to show that the changes in gene expression required for a physiological innovation can be spread over time. The relatively few changes required for the initial emergence of a metabolic pathway contrasted with the numerous modifications involved in the adaptation of this new pathway. Indeed, the emergence of a weak C4 cycle in our study system was accompanied by the up-regulation of three enzymes with a known C4 function and 55 others proteins. The evolution of a stronger C4 cycle then involved the up-regulation of one other C4 enzyme and 14 other proteins. However, adaptation of C4 photosynthesis, illustrated here by population-specific expression of C4-specific enzymes, continues when the plants are already in a C4 state. The evolutionary modifications required to generate a rudimentary C4 pathway can therefore be modest in species possessing C4 enablers, but even a suboptimal C4 pathway is important because it changes the environmental responses of the species. This creates an opportunity for natural selection to act on the standing variation, new mutations, and, in some cases, laterally acquired genes, to assemble a trait of increasing complexity, allowing the colonization and gradual dominance in a larger spectrum of ecological conditions.

Data deposition

All raw DNA sequencing data (Illumina reads) and transcriptome assemblies generated as part of this study have been deposited with NCBI under Bioproject PRJNA401220.

Supplementary data

Supplementary data are available at JXB online. Table S1. List of enzymes considered as core C4 enzymes. Table S2. Information for populations sampled in triplicate. Table S3. RNA-Seq data and mapping statistics for 10 populations with triplicates. Table S4. Pairwise differential expression test results for all genes. Table S5. Leaf abundance, annotation, and summary of significance for all genes. Table S6. Summary of differentially expressed genes referred to in Fig. 1. Table S7. Summary of differentially expressed genes referred to in Supplementary Fig. S1A. Table S8. Summary of differentially expressed genes referred to in Supplementary Fig. S1B. Fig. S1. Phylogenetic patterns of changes in gene expression in (A) Alloteropsis angusta, and (B) Alloteropsis cimicina. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file.

Author contributions

LTD, JJMV, AB, CPO, and PAC designed the research; LTD, MRL, JD, PS, CA, FN, JKO, AM, IMA, CJK, LAD, FK, MA, DY, GB, WPQ, CPO, and PAC identified and collected plant material; LTD and JJMV generated and analysed the transcriptome data, with the help of AB and PAC; LTD, JJMV, and PAC wrote the paper with the help of all co-authors.
  83 in total

1.  Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis.

Authors:  J Castresana
Journal:  Mol Biol Evol       Date:  2000-04       Impact factor: 16.240

2.  T-Coffee: A novel method for fast and accurate multiple sequence alignment.

Authors:  C Notredame; D G Higgins; J Heringa
Journal:  J Mol Biol       Date:  2000-09-08       Impact factor: 5.469

3.  The evolutionary origin of complex features.

Authors:  Richard E Lenski; Charles Ofria; Robert T Pennock; Christoph Adami
Journal:  Nature       Date:  2003-05-08       Impact factor: 49.962

4.  Darwinian evolution can follow only very few mutational paths to fitter proteins.

Authors:  Daniel M Weinreich; Nigel F Delaney; Mark A Depristo; Daniel L Hartl
Journal:  Science       Date:  2006-04-07       Impact factor: 47.728

5.  Evolution of C4 phosphoenolpyruvate carboxylase in Flaveria, a conserved serine residue in the carboxyl-terminal part of the enzyme is a major determinant for C4-specific characteristics.

Authors:  O E Bläsing; P Westhoff; P Svensson
Journal:  J Biol Chem       Date:  2000-09-08       Impact factor: 5.157

6.  Photosynthetic and photorespiratory characteristics of flaveria species.

Authors:  M S Ku; J Wu; Z Dai; R A Scott; C Chu; G E Edwards
Journal:  Plant Physiol       Date:  1991-06       Impact factor: 8.340

7.  Photosynthetic Characteristics of C(3)-C(4) Intermediate Flaveria Species : I. Leaf Anatomy, Photosynthetic Responses to O(2) and CO(2), and Activities of Key Enzymes in the C(3) and C(4) Pathways.

Authors:  M S Ku; R K Monson; R O Littlejohn; H Nakamoto; D B Fisher; G E Edwards
Journal:  Plant Physiol       Date:  1983-04       Impact factor: 8.340

8.  Comparison of leaf structure and photosynthetic characteristics of C3 and C4 Alloteropsis semialata subspecies.

Authors:  O Ueno; N Sentoku
Journal:  Plant Cell Environ       Date:  2006-02       Impact factor: 7.228

9.  OrfPredictor: predicting protein-coding regions in EST-derived sequences.

Authors:  Xiang Jia Min; Gregory Butler; Reginald Storms; Adrian Tsang
Journal:  Nucleic Acids Res       Date:  2005-07-01       Impact factor: 16.971

10.  HaMStR: profile hidden markov model based search for orthologs in ESTs.

Authors:  Ingo Ebersberger; Sascha Strauss; Arndt von Haeseler
Journal:  BMC Evol Biol       Date:  2009-07-08       Impact factor: 3.260

View more
  9 in total

1.  Lateral Gene Transfer Acts As an Evolutionary Shortcut to Efficient C4 Biochemistry.

Authors:  Chatchawal Phansopa; Luke T Dunning; James D Reid; Pascal-Antoine Christin
Journal:  Mol Biol Evol       Date:  2020-11-01       Impact factor: 16.240

2.  Kinetic Modifications of C4 PEPC Are Qualitatively Convergent, but Larger in Panicum Than in Flaveria.

Authors:  Nicholas R Moody; Pascal-Antoine Christin; James D Reid
Journal:  Front Plant Sci       Date:  2020-07-03       Impact factor: 5.753

Review 3.  Russ Monson and the evolution of C4 photosynthesis.

Authors:  Rowan F Sage
Journal:  Oecologia       Date:  2021-03-04       Impact factor: 3.225

4.  The Evolutionary Origin of C4 Photosynthesis in the Grass Subtribe Neurachninae.

Authors:  Roxana Khoshravesh; Matt Stata; Florian A Busch; Montserrat Saladié; Joanne M Castelli; Nicole Dakin; Paul W Hattersley; Terry D Macfarlane; Rowan F Sage; Martha Ludwig; Tammy L Sage
Journal:  Plant Physiol       Date:  2019-10-14       Impact factor: 8.340

Review 5.  Evolution of an intermediate C4 photosynthesis in the non-foliar tissues of the Poaceae.

Authors:  Parimalan Rangan; Dhammaprakash P Wankhede; Rajkumar Subramani; Viswanathan Chinnusamy; Surendra K Malik; Mirza Jaynul Baig; Kuldeep Singh; Robert Henry
Journal:  Photosynth Res       Date:  2022-06-01       Impact factor: 3.429

6.  Upregulation of C4 characteristics does not consistently improve photosynthetic performance in intraspecific hybrids of a grass.

Authors:  Matheus E Bianconi; Graciela Sotelo; Emma V Curran; Vanja Milenkovic; Emanuela Samaritani; Luke T Dunning; Lígia T Bertolino; Colin P Osborne; Pascal-Antoine Christin
Journal:  Plant Cell Environ       Date:  2022-03-10       Impact factor: 7.947

7.  Ribosome profiling elucidates differential gene expression in bundle sheath and mesophyll cells in maize.

Authors:  Prakitchai Chotewutmontri; Alice Barkan
Journal:  Plant Physiol       Date:  2021-09-04       Impact factor: 8.005

8.  Why is C4 photosynthesis so rare in trees?

Authors:  Sophie N R Young; Lawren Sack; Margaret J Sporck-Koehler; Marjorie R Lundgren
Journal:  J Exp Bot       Date:  2020-08-06       Impact factor: 6.992

Review 9.  Convergent evolution of gene regulatory networks underlying plant adaptations to dry environments.

Authors:  Mariana A S Artur; Kaisa Kajala
Journal:  Plant Cell Environ       Date:  2021-07-12       Impact factor: 7.228

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.