Literature DB >> 28859145

Positively selected amino acid replacements within the RuBisCO enzyme of oak trees are associated with ecological adaptations.

Carmen Hermida-Carrera¹, Mario A Fares^2,3, Ángel Fernández⁴, Eustaquio Gil-Pelegrín⁴, Maxim V Kapralov⁵, Arnau Mir⁶, Arántzazu Molins¹, José Javier Peguero-Pina⁴, Jairo Rocha⁶, Domingo Sancho-Knapik⁴, Jeroni Galmés¹.

Abstract

Phylogenetic analysis by maximum likelihood (PAML) has become the standard approach to study positive selection at the molecular level, but other methods may provide complementary ways to identify amino acid replacements associated with particular conditions. Here, we compare results of the decision tree (DT) model method with ones of PAML using the key photosynthetic enzyme RuBisCO as a model system to study molecular adaptation to particular ecological conditions in oaks (Quercus). We sequenced the chloroplast rbcL gene encoding RuBisCO large subunit in 158 Quercus species, covering about a third of the global genus diversity. It has been hypothesized that RuBisCO has evolved differentially depending on the environmental conditions and leaf traits governing internal gas diffusion patterns. Here, we show, using PAML, that amino acid replacements at the residue positions 95, 145, 251, 262 and 328 of the RuBisCO large subunit have been the subject of positive selection along particular Quercus lineages associated with the leaf traits and climate characteristics. In parallel, the DT model identified amino acid replacements at sites 95, 219, 262 and 328 being associated with the leaf traits and climate characteristics, exhibiting partial overlap with the results obtained using PAML.

Entities: Chemical Disease Mutation Species

Mesh：

Substances：

Year: 2017 PMID： 28859145 PMCID： PMC5578625 DOI： 10.1371/journal.pone.0183970

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

RuBisCO is one of the best-studied enzymes and is often used as a model protein in evolutionary studies. During photosynthesis, RuBisCO binds CO2 to the Calvin cycle intermediate ribulose-1,5-bisphosphate (RuBP), thereby acting as the essential entry point for carbon into the biosphere. Due to its imperfect ability to distinguish between CO2 and O2, RuBisCO also catalyzes the oxygenation of RuBP, giving rise to the energy-dissipating process of photorespiration. Compared to other catalysts, RuBisCO is a sluggish enzyme, with a catalytic turnover rate (kcatc) of about 3 s−1 in terrestrial plants [1]. Alongside these catalytic imperfections and its large molecular weight, RuBisCO also represents a significant nitrogen investment, typically accounting for 25–30% of the leaf total nitrogen in C3 plants [2]. The photosynthetic process adapts to abiotic stress, such as high temperature or water deficit [3, 4, 5], by optimizing leaf conductance (stomatal and mesophyll) governing CO2 diffusion [6] and by adjustments in the activity and concentration of RuBisCO and other rate limiting enzymes [7, 8, 5]. Temperature and CO2 concentration at RuBisCO active sites are the main driving forces of RuBisCO evolution and adaptation [9, 10, 11, 12, 13, 14, 15]. Computational analysis of carbon uptake at the leaf [16] and canopy level [17] also suggests that optimization of RuBisCO kinetics in modern C3 plants depends on the temperature regime and CO2 concentration. Therefore, plants from dry environments and plants with high leaf mass per area have the lowest CO2 diffusion, and tend to have higher RuBisCO affinity for CO2 [12, 18]. By contrast, plants possessing the C4 carbon concentration mechanism have faster, but less CO2 specific RuBisCO [19, 20, 21, 22, 23, 24]. High temperatures decrease the ratio of CO2/O2 dissolved in the leaf liquid media, and directly decreases the affinity of RuBisCO for CO2 [25]. Accordingly, adaptation to higher temperatures can be achieved by a greater specificity of RuBisCO for CO2 (Sc/o), thereby reducing the loss of carbon due to photorespiration. Selection pressure on RuBisCO with increased Sc/o in hot environments has been demonstrated in some thermophilic red algae [26] and in terrestrial plants [12]. Because of the trade-off between RuBisCO affinity for CO2 and maximum carboxylase activity (kcatc), the selection for increased affinity for CO2 would inevitably take place at the expense of decreased kcatc [13, 14]. Such fine-tuning of RuBisCO kinetic traits is attributed to environmentally driven changes at the molecular level, most likely amino acid replacements within the catalytic large subunit. In higher plants and green algae, the structure of RuBisCO consists of eight chloroplast-encoded large (L, 50–55 kDa) and eight nucleus-encoded small (S, 12–18 kDa) subunits assembled into a hexadecamer [27]. Large subunits possess the active site and therefore primarily determine RuBisCO kinetic traits [28], although recent studies demonstrate that S-subunits can also influence catalysis [29, 30, 31]. Directed mutagenesis and a variety of recombinant RuBisCOs from plastome-transformed plants allowed identifying molecular changes in L-subunit that translate into changes in RuBisCO catalysis, as well as determining how they affected photosynthesis and plant growth [32, 33, 34, 35, 36, 37]. Recent studies have demonstrated the relationship between amino acid polymorphism in the L-subunit of RuBisCO and catalytic efficiency in natural vegetation and crops, by comparing both distant phylogenetic lineages [18, 38, 39] and closely related species [15, 40] of land plants. Studies comparing the rates of non-synonymous and synonymous substitutions along phylogenies have demonstrated that positive Darwinian selection is acting on RuBisCO within most lineages of plants, but is restricted to a relatively small number of residues [41, 42, 43, 44, 45, 46, 47, 48, 49]. Results derived from analyses of RuBisCO molecular adaptation complement trends in RuBisCO kinetics and confirm the predominant role of some environmental and physiological factors driving RuBisCO evolution. For example, signatures of positive selection are associated with changes in intracellular concentrations of CO2 driven by carbon-concentrating mechanisms, both in algae and terrestrial C4 plants [43, 48, 49, 50]. Mapping positively selected residues within the protein structure helps to locate catalytically important regions of RuBisCO, and suggests candidate amino acid replacements which could be implemented to optimize RuBisCO performance in crops [42, 48, 49]. However, the effect of an amino acid replacement on protein properties could vary in the presence of other mutations, either individually or together, because of the molecular sign epistasis among mutations [51]. These epistatic interactions impose strong selective constraints on amino acid replacements and also may explain the failure of most attempts to improve RuBisCO catalysis by single point mutations [34, 52]. In agreement with this prediction, positive selection analysis must also account for co-adaptive amino acid replacements through the identification of coevolutionary signatures to find how key residue changes affect RuBisCO structure and function. Coevolutionary studies have been applied to various proteins [53, 54, 55], but only recently to RuBisCO [47, 56]. It has been shown that coevolution of residues is common in RuBisCO of land plants and there is an overlap between coevolving and positively selected residues [56]. Evolutionary analyses are needed to identify adaptive changes in the Rubisco sequence, but the drivers of such evolution must also be investigated. In this paper, we used a predictive model called decision tree (DT), which is able to statistically associate a combination of environmental variables to variation in the amino acid residues. A DT can be used for both classification (classification tree) and regression (regression tree) tasks. We used this model for classification tasks, which are frequently employed in applied fields such as engineering and medicine [57, 58, 59, 60, 61]. A DT implicitly performs feature selection and requires relatively little effort for data preparation. The analysis is straightforward, the results shown graphically and they can be easily interpreted. The objective of the study was to investigate molecular adaptation of Quercus RuBisCO to particular ecological conditions and to test if leaf morphological traits are associated with adaptive amino acid substitutions. To achieve this, we compared two different methodologies: the DT model and phylogenetic analysis by maximum likelihood. We selected oak (Quercus) species as a model group for this study because this genus contains a large number of species (ca. 500) inhabiting a wide range of environments. Both evergreen and deciduous oak species have contrasting leaf morphology [62], and therefore variable diffusive limitations to CO2 transfer from the atmosphere to the site of carboxylation [63]. Finally, oaks are often an ecosystem-defining species in most broad-leaved forests worldwide making them an ecologically important group.

Materials and methods

Taxon selection and sampling

A total of 174 species in Fagales were selected for the study (S1 Table). These species belong to the Fagaceae (n = 170) and Nothofagaceae (n = 4). Within Fagaceae, the majority of the species belong to Quercus (n = 158; ca. 30% of the total number of Quercus species). Each species was classified according to its geographic distribution, prevalent climate and leaf habit (S1 Table). The geographic distribution area of each species was assigned according to Govaerts et al. (1998) [64] and information found in publicly available databases [65, 66, 67, 68]. The prevalent climate was obtained by overlapping the species geographical distribution in our study and the Köppen-Geiger world map of climate classification [69]. To simplify the analysis, fifteen Köppen-Geiger climate types were grouped into six: 1) tropical (including climates Af, Am and Aw according to Köppen-Geiger classification); 2) arid steppe (Bsh and Bsk); 3) temperate with dry winter and hot or warm summer (Cwa and Cwb); 4) temperate with dry summer and hot or warm summer (Csa and Csb); 5) temperate or cold without dry season and hot or warm summer (Cfa, Cfb, Dfa and Dfb) and 6) cold with dry summer and hot or warm summer (Dsa, andDsb) (S2 Table). Regarding the leaf habit, species were classified as evergreens (those species retaining their leaves during the whole year), deciduous (when losing all leaves during the unfavourable season) and semi-evergreen (those species that lose some leaves during the unfavourable season, depending on its length and severity). Leaves from all species were sampled from living collections of Jardín Botánico de Iturrarán (Parque Natural de Pagoeta, Aya, Guipúzcoa, Spain), with the exceptions of Q. palmeri, Q. baloot and Q. vaccinifolia, which were collected from The Cheviton Barton collection (Bevon, UK). For each species, leaf density was calculated from leaf thickness and leaf mass area (LMA) measurements performed on fully expanded leaves that developed in the external part of the tree canopy (i.e., exposed to full solar irradiation). The leaf thickness of each species was measured on two discs (disc area = 0.33 cm2) per leaf from five fully hydrated leaves, collected from three to five different individuals. The leaf thickness was measured using a digital contact sensor GTH10L coupled to an amplifier GT-75AP (GT Series, Keyence Corporation, Japan) [70]. Afterwards, LMA of each disc was calculated as the ratio between the dry weight and the area. The dry weight was obtained after drying the leaf discs in a ventilated oven at 60°C until constant weight (typically after 2 days).

DNA sequencing

Total genomic DNA was extracted from leaf material using the DNeasy Plant Mini Kit (Qiagen Ltd., Crawley, UK) according to the manufacturer’s protocol. We sequenced chloroplast genes rbcL and matK [71, 72]. To obtain the full rbcL sequence (1428 nucleotides), the gene was amplified using primers esp2F (5´-AATTCATGAGTTGTAGGGAGGGACTT-3´) and 1494R (5´-GATTGGGCCGAGTTTAATTTAC-3´). The matK gene was amplified in 43 species using the primer X390_F (5´- CGATCTATTCATTCAATATTTC-3´) and Xmatk9_R (5´-CAATCATTCGTGATTGGCCAG -3´). For 42 species, we obtained the nuclear microsatellite loci (SSRs) from [73] (QmC00716, QmC01095, QmC01990, QmC02241) and from [74] (ssrQpZAG15, ssrQpZAG46, ssrQpZAG110, ssrQrZAG-7, ssrQrZAG-20). All PCR reactions were performed using the BioMix Red reagent mix (Bioline Ltd., London, UK). The PCR program for the amplification of the rbcL comprised an initial denaturation at 95°C, 2 min, and 36 cycles of 93°C for 30 s, 53°C for 30 s and 72°C for 3.5 min, and a final extension at 72°C for 30 min. The PCR program for the amplification of the matK gene comprised an initial denaturation at 95°C for 2 min, followed by 35 cycles of 30 s at 94°C (denaturing), 45 s at the annealing temperature of 56°C, 2 min at 72°C (extension), and a final extension phase of 7 min at 72°C. The microsatellites were amplified using the following PCR conditions: 95°C for 2 min, and 35 cycles of 95°C for 30 s, 50°C for 30 s and 72°C for 2 min, and a final extension at 72°C for 5 min. The rbcL and matK PCR products were separated on 2% agarose gels buffered with 1X TAE and purified using Roche High Pure PCR Product Purification Kit (Roche Diagnostics Corporation P.O., Indiana, USA). Chloroplast gene sequencing was performed using an ABI 3130 Genetic analyzer with the ABI BigDyeTM Terminator Cycle Sequencing Ready Reaction Kit (Applied Biosystems, Foster City, California, USA). For microsatellites, we used the ABI 3130 XL Genetic analyzer and fragment analysis was performed using the GeneMapper software v4.1 (Applied Biosystems). The DNA sequences from the chloroplast markers were aligned using Clustal X [75] and manually adjusted with Bioedit v.7.2.5 [76]. All variable sites were checked against the original sequence chromatograms, and doubtful regions were sequenced again. All newly generated sequences were submitted to the GenBank (S3 Table).

Phylogenetic analyses

We inferred the phylogenetic relationships from the nucleotide data using Bayesian inference (BI). We constructed a phylogeny using rbcL sequences from the 158 Quercus species (denoted Quercus large dataset) (Fig 1). The tree topology was not fully resolved for this group when using only one gene. Because we require a robust phylogeny to detect adaptive evolution by maximum likelihood, we chose a subset of species to construct a multilocus tree with better-resolved topology. The tree was constructed with a concatenated alignment of 45 rbcL, 43 matK and 42 SSRs for Quercus species (denoted Quercus small dataset) (Fig 2). Tree topologies using rbcL were congruent with those based on the use of multiple genes, with both leading to similar lists of amino acid sites detected to have evolved under positive selection. Finally, we constructed the phylogeny for all 174 species containing the Fagaceae and Nothofagaceae species (denoted Fagales henceforward) (Fig 3).