| Literature DB >> 28859145 |
Carmen Hermida-Carrera1, Mario A Fares2,3, Ángel Fernández4, Eustaquio Gil-Pelegrín4, Maxim V Kapralov5, Arnau Mir6, Arántzazu Molins1, José Javier Peguero-Pina4, Jairo Rocha6, Domingo Sancho-Knapik4, Jeroni Galmés1.
Abstract
Phylogenetic analysis by maximum likelihood (PAML) has become the standard approach to study positive selection at the molecular level, but other methods may provide complementary ways to identify amino acid replacements associated with particular conditions. Here, we compare results of the decision tree (DT) model method with ones of PAML using the key photosynthetic enzyme RuBisCO as a model system to study molecular adaptation to particular ecological conditions in oaks (Quercus). We sequenced the chloroplast rbcL gene encoding RuBisCO large subunit in 158 Quercus species, covering about a third of the global genus diversity. It has been hypothesized that RuBisCO has evolved differentially depending on the environmental conditions and leaf traits governing internal gas diffusion patterns. Here, we show, using PAML, that amino acid replacements at the residue positions 95, 145, 251, 262 and 328 of the RuBisCO large subunit have been the subject of positive selection along particular Quercus lineages associated with the leaf traits and climate characteristics. In parallel, the DT model identified amino acid replacements at sites 95, 219, 262 and 328 being associated with the leaf traits and climate characteristics, exhibiting partial overlap with the results obtained using PAML.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28859145 PMCID: PMC5578625 DOI: 10.1371/journal.pone.0183970
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Quercus large dataset Bayesian phylogram based on 158 rbcL sequences.
Numbers above branches correspond to Bayesian posterior probabilities. The figure was edited using FigTree Version 1.4.0 [77].
Fig 2Quercus small dataset Bayesian phylogeny based on 45 sequences of rbcL, 43 matK and 42 microsatellites.
Numbers above branches correspond to Bayesian posterior probabilities. The figure was edited using FigTree Version 1.4.0 [77].
Fig 3Fagales Bayesian phylogram based on 174 rbcL sequences.
Numbers above branches correspond to Bayesian posterior probabilities. The figure was edited using FigTree Version 1.4.0 [77].
RuBisCO L-subunit sites subject to positive selection.
| Dataset | N | Site models | M2a | Site model | M8 | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| M0 | M2a | M8 | ||||||||
| Selected sites | Selected sites | |||||||||
| 47 | 0.17 | 0.011 | 14.71 | 95**, 219**, 262*, 328** | 0.000 | 0.011 | 14.71 | 95**, 219**, 262**, 328** | 0.000 | |
| 158 | 0.18 | 0.013 | 13.77 | 95**, 219**, 328** | 0.000 | 0.017 | 11.32 | 95**, 219**, 251*, 328**, 475* | 0.000 | |
| 174 | 0.16 | 0.009 | 17.05 | 95**, 219**, 262**, 328** | 0.000 | 0.009 | 17.54 | 95**, 145*, 219**, 262**, 328** | 0.000 | |
Likelihood ratio tests (LRTs) were calculated between nested models of codon evolution M1a-M2a and M8-M8a.
a Number of species.
b dN/dS ratio averaged across all branches and codons.
c Proportion of codons in a class under positive selection.
d dN/dS ratio in a class under positive selection.
e Sites marked with * and ** are under positive selection with posterior probability higher than 0.95 and 0.99, respectively.
Results of branch site model A in Fagales dataset (174 species).
| Parameters | N | Branch site model | MA | ||
|---|---|---|---|---|---|
| MA | |||||
| Selected sites | |||||
| High leaf density (>900 kg m−3) | 48 | 0.002 | 94.3 | 328** | 1 |
| Low leaf density (<600 kg m−3) | 67 | 0.000 | 999.0 | - | 1 |
| Evergreen | 75 | 0.00023 | 999.0 | 95*** 262*** | 0.000 |
| Semi-evergreen | 17 | 0.000 | 4.47 | - | 1 |
| Deciduous | 82 | 0.005 | 39.7 | 251**, 262***, 328*** | 0.000 |
| Climate 1 | 15 | 0.028 | 999.0 | - | 0.000 |
| Climate 3 | 41 | 0.000 | 999.0 | - | 1 |
| Climate 4 | 35 | 0.000 | 999.0 | 262** | 1 |
| Climate 5 | 69 | 0.004 | 22.03 | 145*, 262***, 328*** | 0.000 |
Likelihood ratio tests (LRTs) were performed to compare the null model A1 (that assumes the same selective pressure along all branches of a phylogeny) with the nested model A (that aims to detect positive selection along particular lineages, forward branches).
a According to information in S1 Table.
b Number of species labelled as forward branches.
c Proportion of codons in a class under positive selection.
d dN/dS ratio in a class under positive selection.
e Sites marked with *, ** and *** are under positive selection with posterior probability higher than 0.90, 0.95 and 0.99, respectively.
RuBisCO L-subunit amino acid replacements in Fagales (174 species) identified under positive selection by the Bayes Empirical Bayes (BEB) analysis implemented in the PAML package [83, 90] along branches leading to species with particular leaf or habitat trait.
| Species | Amino acid changes | Location of residue | Interactionsb |
|---|---|---|---|
| Branches leading to evergreen species | |||
| 95 | Asn | Loop between βC-βD strand | RuBisCO activase |
| Branches leading to deciduous species | |||
| 251 | Ile | Helix α3 | |
| 262 | Ala | Loop 3 | S-subunit |
| 328 | Ala | Loop 6 | Active site |
| Branches leading to species inhabiting climate 5 | |||
| 145 | Ser | Helix αD | Hidrophobic core between N-terminal and C-terminal domains |
| 262 | Ala | Loop 3 | S-subunit |
| 328 | Ala | Loop 6 | Active site |
Fig 4Coevolving sites in the RuBisCO L-subunit of Fagales dataset.
Location of amino acids implicated in co-evolutionary dependencies.
Variable sites resolved with the DT model for the Quercus large dataset (158 species).
| Variable site | xerror | Relative importance of external variables | |||
|---|---|---|---|---|---|
| Geographic area | Climate | Leaf habit | Leaf density | ||
| 95 | 0.89 | 25 | 20 | 35 | 20 |
| 219 | 0.39 | 66 | 26 | n.a. | 8 |
| 262 | 0.44 | 41 | 14 | 27 | 18 |
| 328 | 0.59 | 28 | 32 | 27 | 13 |
The xerror corresponding to the best DT found for each variable site, and relative importance (%) of the external variables (geographic area, climate and leaf habit and density) calculated for each resolved sites are shown. The lower the xerror, the higher the relationship between the external variables and the variable site. The external variable with the higher relative importance is the most important external variable explaining the variability in the site. n.a. denotes not selected external variable.
Fig 5Decision trees (DT) resolved for each RuBisCO L-subunit variable site in Quercus large dataset (158 species) (see S1 Table for details on the external variables: Geographic distribution, climate and leaf habit and density).
Numbers above each tree correspond to the RuBisCO L-subunit variable site according to the spinach sequence (AJ400848.1). First level presents the proportion of amino acids in each variable site (brackets). The external variable that allows the best separation of species is shown over the line. The second level presents the distribution of amino acids (in brackets) after the first split. Subsequent divisions are performed until the lowest xerror for the entire DT is obtained (symbolized as squares). Taking as an example the RuBisCO L-subunit variable site 95, the first level shows the separation of the 158 species between those that present N (121) and those that present S (37). Over the line, leaf habit is indicated as the external variable that gives the best split among the four external variables, with evergreen and semi-evergreen species having a proportion of N/S of 79/8. On the other hand, deciduous species present a proportion of N/S of 42/29. The latter group is further split using geographic area as the best external variable into a group of species from North and Central America having a N/S proportion of 29/9, and a group of species from Eurasia and Asia having a proportion of 13/20. The relative importance of each external variable is shown in Table 4.