Literature DB >> 26500445

The best of both worlds: Phylogenetic eigenvector regression and mapping.

José Alexandre Felizola Diniz1, Fabricio Villalobos1, Luis Mauricio Bini1.   

Abstract

Eigenfunction analyses have been widely used to model patterns of autocorrelation in time, space and phylogeny. In a phylogenetic context, Diniz-Filho et al. (1998) proposed what they called Phylogenetic Eigenvector Regression (PVR), in which pairwise phylogenetic distances among species are submitted to a Principal Coordinate Analysis, and eigenvectors are then used as explanatory variables in regression, correlation or ANOVAs. More recently, a new approach called Phylogenetic Eigenvector Mapping (PEM) was proposed, with the main advantage of explicitly incorporating a model-based warping in phylogenetic distance in which an Ornstein-Uhlenbeck (O-U) process is fitted to data before eigenvector extraction. Here we compared PVR and PEM in respect to estimated phylogenetic signal, correlated evolution under alternative evolutionary models and phylogenetic imputation, using simulated data. Despite similarity between the two approaches, PEM has a slightly higher prediction ability and is more general than the original PVR. Even so, in a conceptual sense, PEM may provide a technique in the best of both worlds, combining the flexibility of data-driven and empirical eigenfunction analyses and the sounding insights provided by evolutionary models well known in comparative analyses.

Entities:  

Keywords:  evolutionary models; phylogenetic comparative methods; phylogenetic imputation; phylogenetic signal

Year:  2015        PMID: 26500445      PMCID: PMC4612606          DOI: 10.1590/S1415-475738320140391

Source DB:  PubMed          Journal:  Genet Mol Biol        ISSN: 1415-4757            Impact factor:   1.771


Eigenfunction analyses have been widely used to model patterns of autocorrelation in time, space and phylogeny (Peres-Neto, 2006; Dray ; Kuhn ; Safi and Pettorelli, 2010; Peres-Neto and Legendre, 2010; Peres-Neto ). In general, these analyses start with a Principal Coordinate Analysis (PCoA; see Legendre and Legendre, 2012) of pairwise distance or connectivity matrices between observations (e.g., species in the case of phylogenetic analysis). After, selected eigenvectors from PCoA are used to detect the magnitude of (temporal, spatial or phylogenetic) patterns in data, both in univariate and multivariate domains. In a phylogenetic context, Diniz-Filho proposed what they called Phylogenetic Eigenvector Regression (PVR), in which eigenvectors are used as explanatory variables in a multiple regression to model trait evolution. The coefficient of determination (R 2) of PVR was interpreted as the amount of phylogenetic signal (see Diniz-Filho ,b,c; Münkemüller ). The method was later expanded to estimate a phylogenetically corrected correlation and variance partitioning modeling (Martins ; Desdevises ). However, PVR was criticized by Rohlf (2001), who showed that if not all phylogenetic eigenvectors are used to model the trait, then there would be a missing part of the phylogeny in the model and, as a consequence, the estimated phylogenetic signal for a trait evolving under Brownian motion would be underestimated. Also, correlation between traits, after accounting for phylogenetic relationships given by the eigenvectors, would be biased and possess and inflated Type I error rates (as shown by Martins ). More recently, Freckleton et al. (2011; see also Martins ; Laurin, 2010) also criticized the statistical performance of PVR and favored the Phylogenetic Generalized Least-Squares (PGLS) because this last one is formally based on evolutionary models (e.g. Brownian motion, Ornstein-Uhlenbeck (O-U)), whereas PVR is a purely statistical, data-driven approach. Diniz-Filho recently expanded the PVR method by relating the coefficients of determination (R 2)of successive PVR models (i.e., with the consecutive addition of phylogenetic eigenvectors as explanatory variables of a trait) to the cumulative eigenvalues of the eigenvectors used in the models. This approach to explore phylogenetically structured patterns of trait variation was called Phylogenetic Signal-Representation (PSR) curve. They showed that different models of trait evolution generate different patterns of relationship between the coefficients of determination (R 2) and the cumulative eigenvalues (i.e., different PSR curves). For instance, under a Brownian motion model of trait evolution, the PSR curve is linear, so the R 2estimated will depend on which eigenvectors are used [supporting Rohlf’s issue (Rohlf, (2001) that missing even a few eigenvectors with very small eigenvalues will cause an inflation in Type I error]. However, if evolution is not Brownian and the PSR is not linear, some of the eigenvectors can describe trait evolution and will be useful for modeling purposes. Guénard recently proposed a new phylogenetic eigenfunction analysis called Phylogenetic Eigenvector Mapping (PEM, akin to Moran’s Eigenvector Mapping – MEM – proposed for spatial analyses – see Dray ; Griffith and Peres-Neto, 2006; Peres-Neto and Legendre, 2010). They proposed PEM particularly to model and predict species traits from phylogenetic relatedness, recently called “phylogenetic imputation” (see also Penone ; Swenson, 2014). The main advance of PEM in respect to the original PVR is the explicit incorporation of a model-based approach in which an O-U process is fitted to data and used to warp the edge lengths of the phylogeny accordingly, before extracting eigenvectors using a PCoA. Our purpose here is to empirically compare the original PVR and the new PEM approach in respect to estimated phylogenetic signal and correlated evolution under alternative evolutionary models. We generated random phylogenies as coalescent (ultrametric) trees with the same numbers of tips (species) as those in Guénard : ranging from 50 to 400 (50, 100, 200, and 400). Subsequently, we simulated the evolution of two independent traits (Y 1 and Y 2) on these trees according to an O-U processes with four restraining forces: α = 0 (pure diffusion or Brownian Motion), α = 0.5 (weak selection), α = 1 (medium selection), and α = 10 (strong selection), creating more complex curvilinear evolutionary models (see Diniz-Filho ), and analogous to the simulations performed in Guénard . We used 100 simulations of each combination of number of species and α values. The phylogenies generated in the first step of our analyses were back transformed to distance matrices, which in turn were used to calculate the phylogenetic eigenvectors using either PVR (as revised by Diniz-Filho ) or PEM (as proposed by Guénard ). We compared PVR and PEM for different sample sizes and O-U models based on four criteria: similarity of eigenvectors (estimated by Procrustes analysis), estimates of phylogenetic signal (R 2 of models), correlation between model residuals (used to estimate correlated evolution), and phylogenetic imputation ability. Our first comparison between PVR and PEM consisted in evaluating the similarity between the eigenvectors (containing the scores of the n tips) generated by these methods. Thus, for each simulation, we used a Procrustes analysis (Legendre and Legendre, 2012) to measure the match between the configurations (“species ordinations” along the phylogenetic axes) generated by PVR and PEM considering increasingly number of eigenvectors. The m 2values (the badness-of-fit statistic that measures the level of congruence between two ordination configurations) were transformed to Procrustes correlation (r) by calculating the square root of their complements (Oksanen ). A high value of r, say ≥ 0.75, would indicate that the configurations generated by PVR and PEM, for a given eigenvector dimensionality, are strongly concordant. Second, we modeled traits Y 1 and Y 2, separately, as a function of the eigenvectors generated by PVR and PEM. For each method and before modeling, eigenvector selection was done with a forward stepwise procedure (Blanchet ). This step is necessary to circumvent the problem of a perfect fit when all eigenvectors are used (see Rohlf, 2001). The coefficients of determination of the regression of a trait (Y 1 or Y 2) on the selected eigenvectors, derived from PVR and PEM, were interpreted as the amounts of phylogenetic signal given by each method. We then compared these amounts (R 2 PVR and R 2 PEM) across the 100 simulations obtained under the different evolutionary models (i.e., O-U with different restraining forces). Third, we used the residuals derived from the PVR models or from the PEM models (see above) to estimate the partial correlation between the two traits after accounting for phylogenetic relatedness among species (see Martins ). A partial correlation estimated by each of these methods should then give the input correlation between the traits, defined as “the correlation of the bivariate normal distribution from which the evolutionary changes in the two traits were drawn” or a “measure of the nonhistorical correlation between the two characters, corrected for phylogenetic interdependences” (Martins, 1996). For each method, the correlation between the phylogenetically corrected traits or specific components (i.e., residuals) should not be significantly different from zero. Thus, type I error rates of correlation coefficients were estimated by the ratio between the number of coefficients that differed significantly from zero and the number of simulations (Martins, 1996; Martins ). Our last comparison between PVR and PEM was based on phylogenetic modeling and the prediction of unknown trait values for species, as proposed in Guénard . That is, we compared the ability of eigenvectors derived from PVR and PEM to predict unknown trait values for one or several species (‘target species’), based on their relative phylogenetic positions, in phylogenies for which trait values were already estimated for a reduced set of species (‘model species’) (Guénard ). Simply put, such prediction is based on a regression model built using the loadings of the ‘model species’ from the selected eigenvectors derived from PVR or PEM to estimate the trait values of the ‘target species’. We followed this procedure to predict trait values for each species at a time as if it were missing from the original set of species (for each combination of species numbers and restraining forces in our simulations). We removed one species (‘target species’) at a time from the original set of species and then calculated the scores of the remaining species (‘model species’) to use them in the regression model to estimate the trait value of the missing species. We then evaluated the predictive power of PVR and PEM and calculated the prediction coefficient proposed by Guénard , which can attain values of 1 when all predictions perfectly match the observations, or below 1 indicating imperfect predictions, and values close to 0 (positive or negative) when predictions are no better than expected by chance (see also Penone for a recent discussion on “imputation” methods). Phylogenies and simulations of trait evolution were, respectively, done using the functions rcoal and rTraitCont from the ape package (Analyses of Phylogenetics and Evolution; see Paradis, 2012). Eigenvectors from PVR and PEM were respectively extracted using the packages PVR (Santos ) and MPSEM (Modelling Phylogenetic Signals using Eigenvector Maps; Guénard and Legendre, 2013), available in the R environment for statistical computing (R Development Core Team, 2012). Our results show that, in general, PVR and PEM are very similar according to the four criteria we devised for comparison. The Procrustes correlations between the two sets of eigenvectors tend to decrease when more eigenvectors (i.e., with smallest eigenvalues) are used in the comparison, even under a Brownian motion model, where the parameter a of PEM (analogous to α) is set to zero (stochastic fluctuations revealing that PEM probably cause the deviations in the last eigenvectors) (Figure 1). Correlations between the first two eigenvectors derived from PVR and PEM are as high as 0.95, and decrease to no less than 0.5 when all eigenvectors are used in the Procrustes analysis. This is expected as PEM warps the edge lengths to take into account patterns of evolutionary deviations (even slightly) from Brownian motion. Thus, it will tend to give more weight to edges close to the tips, producing the differences between the eigenvectors of PVR and PEM with the smallest eigenvalues (see Diniz-Filho and Nabout, 2009; Diniz-Filho ). There is a small, albeit consistent, tendency that when PEM is fitting traits evolving under O-U processes with higher restraining forces, the correlation between the two sets of eigenvectors is slightly lower (Figure 1).
Figure 1

Procrustes correlation between eigenvectors from PEM and PVR, for a successive set of eigenvectors extracted from phylogenies of (A) 50 species, (B) 100 species, (C) 200 species, and (D) 400 species and traits evolving under alternative O-U models (open circles: α = 0; open squares: α = 0.5; open triangle point-up: α = 1; open triangle point-down: α = 10).

The average amounts of phylogenetic signal and Type I errors estimated by PVR (R 2 PVR) and PEM (R 2 PEM) were highly similar, independently of the sample size and evolutionary models (Table 1). Nevertheless, both methods were unable to provide a consistent type I error of 5% (average for PVR and PEM was around 9%, respectively, see Table 1) (see Rohlf, 2001; Martins ; Freckleton ), and these increase with sample sizes. On the other hand, the correlation across tips between the specific components of the simulated traits (residuals from PVR and PEM or the expected values of the traits that are independent of phylogenetic structure) was also high, but mainly when the restraining forces were low or equal to zero.
Table 1

Comparison of PEM and PVR for phylogenies with different sample sizes (n), simulating trait evolution with distinct restraining forces of an O-U process (α, in which α = 0 indicates Brownian motion). The comparison includes the phylogenetic signal estimated by the two methods (R 2), the correlation between model residuals (r), the prediction coefficient used in phylogenetic imputation and the Type I errors of correlated evolution.

n α R 2 PVR R 2 PEM r Prediction coefficientType I error


PVRPEMPVRPEM
5000.9820.9820.870.5160.5250.060.07
0.50.9770.9760.7590.4750.4810.110.07
10.9690.9690.860.1770.2950.040.1
100.8890.8890.530.1160.1960.090.08
10000.9830.9830.9330.5910.5860.070.09
0.50.9810.980.9080.2930.390.030.05
10.9790.9790.8940.290.4360.080.11
100.9550.9420.540.029−0.1030.040.06
20000.9770.9770.9540.830.8530.060.07
0.50.9740.9740.9420.7530.670.110.09
10.9720.9720.8960.6810.770.10.1
100.9550.9540.853−0.057−0.0230.080.08
40000.9750.9750.9480.9270.9070.190.19
0.50.9710.9710.9360.9260.9330.190.14
10.9650.9650.9460.8770.9150.180.17
100.9330.930.935−0.039−0.2150.150.15
Prediction coefficients were also highly congruent between PVR and PEM, with both coefficients having values of less than 1 in all cases and varying in the same manner with the different number of species (sample size) and restraining forces (evolutionary models) used in our simulations (Table 1). PEM values are slightly larger than those from PVR, and prediction coefficients from both PVR and PEM tend to increase with sample size within a single restraining force, and to decrease altogether with higher restraining forces. At low or no restraining forces (α = 0, α = 0.5), prediction coefficients of both methods were similarly high, indicating the higher phylogenetic signal produced by those evolutionary models. We understand that, despite similarity between the two approaches, PEM has a slightly higher prediction ability, especially when there is strong phylogenetic signal (low α values - see Table 1). Also, it is more general than the original PVR because it allows incorporating explicit evolutionary models. Thus, it may solve, perhaps with further improvements in the process of eigenvector selection, some of the problems raised by Freckleton in respect to poorer (in comparison with PGLS) statistical performance of PVR. Our results show that PEM, however, does not provide entirely accurate Type I errors under Brownian motion and so does not perform better than PGLS (according to the previous analyses from the literature; e.g., Freckelton et al. [2011]). This also reinforce the issues on phylogenetic eigenvectors theoretically pointed out by Rohlf (2001; see also Diniz-Filho ,b,c for the same argument in the context of PSR curve). However, notice that recent papers still support the use of phylogenetic eigenvector methods, such as PEM or PVR, in the context of “phylogenetic imputation” (see Guénard ; Swenson, 2014; Penone ) Despite the similarities between PEM and PVR, from a conceptual point of view we understand that PEM may provide an alternative to the original PVR method, being more effective in taking into account phylogenetic signal in trait evolution. This is because PEM may be viewed as technique in the best of both worlds, combining the flexibility of data-driven and empirical eigenfunction analyses (see Griffith and Peres-Neto, 2006) and the sounding insights provided by evolutionary models well known in comparative analyses.
  12 in total

1.  Comparative methods for the analysis of continuous variables: geometric interpretations.

Authors:  F J Rohlf
Journal:  Evolution       Date:  2001-11-11       Impact factor: 3.694

2.  Adaptive constraints and the phylogenetic comparative method: a computer simulation test.

Authors:  Emília P Martins; José Alexandre F Diniz-Filho; Elizabeth A Housworth
Journal:  Evolution       Date:  2002-01       Impact factor: 3.694

3.  Quantifying phylogenetically structured environmental variation.

Authors:  Yves Desdevises; Pierre Legendre; Lamia Azouzi; Serge Morand
Journal:  Evolution       Date:  2003-11       Impact factor: 3.694

4.  Exploring patterns of interspecific variation in quantitative traits using sequential phylogenetic eigenvector regressions.

Authors:  José Alexandre Felizola Diniz Filho; Thiago F Rangel; Thiago Santos; Luis Mauricio Bini
Journal:  Evolution       Date:  2011-11-27       Impact factor: 3.694

5.  Spatial modeling in ecology: the flexibility of eigenfunction spatial analyses.

Authors:  Daniel A Griffith; Pedro R Peres-Neto
Journal:  Ecology       Date:  2006-10       Impact factor: 5.499

6.  Forward selection of explanatory variables.

Authors:  F Guillaume Blanchet; Pierre Legendre; Daniel Borcard
Journal:  Ecology       Date:  2008-09       Impact factor: 5.499

7.  Comparative methods as a statistical fix: the dangers of ignoring an evolutionary model.

Authors:  Rob P Freckleton; Natalie Cooper; Walter Jetz
Journal:  Am Nat       Date:  2011-07       Impact factor: 3.926

8.  Assessment of the relative merits of a few methods to detect evolutionary trends.

Authors:  Michel Laurin
Journal:  Syst Biol       Date:  2010-10-11       Impact factor: 15.683

9.  A comparison of metrics for estimating phylogenetic signal under alternative evolutionary models.

Authors:  José Alexandre F Diniz-Filho; Thiago Santos; Thiago Fernando Rangel; Luis Mauricio Bini
Journal:  Genet Mol Biol       Date:  2012-08-02       Impact factor: 1.771

10.  Modeling body size evolution in Felidae under alternative phylogenetic hypotheses.

Authors:  José Alexandre Felizola Diniz-Filho; João Carlos Nabout
Journal:  Genet Mol Biol       Date:  2009-01-10       Impact factor: 1.771

View more
  1 in total

1.  The Evolution of Ecological Diversity in Acidobacteria.

Authors:  Johannes Sikorski; Vanessa Baumgartner; Klaus Birkhofer; Runa S Boeddinghaus; Boyke Bunk; Markus Fischer; Bärbel U Fösel; Michael W Friedrich; Markus Göker; Norbert Hölzel; Sixing Huang; Katharina J Huber; Ellen Kandeler; Valentin H Klaus; Till Kleinebecker; Sven Marhan; Christian von Mering; Yvonne Oelmann; Daniel Prati; Kathleen M Regan; Tim Richter-Heitmann; João F Matias Rodrigues; Barbara Schmitt; Ingo Schöning; Marion Schrumpf; Elisabeth Schurig; Emily F Solly; Volkmar Wolters; Jörg Overmann
Journal:  Front Microbiol       Date:  2022-02-02       Impact factor: 5.640

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.