| Literature DB >> 21637669 |
José Alexandre Felizola Diniz-Filho1, João Carlos Nabout, Mariana Pires de Campos Telles, Thannya Nascimento Soares, Thiago Fernando L V B Rangel.
Abstract
Most evolutionary processes occur in a spatial context and several spatial analysis techniques have been employed in an exploratory context. However, the existence of autocorrelation can also perturb significance tests when data is analyzed using standard correlation and regression techniques on modeling genetic data as a function of explanatory variables. In this case, more complex models incorporating the effects of autocorrelation must be used. Here we review those models and compared their relative performances in a simple simulation, in which spatial patterns in allele frequencies were generated by a balance between random variation within populations and spatially-structured gene flow. Notwithstanding the somewhat idiosyncratic behavior of the techniques evaluated, it is clear that spatial autocorrelation affects Type I errors and that standard linear regression does not provide minimum variance estimators. Due to its flexibility, we stress that principal coordinate of neighbor matrices (PCNM) and related eigenvector mapping techniques seem to be the best approaches to spatial regression. In general, we hope that our review of commonly used spatial regression techniques in biology and ecology may aid population geneticists towards providing better explanations for population structures dealing with more complex regression problems throughout geographic space.Entities:
Keywords: autocorrelation; geographical genetics; isolation-by-distance; landscape genetics; spatial regression
Year: 2009 PMID: 21637669 PMCID: PMC3036944 DOI: 10.1590/S1415-47572009000200001
Source DB: PubMed Journal: Genet Mol Biol ISSN: 1415-4757 Impact factor: 1.771
Figure 1Allele frequency of each of the 30 populations in the Cerrado biome, and a spatial correlogram, placing in evidence the high positive autocorrelation in the first distance classes.
A comparison of spatial regression methods based on the analysis of null expectation, by regressing allele frequencies evolving under a pure isolation-by-distance process against three explanatory variables (factors). N. models refers to the frequency (out of 20 simulations) with at least one significant (p < 0.05) regression slope, whereas N. coeff. shows the frequency (out of 60 coefficients) of significant coefficients. The Dist(H0) refers to the average Euclidian distances between the regression coefficient vector β and the null expectation (all slopes are zero).
| N. models | N. coeffs | Dist (H0) | |
| OLS | 0.65 | 0.38 | 0.140 |
| TSA | 0.75 | 0.33 | 0.170 |
| PCNM | 0.50 | 0.18 | 0.154 |
| LagRES | 0.40 | 0.13 | 0.076 |
| LagPRED | 0.40 | 0.20 | 0.113 |
| SAR | 0.70 | 0.35 | 0.118 |
| CAR | 0.70 | 0.38 | 0.138 |
| MA | 0.70 | 0.35 | 0.116 |
Figure 2Distribution of spatial regression methods in the 2D solution of non-metric multidimensional scaling (NMDS) based on their standardized slopes. The methods were: Ordinary Least-Squares (OLS); Principal Coordinate of Neighbor Matrices (PCNM); Lagged Response (LagRES); Lagged Predictor (LagPRED); Simultaneous Autoregression (SAR); Conditional Autoregression (CAR); Moving Average (MA); Trend Surface Analysis (TSA).
The same analyses shown in Table 1, but regressing allele frequencies evolving under a pure isolation-by-distance process against two out of three explanatory variables (removing human occupation).
| N. models | N. coeffs | Dist (H0) | |
| OLS | 0.55 | 0.40 | 0.069 |
| TSA | 0.35 | 0.25 | 0.073 |
| PCNM | 0.10 | 0.07 | 0.103 |
| LagRES | 0.15 | 0.07 | 0.033 |
| LagPRED | 0.20 | 0.12 | 0.055 |
| SAR | 0.40 | 0.30 | 0.060 |
| CAR | 0.50 | 0.45 | 0.070 |
| MA | 0.40 | 0.30 | 0.058 |