| Literature DB >> 17892477 |
Abstract
Environmental microbiology is undergoing a dramatic revolution due to the increasing accumulation of biological information and contextual environmental parameters. This will not only enable a better identification of diversity patterns, but will also shed more light on the associated environmental conditions, spatial locations, and seasonal fluctuations, which could explain such patterns. Complex ecological questions may now be addressed using multivariate statistical analyses, which represent a vast potential of techniques that are still underexploited. Here, well-established exploratory and hypothesis-driven approaches are reviewed, so as to foster their addition to the microbial ecologist toolbox. Because such tools aim at reducing data set complexity, at identifying major patterns and putative causal factors, they will certainly find many applications in microbial ecology.Entities:
Mesh:
Year: 2007 PMID: 17892477 PMCID: PMC2121141 DOI: 10.1111/j.1574-6941.2007.00375.x
Source DB: PubMed Journal: FEMS Microbiol Ecol ISSN: 0168-6496 Impact factor: 4.194
Usage (%) of multivariate methods in different fields
| Exploratory analysis | Hypothesis-driven analysis | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Keywords | Cluster | PCA | MDS | PCoA | CCA | RDA | Mantel | CVA | Total number | ||
| Bacter* | 48.5 | 38 | 4.5 | 0.4 | 3.2 | 1.8 | 1.3 | 0.4 | 0.9 | 1.1 | 1141 |
| Microb* | 45.8 | 40.2 | 3.9 | 1.1 | 2.2 | 2.2 | 1.1 | 1.7 | 0.6 | 1.1 | 179 |
| Plant* | 40.3 | 28.5 | 4.6 | 1.7 | 15.5 | 3.7 | 1.9 | 2.3 | 0.6 | 0.9 | 3335 |
| Fung* | 54 | 27.2 | 2.8 | 1.1 | 8.5 | 2.8 | 0.9 | 1.1 | 0.2 | 1.4 | 563 |
| Fish* | 30.1 | 33.7 | 9.8 | 0.3 | 13.5 | 2.7 | 3.6 | 2.9 | 2.3 | 1.2 | 1464 |
| Bird* | 41 | 20.5 | 5.4 | 0.7 | 21.2 | 3.5 | 2.1 | 4.2 | 0.5 | 0.9 | 429 |
| Insect* | 54.3 | 13.7 | 6.1 | 0.8 | 11.5 | 4.4 | 3.5 | 3 | 1.1 | 1.7 | 637 |
A literature search was performed with the Thomson ISI research tool with the following parameters (Doc type, all document types; language, all languages; databases, SCI-EXPANDED, SSCI, A&HCI; Timespan, 1900–2006) on December 13, 2006 in the titles and abstracts of the articles only.
Asterisks were placed at the end of each keyword to accommodate for variations. Each keyword was additionally combined with the following technical designations: cluster, cluster analysis; PCA, principal component analysis; MDS, multidimensional scaling; PcoA, principal coordinate analysis; CCA, canonical correspondence analysis; RDA, redundancy analysis; Mantel, Mantel test, or CVA, canonical variate analysis.
Total number refers to the total number of publications identified by each keyword and all its combinations. The ordination based on correspondence analysis of the raw number is depicted in Fig. 1.
Fig. 1Correspondence analysis of method usage in various scientific fields. In this symmetrical scaling of CA scores, the first two axes explained 47.3% and 35.8% of the total inertia of Table 1, respectively. The gray areas were drawn to facilitate the interpretation. Complete row names (scientific fields; full circles) and column names (methods; white triangles) are given in Table 1. Methods (triangles) located close to each other correspond to methods often occurring together in studies. The distance between a scientific field point and a method point approximates the probability of method usage in the field.
Interpretation of ordination diagrams
| Linear methods (PCA, RDA) | |||||
|---|---|---|---|---|---|
| PCA, RDA | RDA | ||||
| Scaling 1 | Scaling 2 | ||||
| Samples | Species | ENV | NENV | Focus on sample (rows) distance | Focus on species (columns) correlation |
| ✓ | Euclidean distances among samples | – | |||
| ✓ | – | Linear correlations among species | |||
| ✓ | Marginal effects of ENV on ordination scores | Correlations among ENV | |||
| ✓ | Euclidean distance between sample classes | – | |||
| ✓ | ✓ | Abundance values in species data | |||
| ✓ | ✓ | – | Values of ENV in the samples | ||
| ✓ | ✓ | Membership of samples in the classes | |||
| ✓ | ✓ | Linear correlations between species and ENV | |||
| ✓ | ✓ | Mean species abundance within classes of nominal ENV | |||
| ✓ | ✓ | – | Average of ENV within classes | ||
| Unimodal methods (CA, CCA) | |||||
| CA, CCA | CCA | Focus on sample (rows) distance and Hill's scaling | Focus on species (columns) distances | ||
| ✓ | Turnover distances among samples | χ2 distances between samples | |||
| ✓ | - | χ2 distances among species distributions | |||
| ✓ | Marginal effects of ENV | Correlations among ENV | |||
| ✓ | Turnover distances between sample classes | χ2 distances between sample classes | |||
| ✓ | ✓ | Relative abundances of the species table | Relative abundances of the species table | ||
| ✓ | ✓ | – | Values of ENV in the samples | ||
| ✓ | ✓ | Membership of samples in the classes | |||
| ✓ | ✓ | Weighted averages – the species optima in respect to particular ENV | |||
| ✓ | ✓ | Relative total abundances in the sample classes | |||
| ✓ | ✓ | – | ENV averages within sample classes | ||
The interpretation of ordination diagrams depends on the focus of the study, because sample scores are rescaled as a function of the scaling choice. Approximate relationships between and among the different elements represented in biplots and triplots as species (represented as dots or arrows), samples (dots), environmental variables (ENV; arrows), and nominal (qualitative) environmental variables (NENV; dots). A meaningless interpretation (“–”) happens when the suggested comparison is not optimal because of inappropriate scaling of the ordination scores. Adapted from ter Braak (1994); Leps & Smilauer (1999); ter Braak & Smilauer (2002).
Fig. 2Ordination diagrams in two dimensions. (a) In a PCA biplot representation, samples are represented by dots and species by arrows. The arrows point in the direction of maximal variation in the species abundances, and their lengths are proportional to their maximal rate of change. Long arrows correspond to species contributing more to the data set variation. Right-angle projection of a sample dot on a species arrow gives approximate species abundance in the sample. (b) In a CA joint plot representation focusing on species distance, both samples and species are depicted as dots. Species dots correspond to the center of gravity (inertia) of the samples where they mostly occur. Distances between sample and species points give an indication of the probability of species composition in samples (see Table 2 for more details about diagram interpretation).
Fig. 3Partitioning biological variation into the effects of two factors. The large rectangle represents the total variation in the biological data table, which is partitioned among two sets of explanatory variables (a, b). Fraction 4 shows the unexplained part of the biological variation. Fractions 1 and 3 are obtained by partial constrained ordination or partial regression, and can be tested for significance. For instance, fraction 1 corresponds to the amount of biological variation that can be exclusively explained by (a) effects when (b) effects are taken into consideration (i.e., when b is considered as a covariable). Fraction 2 [i.e., variation indifferently attributed to (a) and (b) or a covariation of (a) and (b)] is obtained by subtracting fractions 1 and 3 from the total explained variance, and cannot be tested for statistical significance.
Fig. 4Relationships between numerical methods. Exploratory tools such as PCA, CA, PCoA, NMDS, or cluster analysis can be applied to a sample-by-species table to extract the main patterns of variation, to identify groups or clusters of samples, or specific species interactions. Sample scores on the main axes of variation can be related to variation in environmental variables using indirect gradient analyses. When a constrained analysis is desired (i.e. direct gradient analysis), RDA, db-RDA, CCA, or linear discriminant analysis can be used as extensions of the unconstrained methods. Mantel tests are appropriate to test the significance of the correlation between two distance matrices (e.g. one based on species data and the other on environmental variables). Raw data may be transformed, normalised or standardised as appropriate before analysis.
Fig. 5Combination of ordination and cluster analysis. On a same distance matrix, NMDS or PCOA can be applied to represent the major axes of variation among objects in a two-dimensional space. The superimposition of the results of cluster analysis (primary connections) onto the ordination diagram can help identify the structure in the data set as discontinuities (clusters) into a continuous space (ordination). Adapted from Legendre & Legendre (1998).