| Literature DB >> 20865118 |
Kenneth D Whitney1, Theodore Garland.
Abstract
Mechanisms underlying the dramatic patterns of genome size variation across the tree of life remain mysterious. Effective population size (N(e)) has been proposed as a major driver of genome size: selection is expected to efficiently weed out deleterious mutations increasing genome size in lineages with large (but not small) N(e). Strong support for this model was claimed from a comparative analysis of N(e)u and genome size for ≈30 phylogenetically diverse species ranging from bacteria to vertebrates, but analyses at that scale have so far failed to account for phylogenetic nonindependence of species. In our reanalysis, accounting for phylogenetic history substantially altered the perceived strength of the relationship between N(e)u and genomic attributes: there were no statistically significant associations between N(e)u and gene number, intron size, intron number, the half-life of gene duplicates, transposon number, transposons as a fraction of the genome, or overall genome size. We conclude that current datasets do not support the hypothesis of a mechanistic connection between N(e) and these genomic attributes, and we suggest that further progress requires larger datasets, phylogenetic comparative methods, more robust estimators of genetic drift, and a multivariate approach that accounts for correlations between putative explanatory variables.Entities:
Mesh:
Year: 2010 PMID: 20865118 PMCID: PMC2928810 DOI: 10.1371/journal.pgen.1001080
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Figure 1Ignoring phylogenetic history can lead to incorrect conclusions about the nature of evolutionary associations between traits.
In this hypothetical example, eight species have been measured for two traits, x and y, as indicated by pairs of values at the tips of the phylogenetic tree (A). Ordinary least-squares linear regression (OLS) indicates a statistically significant positive relationship (B; r = 0.62, P = 0.02), potentially leading to an inference of a positive evolutionary association between x and y. However, inspection of the scatterplot (B) in relation to the phylogenetic relationships of the species (A) indicates that the association between x and y is negative for the four species within each of the two major lineages. Regression through the origin with phylogenetically independent contrasts (computed using [34] and setting all branches to length 1.0), which is equivalent to phylogenetic generalized least squares (PGLS) analysis, accounts for the nonindependence of species and indicates no overall evolutionary relationship between the traits (C, standardized contrasts, r = 0.01, P = 0.82; basal contrast indicated in red). The apparent pattern across species was driven by positively correlated trait change only at the basal split of the phylogeny; throughout the rest of the phylogeny, the traits mostly changed in opposite directions (A; basal contrast in red). Notes: In A, the estimated nodal values for both traits are shown in parentheses. These are intermediate steps in the independent contrasts algorithm and are not to be taken as optimal estimates of the states at internal nodes; rather, they are a type of “local parsimony” estimate (except the estimate at the basal node, which is equivalent to the estimate under squared-change parsimony). Contrasts are taken between sister nodes on a phylogeny, not along each branch segment [15], [16], [18].
Figure 2Phylogeny for the species in the Lynch & Conery dataset [, with a reconstruction of genome sizes.
(See Materials and Methods).
Relationships between N and genomic attributes in nonphylogenetic (OLS) and phylogenetic (PGLS, RegOU) models.
| Model | Dependent variable | ln Max Likelihood |
|
|
|
|
|
| ||||||
| Genome Size (Mb) | −25.53 | 29 | −1.17 | 0.64 |
| |
| Gene Number | −07.81 | 28 | −0.54 | 0.56 |
| |
| Half-life of Gene Duplicates | 25.87 | 9 | −0.03 | 0.52 |
| |
| Intron Size | −09.60 | 15 | −0.68 | 0.40 |
| |
| Intron Number | −23.40 | 15 | −1.06 | 0.21 | 0.084 | |
| Transposons (number) | −35.49 | 18 | −2.27 | 0.35 |
| |
| Transposons (fraction of genome) | −12.06 | 18 | −0.56 | 0.31 |
| |
|
| ||||||
| Genome Size (Mb) | −23.51 | 29 | −0.33 | 0.08 | 0.137 | |
| Gene Number | −04.09 | 28 | −0.15 | 0.07 | 0.187 | |
| Half-life of Gene Duplicates | 23.62 | 9 | −0.01 | 0.13 | 0.335 | |
| Intron Size | −09.33 | 15 | −0.36 | 0.13 | 0.187 | |
| Intron Number | −23.84 | 15 | −0.75 | 0.09 | 0.291 | |
| Transposons (number) | −33.83 | 18 | −0.29 | 0.01 | 0.707 | |
| Transposons (fraction of genome) | −11.52 | 18 | −0.07 | 0.01 | 0.740 | |
|
| ||||||
| Genome Size (Mb) | −22.59* | 29 | −0.20 | 0.04 | 0.328 | |
| Gene Number | −03.86* | 28 | −0.12 | 0.04 | 0.282 | |
Log10-transformed dependent variables were regressed on log10(N). Phylogenetic models used arbitrary branch lengths of 1.0 (see Materials and Methods). Note that r values are not comparable across OLS, PGLS, and RegOU models. Asterisks indicate RegOU models with significantly better fit than OLS models, based on ln likelihood ratio tests (see Results); b = regression slope; significant P-values are in bold.
†: Lynch & Conery [7] reported r = 0.659; the discrepancy apparently arises because their analysis used 30 species, only 29 of which were reported in their online supplement.
Figure 3Relationship between N and genome size across 22 eukaryotic and 7 prokaryotic species from the dataset of Lynch & Conery [.
(A) Ordinary least squares regression (OLS); r = 0.64, P<0.0001. (B) Standardized phylogenetically independent contrasts (equivalent to PGLS) using branch lengths of 1.0; r = 0.08, P = 0.138. Values have been “positivized” on the x-axis [35].