| Literature DB >> 20333200 |
Jochen B W Wolf1, Axel Künstner, Kiwoong Nam, Mattias Jakobsson, Hans Ellegren.
Abstract
Selection modulates gene sequence evolution in different ways by constraining potential changes of amino acid sequences (purifying selection) or by favoring new and adaptive genetic variants (positive selection). The number of nonsynonymous differences in a pair of protein-coding sequences can be used to quantify the mode and strength of selection. To control for regional variation in substitution rates, the proportionate number of nonsynonymous differences (d(N)) is divided by the proportionate number of synonymous differences (d(S)). The resulting ratio (d(N)/d(S)) is a widely used indicator for functional divergence to identify particular genes that underwent positive selection. With the ever-growing amount of genome data, summary statistics like mean d(N)/d(S) allow gathering information on the mode of evolution for entire species. Both applications hinge on the assumption that d(S) and mean d(S) (approximately branch length) are neutral and adequately control for variation in substitution rates across genes and across organisms, respectively. We here explore the validity of this assumption using empirical data based on whole-genome protein sequence alignments between human and 15 other vertebrate species and several simulation approaches. We find that d(N)/d(S) does not appropriately reflect the action of selection as it is strongly influenced by its denominator (d(S)). Particularly for closely related taxa, such as human and chimpanzee, d(N)/d(S) can be misleading and is not an unadulterated indicator of selection. Instead, we suggest that inconsistencies in the behavior of d(N)/d(S) are to be expected and highlight the idea that this behavior may be inherent to taking the ratio of two randomly distributed variables that are nonlinearly correlated. New null hypotheses will be needed to adequately handle these nonlinear dynamics.Entities:
Keywords: adaptive evolution; dN/dS ratio; melanocortin-1-receptor; negative selection; neutral theory; positive selection; protein evolution; selection models
Year: 2009 PMID: 20333200 PMCID: PMC2817425 DOI: 10.1093/gbe/evp030
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Compilation of Parameters Derived from Pairwise Comparisons between the Human Genome and the Genomes of 15 Other Species
| Common Species Name | Binomial Nomenclature | Number of Orthologues with Human | Number of Genes with ω >1 | Number of Genes with ω ≤1 | Probability of Genes with ω > 1 (%) | Branch Length in PAML | Branch Length after | Mean | Mean | ψ | Spearman's | Spearman's | Spearman's |
| Chimp | 17,226 | 1,422 | 15,804 | 0.083 | 0.028 | 0.0136 | 0.020 | 0.006 | 0.306 | 0.287 | 0.847 | −0.178 | |
| Macaque | 16,196 | 334 | 15,862 | 0.021 | 0.136 | 0.0640 | 0.106 | 0.028 | 0.260 | 0.461 | 0.875 | 0.034 | |
| Mouse lemur | 13,921 | 132 | 13,789 | 0.009 | 0.363 | 0.2237 | 0.327 | 0.059 | 0.181 | 0.358 | 0.881 | −0.071 | |
| Bush baby | 12,936 | 98 | 12,838 | 0.008 | 0.421 | 0.2565 | 0.361 | 0.068 | 0.188 | 0.379 | 0.900 | −0.008 | |
| Dog | 13,145 | 11 | 13,134 | 0.001 | 0.490 | 0.3350 | 0.468 | 0.072 | 0.154 | 0.449 | 0.873 | 0.015 | |
| Elephant | 11,946 | 74 | 11,872 | 0.006 | 0.479 | 0.3381 | 0.427 | 0.075 | 0.176 | 0.352 | 0.891 | −0.054 | |
| Rabbit | 11,592 | 43 | 11,549 | 0.004 | 0.506 | 0.3504 | 0.487 | 0.072 | 0.148 | 0.353 | 0.883 | −0.073 | |
| Cow | 14,148 | 12 | 14,136 | 0.001 | 0.523 | 0.3423 | 0.506 | 0.075 | 0.149 | 0.391 | 0.880 | −0.036 | |
| Mouse | 15,093 | 5 | 15,088 | 0.000 | 0.705 | 0.4532 | 0.670 | 0.091 | 0.137 | 0.397 | 0.923 | 0.060 | |
| Rat | 13,904 | 3 | 13,901 | 0.000 | 0.734 | 0.4613 | 0.690 | 0.097 | 0.141 | 0.412 | 0.918 | 0.066 | |
| Opossum | 12,283 | 2 | 12,281 | 0.000 | 1.224 | 0.7114 | 1.256 | 0.134 | 0.107 | 0.446 | 0.842 | −0.048 | |
| Platypus | 8,527 | 0 | 8,527 | 0.000 | 1.465 | 0.9674 | 1.615 | 0.149 | 0.092 | 0.419 | 0.828 | −0.107 | |
| Chicken | 8,485 | 0 | 8,485 | 0.000 | 1.637 | 1.0869 | 1.772 | 0.157 | 0.089 | 0.481 | 0.873 | 0.043 | |
| Xenopus | 3,575 | 2 | 3,573 | 0.001 | 2.208 | 1.5278 | 2.485 | 0.178 | 0.072 | 0.440 | 0.870 | −0.004 | |
| Zebra fish | 936 | 0 | 936 | 0.000 | 2.623 | 1.8287 | 3.041 | 0.201 | 0.066 | 0.256 | 0.914 | −0.104 |
Branch length for mouse lemur could not directly be obtained from the study by Miller et al. (2007). We could, however, make use of the strong correlation between branch length values obtained by the CODEML package and those derived from Miller et al. (2007; R2 = 0.96, P < 0.001; dS = −0.03 + 1.46 × Miller branch length) to predict the branch length of mouse lemur.
FRelationship of ψ and branch length based on estimates from Miller et al. (2007). (A) Pairwise alignments of human and 15 other species where all possible orthologues between two species are included (compare table 1). (B) Pairwise alignments of human and 15 other species restricted to core sets of genes that are common to all species pairs under consideration. “Red”: 11-way core set of 4,181 orthologues genes retrieved from all possible pairwise comparisons from human–chimpanzee to human–opossum. “Black”: 15-way core set of 105 genes common to all possible pairwise comparisons from human–chimpanzee to human–zebra finch. The fitted lines are based on log-log regression models. “Number code”: 1: chimp; 2: macaque; 3: mouse lemur; 4: bush baby; 5: dog; 6: elephant; 7: cow; 8: rabbit; 9: mouse; 10: rat; 11: opossum; 12: platypus; 13: chicken; 14: xenopus; and 15: zebra fish. (C) Relationship of ψ and branch length based on multiple alignment of the 11-way core set including a total of 3,866 genes. Individual data points represent estimated values of ψ for both terminal and internal branches after ancestral reconstruction. Numbers encode branch identity (see tree supplementary fig. S7, Supplementary Material online). Branches with the highest ψ 7, 8, 9 are the terminal branches of human, chimpanzee, and rhesus macaque, respectively.
FRelationship between ω and dS estimated for the gene MC1R. (A) Estimates based on pairwise comparisons between chicken and 22 passerine bird species. Number code 1: Lepidothrix serena (DQ388331); 2: Lepidothrix coronata (DQ388330); 3: Malurus leucopterus (AY614610); 4: Phylloscopus chloronotus (AY308751); 5: Phylloscopus humei (AY308750); 6: Phylloscopus tytleri (AY308753); 7: Phylloscopus fuscatus (AY308754); 8: Phylloscopus pulcher (AY308752); 9: Phylloscopus collybita (AY308747); 10: Seicercus burkii (AY308757); 11: Seicercus xanthoschistus (AY308756); 12: Phylloscopus trochiloides (AY308749); 13: Coereba flaveola (AF362601); 14: Tangara cucullata (AF362606); 15: Vermivora peregrina (AY308755); 16: Passerina cyanea (EU191783); 17: Passerina caerulea (EU191787); 18: Passerina amoena (EU191785); 19: Cyanocompsa cyanoides (EU191789); 20: Passerina rositae (EU191788); 21: Corvus corone (EU348729); and 22: Perisoreus infaustus (DQ643387). (B) Branch-specific estimates from a phylogenetic reconstruction of the bird species in (A). Numbers encode branch identity (see tree supplementary fig. S8, Supplementary Material online).
Candidate Functions That Describe Possible Relationships between dN and dS and the Resulting Relationship between ω and dS
FRelationship between measures of protein evolution. Left: dN versus dS, Middle: ω versus dN, and Right: ω versus dS. The relationships are depicted as heatmaps and summarized by regression splines selected by BIC model selection (orange line). The number of genes found in each pixel is symbolized by the different colors. The first three panel sets (A–C) show actual genome data, the last two panels (D–E) are based on simulations mimicking the human–chimpanzee comparison and should be evaluated in comparison with (A). (A) Human–chimpanzee comparison, (B) human–bush baby comparison, (C) human–mouse comparison, (D) uncorrelated draws from two multivariate gamma distributions with shape and rate parameters estimated from human–chimpanzee dN and dS values, and (E) simulated dN and dS values based on a Poisson process of accumulating mutations with varying substitution rates (gamma distributed) and a similar degree of correlation between dN and dS as in the empirical data (ρ = 0.4; see supplementary material, Supplementary Material online). Note that the axis scales differ owing to the large data ranges.