| Literature DB >> 21390268 |
François Serra1, Leonardo Arbiza, Joaquín Dopazo, Hernán Dopazo.
Abstract
Classically, the functional consequences of natural selection over genomes have been analyzed as the compound effects of individual genes. The current paradigm for large-scale analysis of adaptation is based on the observed significant deviations of rates of individual genes from neutral evolutionary expectation. This approach, which assumed independence among genes, has not been able to identify biological functions significantly enriched in positively selected genes in individual species. Alternatively, pooling related species has enhanced the search for signatures of selection. However, grouping signatures does not allow testing for adaptive differences between species. Here we introduce the Gene-Set Selection Analysis (GSSA), a new genome-wide approach to test for evidences of natural selection on functional modules. GSSA is able to detect lineage specific evolutionary rate changes in a notable number of functional modules. For example, in nine mammal and Drosophilae genomes GSSA identifies hundreds of functional modules with significant associations to high and low rates of evolution. Many of the detected functional modules with high evolutionary rates have been previously identified as biological functions under positive selection. Notably, GSSA identifies conserved functional modules with many positively selected genes, which questions whether they are exclusively selected for fitting genomes to environmental changes. Our results agree with previous studies suggesting that adaptation requires positive selection, but not every mutation under positive selection contributes to the adaptive dynamical process of the evolution of species.Entities:
Mesh:
Year: 2011 PMID: 21390268 PMCID: PMC3048381 DOI: 10.1371/journal.pcbi.1001093
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Figure 1Mammal and Drosophila phylogenies.
Numbers on internal and external nodes represent the median number of nonsynonymous and synonymous substitutions per codon (dN/dS) estimated from all the coding sequences compared in mammal (A) and Drosophila (B) genomes. Branch lengths and rates were multiplied by 100. Ancestral estimation of parameters was done in primates (P), rodents (R), D. yakuba and D. erecta (Aye), D. simulans and D. sechellia (Ass), and D. melanogaster, D. simulans and D. sechellia (Amss). C. familiaris and D. ananassae were chosen as outgroup species in the corresponding tree.
Figure 2Summary of the steps developed by the GSSA.
GSSA can be roughly described in a series of five steps (S1 to S5). S1: rank genes of a genome according to an evolutionary variable, S2: assign functional classes to all the listed genes, S3: apply a fixed number of partitions on the ranked list, S4: proceeds with a Fisher exact test (FET) for each partition, S5: adjust p-values by FDR. See text for a full description. Colored boxes (red, orange, cyan and blue) represent functional modules with genes significantly accumulated (0.1% FDR and 5% FDR) at the corresponding extremes of a list (top and bottom), and therefore with significantly high (SH) and low (SL) values of the evolutionary variable (ω) respectively. White represents a non-significant association (NS). Examples show five alternative GO categories with significant and non-significant distributions of the ω statistic. In parenthesis, the total number of genes corresponding to the GO term is shown. For GO1, the function seems to be uncorrelated with the arrangements of the genes. In the example (GO:0007517) partition 16 in human (not shown in the picture) reported the lowest p-value (p = 0.011) although it was not significant after FDR correction (FDR = 0.065). Upper (A) and lower (B) sides of the ranked list (S3) represent both sides of the specified partition number. Remainder GO categories (GO2 to GO5) show the association of dark dots with values located at the top (significant high ω values –SHω), and at the bottom (significant low ω values –SLω) of the list (for GO2-GO3 and GO4-GO5, respectively). In examples, FETs found the most significant p-value for partitions 8, 14, 22 and 27 for GO:0007517, GO:0007186, GO:0009566, GO:0050658 and GO:0022618 in chimpanzee, human, mouse and rat genome, respectively.
Numbers and percentages of functional modules with significant results after GSSA.
| SH | SL | ||||
| KEGG | GO | KEGG | GO | ||
| Mammals | dS | 15 (1.9) | 187 (3.3) | 12 (2.1) | 364 (6.5) |
| dN | 145 (18.2) | 708 (12.6) | 230 (28.9) | 1,839 (32.9) | |
| ω | 123 (15.5) | 649 (11.6) | 206 (25.9) | 1,675 (30.0) | |
| Δω | 64 (8.0) | 421 (7.5) | 107 (13.4) | 818 (14.7) | |
| Drosophilas | dS | 18 (3.1) | 104 (1.5) | 26 (4.5) | 1,263 (18.9) |
| dN | 31 (5.3) | 276 (4.1) | 26 (4.5) | 2,097 (31.5) | |
| ω | 15 (2.6) | 213 (4.1) | 24 (4.1) | 1,321 (19.8) | |
| Δω | 2 (0.3) | 143 (2.1) | 7 (1.2) | 184 (2.8) | |
GO/KEGG terms were 1,394/199 in mammals and 1,331/116 in Drosophilas.
* Statistically significant high (SH) and low (SL) rates after the GSSA (5% FDR).
Figure 3GSSA of evolutionary variables.
The figure shows a selection of GO terms and KEGG pathways with significant and not significant deviations after GSSA of evolutionary rates in mammals (A) and Drosophila (B) species. Colored boxes represent functional modules with genes significantly accumulated at the corresponding extremes of the ranked list as explained in Figure 2. The number inside each box represents the percentage of the total number of genes of the functional module (in parenthesis) that contribute to its significance. Here we reported the numbers of the first significant partition after FET and FDR. Topologies represent the phylogenetic relationships of species.
Functional enrichment results using gene-by-gene and gene-set approaches.
| Biological process | Functional category enriched by PSGs (Reference #) | GSSA results | |||||||
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | SHω | SLω | |
| Olfaction/Sensory perception of smell | H | Pr | Pr | H | |||||
| Chemosensory perception | H | Pr | H | ||||||
| G-protein-mediated signaling | H | H | Pr | H | |||||
| Proteolysis | Ds | M | |||||||
| Immune response | Pr | H, C | Ro | C | |||||
| Inflammatory response | Ro | H | |||||||
| Defense response | Ro | H | |||||||
| Response to wounding | Ro | H | |||||||
| T-cell-mediated immunity | Pr | M | |||||||
| Natural killer-cell-mediated immunity | Pr | R | |||||||
| B-cell- and antibody-mediated immunity | Pr | M | |||||||
| Response to pest, pathogen, or parasite | H | C | |||||||
| Stress response | C | Ro | M | ||||||
| Cell surface receptor-mediated signal transduction | H | Pr | C | Dmel | |||||
| Cell adhesion | H | R | H | ||||||
| Signal transduction/intracellular signaling cascade | H, C | Pr | Ds | H | |||||
| Ion transport | H | H | Ds | H | |||||
| Potassium ion transport | Pr | H | |||||||
| Protein transport | H | Ds | H | ||||||
| Protein metabolism & modification | H, C | C | Ds | H | |||||
| Nervous system development | Ds | H | |||||||
| Organ development | Ds | H | |||||||
| Post-embryonic development | Ds | M | |||||||
| Cell proliferation and differentiation | C | Ds | H | ||||||
| Inhibition of apoptosis | Pr | H | |||||||
| Transcription | H, C | C | Ds | H | |||||
The table depicts some selected biological functions enriched by PSGs as cited in references 1 to 7, and the corresponding significant result observed after GSSA of ω values. References 1 to 7 correspond to cites 6, 7, CSAC, 4, 5, 9 and 8 in the manuscript, respectively. Abbreviations: SHω: statistically significant high ω values; SLω: statistically significant low ω values; H: H. sapiens; C: P. troglodytes; Pr: primates; M: M. musculus; R: R. norvegicus; Ro: rodents; Dmel: D. melanogaster; Dsim: D. simulans; Dsec: D sechelia; Dyak: D. yakuba; Dere: D. erecta; Ds: Drosophila species.
*: p<0.05;
** p<0.001. CSAC: Chimpanzee Sequencing and Analysis Consortium, Nature. 2005 vol. 437 (7055) pp. 69–87.
Figure 4Positive selection and evolution of functional modules.
Circles and triangles represent the median values of dN and dS for KEGG pathways and GO terms (level 6–7), respectively in mammals, and in the Drosophila species. Functional modules with SHω and SLω results after GSSA are shown in red and blue. Those modules without statistical differences are gray. Yellow dots depict the median dS and dN values for H. sapiens (1), P. troglodytes (2), M. musculus (3), R. norvegicus (4), D. simulans (5), D. sechellia (6), D. melanogaster (7), D. yakuba (8) and D. erecta (9). (B) In this case, circles and triangles represent a subset (of A) with modules containing at least one PSG. Note that they are distributed along a wide range of values of dS and dN and in functional categories with significant (red/blue), and non-significant (gray) results after the GSSA (ω ratio).