| Literature DB >> 20628608 |
Jordi Planas1, Josep M Serrat.
Abstract
Assessing the contribution of promoters and coding sequences to gene evolution is an important step toward discovering the major genetic determinants of human evolution. Many specific examples have revealed the evolutionary importance of cis-regulatory regions. However, the relative contribution of regulatory and coding regions to the evolutionary process and whether systemic factors differentially influence their evolution remains unclear. To address these questions, we carried out an analysis at the genome scale to identify signatures of positive selection in human proximal promoters. Next, we examined whether genes with positively selected promoters (Prom+ genes) show systemic differences with respect to a set of genes with positively selected protein-coding regions (Cod+ genes). We found that the number of genes in each set was not significantly different (8.1% and 8.5%, respectively). Furthermore, a functional analysis showed that, in both cases, positive selection affects almost all biological processes and only a few genes of each group are located in enriched categories, indicating that promoters and coding regions are not evolutionarily specialized with respect to gene function. On the other hand, we show that the topology of the human protein network has a different influence on the molecular evolution of proximal promoters and coding regions. Notably, Prom+ genes have an unexpectedly high centrality when compared with a reference distribution (P=0.008, for Eigenvalue centrality). Moreover, the frequency of Prom+ genes increases from the periphery to the center of the protein network (P=0.02, for the logistic regression coefficient). This means that gene centrality does not constrain the evolution of proximal promoters, unlike the case with coding regions, and further indicates that the evolution of proximal promoters is more efficient in the center of the protein network than in the periphery. These results show that proximal promoters have had a systemic contribution to human evolution by increasing the participation of central genes in the evolutionary process.Entities:
Mesh:
Year: 2010 PMID: 20628608 PMCID: PMC2900212 DOI: 10.1371/journal.pone.0011476
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Functional analysis of the Prom genes, Cod genes and genes showing signatures of positive selection either in the proximal promoter or the coding region.
|
|
| Positive genes | ||||
| PANTHER Ontology terms | n | PANTHER Ontology terms | n | PANTHER Ontology terms | n | |
|
| Protein metabolism | 83 | Muscle contraction | 8 | Phosphate metabolism | 12 |
| Other Metabolism | 22 | Sensory perception | 11 | Protein metabolism | 140 | |
| Phosphate metabolism | 5 | Carbohydrate metabolism | 34 | |||
|
| Cell communication | 13 | m-RNA transcription | 55 | ||
| m-RNA transcription | 23 | Signal transduction | 119 | |||
| Signal transduction | 54 | Cell communication | 39 | |||
Categories with significantly more (enrichment) or less (impoverishment) genes than expected (P<0.05, Hypergeometric test).
Prom genes, n = 477.
Cod genes, n = 406.
Genes showing signatures of positive selection either in the proximal promoter or the coding region (Positive genes), n = 871.
*P<0.01.
Functional classification of the genes showing signatures of positive selection both in the proximal promoter and the coding region.
| Biological process | Gene symbol |
| Cell proliferation and differentiatiom | ANP32B, DSTYK, PIK3R2, NCAN, GEMIN4, DKK2, CCDC134 |
| Development | NEIL3, CHORDC1, DKK2, CCDC65, NCAN, UBP1 |
| Nucleoside, nucleotide and nucleic acid metabolism | NEIL3, EME1, TRUB1, SFRS14, UBP1 |
| Signal transduction | DSTYK, PIK3R2, DKK2, CCDC134 |
| Protein metabolism | ANP32B, CHORDC1, PSMC3, DSTYK |
| Cell Cycle | EME1, ANP32B, SMEK2, PSMC3 |
*PANTHER Ontology terms containing more than two genes with signals of positive selection both in the promoter and the coding region.
Figure 1Distributions of the centrality parameters of the IntAct network proteins.
A. Distribution of the degree. The degree of a node, also known as connectivity, is the number of its interacting partners. It is a local measure of centrality, which means that it is not affected by the topology of other regions in the network. B. Distribution of the betweenness. Betweenness centrality is a parameter that is roughly defined as the number of shortest paths between pairs of nodes in the network that pass through a given node and is interpreted as a measure of the importance of a node for the flow of information through the network. As the betweenness distribution contains zeros, we have added 1 unit to the betweenness values to be able to plot the distribution in log scale. C. Distribution of the ASPL. The ASPL value of a node is the average shortest path length between the node and all other nodes in the network and can be interpreted as a measure of geometrical centrality. Notice that the more central a node, the smaller its ASPL. D. Distribution of the Eigenvalue centrality (EVC). The EVC of a node is its associated score in the eigenvector of the largest eigenvalue of the adjacency matrix; in a protein network with unweighted edges, nodes with high EVC are those that are connected to many nodes, which are, in turn, connected to many other nodes and so on. Prom: genes with positively selected proximal promoters (n = 188); Cod: genes with positively selected coding regions (n = 152); Prom Ref.: the reference set for Prom (n = 2219); Cod Ref.: the reference set for Cod (n = 1811). The open circle shows the mean of the distributions. *P<0.05, **P<0.01 and ***P<0.001.
Figure 2Association between the frequency of positively selected genes and centrality.
A. Logistic regression between the frequency of Prom genes and Eigenvalue centrality (EVC). B. Logistic regression between the frequency of Cod genes and EVC. In both panels, the X axis values correspond to the upper interval quantile of EVC in log coordinates.
Correlation between centrality and level of expression.
| DATA1 | DATA2 | |||||||||||
| IntAct proteins |
|
| IntAct proteins |
|
| |||||||
|
| p-value |
| p-value |
| p-value |
| p-value |
| p-value |
| p-value | |
|
| 0.052 |
| 0.041 | 0.5 | 0.035 | 0.6 | 0.048 |
| 0.041 | 0.5 | 0.040 | 0.5 |
|
| 0.058 |
| 0.068 | 0.2 | 0.043 | 0.5 | 0.056 |
| 0.073 | 0.2 | 0.051 | 0.4 |
|
| −0.071 |
| −0.14 |
| −0.058 | 0.3 | −0.071 |
| −0.15 |
| −0.034 | 0.6 |
|
| 0.071 |
| 0.14 |
| 0.062 | 0.3 | 0.071 |
| 0.14 |
| 0.042 | 0.5 |
Intact proteins, n = 6099.
Prom genes, n = 164.
Cod genes, n = 142.
Using per each gene the highest expression value encountered in the set of tissues included in the E-GEOD-803 experiment.
Using per each gene the average of the expression values equal or greater than the median of the set of tissues included in the E-GEOD-803 experiment.
Functional classification of the positively selected central genes.
| Biological process | Gene symbol |
| Cell proliferation and differentiatiom | NCK1, TRAF1, NFKBIA, NKX2-1, NEK6, CCDC85b, SSR1, SMNDC1 |
| Signal transduction | NCK1, MAP3K8, RANBP1, TRAF1, SSR1, NEK6, DAG1, NFKBIA |
| Protein metabolism | MAP3K8, RANBP1, PSMC4, SSR1, NEK6, PFDN1, CANX, LRPPRC |
| Transcription and m-RNA processing | LRPPRC, SMNDC1, NUDT21, NKX2-1, DDX5, CCDC85B, NFKBIA |
| Nucleoside, nucleotide and nucleic acid metabolism | CTPS, NFKBIA, DDX5, SMNDC1, NKX2-1, NUDT21 |
| Cell cycle | MAP3K8, PSMC4, PFDN1, NEK6, SSR1 |
| Intracellular protein traffic | RANBP1, NFKBIA, SSR1, CANX |
| Development | NCK1, VIM, NKX2-1 |
Positively selected central genes: the set of twenty genes containing the top ten central Prom genes and the top ten central Cod genes.
*PANTHER Ontology terms containing more than two genes with signals of positive selection.