| Literature DB >> 20021652 |
Fernando Cruz1, Julien Roux, Marc Robinson-Rechavi.
Abstract
BACKGROUND: The expansion of amino acid repeats is determined by a high mutation rate and can be increased or limited by selection. It has been suggested that recent expansions could be associated with the potential of adaptation to new environments. In this work, we quantify the strength of this association, as well as the contribution of potential confounding factors.Entities:
Mesh:
Year: 2009 PMID: 20021652 PMCID: PMC2806350 DOI: 10.1186/1471-2164-10-619
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Counts of AARs in Positively versus non-Positively Selected Genes in Mammals
| RCGs | non-RCGs | Pure | Impure | |
|---|---|---|---|---|
| 19 | 381 | 26 | 8 | |
| 1207 | 14922 | 2021 | 2448 |
Counts of repeat containing genes (RCGs), repeat-free genes (non-RCGs), and of number of pure and impure amino-acid repeats (AARs), of the PSGs and non-PSGs classes. These numbers were used to perform two different Fisher's Exact Tests.
Physicochemical Properties of the AARs in Positively Selected versus Non-Positively Selected Mammalian Genes
| Acidic | Basic | Polar | Hydrophobic | |
|---|---|---|---|---|
| 10 (0.95) | 0 (-1.08) | 7 (-2.51) | 17 (3.23) | |
| 970 (-0.083) | 154 (0.094) | 2314 (0.22) | 1031 (-0.28) |
Counts of amino acid categories using the Lenhinger classification for each AAR in PSGs and non-PSGs. Values shown in brackets correspond to the residuals for each cell obtained in a Pearson's χ2 test.
ANOVA of Linear Model to Explain the Average Purity of the AARs in Positively Selected and Non-Positively Selected Genes
| Df | Sum Sq | Mean Sq | F value | Var. (%)6 | ||
|---|---|---|---|---|---|---|
| Residuals | 3616 | 20.5105 | 0.0057 | 97.351 | ||
| GCcontext1 | 1 | 0.3351 | 0.3351 | 59.078 | ||
| Species2 | 5 | 0.1154 | 0.0231 | 4.0684 | ||
| ω3 | 1 | 0.0872 | 0.0872 | 15.3805 | ||
| LRT4 | 1 | 0.0183 | 0.0183 | 3.2305 | 0.072362 | 0.087 |
| P. length (aa)5 | 1 | 0.002 | 0.002 | 0.3525 | 0.552754 | 0.009 |
| Total | 3625 | 21.0685 | 0.4714 | |||
1GC content excluding the stretch containing AARs; 2species containing the AAR(s); 3omega (dN/dS) of the most significant evolutionary model; 4significant test for positive selection at any branch of the tree; 5protein length in aminoacids; 6proportion of variance explained.
ANOVA of Linear Model to Explain the Number of AARs in Positively Selected and Non-Positively Selected Mammalian Genes
| Df | Sum Sq | Mean Sq | F value | Var. (%)6 | ||
|---|---|---|---|---|---|---|
| Residuals | 82096 | 6806.8 | 0.1 | 96.879 | ||
| P. length (aa)1 | 1 | 168.1 | 168.1 | 2027.45 | ||
| GCcontext2 | 1 | 48.9 | 48.9 | 590.12 | ||
| LRT3 | 1 | 1.4 | 1.4 | 16.3141 | ||
| Species4 | 5 | 0.8 | 0.2 | 1.9078 | 0.0894 | 0.011 |
| ω5 | 1 | 0.048 | 0.04798 | 0.5787 | 0.4468 | 0.001 |
| Total | 82105 | 7026.01 | 218.748 | |||
1Protein Length in aminoacids; 2GC content excluding the stretch containing AARs; 3significant test for positive selection at any branch of the tree; 4species containing the AAR(s); 5dN/dS of the most significant evolutionary model; 6proportion of variance explained.
Figure 1Influence of GC content at 3. GC3, GC at 3rd codon positions in the sequence context of the repeats. (A) positive correlation and regression line (using least squares) between GC3 and purity in orthologous mammalian exons; (B) Average GC3 in Impure and Pure AARs in orthologous mammalian exons (p < 2.16·10-16; Welch's t-test); (C) positive correlation between GC3 and purity in mammalian genomes and regression line (using least squares); (D) Average GC3 in Impure and Pure AARs in mammalian genomes (p < 2.16·10-16; Welch's t-test).
ANOVA of a Linear Model to Explain the Expression Level of Human Genes in the Brain
| Df | Sum Sq | Mean Sq | F value | ||
|---|---|---|---|---|---|
| P. length (aa)1 | 1 | 2.5 | 2.5 | 0.6648 | 0.4151 |
| GCcontext2 | 1 | 0.1 | 0.1 | 0.0178 | 0.894 |
| N° AARs3 | 1 | 0.1 | 0.1 | 0.0226 | 0.8805 |
| AARs +30 nt4 | 1 | 1 | 1 | 0.2669 | 0.6055 |
| AARs +60 nt5 | 1 | 1.3 | 1.3 | 0.3386 | 0.5608 |
| AARs +90 nt6 | 1 | 5.5 | 5.5 | 1.4469 | 0.2293 |
| 1 | 10.1 | 10.1 | 2.6413 | 0.1045 | |
| 1 | 0.4 | 0.4 | 0.114 | 0.7357 | |
| Residuals | 893 | 3416.8 | 3.8 |
1GC content excluding the stretch containing AARs; 2protein length in aminoacids; 3Number of AARs; 4-6Number of AARs in a window of 4+30 nt, 5+60 nt and 6+90 nt from translation start; 7Non-synomymous substitution rate; 8Average Purity of the AARs.
Enrichment of Molecular Functions of Genes containing AARs
| GO.ID | Term1 | Corrected p-value2 |
|---|---|---|
| GO:0050825 | < 1E-26 | |
| GO:0003677 | 4.01E-15 | |
| GO:0003700 | 1.26E-13 | |
| GO:0043565 | 5.79E-13 | |
| GO:0005199 | structural constituent of cell wall | 1.00E-08 |
| GO:0004879 | 3.15E-07 | |
| GO:0003682 | 2.54E-06 | |
| GO:0003723 | RNA binding | 7.63E-05 |
| GO:0008270 | zinc ion binding | 0.000303826 |
| GO:0004969 | histamine receptor activity | 0.0008013 |
| GO:0045735 | nutrient reservoir activity | 0.0008013 |
| GO:0003702 | RNA polymerase II transcription factor activity | 0.001116964 |
| GO:0003676 | nucleic acid binding | 0.001580342 |
| GO:0003705 | RNA polymerase II transcription factor activity, enhancer binding | 0.009862154 |
| GO:0003735 | structural constituent of ribosome | 0.02671 |
| GO:0005249 | voltage-gated potassium channel activity | 0.049858667 |
| GO:0004386 | helicase activity | 0.065105625 |
| GO:0016563 | transcription activator activity | 0.13355 |
| GO:0003714 | transcription corepressor activity | 0.13355 |
| GO:0005179 | hormone activity | 0.199622105 |
1 In bold terms overrepresented also for genes hosting the highest average purity of their AARs; 2 FDR < 20%.
Percentage of Explained Variance of the Number of Aminoacid Repeats
| Factor | Pr(>F) | Var. (%) |
|---|---|---|
| <2.20E-16 | 5.869336006 | |
| P. length | <2.20E-16 | 2.718369933 |
| <2.20E-16 | 1.965991088 | |
| <2.20E-16 | 1.544242393 | |
| GC context | <2.20E-16 | 0.754548334 |
| <2.20E-16 | 0.597911216 | |
| <2.20E-16 | 0.575348528 | |
| <2.20E-16 | 0.554521432 | |
| <2.20E-16 | 0.553219739 | |
| <2.20E-16 | 0.547145169 | |
| <2.20E-16 | 0.488135064 | |
| 2.33E-12 | 0.348853859 | |
| 3.01E-09 | 0.249491255 | |
| 1.70E-07 | 0.193952332 | |
| 1.25E-06 | 0.166616768 | |
| 3.29E-06 | 0.153165936 | |
| 3.63E-06 | 0.151864242 | |
| 6.19E-06 | 0.144921877 | |
| 0.0004664 | 0.086779567 | |
| 0.0054142 | 0.054671127 | |
| ω | 0.0134962 | 0.043389783 |
| RNA polymerase II transcription factor activity | 0.0240022 | 0.03601352 |
| helicase activity | 0.1667501 | 0.013450833 |
| zinc ion binding | 0.198911 | 0.011715242 |
| transcription activator activity | 0.4614908 | 0.003905081 |
In italics GO Terms that remain significant after Bonferroni Correction. In Bold functions enriched in pure AARs.