| Literature DB >> 22701602 |
Jan Freudenberg1, Peter K Gregersen, Yun Freudenberg-Hua.
Abstract
To measure the strength of natural selection that acts upon single nucleotide variants (SNVs) in a set of human genes, we calculate the ratio between nonsynonymous SNVs (nsSNVs) per nonsynonymous site and synonymous SNVs (sSNVs) per synonymous site. We transform this ratio with a respective factor f that corrects for the bias of synonymous sites towards transitions in the genetic code and different mutation rates for transitions and transversions. This method approximates the relative density of nsSNVs (rdnsv) in comparison with the neutral expectation as inferred from the density of sSNVs. Using SNVs from a diploid genome and 200 exomes, we apply our method to immune system genes (ISGs), nervous system genes (NSGs), randomly sampled genes (RSGs), and gene ontology annotated genes. The estimate of rdnsv in an individual exome is around 20% for NSGs and 30-40% for ISGs and RSGs. This smaller rdnsv of NSGs indicates overall stronger purifying selection. To quantify the relative shift of nsSNVs towards rare variants, we next fit a linear regression model to the estimates of rdnsv over different SNV allele frequency bins. The obtained regression models show a negative slope for NSGs, ISGs and RSGs, supporting an influence of purifying selection on the frequency spectrum of segregating nsSNVs. The y-intercept of the model predicts rdnsv for an allele frequency close to 0. This parameter can be interpreted as the proportion of nonsynonymous sites where mutations are tolerated to segregate with an allele frequency notably greater than 0 in the population, given the performed normalization of the observed nsSNV to sSNV ratio. A smaller y-intercept is displayed by NSGs, indicating more nonsynonymous sites under strong negative selection. This predicts more monogenically inherited or de-novo mutation diseases that affect the nervous system.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22701602 PMCID: PMC3368947 DOI: 10.1371/journal.pone.0038087
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Relative density of nsSNVs (rdnsv) in different gene sets as estimated with different SNV datasets.
Nervous system genes (NSG, light grey) show a smaller rdnsv than immune system genes (ISG, medium grey) or randomly sampled genes (RSG, dark grey) in a European diploid genome sequence (A, C) and a pooled set of 200 European exome sequences (B, D). The greater rdnsv in the pooled 200 exomes than the individual genome indicates an enrichment of nsSNVs among rare SNVs.
Relative density of nonsynonymous variants (rdnsv).
| RSG | expression ISG | expression NSG | keyword ISG | keyword NSG | |
|
| 1496 | 1496 | 1496 | 1040 | 1523 |
|
| |||||
|
| 0.38 [0.31–0.39] | 0.31 | 0.20 | 0.41 | 0.21 |
|
| 0.42 [0.39–0.45] | 0.39 | 0.33 | 0.47 | 0.31 |
|
| |||||
|
| 0.58 [0.53–0.65] | 0.58 | 0.45 | 0.58 | 0.43 |
|
| 0.25 [0.21–0.29] | 0.21 | 0.17 | 0.37 | 0.15 |
Candidate genes for the nervous system (NSG) and the immune system (ISG) are defined by tissue specific expression or keyword search and further compared with a set of randomly sampled genes (RSG). A) Overall rdnsv estimates for a diploid genome and 200 exome sequences, which reflect the density of nonsynonymous variants on a mixture of SNVs that range from rare to common in their population frequency. B) SNVs from the 200 exome dataset are additionally stratified by their derived allele frequency and a regression model is fitted to the values of rdnsv. The predicted value for the allele frequency of 0 is referred to as rdnsv, whereas the predicted value for the allele frequency of 1 is referred to as rdnsv. The interval in brackets shows the 2.5% and 97.5% quantiles from 10.000 random draws of genes.
Figure 2Distribution of rdnsv estimates over 200 individual exomes.
A) expression-based candidate genes and B) keyword-based candidate genes. The value of rdnsv is estimated separately for each of the 200 exomes and found consistently smaller for NSGs (light grey) are than ISGs (medium grey). In addition, smaller estimates of rdnsv for expression-based ISGs than keyword-based ISGs are seen. No difference exists between expression-based NSGs and keyword-based NSGs.
Figure 3Estimates of rdnsv over different allele frequency bins.
The estimates of rdnsv decrease with SNV allele frequency in all gene categories. The slope of the fitted regression model can be interpreted as a measure for the influence of purifying selection on segregating nsSNVs. The y-intercept (rdnsv) can be interpreted as the proportion of nsSites where mutations are tolerated to segregate with an allele frequency notably greater than 0. A) Expression-based NSGs (circles), ISGs (triangles) or RSGs (crosses). The fitted models are rdnsv(NSG) = 0.45−0.061×; rdnsv(ISG) = 0.58−0.079× and rdnsv(RSG) = 0.58−0.071×B) Keyword based NSGs (blue) and ISGs (red). The fitted models are rdnsv(NSG) = 0.43−0.061×; rdnsv(ISG) = 0.58−0.045×.
Estimates of rdnsv in the 200 exomes in sets of genes as defined by ontology (GO) annotations.
| Gene Ontology (GO) category | genes | nsSNVs | sSNVs | mean rdnsv | sd rdnsv | rdnsv0 | rdnsv1 |
|
| |||||||
| GO:0007169:transmembrane_receptor_tyrosine_kinase_signaling | 313 | 365 | 636 | 0.121 | 0.011 | 0.385 | 0.072 |
| GO:0007167:enzyme_linked_receptor_protein_signaling_pathway | 394 | 466 | 771 | 0.128 | 0.01 | 0.397 | 0.095 |
| GO:0006935:chemotaxis | 337 | 405 | 669 | 0.136 | 0.01 | 0.401 | 0.094 |
| GO:0048812:neuron_projection_morphogenesis | 326 | 445 | 703 | 0.139 | 0.011 | 0.396 | 0.11 |
| GO:0007409:axonogenesis | 303 | 420 | 664 | 0.141 | 0.012 | 0.397 | 0.11 |
| GO:0031175:neuron_projection_development | 368 | 510 | 795 | 0.143 | 0.011 | 0.4 | 0.119 |
| GO:0048667:cell_morphogenesis_involved_in_neuron_differentiation | 322 | 446 | 694 | 0.146 | 0.012 | 0.397 | 0.12 |
| GO:0043005:neuron_projection | 411 | 587 | 831 | 0.15 | 0.01 | 0.484 | 0.1 |
| GO:0045202:synapse | 326 | 477 | 633 | 0.151 | 0.013 | 0.544 | 0.089 |
| GO:0048666:neuron_development | 436 | 578 | 890 | 0.153 | 0.011 | 0.399 | 0.126 |
|
| |||||||
| GO:0005815:microtubule_organizing_center | 277 | 542 | 484 | 0.344 | 0.025 | 0.5 | 0.399 |
| GO:0005576:extracellular_region | 1322 | 2281 | 1948 | 0.345 | 0.012 | 0.644 | 0.341 |
| GO:0006952:defense_response | 525 | 764 | 691 | 0.351 | 0.022 | 0.556 | 0.345 |
| GO:0006955:immune_response | 500 | 696 | 625 | 0.357 | 0.023 | 0.533 | 0.393 |
| GO:0004871:signal_transducer_activity | 1170 | 2350 | 1994 | 0.4 | 0.016 | 0.586 | 0.369 |
| GO:0004872:receptor_activity | 1233 | 2710 | 2218 | 0.406 | 0.015 | 0.615 | 0.368 |
| GO:0038023:signaling_receptor_activity | 913 | 2046 | 1583 | 0.456 | 0.019 | 0.624 | 0.43 |
| GO:0004888:transmembrane_signaling_receptor_activity | 844 | 1991 | 1489 | 0.466 | 0.02 | 0.654 | 0.437 |
| GO:0004930:G-protein_coupled_receptor_activity | 604 | 1469 | 890 | 0.603 | 0.029 | 0.741 | 0.573 |
| GO:0004984:olfactory_receptor_activity | 310 | 1024 | 430 | 0.896 | 0.048 | 1.011 | 0.993 |
The 10 GO-categories with the smallest (A) and the greatest (B) mean values of rdnsv are shown. The full list of all GO-categories with at least 1000 coding SNVs is given in Table S3. For each category the number of annotated genes, nonsynonymous and synonymous SNVs, the mean and standard deviation of individual rdnsv estimates across the 200 exomes, as well as the values of rdnsv and rdnsv, are shown.