| Literature DB >> 24916662 |
Daniel R Schrider1, Andrew D Kern2.
Abstract
Identifying the complete set of functional elements within the human genome would be a windfall for multiple areas of biological research including medicine, molecular biology, and evolution. Complete knowledge of function would aid in the prioritization of loci when searching for the genetic bases of disease or adaptive phenotypes. Because mutations that disrupt function are disfavored by natural selection, purifying selection leaves a detectable signature within functional elements; accordingly, this signal has been exploited for over a decade through the use of genomic comparisons of distantly related species. While this is so, the functional complement of the genome changes extensively across time and between lineages; therefore, evidence of the current action of purifying selection in humans is essential. Because the removal of deleterious mutations by natural selection also reduces within-species genetic diversity within functional loci, dense population genetic data have the potential to reveal genomic elements that are currently functional. Here, we assess the potential of this approach by examining an ultradeep sample of human mitochondrial genomes (n = 16,411). We show that the high density of polymorphism in this data set precisely delineates regions experiencing purifying selection. Furthermore, we show that the number of segregating alleles at a site is strongly correlated with its divergence across species after accounting for known mutational biases in human mitochondrial DNA (ρ = 0.51; P < 2.2 × 10(-16)). These two measures track one another at a remarkably fine scale across many loci-a correlation that is purely the result of natural selection. Our results demonstrate that genetic variation has the potential to reveal with surprising precision which regions in the genome are currently performing important functions and likely to have deleterious fitness effects when mutated. As more complete human genomes are sequenced, similar power to reveal purifying selection may be achievable in the human nuclear genome.Entities:
Keywords: mitochondria; natural selection; population genetics
Mesh:
Substances:
Year: 2014 PMID: 24916662 PMCID: PMC4122919 DOI: 10.1093/gbe/evu116
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
FThe correlation between polymorphism and divergence in the human mitochondrial genome. The average number of alleles per base pair in 10-bp windows is shown in the x-axis and divergence as measured by the negated phyloP score is shown in the y-axis.
Gene-Specific Correlations Between SNP Density and Negative phyloP Score
| Gene Name | Gene Description | Gene Start (hg19) | Gene End (hg19) | Spearman’s | Spearman's | ||
|---|---|---|---|---|---|---|---|
| tRNA phenylalanine | 579 | 649 | −0.484 | 1.90 × 10−5 | −0.468948 | 3.71 × 10−5 | |
| 12S ribosomal RNA | 650 | 1603 | −0.429 | <2.20 × 10−16 | −0.4169046 | <2.20 × 10−16 | |
| tRNA valine | 1604 | 1672 | −0.336 | 0.004783 | −0.326022 | 0.006261 | |
| 16S ribosomal RNA | 1673 | 3230 | −0.443 | <2.20 × 10−16 | −0.4435798 | <2.20 × 10−16 | |
| tRNA leucine 1 | 3231 | 3305 | −0.301 | 0.008593 | −0.29837 | 0.00932 | |
| NADH dehydrogenase subunit 1 | 3308 | 4263 | −0.542 | <2.20 × 10−16 | −0.5149936 | <2.20 × 10−16 | |
| tRNA isoleucine | 4264 | 4332 | −0.269 | 0.02551 | −0.2523745 | 0.03643 | |
| tRNA glutamine | 4330 | 4401 | −0.463 | 4.19 × 10−5 | −0.476385 | 2.34 × 10−5 | |
| tRNA methionine | 4403 | 4470 | −0.291 | 0.01611 | −0.2880727 | 0.01721 | |
| NADH dehydrogenase subunit 2 | 4471 | 5512 | −0.516 | <2.20 × 10−16 | −0.461188 | <2.20 × 10−16 | |
| tRNA tryptophan | 5513 | 5580 | −0.509 | 9.42E × 10−6 | −0.4626812 | 7.11 × 10−5 | |
| tRNA alanine | 5588 | 5656 | −0.429 | 0.0002381 | −0.4412621 | 0.0001475 | |
| tRNA asparagine | 5658 | 5730 | −0.378 | 0.0009899 | −0.3765626 | 0.001025 | |
| tRNA cysteine | 5762 | 5827 | −0.596 | 1.26 × 10−7 | −0.5709837 | 5.55 × 10−7 | |
| tRNA tyrosine | 5827 | 5892 | −0.352 | 0.00371 | −0.3486535 | 0.004118 | |
| Cytochrome | 5905 | 7446 | −0.633 | <2.20 × 10−16 | −0.6186776 | <2.20 × 10−16 | |
| tRNA serine 1 | 7447 | 7515 | −0.627 | 8.31 × 10−9 | −0.6107229 | 2.51 × 10−8 | |
| tRNA aspartic acid | 7519 | 7586 | −0.365 | 0.002186 | −0.320253 | 0.007758 | |
| Cytochrome | 7587 | 8270 | −0.573 | <2.20 × 10−16 | −0.5291891 | <2.20 × 10−16 | |
| tRNA lysine | 8296 | 8365 | −0.331 | 0.005158 | −0.2809919 | 0.01846 | |
| ATP synthase F0 subunit 8 | 8367 | 8573 | −0.266 | 0.0001042 | −0.2637231 | 0.0001233 | |
| ATP synthase F0 subunit 6 | 8528 | 9208 | −0.389 | <2.20 × 10−16 | −0.3823366 | <2.20 × 10−16 | |
| Cytochrome | 9208 | 9991 | −0.538 | <2.20 × 10−16 | −0.5314281 | <2.20 × 10−16 | |
| tRNA glycine | 9992 | 10059 | −0.396 | 0.0008191 | −0.3404812 | 0.004497 | |
| NADH dehydrogenase subunit 3 | 10060 | 10405 | −0.510 | <2.20 × 10−16 | −0.5000739 | <2.20 × 10−16 | |
| tRNA arginine | 10406 | 10470 | −0.470 | 7.66 × 10−5 | −0.4415206 | 0.0002317 | |
| NADH dehydrogenase subunit 4L | 10471 | 10767 | −0.491 | <2.20 × 10−16 | −0.5133884 | <2.20 × 10−16 | |
| NADH dehydrogenase subunit 4 | 10761 | 12138 | −0.561 | <2.20 × 10−16 | −0.5419698 | <2.20 × 10−16 | |
| tRNA histidine | 12139 | 12207 | −0.277 | 0.02112 | −0.1786413 | 0.1419 | |
| tRNA serine 2 | 12208 | 12266 | −0.470 | 0.0001712 | −0.489146 | 8.45 × 10−5 | |
| tRNA leucine 2 | 12267 | 12337 | −0.374 | 0.001311 | −0.3543069 | 0.002434 | |
| NADH dehydrogenase subunit 5 | 12338 | 14149 | −0.532 | <2.20 × 10−16 | −0.5141183 | <2.20 × 10−16 | |
| NADH dehydrogenase subunit 6 | 14150 | 14674 | −0.500 | <2.20 × 10−16 | −0.4776221 | <2.20 × 10−16 | |
| tRNA glutamic acid | 14675 | 14743 | −0.347 | 0.003468 | −0.3476731 | 0.003421 | |
| Cytochrome | 14748 | 15888 | −0.551 | <2.20 × 10−16 | −0.5025513 | <2.20 × 10−16 | |
| tRNA threonine | 15889 | 15954 | −0.560 | 1.00 × 10−6 | −0.4354723 | 0.0002578 | |
| tRNA proline | 15957 | 16024 | −0.103 | 0.4044 | −0.1870294 | 0.1267 |
FThe correlation between polymorphism and divergence at nonsynonymous and synonymous sites. The number of alleles observed at each site is shown in the x-axis, and divergence (negative phyloP score) is shown in the y-axis at (A) second codon position (nonsynonymous) sites and (B) 4-fold degenerate (synonymous) sites. We added noise to the number of alleles to reveal the density of sites along the y-axis.
FThe probability of polymorphism versus the probability of unconstrained evolution across vertebrates. (A) The 5-bp sliding genomic windows showing SNP density (blue) and one minus the probability of conservation across vertebrates (red) according to phastCons (Siepel et al. 2005) across the phenylalanine tRNA gene. (B) The same plot for the tryptophan tRNA gene.
FComparison of conserved elements called from phylogenetic data and those called from population genetic data. This image form the UCSC Genome Browser shows positions 1–5,000 of the human mitochondrial genome. Conserved elements called from the polymorphism-based HMM (mitoPopCons) appear in blue, whereas phastCons elements obtained from a comparison of mammalian genomes appear in red. phastCons conservation probabilities are shown at the bottom in green. Gene locations are shown at the top.