| Literature DB >> 26833483 |
Erli Pang1, Xiaomei Wu2, Kui Lin3.
Abstract
Protein evolution plays an important role in the evolution of each genome. Because of their functional nature, in general, most of their parts or sites are differently constrained selectively, particularly by purifying selection. Most previous studies on protein evolution considered individual proteins in their entirety or compared protein-coding sequences with non-coding sequences. Less attention has been paid to the evolution of different parts within each protein of a given genome. To this end, based on PfamA annotation of all human proteins, each protein sequence can be split into two parts: domains or unassigned regions. Using this rationale, single nucleotide polymorphisms (SNPs) in protein-coding sequences from the 1000 Genomes Project were mapped according to two classifications: SNPs occurring within protein domains and those within unassigned regions. With these classifications, we found: the density of synonymous SNPs within domains is significantly greater than that of synonymous SNPs within unassigned regions; however, the density of non-synonymous SNPs shows the opposite pattern. We also found there are signatures of purifying selection on both the domain and unassigned regions. Furthermore, the selective strength on domains is significantly greater than that on unassigned regions. In addition, among all of the human protein sequences, there are 117 PfamA domains in which no SNPs are found. Our results highlight an important aspect of protein domains and may contribute to our understanding of protein evolution.Entities:
Keywords: Human genome; Natural selection; Protein domain; Protein-coding sequence; SNPs
Mesh:
Substances:
Year: 2016 PMID: 26833483 PMCID: PMC4875946 DOI: 10.1007/s00438-016-1170-7
Source DB: PubMed Journal: Mol Genet Genomics ISSN: 1617-4623 Impact factor: 3.291
Summary of polymorphisms and divergence
| Rare (MAF <0.5 %) | Low (0.5 % ≤ MAF ≤ 5 %) | Common (MAF >5 %) | |
|---|---|---|---|
| Polymorphism 19,909 genes | |||
| Non-synonymous SNPs | |||
| Domains | 101,551 | 13,172 | 6965 |
| Unassigned regions | 135,585 | 22,134 | 12,078 |
| Synonymous SNPs | |||
| Domains | 68,916 | 15,092 | 10,063 |
| Unassigned regions | 77,722 | 18,160 | 11,388 |
| Divergence (fixed) 15,649 genes | |||
| Non-synonymous changes | |||
| Domains | 10,153 | ||
| Unassigned regions | 21,810 | ||
| Synonymous changes | |||
| Domains | 18,988 | ||
| Unassigned regions | 23,626 | ||
Direction of selections for domain and unassigned regions
| Type of regions in 15,649 genes | Non-synonymous SNPs | Synonymous SNPs |
|---|---|---|
| Domain regions | ||
| Fixed divergence | 10,153 | 18,988 |
| Polymorphisms | 104,956 | 81,593 |
| Direction of selection | −0.21 | |
| Unassigned regions | ||
| Fixed divergence | 21,810 | 23,626 |
| Polymorphisms | 153,022 | 96,960 |
| Direction of selection | −0.13 | |
Direction of selection: D n/(D n + D s) − P n/(P n + P s)
Fig. 1Distribution of fitness of non-synonymous in domains and unassigned regions. Error bars denote SE around estimated proportions
Fig. 2Density of SNPs in domains and unassigned regions. a Density of non-synonymous and synonymous SNPs (Fisher’s exact test: ρ = 0.90, p < 2.2 × 10−16 and ρ = 1.14, p < 2.2 × 10−16, respectively). b Density of different MAF non-synonymous SNPs (Fisher’s exact test, ρ = 0.94, p < 2.2 × 10−16, ρ = 0.75, p < 2.2 × 10−16 and ρ = 0.73, p < 2.2 × 10−16, respectively). c Density of different MAF synonymous SNPs (Fisher’s exact test, ρ = 1.14, p < 2.2 × 10−16, ρ = 1.07, p < 3.27 × 10−10, and ρ = 1.14, p < 2.2 × 10−16, respectively)
Fig. 3Distribution of non-synonymous and synonymous substitution rates of fixed mutations in domains and unassigned regions
Annotation of domains without any variation
| Pfam Acc | Average length | Frequency of occurrences | Category IDa, category nameb |
|---|---|---|---|
| PF00220 | 9 | 2 | GO:0005185, neurohypophyseal hormone activity |
| PF00416 | 98.5 | 2 | GO:0003723, RNA binding |
| PF00714 | 138 | 1 | GO:0005133, interferon-gamma receptor binding |
| PF00833 | 122 | 2 | GO:0003735, structural constituent of ribosome |
| PF01192 | 53 | 3 | GO:0003899, DNA-directed RNA polymerase activity |
| PF01200 | 69 | 1 | GO:0003735, structural constituent of ribosome |
| PF01472 | 76.7 | 3 | GO:0003723, RNA binding |
| PF01648 | 113 | 3 | GO:0000287, magnesium ion binding |
| PF01918 | 65.5 | 4 | GO:0003676, nucleic acid binding |
| PF02045 | 57 | 2 | GO:0003700, sequence-specific DNA binding transcription factor activity |
| PF02229 | 56 | 6 | GO:0003677, DNA binding |
| PF02935 | 60.7 | 3 | GO:0004129, cytochrome-c oxidase activity |
| PF02938 | 97 | 1 | GO:0005524, ATP binding |
| PF03002 | 18 | 4 | GO:0005179, hormone activity |
| PF04272 | 52 | 1 | GO:0042030, ATPase inhibitor activity |
| PF04376 | 79 | 5 | GO:0004057, arginyltransferase activity |
| PF05366 | 31 | 3 | GO:0030234, enzyme regulator activity |
| PF05495 | 74 | 4 | GO:0008270, zinc ion binding |
| PF09282 | 26.3 | 3 | GO:0005515, protein binding |
| PF10576 | 17 | 3 | GO:0051539, 4 iron, 4 sulfur cluster binding |
| PF11411 | 36 | 3 | GO:0003910, DNA ligase (ATP) activity |
| PF11547 | 53 | 3 | GO:0043130, ubiquitin binding |
| PF11803 | 46 | 6 | GO:0048040, UDP-glucuronate decarboxylase activity |
| PF12125 | 40 | 9 | GO:0046983, protein dimerization activity |
| PF13014 | 38 | 1 | GO:0003723, RNA binding |
aId of Gene Ontology “molecular function” (from Pfam27.0)
bName of Gene Ontology “molecular function”
Spearman’s ρ and p between the number of different MAF SNPs and the length of proteins
| SNP categories | Spearman’s | Spearman’s | Spearman’s |
|---|---|---|---|
| Non-synonymous SNPs | 0.83, <2.2 × 10−16 | 0.65, <2.2 × 10−16 | 0.43, <2.2 × 10−16 |
| Synonymous SNPs | 0.82, <2.2 × 10−16 | 0.70, <2.2 × 10−16 | 0.53, <2.2 × 10−16 |
Fig. 4Distribution of domain lengths