| Literature DB >> 29719616 |
Hafiz Ishfaq Ahmad1, Muhammad Jamil Ahmad1, Muhammad Muzammal Adeel2, Akhtar Rasool Asif3, Xiaoyong Du1,2.
Abstract
The rapid evolution of reproductive proteins might be driven by positive Darwinian selection. The bone morphogenetic protein family is the largest within the transforming growth factor (TGF) superfamily. A little have been known about the molecular evolution of bone morphogenetic proteins exhibiting potential role in mammalian reproduction. In this study we investigated mammalian bone morphogenetic proteins using maximum likelihood approaches of codon substitutions to identify positive Darwinian selection in various species. The proportion of positively selected sites was tested by different likelihood models for individual codon, and M8 were found to be the best model. The percentage of positively elected sites under M8 are 2.20% with ω = 1.089 for BMP2, 1.6% with ω = 1.61 for BMP 4 0.53% for BMP15 with ω = 1.56 and 0.78% for GDF9 with ω = 1.93. The percentage of estimated selection sites under M8 is strong statistical confirmation that divergence of bone morphogenetic proteins is driven by Darwinian selection. For the proteins, model M8 was found significant for all proteins with ω > 1. To further test positive selection on particular amino acids, the evolutionary conservation of amino acid were measured based on phylogenetic linkage among sequences. For exploring the impact of these somatic substitution mutations in the selection region on human cancer, we identified one pathogenic mutation in human BMP4 and one in BMP15, possibly causing prostate cancer and six neutral mutations in BMPs. The comprehensive map of selection results allows the researchers to perform systematic approaches to detect the evolutionary footprints of selection on specific gene in specific species.Entities:
Keywords: BMPs; evolution; maximum likelihood; nucleotide substitutions; selection
Year: 2018 PMID: 29719616 PMCID: PMC5915083 DOI: 10.18632/oncotarget.24240
Source DB: PubMed Journal: Oncotarget ISSN: 1949-2553
Model parameter estimates, dN/dS ratios, log likelihood values and test statistics for PAML site models of positive selection in mammalian bone morphogenetic proteins
| Gene | n | Lc | S | dN/dS | Model | Parameter estimates | 2ΔlM2 vs. M1 | 2ΔlM8 vs. M7 | Positively selected sites |
|---|---|---|---|---|---|---|---|---|---|
| BMP2 | 39 | 398 | 3.5 | 0.08 | M1 | P1 = 0.92674 P2 = 0.07326 | 0 | 8.14* | 37, 38, 120, 126, 162, 183, 190, 239 |
| ω1 = 0.04427 ω2 = 1.00000 | |||||||||
| M2 | P1 = 0.92674 P2 = 0.05049 P3 = 0.02277 | ||||||||
| ω1 = 0.04427 ω2 = 1.00000 ω3 = 1.00000 | |||||||||
| M7 | p = 0.16499 q = 1.54264 | ||||||||
| M8 | P0 = 0.97794 p = 0.21124 q = 2.75784 | ||||||||
| GDF9 | 33 | 457 | 8.1 | 0.31 | M1 | P1 = 0.59348, P2 = 0.40652 | 0.48 | 2.59 | 30, 186, |
| ω1 = 0.12471, ω2 = 1.00000 | |||||||||
| M2 | P1 = 0. 59353, P2 = 0. 33608, P3 = 0. 07040 | ||||||||
| ω1 = 0. 12473, ω2 = 1.00000, ω3 = 1.00000 | |||||||||
| M7 | p = 0. 48463 q = 0. 91798 | ||||||||
| M8 | P0 = 0. 99219 p = 0. 49979 q = 0. 97580 | ||||||||
| BMP4 | 29 | 570 | 4.3 | 0.09 | M1 | P1 = 0.93285, P2 = 0.06715 | 3.41 | 26.91** | |
| ω = 0.05872, ω1 = 1.00000 | |||||||||
| M2 | P1 = 0.93307, P2 = 0.06381, P3 = 0.00312 | ||||||||
| M7 | p = 0.23270 q = 1.77764 | ||||||||
| M8 | P0 = 0.98531 p = 0.31720 q = 3.28217 | ||||||||
| BMP15 | 86 | 434 | 13.5 | 0.41 | M1 | P1 = 0.54292 P2 = 0.45708 | 8.4* | 17.69** | 31, 37, 89, |
| ω1 = 0.17179 ω1 = 1.00000 | 335 | ||||||||
| M2 | P1 = 0.53569 P2 = 0.44200 P3 = 0.02231 | ||||||||
| M7 | p = 0.60414 q = 0.76298 | ||||||||
| M8 | P0 = 0.94697 p = 0.66983 q = 0.97138 | ||||||||
| P1 = 0.21259 ω1 = 1.00000 |
The data have n sequences, each of Lc codons after alignment gaps are removed. S is the tree length, measured as the number of nucleotide substitutions per codon. The proportion of sites under positive selection (p1), or under selective constraint (p0), and parameters p and q for the beta distribution. Parameters indicating positive selection are in bold. p: significant at 5% level; pp: significant at 1% level. Sites potentially under positive selection identified under model M8 are listed according to the human sequence numbering. Positively selected sites with posterior probability 0.9 are underlined, 0.8–0.9 in bold, and 0.5– 0.7 in plain text. The test statistic 2Δl is compared to a χ2 distribution with 2 degrees of freedom, critical values 5.99, 9.21, and 13.82 at 5%, 1%, and 0.1% significance, respectively. **: significant at 1% level; *: significant at 5% level.
Positively selected sites under different PAML site models using bayes empirical bayes analysis
| Gene | Model | Codon | Amino Acid | Posterior Probability | Post mean ± SE for ω |
|---|---|---|---|---|---|
| BMP-2 | M8: selection, | 38 | S | 0.695 | 1.187 ± 0.532 |
| beta+ ω | 41 | P | 0.632 | 1.114 ± 0.554 | |
| 43 | S | 0.713 | 1.230 ± 0.472 | ||
| 118 | L | 0.597 | 1.079 ± 0.555 | ||
| 164 | N | 0.611 | 1.087 ± 0.569 | ||
| 236 | K | 0.607 | 1.115 ± 0.518 | ||
| GDF-9 | M8: selection, | 186 | S | 0.585 | 1.225 ± 0.335 |
| beta+ ω | 253 | L | 0.696 | 1.300 ± 0.309 | |
| 290 | G | 0.832 | 1.395 ± 0.238 | ||
| 302 | V | 0.938* | 1.463 ± 0.148 | ||
| BMP-4 | M8: selection, | 99 | I | 0.823 | 1.368 ± 0.311 |
| beta+ ω | 100 | H | 0.827 | 1.370 ± 0.317 | |
| 102 | T | 0.998** | 1.512 ± 0.123 | ||
| 173 | R | 0.506 | 1.075 ± 0.449 | ||
| 188 | A | 0.867 | 1.401 ± 0.309 | ||
| 190 | V | 0.986* | 1.503 ± 0.143 | ||
| 214 | T | 0.536 | 1.071 ± 0.488 | ||
| 264 | N | 0.515 | 1.073 ± 0.461 | ||
| BMP-15 | M8: selection, | 22 | R | 0.590 | 1.239 ± 0.368 |
| beta+ ω | 28 | G | 0.753 | 1.361 ± 0.332 | |
| 80 | S | 0.544 | 1.198 ± 0.392 | ||
| 104 | V | 0.846 | 1.426 ± 0.285 | ||
| 127 | L | 0.514 | 1.393 ± 0.236 | ||
| 145 | R | 0.764 | 1.369 ± 0.322 | ||
| 160 | P | 0.615 | 1.255 ± 0.376 | ||
| 168 | E | 0.703 | 1.315 ± 0.291 | ||
| 169 | G | 0.759 | 1.365 ± 0.329 | ||
| 220 | L | 0.556 | 1.212 ± 0.373 | ||
| 273 | S | 0.547 | 1.198 ± 0.397 | ||
| 323 | T | 0.717 | 1.334 ± 0.339 |
Bayes Empirical likelihood ratio test statistic for model M8: selection, beta+ ω, indicate posterior probability P > 95% (*) and P > 99% (**). For codon position, the amino acid number is given followed by an abbreviation.
Figure 1Amino acid residues identified likely to be under positive selection by bayes empirical bayes
The amino acid sites of ω > 1 under M8 model. The posterior probability of each site was calculated by BEB. Sites show positive selection at different posterior probabilities.
Figure 2Location of positively selected amino acid sites identified BMP2, BMP4, BMP15 and GDF9 genes
Three dimensional structure prediction of BMPs and GDF9 was carried out by using Ab-initio modeling approach. Primary sequences of human BMP2 (ACV32596.1), BMP4 (AAH20546.1), BMP15 (AAI17265.1) and GDF9 (AAH96229.1) were subjected to I-TESSAR to predict suitable structures. Structure validation of all predicted models was done by MolProbity server. To test the steric hindrance of amino acid residues Ramachandran values were calculted by using Ramachandran Plot2.0 tool. UCSF Chimera was applied for visualization and geometry optimization of predicted proteins. All the residues identified as under selection fall in the domain containing the ligand binding site. The sites which fall in the region identified as the ligand binding site and another cluster in a region immediately following the signal sequence.
Prediction of pathogenic point amino acid substitutions mutation was estimated from the usage of functional analysis through hidden Markov model (FATHMM)
| Gene | Codon | Snp Id | Tissue Distribution | FATHMM prediction: (Functional Analysis through Hidden Markov Models) |
|---|---|---|---|---|
| BMP-2 | 38 | COSM1029277 | Endometrium(1)/Prostate(1) p.S38S | Neutral (score 0.15) |
| 41 | ||||
| 43 | ||||
| 118 | ||||
| 164 | ||||
| 236 | ||||
| GDF-9 | 186 | |||
| 253 | ||||
| 290 | ||||
| 302 | ||||
| BMP-4 | 99 | |||
| 100 | XXX | Lung p.H100Y | Pathogenic (score 0.98) | |
| 102 | ||||
| 173 | ||||
| 188 | ||||
| 190 | ||||
| 214 | XXX | Hematopoietic and lymphoid tissue(1)/Prostate(1) p.T214T | Pathogenic (score 0.84) | |
| 264 | ||||
| BMP-15 | 22 | |||
| 28 | ||||
| 80 | ||||
| 104 | COSM4649428 | Large intestine(1) p.V104M | Neutral (score 0.03) | |
| 104 | COSM6187212 | Lung(1) p.V104A | Neutral (score 0.03) | |
| 127 | ||||
| 145 | ||||
| 160 | ||||
| 168 | COSM385794 | Lung(1) p.E168K | Neutral (score 0.10) | |
| 169 | COSM3562207 | Skin(1) p.G169R | Neutral (score 0.01) | |
| 220 | ||||
| 273 | ||||
| 323 | COSM309487 | Lung(2) p.T323T | Neutral (score 0.01) |