| Literature DB >> 34450622 |
Miguel Arenas1,2,3.
Abstract
MOTIVATION: The evolutionary processes of mutation and recombination, upon which selection operates, are fundamental to understand the observed molecular diversity. Unlike nucleotide sequences, the estimation of the recombination rate in protein sequences has been little explored, neither implemented in evolutionary frameworks, despite protein sequencing methods are largely used.Entities:
Year: 2021 PMID: 34450622 PMCID: PMC8696103 DOI: 10.1093/bioinformatics/btab617
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Accuracy of ProteinEvolverABC in the estimation of recombination and substitution rates under different evolutionary scenarios and based on ABC with 50 000 simulations. For each studied combination of ρ and θ (evolutionary scenario based on 100 simulations) the figure shows the estimates of ρ (above) and θ (below). The black bars indicate the true value. Clear and dark grey bars correspond to the mode of the estimated posterior distributions (using the rejection and multiple linear regression approaches, respectively, both based on 50 000 simulations) and error bars indicate the 95% confidence interval
Fig. 2.Accuracy of ProteinEvolverABC in the estimation of recombination and substitution rates present in coding data. For each studied combination of ρ and θ (evolutionary scenario based on 100 simulations) the figure shows the estimates of ρ (above) and θ (below). The black bars indicate the true value (recombination and substitution rates present in coding data). Clear and dark grey bars correspond to the mode of the estimated posterior distributions (using the rejection and multiple linear regression approaches, respectively, both based on 50 000 simulations) and error bars indicate the 95% confidence interval
Recombination and substitution rates estimated with ProteinEvolverABC for the studied protein families
| Description of the protein family | PFAM code, number of sequences, sequence length | Mode | Mean | Median | 97.5% HPDI |
|---|---|---|---|---|---|
| Coronavirus small envelope protein E | PF02723, 27, 82 | (ρ) 48.68 | (ρ) 53.95 | (ρ) 48.79 | (ρ) 47.64–119.00 |
| ( | ( | ( | ( | ||
| Coronavirus 2′-O-methyltransferase | PF06460, 5, 299 | ( | ( | ( | ( |
| ( | ( | ( | ( | ||
| Coronavirus non-structural protein NS12.7 | PF04753, 9, 109 | ( | ( | ( | ( |
| ( | ( | ( | ( | ||
| Coronavirus replicase NSP7 | PF08716, 5, 83 | ( | ( | ( | ( |
| ( | ( | ( | ( | ||
| Coronavirus replicase NSP8 | PF08717, 5, 198 | ( | ( | ( | ( |
| ( | ( | ( | ( | ||
| Coronavirus RNA synthesis protein NSP10 | PF09401, 6, 122 | ( | ( | ( | ( |
| ( | ( | ( | ( | ||
| Betacoronavirus viroporin | PF11289, 5, 273 | ( | ( | ( | ( |
| ( | ( | ( | ( | ||
| Aspartate protease | PF09668, 10, 124 | ( | ( | ( | ( |
| ( | ( | ( | ( |
Note: The first column provides information about the protein family and the second column includes the PFAM code, number of sequences and sequence length in amino acids, respectively. For each parameter (recombination rate ρ or substitution rate θ), the table presents the mode, mean, median and 97.5% HPDI (highest posterior density interval) of the estimated posterior distribution. The full posterior distributions are shown in Supplementary Figures S12–S19, Supplementary Material.