| Literature DB >> 22807683 |
Ben Murrell1, Joel O Wertheim, Sasha Moola, Thomas Weighill, Konrad Scheffler, Sergei L Kosakovsky Pond.
Abstract
The imprint of natural selection on protein coding genes is often difficult to identify because selection is frequently transient or episodic, i.e. it affects only a subset of lineages. Existing computational techniques, which are designed to identify sites subject to pervasive selection, may fail to recognize sites where selection is episodic: a large proportion of positively selected sites. We present a mixed effects model of evolution (MEME) that is capable of identifying instances of both episodic and pervasive positive selection at the level of an individual site. Using empirical and simulated data, we demonstrate the superior performance of MEME over older models under a broad range of scenarios. We find that episodic selection is widespread and conclude that the number of sites experiencing positive selection may have been vastly underestimated.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22807683 PMCID: PMC3395634 DOI: 10.1371/journal.pgen.1002764
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Figure 1The standard random effects approach and samples.
A) The standard random effects approach, in which the rates vary randomly over sites but are constant over branches. Different values of are showed in different colors. B) Samples from our new random effects approach [20], used by MEME, in which the rate on each branch is drawn independently of the rate on any other branch. All possible assignments of rates to sites are considered.
Comparative performance of FEL and MEME on simulated data where varies along phylogenetic lineages.
| Japanese encephalitis virus | Vertebrate rhodopsin | Camelid VHH | ||||||||
| ω |
| ω | ω | ω | ω | ω | ω | ω | ω | ω |
| 0 | 0.1 | 0.00 | 0.01 | 0.03 | 0.00 | 0.00 | 0.02 | 0.00 | 0.00 | 0.04 |
| 0 | 0.25 | 0.01 | 0.06 | 0.12 | 0.01 | 0.04 | 0.15 | 0.00 | 0.14 | 0.56 |
| 0 | 0.5 | 0.06 | 0.19 | 0.34 | 0.09 | 0.34 | 0.54 | 0.23 | 0.85 | 0.96 |
| 0.2 | 0.1 | 0.00 | 0.01 | 0.02 | 0.00 | 0.01 | 0.02 | 0.00 | 0.01 | 0.04 |
| 0.2 | 0.25 | 0.02 | 0.07 | 0.14 | 0.03 | 0.09 | 0.17 | 0.01 | 0.27 | 0.62 |
| 0.2 | 0.5 | 0.05 | 0.18 | 0.36 | 0.13 | 0.36 | 0.55 | 0.30 | 0.84 | 0.90 |
| 0.4 | 0.1 | 0.00 | 0.01 | 0.03 | 0.01 | 0.02 | 0.03 | 0.01 | 0.04 | 0.10 |
| 0.4 | 0.25 | 0.02 | 0.09 | 0.15 | 0.04 | 0.09 | 0.21 | 0.03 | 0.33 | 0.63 |
| 0.4 | 0.5 | 0.07 | 0.17 | 0.33 | 0.17 | 0.39 | 0.51 | 0.40 | 0.82 | 0.96 |
Power to detect sites under selection () are reported for FEL and MEME (in boldface) for each unique combination of negative selection (), positive selection (), and proportion of branches under positive selection () parameters.
Comparative performance of MEME and FEL on 16 empirical alignments (see Results and Text S1 for an extended discussion of each individual case).
| Data set | N | S | Mean | Classes of sites detected at | Mean | Sites where | ||||
| Div. | M+F0 | M+F+ | M+F− | M−F+ | M+F0− | M+F+ | MEME>FEL at | |||
| Abalone sperm lysin | 25 | 134 | 0.43 | 17 | 9 | 0 | 1 (0.04/0.05) | 0.17 | 0.35 | 19 |
| Camelid VHH | 212 | 96 | 0.27 | 22 | 6 | 2 | 0 (n/a) | 0.11 | 0.50 | 26 |
| Diatom SIT | 97 | 300 | 0.54 | 12 | 0 | 36 | 0 (n/a) | 0.05 | n/a | 82 |
| Drosophila | 23 | 254 | 0.26 | 9 | 1 | 0 | 0 (n/a) | 0.09 | 0.19 | 7 |
| Echinoderm H3 | 37 | 111 | 0.33 | 0 | 0 | 1 | 0 (n/a) | 0.02 | n/a | 3 |
| Flavivirus NS5 | 18 | 342 | 0.48 | 3 | 0 | 1 | 0 (n/a) | 0.16 | n/a | 7 |
| Hepatitis D virus Ag | 33 | 196 | 0.29 | 13 | 7 | 0 | 1 (0.05/0.07) | 0.08 | 0.37 | 10 |
| HIV-1 | 476 | 335 | 0.08 | 12 | 10 | 7 | 0 (n/a) | 0.04 | 0.69 | 27 |
| HIV-1 | 29 | 192 | 0.08 | 5 | 2 | 0 | 7 (0.04/0.06) | 0.11 | 0.59 | 3 |
| IAV H3N2 HA | 349 | 329 | 0.04 | 7 | 11 | 2 | 3 (0.04/0.06) | 0.04 | 0.73 | 8 |
| JEV | 23 | 500 | 0.13 | 2 | 1 | 1 | 0 (n/a) | 0.11 | 1.00 | 3 |
| Mamallian | 17 | 144 | 0.38 | 10 | 2 | 0 | 0 (n/a) | 0.20 | 0.31 | 11 |
| Primate | 21 | 510 | 0.36 | 3 | 0 | 1 | 0 (n/a) | 0.18 | n/a | 4 |
| Salmonella | 42 | 353 | 0.04 | 1 | 0 | 0 | 0 (n/a) | 0.02 | n/a | 0 |
| Vertebrate rhodopsin | 38 | 330 | 0.34 | 13 | 1 | 5 | 0 (n/a) | 0.11 | 0.74 | 39 |
| West Nile virus NS3 | 19 | 619 | 0.13 | 1 | 1 | 0 | 0 (n/a) | 0.04 | 1.00 | 2 |
| Total/Mean | 130 | 51 | 56 | 12 | 0.10 | 0.59 | ||||
() reports the number of sequences (codons) in the alignment. () refers sites found by MEME to be positively (negatively) selected (). () denote sites found by FEL to be positively (negatively) selected (). references sites that are classified as neutrally evolving by FEL. Values in parentheses for the column show the mean p-values for FEL and MEME on this set of sites, respectively. Values reported in the rightmost column count the number of sites where MEME fits significantly better than FEL, based on a 2-degrees of freedom LRT (). Abbreviations: IAV = Influenza A virus, JEV = Japanese encephalitis virus.
Figure 2Individual sites of the vertebrate rhodopsin alignment used to illustrate similarities and differences between FEL and MEME.
Branches that have experienced substitutions, based on most likely joint maximum likelihood ancestral reconstructions at a given site, are labeled as count of synonymous substitutions:count of non-synonymous substitutions. The thickness of each branch is proportional to the minimal number of single nucleotide substitutions mapped to the branch. Branches are colored according to the magnitude of the empirical Bayes factor (EBF) for the event of positive selection: red – evidence for positive selection, teal – evidence for neutral evolution or negative selection, black –Ê no information. See Methods for more detail. All three sites were identified as experiencing positive diversifying selection by MEME. FEL reported site 54 as positively selected, site 273 as neutral, and site 210 as negatively selected.