| Literature DB >> 27052568 |
François Blanquart1, Thomas Bataillon2.
Abstract
The fitness landscape defines the relationship between genotypes and fitness in a given environment and underlies fundamental quantities such as the distribution of selection coefficient and the magnitude and type of epistasis. A better understanding of variation in landscape structure across species and environments is thus necessary to understand and predict how populations will adapt. An increasing number of experiments investigate the properties of fitness landscapes by identifying mutations, constructing genotypes with combinations of these mutations, and measuring the fitness of these genotypes. Yet these empirical landscapes represent a very small sample of the vast space of all possible genotypes, and this sample is often biased by the protocol used to identify mutations. Here we develop a rigorous statistical framework based on Approximate Bayesian Computation to address these concerns and use this flexible framework to fit a broad class of phenotypic fitness models (including Fisher's model) to 26 empirical landscapes representing nine diverse biological systems. Despite uncertainty owing to the small size of most published empirical landscapes, the inferred landscapes have similar structure in similar biological systems. Surprisingly, goodness-of-fit tests reveal that this class of phenotypic models, which has been successful so far in interpreting experimental data, is a plausible in only three of nine biological systems. More precisely, although Fisher's model was able to explain several statistical properties of the landscapes-including the mean and SD of selection and epistasis coefficients-it was often unable to explain the full structure of fitness landscapes.Entities:
Keywords: Fisher’s geometric model; adaptation; antibiotic resistance; epistasis; fitness landscape; mutational network
Mesh:
Year: 2016 PMID: 27052568 PMCID: PMC4896198 DOI: 10.1534/genetics.115.182691
Source DB: PubMed Journal: Genetics ISSN: 0016-6731 Impact factor: 4.562
Summary of data sets
| Name | Species | Environment | Mutation type | Mutation and genotype number | Fitness measure | Measurement error | Note | References |
|---|---|---|---|---|---|---|---|---|
| A1, A2 | Minimal medium | Random (phenotypic markers) | Two data sets of 5 mutations, 25 genotypes | Rate of increase in colony radius per unit time, relative to the ancestor | We calculated a single standard error using the two replicate measurements | The two sets of five mutations are not independent | ||
| B1–B10 | Standard medium (on plates) | Random (gene deletions) | 1711 + 3885 mutations, 5.4 million genotypes | Increase in colony size per unit time relative to the ancestor | Standard error reported for each fitness measure | Analysis done on 10 independent subsets of 20 mutations. 1 + 20 + 100 genotypes corresponding to single and double mutants | ||
| C1, C2 | Laboratory environment. | Random (phenotypic markers) | Five mutations, 25 genotypes | Productivity (product of fecundity and survival) C1 and mating success C2 | We roughly estimated standard error for the productivity measure ( | Two genotypes with 0 mating success removed | ||
| D | Single-stranded DNA bacteriophage ID11 | Independently selected (experimental evolution) | Nine mutations, nine single mutants, 18 double mutants | Log2 increase in phage population per hour | Standard error reported for each fitness measure | — | ||
| E1, E2 | Vesicular stomatitis virus | Baby hamster kidney (BHK21) cells (host) | Independently selected (E1, found in natural isolates) and random (E2) | Six mutations, six single mutants, 15 double mutants (E1) | Growth rate relative to ancestor | Standard error reported for each fitness measure | — | |
| 28 mutations, 76 double mutants (E2) | ||||||||
| F | New, low-glucose environment | Coselected in experimental evolution | Five mutations, 25 genotypes | Growth rate relative to ancestor | 95% CIs reported | — | ||
| G | Methanol environment | Coselected in experimental evolution | Four mutations, 24 genotypes | Growth rate relative to ancestor | Standard errors not reported, but we estimated them using standard errors reported for another set of genotypes ( | Order of fixation of mutations not known; mutations were assumed to fix from the largest-effect mutation to the smallest-effect mutation | ( | |
| H1, H2 | Cefotaxime (β-lactam antibiotic) | Independently selected (found in natural isolates); mutations chosen because together increase resistance to cefotaxime 100,000-fold | Five mutations, 25 genotypes | Cefotaxime resistance measured as MIC | For H1, we calculated single errors using the three replicate measurements; for H2, standard errors were reported | H2 is the same data set as H1, with MIC remeasured on same genotypes; resistance to piperacillin + clavulanic acid was also measured but not used here | ||
| H3, H4 | Cefotaxime (β-lactam antibiotic) | Independently selected (found in natural isolates) | four mutations, 24 genotypes; two independent data sets | Cefotaxime resistance measured as IC99.99 (highly correlated with MIC) | Standard errors not reported, but we used the average standard error of H2 | One data set with four mutations of smallest effect H3, one with four mutations of largest effect H4 among 48 mutations | ||
| I1, I2, I3 | Pyrimethamine (antimalarial drug) | Independently selected (found in clinical isolates) | four mutations, 24 genotypes; I2 includes the same four mutations as I1 plus two additional mutations affecting another locus | Pyrimethamine resistance measured as IC50 in μg/ml (I1) and M (mol/L) (I2); growth rate of the transformed strain at concentration 1 μmol/L (I3) | Standard errors for I1 and I2 reported | In I2 ( | ||
Figure 1A diversity of genotypic landscapes can be generated by Fisher’s fitness landscape model. Each row shows an example of Fisher’s landscape with two phenotypes (n = 2), with three mutations depicted as arrows in the phenotypic space (left) and the empirical landscape resulting from these mutations in combination (i.e., eight genotypes) (right). Blue edges denote mutations that are beneficial in their background, while red edges denote deleterious mutations. (Top row) A sharp landscape with Q = 0.5 and where the three mutations are random mutations. (Center row) Fisher’s classic landscape with Q = 2 and three coselected mutations. (Bottom row) Q = 4 and three independently selected mutations. Fitness of the ancestral strain is set to 1 without loss of generality.
Figure 2Accuracy of inference for different methods and different data sets. The median posterior distribution for the rejection algorithm is shown as a function of the true parameter for each of the 500 cross-validation data sets (gray points) when the set of genotypes is composed of all combinations of four independently selected mutations, chosen as the four largest-effect mutations among a set of 48 mutations, as in landscape H4 (Schenk ). Perfect inference corresponds to all points on the y = x line. For clarity, we represent this cloud of points with a local nonlinear fit (gray line). The equivalent linear fit for the neural-network algorithm is shown as a gray dashed line. The plain and dashed blue line similarly show the local linear fit for rejection and neural-network algorithms for the data set composed of 20 random mutations and single and double mutants only (as in landscapes B1–B10). The neural-network algorithm generally improves inference compared to the rejection algorithm. The data set composed of all combinations of four selected mutations performs better than the one composed of 20 random mutations and single and double mutants.
Expected prediction error under various experimental designs
| rej | reg | nn | rej | reg | nn | rej | reg | nn | rej | reg | nn | |||
| 5 mutations, 25 genotypes | R | 0.85 | 0.68 | 0.64 | 0.57 | 0.39 | 0.37 | 0.44 | 0.35 | 0.32 | 0.67 | 0.49 | 0.43 | |
| IS | - | 0.91 | 0.79 | 0.7 | 0.34 | 0.2 | 0.18 | 0.33 | 0.19 | 0.35 | 0.15 | |||
| CS | 0.83 | 0.73 | 0.63 | 0.34 | 0.19 | 0.17 | 0.53 | 0.37 | 0.33 | 0.39 | 0.24 | 0.17 | ||
| 4 mutations, 24 genotypes | R | - | 0.93 | 0.8 | 0.78 | 0.79 | 0.58 | 0.54 | 0.42 | 0.36 | 0.35 | 0.64 | 0.52 | 0.45 |
| IS | 0.87 | 0.76 | 0.67 | 0.41 | 0.22 | 0.18 | 0.43 | 0.25 | 0.23 | 0.48 | 0.27 | 0.2 | ||
| CS | 0.9 | 0.76 | 0.69 | 0.37 | 0.2 | 0.17 | 0.49 | 0.33 | 0.3 | 0.42 | 0.33 | 0.24 | ||
| 8 mutations, 8 single and 20 double mutants | RS | - | 0.8 | 0.58 | 0.68 | 0.48 | 0.5 | 0.37 | 0.3 | 0.29 | 0.55 | 0.44 | 0.4 | |
| IS | - | 0.75 | 0.69 | 0.63 | 0.44 | 0.29 | 0.22 | 0.41 | 0.29 | 0.28 | 0.47 | 0.33 | 0.29 | |
| CS | - | 0.77 | 0.67 | 0.62 | 0.4 | 0.21 | 0.18 | 0.43 | 0.26 | 0.23 | 0.27 | 0.23 | 0.2 | |
| 20 mutations, up to 121 genotypes | R | 0.72 | 0.62 | 0.48 | 0.25 | 0.21 | 0.35 | 0.23 | 0.62 | 0.45 | 0.39 | |||
| 9 mutations, 9 single mutants, 18 double mutants | IS | 0.74 | 0.68 | 0.63 | 0.37 | 0.22 | 0.38 | 0.27 | 0.24 | 0.45 | 0.37 | 0.32 | ||
| 6 mutations, 6 single mutants, 15 double mutants | IS | 0.81 | 0.8 | 0.76 | 0.45 | 0.25 | 0.18 | 0.39 | 0.33 | 0.3 | 0.67 | 0.59 | 0.54 | |
| 5 mutations, 25 genotypes | IS, high fitness combination | 0.86 | 0.74 | 0.61 | 0.24 | 0.09 | 0.42 | 0.25 | 0.24 | 0.1 | ||||
| 4 mutations, 24 genotypes | IS, small fitness effect mutants | 0.8 | 0.84 | 0.78 | 0.69 | 0.52 | 0.43 | 0.5 | 0.41 | 0.38 | 0.69 | 0.48 | 0.36 | |
| 4 mutations, 24 genotypes | IS, large fitness effect mutants | 0.94 | 0.72 | 0.26 | 0.1 | 0.25 | 0.15 | 0.27 | 0.1 | |||||
Prediction error for the four parameters of Fisher’s model, for several experimental designs (based on single and double mutants, or complete sets of mutations and all associated genotypes) and selection procedures (* R: random, IS: independently selected, CS: co-selected mutations), when the 6 summary statistics were used in the ABC algorithm. For each parameter, the three lowest prediction errors are in bold, highlighting the protocol and inference algorithms that perform best.
Figure 3Posterior distribution of parameters for all experimental landscapes. (From top to bottom) A1 andA2 (Aspergillus) and C1 and C2 (Drosophila); the yeast deletion data set (B1–B10); virus evolving on their host (D (circle) and E1-E2 (squares)) and bacteria in a novel medium (F and G); adaptation to an environment containing pyrimethamine (I3). The black point shows the median of the prior, and the dashed line delineates the 50% higher-density region. The points show the median of the posteriors, and the shaded areas show the 50% higher posterior density regions for the data sets.
Posterior distribution of parameters and posterior predictive checks, neural-network algorithm
| Reference | Name | σ | |||||
|---|---|---|---|---|---|---|---|
| — | Prior | 4 (1; 19) | 1.39 (0.05; 7.39) | 0.14 (0.01; 0.74) | 2.25 (0.59; 3.91) | ||
| A1 | 5.24 (0.59; 20.96) | 0.14 (-0.07; 1.88) | 0.15 (0.08; 0.37) | 1.60 (0.42; 3.52) | |||
| A2 | 6.72 (1.63; 23.09) | 0.34 (-0.09; 3.12) | 0.12 (0.05; 0.31) | 1.69 (0.66; 3.51) | |||
| B1 | 6.00 (1.54; 19.08) | 1.01 (0.26; 3.88) | 0.09 (0.04; 0.29) | 1.91 (0.91; 3.75) | 0 | ||
| B2 | 3.44 (0.31; 12.82) | 0.28 (0.12; 1.02) | 0.09 (0.05; 0.20) | 2.96 (1.64; 4.19) | 0.02 | ||
| B3 | 3.33 (0.13; 13.78) | 0.40 (0.13; 1.78) | 0.10 (0.06; 0.23) | 2.33 (1.19; 4.36) | 0.01 | ||
| B4 | 8.28 (1.64; 24.06) | 1.21 (0.04; 5.10) | 0.14 (0.03; 0.48) | 1.57 (0.77; 3.13) | |||
| B5 | 4.16 (0.89; 14.84) | 0.43 (0.07; 2.37) | 0.08 (0.05; 0.23) | 2.17 (1.02; 4.37) | 0.02 | ||
| B6 | 3.01 (-0.72; 13.87) | 0.34 (-0.05; 1.96) | 0.10 (0.05; 0.29) | 2.23 (1.16; 4.68) | 0.01 | ||
| B7 | 4.47 (-0.25; 15.71) | 0.64 (0.12; 2.23) | 0.11 (0.05; 0.33) | 2.07 (0.97; 4.12) | 0.02 | ||
| B8 | 1.63 (-1.91; 12.29) | 1.32 (0.32; 4.92) | 0.11 (0.00; 0.48) | 2.24 (1.10; 4.17) | 0.02 | 0.01 | |
| B9 | 4.12 (0.48; 15.58) | 0.39 (0.02; 2.38) | 0.09 (0.04; 0.26) | 2.12 (1.07; 4.26) | 0 | ||
| B10 | 3.46 (0.63; 15.18) | 0.32 (0.07; 1.58) | 0.07 (0.04; 0.20) | 2.34 (1.08; 4.35) | 0.01 | ||
| C1 | 4.92 (2.12; 13.24) | 1.02 (0.58; 3.20) | 0.30 (0.16; 0.66) | 2.98 (1.71; 4.06) | 0 | ||
| C2 | 2.09 (0.21; 7.03) | 1.10 (0.82; 2.38) | 0.57 (0.39; 1.03) | 2.58 (1.01; 3.57) | 0 | ||
| D | 7.00 (2.95; 15.21) | 0.46 (0.36; 0.82) | 0.21 (0.15; 0.39) | 2.08 (0.83; 3.82) | |||
| E1 | 6.28 (1.64; 19.82) | 0.19 (0.06; 0.86) | 0.15 (0.07; 0.41) | 1.65 (0.23; 3.79) | 0.01 | ||
| E2 | 5.28 (2.11; 12.45) | 0.20 (0.09; 0.55) | 0.14 (0.10; 0.25) | 2.26 (1.34; 3.42) | 0.03 | ||
| F | 6.62 (1.63; 22.28) | 0.42 (0.21; 0.98) | 0.08 (0.05; 0.19) | 1.89 (0.81; 3.70) | 0.03 | ||
| G | 3.65 (0.86; 15.86) | 1.09 (0.73; 2.48) | 0.07 (0.03; 0.21) | 2.67 (1.30; 4.05) | |||
| H1 | 14.39 (7.25; 29.54) | 12.97 (12.16; 15.73) | 0.89 (0.64; 1.46) | 1.40 (0.14; 2.48) | 0.01 | 0 | |
| H2 | 13.18 (5.76; 28.86) | 12.02 (10.87; 14.83) | 0.46 (0.18; 1.08) | 1.83 (0.81; 2.80) | 0.01 | 0 | |
| H3 | 4.81 (1.89; 15.30) | 3.17 (1.08; 8.91) | 0.30 (0.13; 0.79) | 2.94 (1.53; 3.91) | |||
| H4 | 8.89 (5.63; 17.44) | 6.24 (5.27; 7.94) | 0.75 (0.51; 1.13) | 1.40 (0.62; 2.15) | 0 | 0 | |
| I1 | 8.24 (3.79; 19.68) | 9.20 (7.78; 14.61) | 0.57 (0.26; 1.24) | 2.22 (0.55; 3.51) | 0.02 | 0 | |
| I2 | 5.16 (2.50; 13.08) | 7.76 (7.41; 8.95) | 0.23 (0.15; 0.37) | 3.84 (3.17; 4.49) | 0 | 0 | |
| I3 | 1.28 (-0.58; 5.47) | 2.33 (2.19; 2.71) | 0.47 (0.32; 0.79) | 3.70 (3.11; 4.24) | 0.03 |
The median posterior distribution of parameters and the 2.5–97.5% quantile interval (equivalent to 95% higher posterior density) of the posterior distribution of parameters for the rejection algorithm. The prior is shown for comparison (first row). The P-value for the test of adequacy with Fisher’s model is indicated.
Figure 4Posterior predictive checks on two example data sets. One data set is compatible with Fisher’s model (top row; Aspergillus data set A1), and one rejects Fisher’s model (bottom row, data set F). (Left) The median posterior fitness against the “true” fitness of pseudodata generated under Fisher’s model for the cross-validation showing that when the pseudodata have been generated using Fisher as the true model, the posterior fitnesses are close to the true fitness values. (Center) Posterior predicted log-fitness as a function of the true experimental log-fitness. The points are the median posterior, and the lines show the 2.5–97.5% interval. The color code indicates the number of mutations of each genotype, the ancestor in red being set to log-fitness = 0. The median posterior fitnesses are very well correlated with the true fitnesses when the landscape is compatible with Fisher’s model but less so when Fisher’s model is rejected. (Right) The median distance of pseudodata to the accepted simulations when the pseudodata are simulated under Fisher’s model and the posterior parameters. This distribution together with the observed median distance for the experimental data (dashed line) is used to calculate the P-value corresponding to the null hypothesis: “the underlying fitness landscape is Fisher’s model.”
Figure 5Empirical landscapes compared with simulated landscapes. For each data set, the data (left) is shown side by side with the simulated genotypic landscape closest to the data in terms of Euclidean distance (center), and a typical simulated landscape, defined as the landscape, among all simulated landscapes retained by the ABC framework, whose distance to the data was closest to the median distance. The coefficient of determination R2 is also shown. Blue edges are beneficial mutations; red edges are deleterious mutations. Fitness values that are particularly unexpected under Fisher’s model are marked with a triangle.