| Literature DB >> 29329426 |
Venelin Mitov1,2, Tanja Stadler1,2.
Abstract
Pathogen traits, such as the virulence of an infection, can vary significantly between patients. A major challenge is to measure the extent to which genetic differences between infecting strains explain the observed variation of the trait. This is quantified by the trait's broad-sense heritability, H2. A recent discrepancy between estimates of the heritability of HIV-virulence has opened a debate on the estimators' accuracy. Here, we show that the discrepancy originates from model limitations and important lifecycle differences between sexually reproducing organisms and transmittable pathogens. In particular, current quantitative genetics methods, such as donor-recipient regression of surveyed serodiscordant couples and the phylogenetic mixed model (PMM), are prone to underestimate H2, because they neglect or do not fit to the loss of resemblance between transmission partners caused by within-host evolution. In a phylogenetic analysis of 8,483 HIV patients from the United Kingdom, we show that the phenotypic correlation between transmission partners decays with the amount of within-host evolution of the virus. We reproduce this pattern in toy-model simulations and show that a phylogenetic Ornstein-Uhlenbeck model (POUMM) outperforms the PMM in capturing this correlation pattern and in quantifying H2. In particular, we show that POUMM outperforms PMM even in simulations without selection-as it captures the mentioned correlation pattern-which has not been appreciated until now. By cross-validating the POUMM estimates with ANOVA on closest phylogenetic pairs, we obtain H2 ≈ 0.2, meaning ∼20% of the variation in HIV-virulence is explained by the virus genome both for European and African data.Entities:
Keywords: ANOVA; HIV; Ornstein–Uhlenbeck; donor–recipient regression; phylogenetic mixed model; set-point viral load (spVL)
Year: 2018 PMID: 29329426 PMCID: PMC5850476 DOI: 10.1093/molbev/msx328
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
. 1.A schematic representation of an epidemic. Colored rectangles represent infectious periods of hosts, different colors corresponding to different host types. Triangles inside hosts represent pathogen quasispecies, change of color indicating substitution of dominant strains. Capital letters denote host-events: M: diagnosis followed by immediate phenotype measurement, treatment and quarantine for the host; D: host death. The transmission tree connecting the measured hosts is drawn in black. Notice that, due to incomplete sampling, there is no one-to-one correspondance between transmission events and branching points on the tree. By convention, the time origin is at the root of the tree and the time is assumed to increase toward the the tips. We denote by t the time distance from the root to tip i. The mean root-tip distance is denoted by . For each couple of tips, i and j, we denote by t the time distance from the root to their most recent common ancestor (mrca) and by d their phylogenetic distance. For clarity, we show how d can be expressed in terms of t and the root-tip times, t and t. Couples of tips that are each other’s closest tip by phylogenetic distance, for example, (2, 3) and (4, 5), are called “phylogenetic pairs” (PPs). In balanced trees, PPs tend to coincide with pairs of tips descending from the same parent node (a.k.a., siblings or “cherries”).
. 2. A toy model of an epidemic with within-host mutation and SIR dynamics. (A) A pathogen trait represents the sum of a general
. 3.Correlation between lg(spVL)-values in HIV phylogenetic pairs. A sample of 1917 PPs with lg(spVL)-measurements from HIV patients shows a decrease in the correlation (ICC) between pair trait values as a function of the pair phylogenetic distance d. The point estimates and 95% CIs in ten strata of equal size (deciles) are depicted as points and error bars positioned at the mean d for each stratum, . Black and magenta points with error-bars denote the estimated rA and rSp in the real data. Dashed horizontal bars denote the 95% CI for rA evaluated on all phylogenetic pairs. A black and a magenta inclined line denote the least squares linear regression of rA and rSp on . Brown and green points with error bars denote the estimated values of rA obtained after replacing the real trait values on the tree by values simulated under the maximum likelihood fit of the PMM and the POUMM methods, respectively (mean and 95% CI estimated from 100 replications). A brown and a green line show the expected correlation between pairs of tips at distance d, as modeled under the ML-fit of the PMM and the POUMM (eqs. 2 and 3). A light-brown and a light-green region depict the 95% high posterior density (HPD) intervals inferred from Bayesian fit of the two models (Materials and Methods).
Tested Estimators of the Broad-Sense Heritability of Pathogen Traits.
| Input Data | Method (Abbreviation) | Assumptions | Estimator |
|---|---|---|---|
| Grouping by identical infecting strain | Adjusted coefficient of determination | The sample of data contains all genotypes present in the population | |
| One-way analysis of variance (ANOVA) | Independently sampled genotypes | ||
| i.i.n.d. trait-values within each group | |||
| Equal within-group variances (homoscedasticity) | |||
| Known donor–recipient couples | Donor–recipient regression (DR) | Independently sampled donor–recipient couples | |
| Equal residual variance across the range of donor-values (homoscedasticity) | |||
| Equal donor and population variances | variants: | ||
| Phylogenetic pairs (PPs) | ANOVA on all/closest PPs (ANOVA-PP, ANOVA-CPP) | ANOVA assumptions (see above) | Defined as in equation (7), but calculated on PPs |
| variants: | |||
| Spearman correlation on all/closest PPs | PPs are independent from one another | Pearson (product mean) correlation, calculated on the ranks of the trait-values. | |
| variants: | |||
| Linear regression of | The intercept, | ||
| Equal residual variance across the range of | |||
| Transmission tree | Phylogenetic mixed model (PMM) | Branching BM evolution | |
| i.i.n.d. distributed environmental deviation, | |||
| Phylogenetic Ornstein–Uhlenbeck mixed model (POUMM) | Branching OU evolution | ||
| i.i.n.d. environmental deviation, |
Note.—Notation: , sample variance; , sample covariance; N, number of patients; K, number of distinct groups of patients, that is, genotypes or phylogenetic pairs; z, measured values; , estimated genotypic values: mean values from patients carrying a given genotype; , donor values; , recipient values; , within-group mean square: , where z is an individual’s value and is the mean value of the group to which the individual belongs; , among-group mean square: , where is defined as above and is the population mean value; n, weighted mean number patients in a group, that is, n=2 for phylogenetic pairs and for groups of variable size; α, σ, σ: PMM/POUMM parameters (described in Materials and Methods).
i.i.n.d., independent and identically normally distributed; d, phylogenetic distance between donor–recipient pairs or phylogenetic pairs; , threshold on d (see text).
. 4.Heritability estimates in toy-model simulations. (A–D) H2-estimates in simulations of “neutral” and “select” within-/between-host dynamics. Each group of box-whiskers summarizes the simulations for a fixed scenario and contact interval, ; white boxes (background) denote true heritability, colored boxes denote estimates (foreground). Statistical significance is evaluated through t-tests summarized in table 2.
Mean Difference from the Toy-Model Simulations Grouped by Scenario.
| Within: | Neutral | Neutral | Select | Select |
|---|---|---|---|---|
| Between: | Neutral | Select | Neutral | Select |
| 50 | 41 | 47 | 37 | |
| −0.01 | −0.02 | 0.05 | 0.04 | |
| −0.07 | −0.04 | 0 | −0.01 | |
| −0.25 | −0.2 | −0.07 | −0.06 | |
| 0.05 | 0.05 | 0.08 | 0.06 | |
| −0.05 | −0.06 | 0.01 | −0.04 | |
| −0.05 | −0.06 | 0 | −0.03 | |
| −0.18 | −0.15 | −0.06 | −0.08 | |
| −0.05 | −0.05 | −0.05 | −0.07 | |
| −0.17 | −0.17 | −0.01 | −0.04 | |
| −0.28 | −0.24 | −0.12 | −0.16 | |
| −0.01 | −0.02 | 0.01 | 0.03 | |
| −0.01 | −0.02 | 0.01 | 0.03 |
Note.—Statistical significance is estimated by Student’s t-tests, P values denoted by an asterisk as follows: * P<0.01; **P<0.001. Gray background indicates estimates that are unavailable in practice.
. 5.Phylogenetic pairs in lg(spVL) data from the United Kingdom. (A) A scatter plot of the phylogenetic distances between pairs of tips against their absolute phenotypic differences: gray, PPs (); magenta, CPPs (). A black line shows the linear regression of on d (the slope of the regression was statistically positive at the 0.01 level). (B) A box-plot representing the trait-distribution along the transmission tree. Each box-whisker represents the lg(spVL)-distribution of patients grouped by their distance from the root of the tree measured in substitutions per site. Wider boxes indicate groups bigger in size. Segments in magenta denote lg(spVL)-values in CPPs. (C) A box-plot of the lg(spVL)-distribution in all patients (black), PPs (gray), and CPPs (magenta).
Estimates of lg(spVL)-Heritability in HIV Data from the United Kingdom.
| Method | 95% CI | 95% HPD | ||
|---|---|---|---|---|
| Linear regression of | 10 points | 0.17 | [0.09, 0.24] | – |
| Linear regression of | 10 points | 0.18 | [0.11, 0.25] | – |
| ANOVA-CPP ( | 224 | 0.17 | [−0.02, 0.31] | – |
| ANOVA-CPP ( | 232 | 0.16 | [0.01, 0.30] | – |
| ANOVA-CPP ( | 384 | 0.16 | [0.06, 0.25] | – |
| ANOVA-PP ( | 3,834 | 0.11 | [0.08, 0.14] | – |
| Spearman-CPP ( | 224 | 0.23 | [0.05, 0.42] | – |
| Spearman-CPP ( | 232 | 0.22 | [0.03, 0.4] | – |
| Spearman-CPP ( | 384 | 0.2 | [0.06, 0.34] | – |
| Spearman-PP ( | 3,834 | 0.11 | [0.06, 0.15] | – |
| POUMM ( | 8,483 | 0.21 | – | [0.14, 0.29] |
| POUMM ( | 8,483 | 0.2 | – | [0.13, 0.29] |
| PMM ( | 8,483 | 0.08 | – | [0.05, 0.12] |
| PMM ( | 8,483 | 0.06 | – | [0.02, 0.1] |
| PMM, ReML ( | 8,483 | 0.06 | [0.03, 0.09] | – |
Note.—Also written are the results from a previous analysis on the same data set (Hodcroft et al. 2014). “–”: the analysis was not done in the mentioned study. Gray background: estimates considered unreliable due to: anegative bias caused by measurement delays and bnegative bias caused by BM violation. Uncertainty in the estimates is expressed in terms of 95% confidence intervals (CI), or, in the case of Bayesian inference, by 95% high posterior density intervals (HPDs).
. 6.A comparison between H2-estimates from the UK HIV-cohort and previous estimates on African, Swiss, and Dutch data. (A) Estimates with minimized measurement delay (dark cadet-blue) and POUMM estimates (green); (B) Down-biased estimates due to higher measurement delays (light-blue) or violated BM-assumption (brown). Confidence is depicted either as segments indicating estimated 95% CI or P values in cases of missing 95% CIs. References to the corresponding publications are written as numbers in superscript as follows: 1: Tang et al. (2004); 2: Hecht et al. (2010); 3: Hollingsworth et al. (2010); 4: van der Kuyl et al. (2010); 5: Lingappa et al. (2013); 6: Yue et al. (2013); 7: Alizon et al. (2010); 8: Shirreff et al. (2013); 9: Hodcroft et al. (2014); 10: Blanquart et al. (2017); 11: Bertels et al. (2018); 12: this work. For clarity, estimates from previous studies, which are not directly comparable (e.g., previous results from Swiss MSM/strict data sets; Alizon et al. 2010).