| Literature DB >> 36064318 |
Glenn Marion1, Andrea Doeschl-Wilson2, Christopher Pooley3,4, Stephen Bishop.
Abstract
BACKGROUND: The spread of infectious diseases in populations is controlled by the susceptibility (propensity to acquire infection), infectivity (propensity to transmit infection), and recoverability (propensity to recover/die) of individuals. Estimating genetic risk factors for these three underlying host epidemiological traits can help reduce disease spread through genetic control strategies. Previous studies have identified important 'disease resistance single nucleotide polymorphisms (SNPs)', but how these affect the underlying traits is an unresolved question. Recent advances in computational statistics make it now possible to estimate the effects of SNPs on host traits from epidemic data (e.g. infection and/or recovery times of individuals or diagnostic test results). However, little is known about how to effectively design disease transmission experiments or field studies to maximise the precision with which these effects can be estimated.Entities:
Mesh:
Year: 2022 PMID: 36064318 PMCID: PMC9442948 DOI: 10.1186/s12711-022-00747-1
Source DB: PubMed Journal: Genet Sel Evol ISSN: 0999-193X Impact factor: 5.100
Fig. 1Schematic diagram of a disease transmission experiment. a The experiment consists of several contact groups in which some individuals are initially infected “seeders” and some are initially susceptible “contacts”. Each symbol represents an individual, and the annotations AA, AB and BB refer to the genotype of that individual at a given bi-allelic SNP under investigation. b As the experiment progresses some susceptible individuals become infected and some infected individuals recover. c If the experiment continues until the epidemics die out, only susceptible and recovered individuals are observed in the final state (for practical reasons, experiments are often terminated before this point). Note that the spatial separation of seeders (left) and contacts (right) in this diagram is for illustrative purposes only (random mixing between individuals is assumed)
List of key parameters and quantities
| Type | Parameter | Description |
|---|---|---|
| Experimental design | Number of contact groups | |
| Number of seeders (initially infected individuals) in each contact group | ||
| Number of contacts (initially susceptible individuals) in each contact group | ||
| Total number of individuals per group | ||
| Total number of individuals | ||
| Proportion of homozygotes (i.e. | ||
| Average proportion of homozygotes across groups | ||
| Homozygote balance (i.e. the proportion of | ||
| Average homozygote balance across groups | ||
| Population-wide epidemiological parameters | Population average transmission rate | |
| Population average recovery rate | ||
| Shape parameter that characterises the dispersion in infection durations of different individuals | ||
| Individual-based epidemiological traits for individual | Force of infection (probability per unit time to become infected) | |
| Mean of gamma distributed recovery time | ||
| Fractional deviation in susceptibility, infectivity and recoverability | ||
| SNP | SNP-based contribution to | |
| SNP effects, i.e. half the change in | ||
| Scaled dominance factors (1 = | ||
| Fixed effects | Vectors of fixed effects for the three traits | |
| Design matrix for fixed effects | ||
| Residuals | Residual contributions to | |
| Covariance matrix of residual contributions | ||
| Group effects | Group effects (accounts for differences in transmission rates in different contact groups) | |
| Standard deviation in group effects | ||
| Bayesian model | Set of all model parameters | |
| Set of all events (infection and recovery / death times) which may be unknown, i.e. latent variables in the model | ||
| Other parameters used in the analyses | Total number of infections during experiment | |
| Fraction of contacts that become infected | ||
| Proportion of the total number of infections accounted for by seeders, i.e. | ||
| Fisher information matrix | ||
| Average over contact groups | ||
| Average over entire infected population (included seeders as well as those individuals infected during epidemics) |
Data/model scenarios
| Data | Design | Residual | Group effect | Fixed effect | Information source |
|---|---|---|---|---|---|
| Inf. + Rec | Single group (no dominance estimate) | ✕ | ✕ | ✕ | Figure |
| Inf. + Rec | Pure (no dominance estimate) | ✕ | ✕ | ✕ | Figure |
| Inf. + Rec | Mixed (no dominance estimate) | ✕ | ✕ | ✕ | Figure |
| Inf. + Rec | Pure/mixed (no dominance estimate) | ✓ | ✓ | ✓ | Figure |
| Inf. + Rec | Pure (dominance estimate) | ✕ | ✕ | ✕ | Additional file |
| Inf. + Rec | Mixed (dominance estimate) | ✕ | ✕ | ✕ | Additional file |
| Inf. + Rec | Pure/mixed (no dominance estimate) | ✓ | ✕ | ✕ | Additional file |
| Inf. + Rec | Pure/mixed (no dominance estimate) | ✕ | ✓ | ✕ | Additional file |
| Inf. + Rec | Pure/mixed (no dominance estimate) | ✕ | ✕ | ✓ | Additional file |
| Rec. (no Inf.) | Pure/mixed (no dominance estimate) | ✕ | ✕ | ✕ | Additional file |
| Periodic DS checks | Pure/mixed (no dominance estimate) | ✕ | ✕ | ✕ | Additional file |
| Inf. + Rec | Pure/mixed (no dominance estimate) | ✓ | ✓ | ✓ | Additional file |
| Inf. + Rec | Pure/mixed (dominance estimate) | ✓ | ✓ | ✓ | Additional file |
| Inf. + Rec | HWE | ✕ | ✕ | ✕ | Additional file |
This table summarises all the data/model scenarios used in this paper. The columns are as follows: Data (“Inf. + Rec.” means that infection and recovery times of all individuals are assumed to be known exactly, “Rec.” means only recovery times are known, and “Periodic DS checks” means the disease status of individuals is periodically checked); Design (this includes the five optimal designed illustrated in Fig. 2 as well as “HWE”, in which individuals are randomly allocated genotypes assuming Hardy–Weinberg equilibrium); Residual (a tick (✓) is indicated if the model incorporates the residuals =(, , ) in Eq. (2)); Group effect (a tick (✓) is indicated if the model incorporates the random group effect in Eq. (1)); Fixed effect (a tick (✓) is indicated if the model incorporates a fixed effect = (, , ) in Eq. (2)); and Information source (indicates the figure in the main text and in Additional files that relates to the corresponding scenario)
Fig. 2Optimal experimental designs. This figure shows the optimal composition of the seeder and contact populations for different experimental designs: a Single contact group design: ~ 15% of individuals are seeders, where seeders have genotype BB (or AA) and contacts predominately have genotype AA (or BB), with ~ 10% BB, to allow for estimation of the susceptibility SNP effect . Estimation of dominance was found to be challenging using only a single contact group (not shown). b Multiple groups “pure” design: ~ 47% of individuals are seeders. Seeders and contacts consist of different combinations of AA and BB across groups (and AB when dominance is investigated). c Multiple groups “mixed” design: a small number of individuals are seeders (typically two or three, sufficient to initiate epidemics). When dominance is not investigated, there is a 83%/17% split in AA/BB individuals in the contact population in group 1 and vice-versa in group 2. When dominance is investigated, there is a 80%/10%/10% split in AA/AB/BB individuals in the contact population in group 1, and these proportions are permuted to define the two other groups. Optimisation of these designs was (for the most part) based on maximising the precision with which the infectivity SNP effect can be estimated (since this was generally the most difficult trait to estimate). However in cases where maximal precision for corresponds to minimal precision for , values are chosen to give equal precision to the two (e.g. ~ 10% BB in a, as discussed in the paper). The percentages above are, to a large extent, independent of (see Additional file 8) or other factors in the model/data (see Additional file 11). For reference the optimal homozygote balance and (i.e. proportion of AA minus BB individuals) and homozygosity and (i.e. proportion of AA plus BB individuals) are shown for each design (the ‘’ symbols indicate that these are optimal values to be aimed for, accounting for the fact that the number of individuals is discrete). The same basic designs can be replicated multiple times within an experiment. Note that the results equally apply to the estimation of non-genetic factors, e.g. vaccination effects (AA replaced with “Vac.” and BB replaced with “Unvac.” and dominance not applicable). The spatial separation between seeders and contacts in this diagram is for illustrative purposes only
Fig. 3Single contact group design. Precision estimates for the single contact group design (no dominance estimate) in Fig. 2a. The left, middle and right columns show graphs for standard deviations (SDs) in the posterior distributions for SNP effects for susceptibility , infectivity and recoverability under different scenarios. a The fraction of seeder individuals is varied (arbitrarily fixing , ). b The composition of SNP genotypes in the seeder population is changed by varying (fixing and ). Here the left-hand edge of the graph corresponds to the case when all seeders are BB and the right edge is when they are all AA (points in between represent a mixture of the two). c The composition of SNP genotypes in the contact population is changed by varying (fixing and ). Note that a low SD implies high precision. Dashed lines represent analytical results and crosses refer to posterior estimates from simulated data (see Additional file 1). refers to the total number of individuals
Fig. 4The “pure” design. Precision estimates for the pure design (no dominance estimate) consisting of four contact groups per replicate with homogeneous seeder/contact SNP genotypes illustrated in Fig. 2b. The left, middle and right columns show graphs for standard deviations (SDs) in the posterior distributions for the SNP effects for susceptibility , infectivity and recoverability under different scenarios. a The fraction of seeder individuals in each contact group is varied. b The number of experimental replicates, each consisting of four contact groups, is varied (with smaller groups sizes keeping the total number of individuals approximately fixed). Dashed lines represent analytical results and crosses refer to posterior estimates from simulated data (see Additional file 1). refers to the total number of individuals
Fig. 5The “mixed” design. Precision estimates for the mixed design (no dominance estimate) consisting of two contact groups per replicate with homogeneous seeder SNP genotypes and heterogeneous contact SNP genotypes illustrated in Fig. 2c. The left, middle and right columns show graphs for standard deviations (SDs) in the posterior distributions for SNP effects for susceptibility , infectivity and recoverability under different scenarios. a The composition of SNP genotypes in the contact population in group 1 is changed by varying χcont,1 while using the opposite value in group 2 and . b The fraction of seeders is varied (fixing ). Dashed lines represent analytical results and crosses come from posterior estimates from simulated data (see Additional file 1). refers to the total number of individuals.
Parameter precision estimates
| Design | SD in | SD in | SD in | SD in | SD in | SD in |
|---|---|---|---|---|---|---|
| Single group (no dominance estimate) | ∞ | ∞ | ∞ | |||
| Pure design (no dominance estimate) | ∞ | ∞ | ∞ | |||
| Pure design (dominance estimate) | ||||||
| Mixed design (no dominance estimate) | ∞ | ∞ | ∞ | |||
| Mixed design (dominance estimate) |
This table provides analytically derived estimates for parameter precisions (as measured by the posterior standard deviations (SDs) in the SNP effects , , and and dominance parameters , , and ) for the optimum designs outlined in Fig. 2
Fig. 6Partitioning contributions to standard deviations for estimates of SNP effects. Residuals, group effects and a fixed effect are sequentially added to the basic SNP-only model (infection and recovery times assumed known). The corresponding increase in the SDs in the posterior distributions for SNP effects is investigated for (a) the susceptibility , b the infectivity , and c the recoverability . For comparison, four different scenarios are investigated: a pure design (no dominance estimate) with respectively (i.e. a single replicate of the basic design) and (i.e. three replicates of the basic design) and a mixed design (no dominance estimate) with (i.e. two replicates) and (i.e. six replicates). In each case, ~ 1000 individuals were partitioned equally among the contact groups. The residuals were chosen to have the covariance matrix , , , and , the group effects had a SD of , and the fixed effect (assumed to represent sex with gender randomly allocated) had a size . Results were found to be largely insensitive to these essentially arbitrary choices
Fig. 7Precision Calculator tool. SIRE-PC (susceptibility infectivity recoverability estimation precision calculator) is an easy-to-use online software that calculates the analytical expressions provided in the "Results" section to help aid experimental design