| Literature DB >> 31056055 |
M Alamil1, J Hughes2, K Berthier3, C Desbiez3, G Thébaud4, S Soubeyrand1.
Abstract
Pathogen sequence data have been exploited to infer who infected whom, by using empirical and model-based approaches. Most of these approaches exploit one pathogen sequence per infected host (e.g. individual, household, field). However, modern sequencing techniques can reveal the polymorphic nature of within-host populations of pathogens. Thus, these techniques provide a subsample of the pathogen variants that were present in the host at the sampling time. Such data are expected to give more insight on epidemiological links than a single sequence per host. In general, a mechanistic viewpoint to transmission and micro-evolution has been followed to infer epidemiological links from these data. Here, we investigate an alternative approach grounded on statistical learning. The idea consists of learning the structure of epidemiological links with a pseudo-evolutionary model applied to training data obtained from contact tracing, for example, and using this initial stage to infer links for the whole dataset. Such an approach has the potential to be particularly valuable in the case of a risk of erroneous mechanistic assumptions, it is sufficiently parsimonious to allow the handling of big datasets in the future, and it is versatile enough to be applied to very different contexts from animal, human and plant epidemiology. This article is part of the theme issue 'Modelling infectious disease outbreaks in humans, animals and plants: approaches and important themes'. This issue is linked with the subsequent theme issue 'Modelling infectious disease outbreaks in humans, animals and plants: epidemic forecasting and control'.Entities:
Keywords: contact information; infectious disease; pathogen spread; training data; transmission trees; within-host pathogen diversity
Mesh:
Year: 2019 PMID: 31056055 PMCID: PMC6553606 DOI: 10.1098/rstb.2018.0258
Source DB: PubMed Journal: Philos Trans R Soc Lond B Biol Sci ISSN: 0962-8436 Impact factor: 6.237
Figure 1.Transmissions inferred in the naive and vaccinated chains with two different pairs of training hosts for calibrating the penalization. Panel (a) corresponds to the naive chain using pair 106–112 as training hosts (i.e. the last group of the chain); (b) naive chain, pair 111–108; (c) vaccinated chain, pair 400–413 (i.e. the last group of the chain); (d) vaccinated chain, pair 401–416. Training hosts are written in bold. The thickness of each arrow is proportional to the intensity of the corresponding inferred link. (Online version in colour.)
Figure 2.Estimated intensities of links for all recipients (a; vertical line: median intensity) and for each recipient in the training set of hosts (b–f; vertical lines: intensity for the source identified with contact tracing). This figure was obtained from the combined analysis of 31 sequence fragments and with cross-validation. Analogous figures obtained without cross-validation and with half of the fragments are given in electronic supplementary material, figures S6–S8. The second half of fragments led to approximately the same results for training hosts. Note in addition that using only one fragment for inferring transmissions led to particularly stochastic outputs. (Online version in colour.)
Figure 3.Most likely epidemiological links cumulating to 20% probability for each recipient (i.e. for each recipient, potential donors were ranked with respect to link intensity, and the subset of donors with higher ranks for whom the sum of link intensities reached 0.2 were displayed on the graph). (Online version in colour.)
Figure 4.Links inferred between salsify patches based on sampled sets of potyvirus sequences (a; links from the same source have the same colour) and distribution of link distances (b; the vertical red line gives the mean distance). (Online version in colour.)