| Literature DB >> 27612975 |
Bisakha Ray1, Elodie Ghedin2, Rumi Chunara3.
Abstract
Networks inference problems are commonly found in multiple biomedical subfields such as genomics, metagenomics, neuroscience, and epidemiology. Networks are useful for representing a wide range of complex interactions ranging from those between molecular biomarkers, neurons, and microbial communities, to those found in human or animal populations. Recent technological advances have resulted in an increasing amount of healthcare data in multiple modalities, increasing the preponderance of network inference problems. Multi-domain data can now be used to improve the robustness and reliability of recovered networks from unimodal data. For infectious diseases in particular, there is a body of knowledge that has been focused on combining multiple pieces of linked information. Combining or analyzing disparate modalities in concert has demonstrated greater insight into disease transmission than could be obtained from any single modality in isolation. This has been particularly helpful in understanding incidence and transmission at early stages of infections that have pandemic potential. Novel pieces of linked information in the form of spatial, temporal, and other covariates including high-throughput sequence data, clinical visits, social network information, pharmaceutical prescriptions, and clinical symptoms (reported as free-text data) also encourage further investigation of these methods. The purpose of this review is to provide an in-depth analysis of multimodal infectious disease transmission network inference methods with a specific focus on Bayesian inference. We focus on analytical Bayesian inference-based methods as this enables recovering multiple parameters simultaneously, for example, not just the disease transmission network, but also parameters of epidemic dynamics. Our review studies their assumptions, key inference parameters and limitations, and ultimately provides insights about improving future network inference methods in multiple applications. Copyright ÂEntities:
Keywords: Bayesian inference; Infectious disease; Multimodal data; Network inference; Transmission
Mesh:
Year: 2016 PMID: 27612975 PMCID: PMC7106161 DOI: 10.1016/j.jbi.2016.09.004
Source DB: PubMed Journal: J Biomed Inform ISSN: 1532-0464 Impact factor: 6.317
Fig. 1Examples of multimodal network inference methods in different applications. Different modalities of data have been integrated in several applications for inferring specific networks. Most network inference methods focus on recovering network topology.
Fig. 2Modeling transmission of infectious diseases, an area in which use of multiple modalities of data has been developed. (a) Several key questions can be answered such as who infected whom or how did the infection transmit through the population or region. (b) Possible inputs to the model include pathogen genomic sequences, spatial and temporal information, point-of-care diagnostic information, and mobile health information. The data are brought together in multimodal network inference frameworks. (c) Some possible outputs are the transmission tree, latency period, epidemic reproduction number, phylogenetic tree, and proportion of infected hosts sampled.
Fig. 3Study design and inclusion-exclusion criteria. This is a decision tree showing our searches and selection criteria for both PubMed and Google Scholar. We focused only on genomic epidemiology methods utilizing Bayesian inference for infectious disease transmission.
Summary of network inference methods to-date used in infectious disease modeling.
| Literature source | Location | Time span | Pathogen | Sample size | Assumptions | Inferred Parameters |
|---|---|---|---|---|---|---|
| Cottam et al. (2008) | Durham area | 2001 | FMD | 22 | Farm infectiousness is not quantified Different animals may have different levels of infectiousness | Transmission tree Infection dates Most likely period of infectiousness Spatial dependence Probability of transitions, transversions, deletions |
| Ypma et al. (2012) | The Netherlands | 2003 | Avian influenza A (H7N7) | 185 | Mutations happen before or shortly after infection. The mutation rate is constant | Transmission tree Rate of decline of infectiousness Kernel parameters for scale and shape of spatial kernel Expected number of transitions Expected number of transversions Probability of deletion |
| Morelli et al. (2012) | Durham County Surrey and Berkshire, UK | 2001 2007 | FMD | 12 premises 8 premises | Prior centered on and sensitive to lesion age | Transmission tree Infection times Latency duration Duration from infectiousness to detection |
| Ypma et al. (2013) | Durham County, England | 2001 | FMD | 12 premises | Within-host diversity different from genetic diversity | Phylogenetic tree Transmission tree Epidemiological parameters Mutational parameters Infection times |
| Teunis et al. (2013) | Netherlands | December 2002 –December 2007 | Norovirus | 160 | Likelihood proportional to product of conditional probability density and entry from transition probability matrix | Transmission tree Reproductive number |
| Didelot et al. (2014) | British Columbia | 2004–2011 | Tuberculosis | 40 | All cases comprising an outbreak have been sampled | Transmission tree Rate of infectivity Rate of removal Effective population size Duration of replication cycle |
| Mollentze et al. (2014) | KwaZulu Natal province, South Africa | 1 March 2010–8 June 2011 | Rabies virus | 195 | Observation date is shortly after infection date | Transmission tree Population size |
| Jombart et al. (2014) | Singapore | 2003 | SARS | 15 | Densely sampled outbreak Distribution of generation time known Time from infection to sample collection known | Transmission tree Superspreaders Mutation rates Separate introductions of the pathogen Unobserved cases Effective reproduction number |
Fig. 4Different spatial and genomic resolutions utilized to study disease spread. (a) Regions of interest considered for different studies. Influenza studies considered world-wide spread, SARS was studied in Singapore, Tuberculosis (TB) dataset was from British Columbia, Norovirus in a university hospital in the Netherlands, and Foot and Mouth Disease (FMD) in 12 farms in Durham. (b) Different genomic sequencing platforms utilized in studies. For the TB study, Whole genome sequencing was performed on Illumina HiSeq platform with M. tuberculosis CDC1551 reference sequence and aligned using Burrows-Wheeler Aligner algorithm. SARS DNA sequences were obtained from GenBank and aligned using MUSCLE. For avian influenza, RNA consensus sequences of the haemagglutinin, neuriminidase and polymerase PB2 genes were sequenced. For H1N1 influenza, isolates were typed for hemagglutinin (HA) and neuraminidase (NA) genes.
Summary of gaps in existing inference techniques and suggestions for future research.
| Presently available data and methods | Suggestions for future research | |
|---|---|---|
| Genomic | Pathogen genomic sequences are largely consensus in nature | Use deep-sequencing for within-host identification of minor variants |
| Spatial | Individual to individual Farm to farm Country to country | Use community-level resolution such as household to household, zipcode to zipcode, or neighborhood-based geographical locations, which are reasonable for targeting of public-health interventions |
| Methods | Fitted to disease Small sample size Biased towards the most severe cases | Perform power analysis to identify sample size for inference Reduce selection bias in data by generating it from the community which captures a wide range of infections Incorporate supplementary information such as social networking, point-of-care data, and electronic medical record (EMR) data. Social networking data can capture social and family contact structures which can augment information about how transmission spreads. Point-of-care data can be utilized where access to clinics is not available or feasible. EMR data includes information such as family, social, and medication history |
| Data Generation | Clinical | Community-generated data from the wide range of cases in the community who do not necessarily report to a clinic or are symptomatic Crowdsourced data which includes multitudes of factors such as social network structures and mobility data Participatory self-reported data |
| Parameters | Transmission tree Rate of infectiousness Rate of recovery Proportion of infected hosts sampled Genetic outliers Superspreaders | Community parameters capturing location or neighborhood-based infectiousness and transmissibility essential for proactive intervention such as quarantine and vaccination Incorporate population stochastics such as mobility and transportation Foreign exports |