| Literature DB >> 35135469 |
J Walker Gussler1,2, David S Campo3, Zoya Dimitrova1, Pavel Skums2, Yury Khudyakov1.
Abstract
BACKGROUND: Investigation of outbreaks to identify the primary case is crucial for the interruption and prevention of transmission of infectious diseases. These individuals may have a higher risk of participating in near future transmission events when compared to the other patients in the outbreak, so directing more transmission prevention resources towards these individuals is a priority. Although the genetic characterization of intra-host viral populations can aid the identification of transmission clusters, it is not trivial to determine the directionality of transmissions during outbreaks, owing to complexity of viral evolution. Here, we present a new computational framework, PYCIVO: primary case inference in viral outbreaks. This framework expands upon our earlier work in development of QUENTIN, which builds a probabilistic disease transmission tree based on simulation of evolution of intra-host hepatitis C virus (HCV) variants between cases involved in direct transmission during an outbreak. PYCIVO improves upon QUENTIN by also adding a custom heterogeneity index and identifying the scenario when the primary case may have not been sampled.Entities:
Mesh:
Year: 2022 PMID: 35135469 PMCID: PMC8822801 DOI: 10.1186/s12859-022-04585-2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Example of a bridge reconstruction using maximum parsimony. If the red variant A were to mutate to the green variant B the gray intermediate variants would represent the most parsimonious path between these two variants
Fig. 2Visualization of evolutionary simulation. Dots represent unique variants. The simulation is a random walk that starts from the green potential infector variants with the gray intermediate variant dots and red potential infectee variant dots being unexplored. As the simulation carries out, dots turn blue as they are explored. The simulation is complete when all potential infectee variant dots are reached, or when all the red dots have turned blue. The top 3 frames show sample A evolving into sample B. The bottom 3 frames show sample B evolving into sample A. We can see that it is easier for A to evolve into B as it finishes faster, so we would estimate that A transmitted the virus to B rather than vice versa
Fig. 3Distribution of PYCIVO distance measurements among our dataset. Most sample pairs fall either well below or above the empirically derived T value of 2000. Evolutionary distances used in PYCIVO versus the minimum hamming distances between samples. The threshold for transmission is indicated by vertical and horizontal lines. Only 827 out of 5460 cases in our study were linked by the PYCIVO distance metric, 817 of these are also linked by minimum hamming distance
Feature performance
| K-mer entroppy | 9/11 |
| Average nucleotide entropy | 9/11 |
| Mean hamming distance | 9/11 |
| Max hamming distance | 9/11 |
| Nucleotide diversity | 8/11 |
| Haplotype entropy | 8/11 |
| Mutation frequency | 8/11 |
| 1-step component entropy | 8/11 |
| Epistasis coefficient | 6/11 |
| Frequency entropy | 3/11 |
| Hill numbers | 3/11 |
| Mean consensus | 2/11 |
| Simpson index | 2/11 |
Results per outbreak
| Id | n of cases | n of features | Primary present | Primary absent |
|---|---|---|---|---|
| 1 | 33 | 0 | NPC | NPC |
| 2 | 19 | 7 | HC | NPC |
| 3 | 15 | 8 | HC | NPC |
| 4 | 9 | 0 | NPC | NPC |
| 5 | 7 | 8 | LC | NPC |
| 6 | 6 | 7 | HC | NPC |
| 7 | 4 | 8 | LC | NPC |
| 8 | 4 | 8 | LC | NPC |
| 9 | 3 | 8 | LC | LC |
| 10 | 3 | 7 | LC | LC |
| 11 | 2 | 8 | LC | N/A |
The last two columns show the predicted label
Results per label
| Label | Primary present | Primary absent |
|---|---|---|
| HC | 3/11 | 0/10 |
| LC | 6/11 | 2/10 |
| NPC | 2/11 | 8/10 |