| Literature DB >> 29244005 |
Atkinson G Longmire1,2, Seth Sims1,3,2, Inna Rytsareva1, David S Campo4, Pavel Skums1,3, Zoya Dimitrova1, Sumathi Ramachandran1, Magdalena Medrzycki1, Hong Thai1, Lilia Ganova-Raeva1, Yulin Lin1, Lili T Punkova1, Amanda Sue1, Massimo Mirabito5,2, Silver Wang5,2, Robin Tracy5,2, Victor Bolet6, Thom Sukalac5, Chris Lynberg7, Yury Khudyakov1.
Abstract
BACKGROUND: Hepatitis C is a major public health problem in the United States and worldwide. Outbreaks of hepatitis C virus (HCV) infections associated with unsafe injection practices, drug diversion, and other exposures to blood are difficult to detect and investigate. Effective HCV outbreak investigation requires comprehensive surveillance and robust case investigation. We previously developed and validated a methodology for the rapid and cost-effective identification of HCV transmission clusters. Global Hepatitis Outbreak and Surveillance Technology (GHOST) is a cloud-based system enabling users, regardless of computational expertise, to analyze and visualize transmission clusters in an independent, accurate and reproducible way.Entities:
Keywords: Cloud; HCV; HVR1; Liver cancer; Outbreak detection; Public health; Surveillance; Threshold; Transmission; Virtual diagnostics
Mesh:
Year: 2017 PMID: 29244005 PMCID: PMC5731493 DOI: 10.1186/s12864-017-4268-3
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Screen captures of the two main types of tasks within the GHOST web interface. Left shows the Quality Control task. Right shows the Analysis task
Fig. 2Flowchart depicting processing steps for a Quality Control task
Fig. 3The technological layout of the GHOST system. From left to right: user uploading Illumina demultiplexed sequence data; front-end public exposure, authentication, and message forwarding; middle-tier control, messaging, and data management within the CDC, backend computation and control management within the AWS environment
Fig. 4Read count and quality statistics for samples used in runtime testing. Left-side Y-axis represents the number of reads. Right-side Y-axis represents PHRED quality score average
Fig. 5Runtime measurements for Quality Control tasks spanning subsampling levels of 10,000–100,000 read pairs. Y-axis unit in minutes
Fig. 6Total processing time for all samples of the specified subsampling level to complete. Y-axis in minutes
Fig. 7The all-pairs minimum distance calculation portion of the Analysis task time with varying number of input samples. Best fit line calculated using the quadratic least squares regression
Serum sources used to construct MiSeq libraries
| Name | Source | Subtype |
|---|---|---|
| T-1 | Single | 1a |
| T-2 | Single | 1b |
| T-3 | Single | 2 k |
| T-4 | Single | 3a |
| T-5 | Single | 6f |
| T-6 | Combination | 2b/4a |
| T-7 | Combination | 3a/4a |
| T-8 | none | none |
Fig. 8Dark blue balls represent unrelated samples. Light blue balls represent samples in a cluster. Lines represent relatedness. Left shows GHOST linkage results for 5 of the 6 libraries constructed during the state health department GHOST Training in November, 2015. Right shows Library 5 GHOST linkage results, which showed the absence of T-4 due to loss of pellet during library preparation
Summary table of data used in the study
| Collection | Classification | Samples number | Origin |
|---|---|---|---|
| G1–8 | Unrelated | 8 | CDC Archive |
| T1–8 | Unrelated | 8 | Artificial |
| Unrelated Collection | Unrelated | 16 | CDC Archive |
| Transmission Collection | Related | 8 | Outbreak |
| Time Series Collection | Related | 8 | CDC Archive |
| Spike Collection | Related | 8 | Artificial |
Fig. 9GHOST output for ten-fold subsampling (N = 20,000) of 16 samples with no epidemiological evidence of intra-group transmission. Fifteen clicks are present, as one sample did not have sufficient product for sequencing, and subsequent sequence subsets did not pass GHOST preprocessing filters
GHOST accuracy at subsampling levels N = 104 to N = 101. The final column shows linkage percent when the filter requirement for the minimum frequency of a unique sequence to create linkage is reduced from the default of 2 to 1
| Dataset | Links expected |
|
|
|
|
|
|---|---|---|---|---|---|---|
| Transmission | 30 | 100.00% | 100.00% | 100.00% | 26.67% | 100.00% |
| Time Series | 40 | 100.00% | 100.00% | 100.00% | 35.00% | 100.00% |
| Spike | 40 | 100.00% | 75.00% | 27.50% | 0.00% | 57.50% |