Literature DB >> 31757881

Identification of Infectious Agents in High-Throughput Sequencing Data Sets Is Easily Achievable Using Free, Cloud-Based Bioinformatics Platforms.

Joseph G Chappell¹, Timothy Byaruhanga^2,3, Theocharis Tsoleridis², Jonathan K Ball^2,4, C Patrick McClure^1,4.

Abstract

Entities: Species

Keywords: bioinformatics; high-throughput sequencing; virology

Mesh：

Year: 2019 PMID： 31757881 PMCID： PMC6879272 DOI： 10.1128/JCM.01386-19

Source DB: PubMed Journal: J Clin Microbiol ISSN： 0095-1137 Impact factor: 5.948

× No keyword cloud information.

LETTER

It was with great interest that we read the recent publication by Brinkmann et al. (1) on the comparison of various methodologies for diagnosing viral infections in high-throughput sequencing (HTS) data sets. The authors demonstrated that there is a plethora of workflows and pipelines available to analyze HTS data sets and the choice of technique can lead to different results, even with a uniform proficiency testing data set. Processing HTS data sets is computationally intensive, may require significant investment, and often necessitates a comprehensive technical background to fully analyze the results. Currently, these requirements can limit the use of HTS, preventing clinicians and researchers with minimal funding or expertise in bioinformatics from exploring and exploiting this powerful technology. However, several online tools, such as IDseq (2, 3) and Genome Detective (4), have recently been made available for research involving pathogen discovery and identification. The cloud-based nature of these tools removes the requirement for users to have high-specification computers for data processing, and automated identification of microbial sequences reduces the need for any significant background in bioinformatics. HTS data sets, with identifying information removed, are simply uploaded, and annotated sequence matches to potential pathogens are delivered within hours, in a format that can be easily interpreted by those with relevant clinical or academic skills. While IDseq automatically discards any human genomic reads, the submission of data sets containing patient sequences, although anonymized, to third-party platforms necessitates ethical consideration and permission. We evaluated IDseq and Genome Detective against the simulated in silico data set provided by Brinkmann et al. (1). IDseq analysis took 92 min from the initiation of sample uploading to the presentation of the mapped reads, one-half of the time for the fastest participant (participant 1) reported by Brinkmann et al. (1). Of the 6,339,908 reads in the data set, 1,362,725 reads (21.5%) passed host filtering; of those, 996,855 reads (73.2%) mapped to bacterial nucleotide databases (70.3% to nonredundant protein databases). Genome Detective identified and removed 6,290,069 reads (99%) as nonviral hits, completing the analysis in only 16 min. Both platforms detected all four viruses in the data set (Table 1). Detection of Torque teno virus, human herpesvirus 1, and measles virus was not as sensitive as in many of the other participant workflows. However, both IDseq and Genome Detective identified the highly divergent avian orthobornavirus (55% similarity to a reference sequence), whereas 9 of the 13 workflows in the study by Brinkmann et al. (1) did not.

TABLE 1

Comparison of viral reads identified by IDseq and Genome Detective and the COMPARE virus proficiency tests

Method and database	Sensitivity (%)				Time (h)
Method and database	Torque teno virus	Human herpesvirus 1	Measles virus	Avian bornavirus	Time (h)
Proficiency test, participants 1 to 13 (median [range])	100 (0–102)	99 (10–400)	100 (0–140)	0 (0–100)	15.5 (3–216)
IDseq
Nucleotide	59	56	69	0	1.5
Nonredundant protein	59	55	70	53	1.5
Genome Detective	100	84	82	41	0.25

A sensitivity of >100% indicated false-positive results.

Comparison of viral reads identified by IDseq and Genome Detective and the COMPARE virus proficiency tests A sensitivity of >100% indicated false-positive results. Our results show that both platforms can accurately identify viral genomes in HTS data sets, with little or no prior knowledge of bioinformatic approaches. IDseq has the additional capability to detect bacterial genomes as well as viral genomes. While not as sensitive as some of the other methodologies tested, IDseq and Genome Detective were able to identify all of the infectious agents included in the proficiency data set, in a fraction of the time reported for the other pipelines, and required very little local computational power. IDseq, Genome Detective, and similar free cloud-based online tools will significantly reduce the barrier to entry for exploiting HTS, without the hardware and background required for traditional bioinformatics approaches.

3 in total

1. Genome Detective: an automated system for virus identification from high-throughput sequencing data.

Authors: Michael Vilsker; Yumna Moosa; Sam Nooij; Vagner Fonseca; Yoika Ghysens; Korneel Dumon; Raf Pauwels; Luiz Carlos Alcantara; Ewout Vanden Eynden; Anne-Mieke Vandamme; Koen Deforche; Tulio de Oliveira
Journal: Bioinformatics Date: 2019-03-01 Impact factor: 6.937

2. Metagenomic next-generation sequencing of samples from pediatric febrile illness in Tororo, Uganda.

Authors: Akshaya Ramesh; Sara Nakielny; Jennifer Hsu; Mary Kyohere; Oswald Byaruhanga; Charles de Bourcy; Rebecca Egger; Boris Dimitrov; Yun-Fang Juan; Jonathan Sheu; James Wang; Katrina Kalantar; Charles Langelier; Theodore Ruel; Arthur Mpimbaza; Michael R Wilson; Philip J Rosenthal; Joseph L DeRisi
Journal: PLoS One Date: 2019-06-20 Impact factor: 3.240

3. Proficiency Testing of Virus Diagnostics Based on Bioinformatics Analysis of Simulated In Silico High-Throughput Sequencing Data Sets.

Authors: Annika Brinkmann; Andreas Andrusch; Ariane Belka; Claudia Wylezich; Dirk Höper; Anne Pohlmann; Thomas Nordahl Petersen; Pierrick Lucas; Yannick Blanchard; Anna Papa; Angeliki Melidou; Bas B Oude Munnink; Jelle Matthijnssens; Ward Deboutte; Richard J Ellis; Florian Hansmann; Wolfgang Baumgärtner; Erhard van der Vries; Albert Osterhaus; Cesare Camma; Iolanda Mangone; Alessio Lorusso; Maurilia Marcacci; Alexandra Nunes; Miguel Pinto; Vítor Borges; Annelies Kroneman; Dennis Schmitz; Victor Max Corman; Christian Drosten; Terry C Jones; Rene S Hendriksen; Frank M Aarestrup; Marion Koopmans; Martin Beer; Andreas Nitsche
Journal: J Clin Microbiol Date: 2019-07-26 Impact factor: 5.948

3 in total

1 in total

1. IDseq-An open source cloud-based pipeline and analysis service for metagenomic pathogen detection and monitoring.

Authors: Katrina L Kalantar; Tiago Carvalho; Charles F A de Bourcy; Boris Dimitrov; Greg Dingle; Rebecca Egger; Julie Han; Olivia B Holmes; Yun-Fang Juan; Ryan King; Andrey Kislyuk; Michael F Lin; Maria Mariano; Todd Morse; Lucia V Reynoso; David Rissato Cruz; Jonathan Sheu; Jennifer Tang; James Wang; Mark A Zhang; Emily Zhong; Vida Ahyong; Sreyngim Lay; Sophana Chea; Jennifer A Bohl; Jessica E Manning; Cristina M Tato; Joseph L DeRisi
Journal: Gigascience Date: 2020-10-15 Impact factor: 6.524

1 in total