Literature DB >> 33594411

Viral Host Range database, an online tool for recording, analyzing and disseminating virus-host interactions.

Quentin Lamy-Besnier1,2, Bryan Brancotte3, Hervé Ménager3, Laurent Debarbieux1.   

Abstract

MOTIVATION: Viruses are ubiquitous in the living world, and their ability to infect more than one host defines their host range. However, information about which virus infects which host, and about which host is infected by which virus, is not readily available.
RESULTS: We developed a web-based tool called the Viral Host Range database to record, analyze and disseminate experimental host range data for viruses infecting archaea, bacteria and eukaryotes. AVAILABILITY: The ViralHostRangeDB application is available from https://viralhostrangedb.pasteur.cloud. Its source code is freely available from the Gitlab hub of Institut Pasteur (https://gitlab.pasteur.fr/hub/viralhostrangedb).
© The Author(s) 2021. Published by Oxford University Press.

Entities:  

Year:  2021        PMID: 33594411      PMCID: PMC8428608          DOI: 10.1093/bioinformatics/btab070

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Viral genomic data are expanding, and their in silico analysis poses many challenges, including how to predict the likely host of a given virus (de Jonge ; Dzunkova ; Kieft ; Li ; Santiago-Rodriguez and Hollister, 2019). The gold standard for host identification remains the experimental evidence, which can take a long time and considerable effort to obtain. Four years passed between the prediction of Bacteroidetes as the putative host for crAssphage (the most abundant human gut bacteriophage) and the first experimental evidence that the strain Bacteroidetes intestinalis APC919/174 serves as a host for ϕcrAss001 (Dutilh ; Shkoporov ). The GenBank (Sayers ) database might be expected to provide information about the host of a virus, but these records mostly identify the host only to genus or species level, which is insufficient. For instance, the host indicated for bacteriophage T4 is the bacterium Escherichia coli, with no identification of a strain, which is as imprecise as indicating that human cells are the host for HIV-1. For a non-expert, such information suggests that any E. coli strain can be infected by bacteriophage T4, or that any human cell can be infected by HIV-1. Another public resource that could be used is the International Committee on Taxonomy of Viruses (ICTV) (Lefkowitz ). However, host is not indicated in the data available from the ICTV website (talk.ictvonline.org). Finally, it is possible to search in microbial collections (ATCC; www.atcc.org, DSMZ; www.dsmz.de) the host associated with a deposited virus, but, unfortunately, these resources contain data for only limited numbers of published virus-host pairs. Over and above the identification of a single host for virus propagation, virus–host range is another characteristic that is not readily available from public data sources. For viruses infecting multicellular organisms, including humans, in particular, the determination of host range is limited by the ability to grow cell lines. By contrast, for unicellular organisms, the number of hosts to be tested is very large, but unfortunately data are rarely published under an exploitable format. Interestingly, bacteriophage host range data are as old as the first article naming these viruses, published in 1917 by d’Herelle, in which bacteriophages infecting a Shiga strain were reported to be unable to infect Flexner or Hiss strains (d'Herelle, 1917). For decades, viral host range tests were routinely performed for the typing of bacteria (Sabat ; Sechter ). Nowadays, host ranges are being determined for an increasing number of bacteriophages to identify candidates for phage therapy. This treatment for bacterial infections was originally proposed in 1917, and is used regularly in some countries (Georgia, Poland) (d'Herelle, 1917; Kutateladze, 2015). Its use is now expanding worldwide to treat infections caused by antibiotic-resistant pathogens (Corbellino et al., 2019; Dedrick ; Jennes ; Schooley ). Consequently, semi-automated systems for high-throughput host range tests have been developed (http://www.aphage.com/the-science/). However, only the small number of positive outputs from these tests are finally used, with the bulk of the information obtained discarded and, thus, unavailable. Another major challenge is the integration of host range data into a single searchable and analysis tool. Viral host range data are, by definition, a variable, which should be regenerated dynamically following the acquisition of new data.

2 Materials and methods

2.1 Data availability

The ViralHostRangeDB application is available from https://viralhostrangedb.pasteur.cloud. Its source code is freely available from the Gitlab instance of Institut Pasteur (https://gitlab.pasteur.fr/hub/viralhostrangedb), under the terms of the MIT license, together with detailed documentation (https://hub.pages.pasteur.fr/viralhostrangedb/) including instructions for use, deployment and administration purposes. A demonstration server can be run directly from a docker image (https://hub.docker.com/r/viralhostrangedb/demo), providing a way of testing all features of the application, including the privileges and (in)visibility of private data sources.

2.2 Architecture

The architecture of the ViralHostRangeDB web application is based on the Django Web Framework and the PostgreSQL database. Data are displayed, on the server side, in the Django REST framework. This environment provides efficient and safe data storage as well as tight control access. The application, its database and routine processes (backup, email notifications, virus/host identifier analysis, etc.), are hosted on a Kubernetes cluster (https://kubernetes.io/), providing high availability, scalability and fail-over. The global software quality of the application is ensured through unit test scenarios covering 99% of the code base.

2.3 Importing data

Any authenticated user can contribute datasets via the top menu. Datasets can be uploaded as Excel files as detailed in the online documentation (https://hub.pages.pasteur.fr/viralhostrangedb/compatible_file.html). Excel data files are imported with the Pandas and xlrd Python packages (McKinney, 2017). During the mapping of the responses of a file onto the global scheme, the thresholds suggested to users are calculated with the NumPy (Oliphant, 2006) and Scikit-learn (Pedregosa ) packages. The NCBI identifiers describing the host and virus strains are validated with Entrez web services (Sayers ) which are queried with the BioPython (Cock ) package.

2.4 Privacy

The access to uploaded datasets can be finely controlled, by restricting it to the uploader only, sharing it with a specific set of other users, or making it public. It is also possible to set permissions for the edition of a dataset for each user. Private data sources can be accessed only by explicitly authorized users, regardless of whether the user is a curator or a privileged administrator. To secure edition operations on the datasets, all modifications are logged and stored in histories, to allow rollback.

2.5 Search tool

The web interface allows the interrogation of datasets. A ‘search module’, accessible either through a quick search box or through a specific advanced search page, can be used to discover datasets through full text and specific filters (e.g. host or virus names, contributor, publication, etc.). The exploration module, accessible from the top menu or from the search results, provides the main functionality of the application: the ability to compare the responses of any number of hosts to any number of viruses, across all the datasets accessible.

3 Results

We circumvented the challenges associated with virus–host range analysis, by designing the Viral Host Range database (VHRdb, https://viralhostrangedb.pasteur.cloud/), which compiles experimental host range data provided by contributors. This open web-based resource can be used to explore and analyze publicly accessible data with a powerful search engine that scans data and metadata (virus or host names, contributor name, location, GenBank accession number, etc.). Not only can users find a virus, but they can also immediately identify the set of hosts on which it has been tested, across all the available data. Filters, analysis and display settings can facilitate rapid visualization of the most relevant information, such as the highest host range score or the most susceptible host (Fig. 1). Importantly, when discrepancies between datasets are detected, they are highlighted and direct access is provided to the source data, for further investigations.
Fig. 1.

Diagram presenting the main functionalities of the Viral Host Range database. The top panel (Search) introduces the search tool and links to subsequent information. The bottom panel (Contribute) presents the main steps that contributors must achieve to record new data. Shown in the middle panel (Explore) is an example of results obtained from dataset comparison, using the datasets selected from the searched results displayed in the top panel and the newly contributed data displayed in the bottom panel (red arrows). Main tools and options to select, rank and display data are also indicated

Diagram presenting the main functionalities of the Viral Host Range database. The top panel (Search) introduces the search tool and links to subsequent information. The bottom panel (Contribute) presents the main steps that contributors must achieve to record new data. Shown in the middle panel (Explore) is an example of results obtained from dataset comparison, using the datasets selected from the searched results displayed in the top panel and the newly contributed data displayed in the bottom panel (red arrows). Main tools and options to select, rank and display data are also indicated We designed a user-guided process for uploading data compatible with the VHRdb mapping tool, to facilitate comparisons of datasets. This mapping tool is the cornerstone of VHRdb, translating the contributor’s original (numerical) data into a unified ranking system. The mapping tool was designed to allow each contributor to classify the results of virus–host interaction tests into a maximum of three responses: ‘0’, for ‘no infection’; ‘2’ for ‘infection’ and ‘1’ for ‘intermediate’, corresponding to any interaction that is different from ‘0’ and ‘2’. Then, contributors can readily compare their results with publicly available datasets (curated by administrators to ensure that the database remains homogeneous). If kept private, data are neither accessible to, nor curated by administrators. Analysis across a restricted number of datasets is also possible, to focus on specificities associated with one or several viruses or hosts. Another issue affecting the accurate appreciation of a virus–host range is the lack of precise characterizations of tested hosts. In particular, most of clinical isolates used to determine the host range of bacteriophages for phage therapy applications are not sequenced. In addition, viruses themselves evolve over time and adapt their host range to the available hosts (Rothenburg and Brennan, 2020). The VHRdb therefore handles GenBank accession numbers for both viruses and hosts, as a solution to provide unique identifiers. In addition to the identification of suitable hosts for viruses and the cross-analysis of experimental tests, we anticipate that the VHRdb will become a resource for the development of machine learning approaches, which require large amounts of data, to improve the prediction of the host of a virus, or even the receptor that it uses (Leite ; Young ). It could also be used more directly by clinicians, who will increasingly have access to the genome sequences of pathogens. If the strain infecting a patient is closely related to a tested strain present in the VHRdb, candidate bacteriophages are immediately identified, shortening the time required to develop an appropriate treatment. The VHRdb will also provide opportunities to address fundamental questions in virology, from ecological dynamics to the molecular mechanisms underlying virus–host interactions. The VHRdb is a unique, publicly accessible resource for the community of microbial virologists, for the rapid identification, characterization and dissemination of data for virus–host interactions of broad interest to the educational, scientific and medical communities and to private sector entities developing applications. At the time of publication, the VHRdb holds 15 753 interactions obtained from 739 viruses infecting 1 664 archaeal, bacterial or protist hosts, including the entire Felix d’Herelle collection of bacteriophages.
  21 in total

1.  Experience of the Eliava Institute in bacteriophage therapy.

Authors:  Mzia Kutateladze
Journal:  Virol Sin       Date:  2015-02       Impact factor: 4.327

Review 2.  Overview of molecular typing methods for outbreak detection and epidemiological surveillance.

Authors:  A J Sabat; A Budimir; D Nashev; R Sá-Leão; J m van Dijl; F Laurent; H Grundmann; A W Friedrich
Journal:  Euro Surveill       Date:  2013-01-24

3.  Biopython: freely available Python tools for computational molecular biology and bioinformatics.

Authors:  Peter J A Cock; Tiago Antao; Jeffrey T Chang; Brad A Chapman; Cymon J Cox; Andrew Dalke; Iddo Friedberg; Thomas Hamelryck; Frank Kauff; Bartek Wilczynski; Michiel J L de Hoon
Journal:  Bioinformatics       Date:  2009-03-20       Impact factor: 6.937

4.  Eradication of a Multidrug-Resistant, Carbapenemase-Producing Klebsiella pneumoniae Isolate Following Oral and Intra-rectal Therapy With a Custom Made, Lytic Bacteriophage Preparation.

Authors:  Mario Corbellino; Nicolas Kieffer; Mzia Kutateladze; Nana Balarjishvili; Lika Leshkasheli; Lia Askilashvili; George Tsertsvadze; Sara Giordana Rimoldi; Deia Nizharadze; Naomi Hoyle; Lia Nadareishvili; Spinello Antinori; Cristina Pagani; Daniele Giuseppe Scorza; Ai Ling Loredana Romanò; Sandro Ardizzone; Piergiorgio Danelli; Maria Rita Gismondo; Massimo Galli; Patrice Nordmann; Laurent Poirel
Journal:  Clin Infect Dis       Date:  2020-04-15       Impact factor: 9.079

5.  Defining the human gut host-phage network through single-cell viral tagging.

Authors:  Mária Džunková; Soo Jen Low; Joshua N Daly; Li Deng; Christian Rinke; Philip Hugenholtz
Journal:  Nat Microbiol       Date:  2019-08-05       Impact factor: 17.745

6.  Virus taxonomy: the database of the International Committee on Taxonomy of Viruses (ICTV).

Authors:  Elliot J Lefkowitz; Donald M Dempsey; Robert Curtis Hendrickson; Richard J Orton; Stuart G Siddell; Donald B Smith
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

7.  Use of bacteriophages in the treatment of colistin-only-sensitive Pseudomonas aeruginosa septicaemia in a patient with acute kidney injury-a case report.

Authors:  Serge Jennes; Maia Merabishvili; Patrick Soentjens; Kim Win Pang; Thomas Rose; Elkana Keersebilck; Olivier Soete; Pierre-Michel François; Simona Teodorescu; Gunther Verween; Gilbert Verbeken; Daniel De Vos; Jean-Paul Pirnay
Journal:  Crit Care       Date:  2017-06-04       Impact factor: 9.097

8.  Predicting host taxonomic information from viral genomes: A comparison of feature representations.

Authors:  Francesca Young; Simon Rogers; David L Robertson
Journal:  PLoS Comput Biol       Date:  2020-05-26       Impact factor: 4.475

9.  Adsorption Sequencing as a Rapid Method to Link Environmental Bacteriophages to Hosts.

Authors:  Patrick A de Jonge; F A Bastiaan von Meijenfeldt; Ana Rita Costa; Franklin L Nobrega; Stan J J Brouns; Bas E Dutilh
Journal:  iScience       Date:  2020-08-06

10.  GenBank.

Authors:  Eric W Sayers; Mark Cavanaugh; Karen Clark; James Ostell; Kim D Pruitt; Ilene Karsch-Mizrachi
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

View more
  3 in total

1.  Phage Cocktail Development for Bacteriophage Therapy: Toward Improving Spectrum of Activity Breadth and Depth.

Authors:  Stephen T Abedon; Katarzyna M Danis-Wlodarczyk; Daniel J Wozniak
Journal:  Pharmaceuticals (Basel)       Date:  2021-10-03

2.  Systematic analysis of putative phage-phage interactions on minimum-sized phage cocktails.

Authors:  Felipe Molina; Manuel Menor-Flores; Lucía Fernández; Miguel A Vega-Rodríguez; Pilar García
Journal:  Sci Rep       Date:  2022-02-14       Impact factor: 4.379

Review 3.  Computational Prediction of Bacteriophage Host Ranges.

Authors:  Cyril J Versoza; Susanne P Pfeifer
Journal:  Microorganisms       Date:  2022-01-12
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.