Literature DB >> 28961923

PhyloGeoTool: interactively exploring large phylogenies in an epidemiological context.

Pieter Libin1,2, Ewout Vanden Eynden1, Francesca Incardona3, Ann Nowé2, Antonia Bezenchek4, Anders Sönnerborg5, Anne-Mieke Vandamme1,6, Kristof Theys1, Guy Baele1.   

Abstract

MOTIVATION: Clinicians, health officials and researchers are interested in the epidemic spread of pathogens in both space and time to support the optimization of intervention measures and public health policies. Large sequence databases of virus sequences provide an interesting opportunity to study this spread through phylogenetic analysis. To infer knowledge from large phylogenetic trees, potentially encompassing tens of thousands of virus strains, an efficient method for data exploration is required. The clades that are visited during this exploration should be annotated with strain characteristics (e.g. transmission risk group, tropism, drug resistance profile) and their geographic context.
RESULTS: PhyloGeoTool implements a visual method to explore large phylogenetic trees and to depict characteristics of strains and clades, including their geographic context, in an interactive way. PhyloGeoTool also provides the possibility to position new virus strains relative to the existing phylogenetic tree, allowing users to gain insight in the placement of such new strains without the need to perform a de novo reconstruction of the phylogeny.
AVAILABILITY AND IMPLEMENTATION: https://github.com/rega-cev/phylogeotool (Freely available: open source software project). CONTACT: phylogeotool@kuleuven.be. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2017. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2017        PMID: 28961923      PMCID: PMC5860094          DOI: 10.1093/bioinformatics/btx535

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Expanding and intensifying sequencing efforts for the management of infectious diseases along with the generation of large-scale databases of clinical and demographical information provide unprecedented opportunities for the surveillance of epidemics and outbreaks of viral pathogens. Mapping the origin and dynamics of epidemics in space and time is becoming feasible as geo-tagged and time-stamped sequence data are now part of routine clinical care. Tracking the geographical spread and the relationship to specified characteristics for distinct virus clades (e.g. transmission risk group, tropism, drug resistance profile) can help to improve our understanding of such outbreaks. Computational and methodological advances now allow to infer phylogenies of tens of thousands of sequences (Liu ) and applications have been developed to visualize such large phylogenetic trees (de Vienne, 2016; Huson and Scornavacca, 2012). However, efficient means to visually navigate through these large phylogenies and the annotated information (e.g. virus and patient data) are currently still lacking. Further, fast and accurate placement of novel virus sequences onto an existing phylogenetic tree can provide valuable insights for outbreak detection, by relating evolutionary dynamics to epidemiological and clinical characteristics.

2 Features

We present PhyloGeoTool, an application to interactively navigate large phylogenies and to explore associated clinical and epidemiological data. PhyloGeoTool implements an algorithm that automatically partitions a phylogeny into an optimal number of clusters, thereby recursively partitioning each identified cluster (see Section 3). A graphical user interface provides a concise visualization of the initial tree of clusters. Subsequent levels of the phylogeny are visualized upon the selection of a specific cluster (Fig. 1), with an option to show their respective positions within the entire phylogeny (not shown). At each partitioning level of the phylogenetic tree, an overview of sequence attributes is provided. A map shows the geographic distribution of sampling and a bar chart shows the distribution of the attribute that was selected by the user. In addition, hovering over a particular cluster activates a bar chart which presents attribute information for that cluster in relation to the rest of the clusters (Fig. 1).
Fig. 1

The PhyloGeoTool graphical user interface. The upper left panel shows the geographical distribution of the samples present in the selected cluster. The lower left panel shows the distribution for a selected trait of interest; white bars show the distribution for the entire dataset for that level of the tree, whereas the colored bars show the distribution for a specific selected cluster and are annotated by their respective percentage. The right panel shows the clustered phylogenetic tree and allows to perform phylogenetic placement

The PhyloGeoTool graphical user interface. The upper left panel shows the geographical distribution of the samples present in the selected cluster. The lower left panel shows the distribution for a selected trait of interest; white bars show the distribution for the entire dataset for that level of the tree, whereas the colored bars show the distribution for a specific selected cluster and are annotated by their respective percentage. The right panel shows the clustered phylogenetic tree and allows to perform phylogenetic placement PhyloGeoTool only requires the user to provide a phylogenetic tree and attribute information for each taxon, without the need for an underlying database structure. Given that processing large phylogenies is a time-consuming task, a phylogeny is partitioned prior to the deployment of the web application. This enables the exploration of the phylogenetic tree to be instantaneous for the user. Further, the inclusion of novel sequence data does not require the re-estimation of the phylogeny or partitioning, as PhyloGeoTool supports the fast and accurate phylogenetic placement of submitted virus sequences in the existing phylogeny using pplacer (Matsen ) (section ‘Phylogenetic placement’ in Supplementary Material). PhyloGeoTool is implemented as a web application to offload the installation and computational burden to the hosting server.

3 Materials and methods

We here present an algorithm that partitions the binary phylogenetic tree into clusters using a recursive approach. Combining such an approach to identify clusters of sequences with a progressive zooming approach ensures an efficient and interactive visual navigation of the entire phylogenetic tree. To partition a binary tree into k clusters, the following algorithm was devised. Intuitively, the binary tree is partioned recursively using the cluster sizes as clustering criterium. Starting at the root of the tree , the first cluster consists of its left child and all its descendants (i.e. the ‘left’ part of ), while the second cluster consists of its right child and all its descendants (i.e. the ‘right’ part of ). These clusters are added to a set , that is ordered by descending cluster size (i.e. the number of tree leaves that each cluster covers). The largest cluster from is removed and its corresponding tree is split at the root, creating two new clusters corresponding to the resulting subtrees. These two new clusters are subsequently added to . This process is repeated until the maximum number of clusters is reached (i.e. ). While this method results in the partitioning of in k clusters, a value for k that ensures the presence of well-defined clusters still needs to be determined. The subtype diversity ratio (SDR) provides a measure to score a particular clustering of and is defined as the ratio of the mean intra-cluster pairwise distance to the mean inter-cluster pairwise distance (Rambaut ). Therefore, low intra-pairwise distances relative to inter-pairwise distances imply the presence of well-defined clusters (Archer and Robertson, 2007). To determine the optimal value for k, the function is analyzed from k = 2 (i.e. the minimal cluster) to k = 50 (i.e. the maximal cluster size). In this process, two cases are discerned: the SDR function exhibits a descending trend over the entire domain or a clear local minimum can be found when analysing the SDR function. In the first case, k is found optimal where the loss in SDR is maximal: such a k can be found by considering all SDR scores for (i.e. k = 50, the maximal cluster size) and selecting k where the curvature of is maximal (Equation 1). In the second case, the local SDR minimum is selected. We refer to the Supplementary Material (section ‘SDR function analysis’) for more details on the SDR function analysis and some examples that demonstrate the process.

4 Application and future perspectives

To illustrate this, we have evaluated PhyloGeoTool in the context of HIV-1 using data available within the EuResist Integrated Data Base (Fig. 1) (Zazzi ). This database contains virus genotypes, clinical responses and epidemiological markers of more than 66.000 patients from 12 different countries. A public version of the web application operating on the EuResist dataset is available at http://phylogeotool.gbiomed.kuleuven.be/euresist/. To demonstrate PhyloGeoTool‘s potential, we present a case study concerning transmitted HIV-1 drug resistance in Europe using the EuResist PhyloGeoTool instance (details in the ’Case study’ section of the Supplementary Material). We investigate the prevalence of transmitted drug resistance (RegaDB software; Libin ) and its association with geography, HIV-1 subtype (Rega HIV subtyping tool; Alcantara ) and particular clades in the phylogenetic tree. As we report in Supplementary Material, observed trends were in agreement with a recent European study concerning transmitted drug resistance (Hofstra ). In addition, we have evaluated PhyloGeoTool in the context of Dengue virus (DENV). We have downloaded a dataset of 8125 envelope gene sequences covering all four DENV serotypes from Genbank and their attributes (i.e. serotype, genotype, sample source, country of origin and collection date). A public version of the web application operating on the Dengue dataset is available at http://phylogeotool.gbiomed.kuleuven.be/dengue/. The evaluations, where our clustering algorithm was able to extract the expected clusters, show that PhyloGeoTool has the potential to act as an important tool to inform public health by providing support to visualize, navigate and study large sequence databases of viral pathogen with annotated data. Click here for additional data file.
  10 in total

1.  Human immunodeficiency virus. Phylogeny and the origin of HIV-1.

Authors:  A Rambaut; D L Robertson; O G Pybus; M Peeters; E C Holmes
Journal:  Nature       Date:  2001-04-26       Impact factor: 49.962

2.  Dendroscope 3: an interactive tool for rooted phylogenetic trees and networks.

Authors:  Daniel H Huson; Celine Scornavacca
Journal:  Syst Biol       Date:  2012-07-10       Impact factor: 15.683

Review 3.  Predicting response to antiretroviral treatment by machine learning: the EuResist project.

Authors:  Maurizio Zazzi; Francesca Incardona; Michal Rosen-Zvi; Mattia Prosperi; Thomas Lengauer; Andre Altmann; Anders Sonnerborg; Tamar Lavee; Eugen Schülter; Rolf Kaiser
Journal:  Intervirology       Date:  2012-01-24       Impact factor: 1.763

4.  CTree: comparison of clusters between phylogenetic trees made easy.

Authors:  John Archer; David L Robertson
Journal:  Bioinformatics       Date:  2007-08-23       Impact factor: 6.937

5.  pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree.

Authors:  Frederick A Matsen; Robin B Kodner; E Virginia Armbrust
Journal:  BMC Bioinformatics       Date:  2010-10-30       Impact factor: 3.169

6.  RAxML and FastTree: comparing two methods for large-scale maximum likelihood phylogeny estimation.

Authors:  Kevin Liu; C Randal Linder; Tandy Warnow
Journal:  PLoS One       Date:  2011-11-21       Impact factor: 3.240

7.  Transmission of HIV Drug Resistance and the Predicted Effect on Current First-line Regimens in Europe.

Authors:  L Marije Hofstra; Nicolas Sauvageot; Jan Albert; Ivailo Alexiev; Federico Garcia; Daniel Struck; David A M C Van de Vijver; Birgitta Åsjö; Danail Beshkov; Suzie Coughlan; Diane Descamps; Algirdas Griskevicius; Osamah Hamouda; Andrzej Horban; Marjo Van Kasteren; Tatjana Kolupajeva; Leondios G Kostrikis; Kirsi Liitsola; Marek Linka; Orna Mor; Claus Nielsen; Dan Otelea; Dimitrios Paraskevis; Roger Paredes; Mario Poljak; Elisabeth Puchhammer-Stöckl; Anders Sönnerborg; Danica Staneková; Maja Stanojevic; Kristel Van Laethem; Maurizio Zazzi; Snjezana Zidovec Lepej; Charles A B Boucher; Jean-Claude Schmit; Annemarie M J Wensing; E Puchhammer-Stockl; M Sarcletti; B Schmied; M Geit; G Balluch; A-M Vandamme; J Vercauteren; I Derdelinckx; A Sasse; M Bogaert; H Ceunen; A De Roo; S De Wit; F Echahidi; K Fransen; J-C Goffard; P Goubau; E Goudeseune; J-C Yombi; P Lacor; C Liesnard; M Moutschen; D Pierard; R Rens; Y Schrooten; D Vaira; L P R Vandekerckhove; A Van den Heuvel; B Van Der Gucht; M Van Ranst; E Van Wijngaerden; B Vandercam; M Vekemans; C Verhofstede; N Clumeck; K Van Laethem; D Beshkov; I Alexiev; S Zidovec Lepej; J Begovac; L Kostrikis; I Demetriades; I Kousiappa; V Demetriou; J Hezka; M Linka; M Maly; L Machala; C Nielsen; L B Jørgensen; J Gerstoft; L Mathiesen; C Pedersen; H Nielsen; A Laursen; B Kvinesdal; K Liitsola; M Ristola; J Suni; J Sutinen; D Descamps; L Assoumou; G Castor; M Grude; P Flandre; A Storto; O Hamouda; C Kücherer; T Berg; P Braun; G Poggensee; M Däumer; J Eberle; H Heiken; R Kaiser; H Knechten; K Korn; H Müller; S Neifer; B Schmidt; H Walter; B Gunsenheimer-Bartmeyer; T Harrer; D Paraskevis; A Hatzakis; A Zavitsanou; A Vassilakis; M Lazanas; M Chini; A Lioni; V Sakka; S Kourkounti; V Paparizos; A Antoniadou; A Papadopoulos; G Poulakou; I Katsarolis; K Protopapas; G Chryssos; S Drimis; P Gargalianos; G Xylomenos; G Lourida; M Psichogiou; G L Daikos; N V Sipsas; A Kontos; M N Gamaletsou; G Koratzanis; H Sambatakou; H Mariolis; A Skoutelis; V Papastamopoulos; O Georgiou; P Panagopoulos; E Maltezos; S Coughlan; C De Gascun; C Byrne; M Duffy; C Bergin; D Reidy; G Farrell; J Lambert; E O'Connor; A Rochford; J Low; P Coakely; S O'Dea; W Hall; O Mor; I Levi; D Chemtob; Z Grossman; M Zazzi; A de Luca; C Balotta; C Riva; C Mussini; I Caramma; A Capetti; M C Colombo; C Rossi; F Prati; F Tramuto; F Vitale; M Ciccozzi; G Angarano; G Rezza; T Kolupajeva; O Vasins; A Griskevicius; V Lipnickiene; J C Schmit; D Struck; N Sauvageot; R Hemmer; V Arendt; C Michaux; T Staub; C Sequin-Devaux; A M J Wensing; C A B Boucher; D A M C van de Vijver; A van Kessel; P H M van Bentum; K Brinkman; B J Connell; M E van der Ende; I M Hoepelman; M van Kasteren; M Kuipers; N Langebeek; C Richter; R M W J Santegoets; L Schrijnders-Gudde; R Schuurman; B J M van de Ven; B Åsjö; A-M Bakken Kran; V Ormaasen; P Aavitsland; A Horban; J J Stanczak; G P Stanczak; E Firlag-Burkacka; A Wiercinska-Drapalo; E Jablonowska; E Maolepsza; M Leszczyszyn-Pynka; W Szata; R Camacho; C Palma; F Borges; T Paixão; V Duque; F Araújo; D Otelea; S Paraschiv; A M Tudor; R Cernat; C Chiriac; F Dumitrescu; L J Prisecariu; M Stanojevic; Dj Jevtovic; D Salemovic; D Stanekova; M Habekova; Z Chabadová; T Drobkova; P Bukovinova; A Shunnar; P Truska; M Poljak; M Lunar; D Babic; J Tomazic; L Vidmar; T Vovko; P Karner; F Garcia; R Paredes; S Monge; S Moreno; J Del Amo; V Asensi; J L Sirvent; C de Mendoza; R Delgado; F Gutiérrez; J Berenguer; S Garcia-Bujalance; N Stella; I de Los Santos; J R Blanco; D Dalmau; M Rivero; F Segura; M J Pérez Elías; M Alvarez; N Chueca; C Rodríguez-Martín; C Vidal; J C Palomares; I Viciana; P Viciana; J Cordoba; A Aguilera; P Domingo; M J Galindo; C Miralles; M A Del Pozo; E Ribera; J A Iribarren; L Ruiz; J de la Torre; F Vidal; B Clotet; J Albert; A Heidarian; K Aperia-Peipke; M Axelsson; M Mild; A Karlsson; A Sönnerborg; A Thalme; L Navér; G Bratt; A Karlsson; A Blaxhult; M Gisslén; B Svennerholm; I Bergbrant; P Björkman; C Säll; Å Mellgren; A Lindholm; N Kuylenstierna; R Montelius; F Azimi; B Johansson; M Carlsson; E Johansson; B Ljungberg; H Ekvall; A Strand; S Mäkitalo; S Öberg; P Holmblad; M Höfer; H Holmberg; P Josefson; U Ryding
Journal:  Clin Infect Dis       Date:  2015-11-29       Impact factor: 9.079

8.  Lifemap: Exploring the Entire Tree of Life.

Authors:  Damien M de Vienne
Journal:  PLoS Biol       Date:  2016-12-22       Impact factor: 8.029

9.  A standardized framework for accurate, high-throughput genotyping of recombinant and non-recombinant viral sequences.

Authors:  Luiz Carlos Junior Alcantara; Sharon Cassol; Pieter Libin; Koen Deforche; Oliver G Pybus; Marc Van Ranst; Bernardo Galvão-Castro; Anne-Mieke Vandamme; Tulio de Oliveira
Journal:  Nucleic Acids Res       Date:  2009-05-29       Impact factor: 16.971

10.  RegaDB: community-driven data management and analysis for infectious diseases.

Authors:  Pieter Libin; Gertjan Beheydt; Koen Deforche; Stijn Imbrechts; Fossie Ferreira; Kristel Van Laethem; Kristof Theys; Ana Patricia Carvalho; Joana Cavaco-Silva; Giuseppe Lapadula; Carlo Torti; Matthias Assel; Stefan Wesner; Joke Snoeck; Jean Ruelle; Annelies De Bel; Patrick Lacor; Paul De Munter; Eric Van Wijngaerden; Maurizio Zazzi; Rolf Kaiser; Ahidjo Ayouba; Martine Peeters; Tulio de Oliveira; Luiz C J Alcantara; Zehava Grossman; Peter Sloot; Dan Otelea; Simona Paraschiv; Charles Boucher; Ricardo J Camacho; Anne-Mieke Vandamme
Journal:  Bioinformatics       Date:  2013-05-02       Impact factor: 6.937

  10 in total
  7 in total

Review 1.  Real-Time Analysis and Visualization of Pathogen Sequence Data.

Authors:  Richard A Neher; Trevor Bedford
Journal:  J Clin Microbiol       Date:  2018-10-25       Impact factor: 5.948

Review 2.  Computational strategies to combat COVID-19: useful tools to accelerate SARS-CoV-2 and coronavirus research.

Authors:  Franziska Hufsky; Kevin Lamkiewicz; Alexandre Almeida; Abdel Aouacheria; Cecilia Arighi; Alex Bateman; Jan Baumbach; Niko Beerenwinkel; Christian Brandt; Marco Cacciabue; Sara Chuguransky; Oliver Drechsel; Robert D Finn; Adrian Fritz; Stephan Fuchs; Georges Hattab; Anne-Christin Hauschild; Dominik Heider; Marie Hoffmann; Martin Hölzer; Stefan Hoops; Lars Kaderali; Ioanna Kalvari; Max von Kleist; Renó Kmiecinski; Denise Kühnert; Gorka Lasso; Pieter Libin; Markus List; Hannah F Löchel; Maria J Martin; Roman Martin; Julian Matschinske; Alice C McHardy; Pedro Mendes; Jaina Mistry; Vincent Navratil; Eric P Nawrocki; Áine Niamh O'Toole; Nancy Ontiveros-Palacios; Anton I Petrov; Guillermo Rangel-Pineros; Nicole Redaschi; Susanne Reimering; Knut Reinert; Alejandro Reyes; Lorna Richardson; David L Robertson; Sepideh Sadegh; Joshua B Singer; Kristof Theys; Chris Upton; Marius Welzel; Lowri Williams; Manja Marz
Journal:  Brief Bioinform       Date:  2021-03-22       Impact factor: 11.622

3.  Nonbifurcating Phylogenetic Tree Inference via the Adaptive LASSO.

Authors:  Cheng Zhang; V U Dinh; Frederick A Matsen
Journal:  J Am Stat Assoc       Date:  2020-07-20       Impact factor: 5.033

4.  A computational method for the identification of Dengue, Zika and Chikungunya virus species and genotypes.

Authors:  Vagner Fonseca; Pieter J K Libin; Kristof Theys; Nuno R Faria; Marcio R T Nunes; Maria I Restovic; Murilo Freire; Marta Giovanetti; Lize Cuypers; Ann Nowé; Ana Abecasis; Koen Deforche; Gilberto A Santiago; Isadora C de Siqueira; Emmanuel J San; Kaliane C B Machado; Vasco Azevedo; Ana Maria Bispo-de Filippis; Rivaldo Venâncio da Cunha; Oliver G Pybus; Anne-Mieke Vandamme; Luiz C J Alcantara; Tulio de Oliveira
Journal:  PLoS Negl Trop Dis       Date:  2019-05-08

Review 5.  Advances in Visualization Tools for Phylogenomic and Phylodynamic Studies of Viral Diseases.

Authors:  Kristof Theys; Philippe Lemey; Anne-Mieke Vandamme; Guy Baele
Journal:  Front Public Health       Date:  2019-08-02

6.  Distance-Based Phylogenetic Placement with Statistical Support.

Authors:  Navid Bin Hasan; Metin Balaban; Avijit Biswas; Md Shamsuzzoha Bayzid; Siavash Mirarab
Journal:  Biology (Basel)       Date:  2022-08-12

7.  VIRULIGN: fast codon-correct alignment and annotation of viral genomes.

Authors:  Pieter J K Libin; Koen Deforche; Ana B Abecasis; Kristof Theys
Journal:  Bioinformatics       Date:  2019-05-15       Impact factor: 6.937

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.