Literature DB >> 28451057

A network perspective on the virus world.

Jaime Iranzo1, Mart Krupovic2, Eugene V Koonin1.   

Abstract

Viral evolution is characterized by high rates of horizontal gene transfer and fast sequence divergence. Furthermore, there are no universal genes shared by all viruses. As a result, distant relationships among viruses are better represented by a network than by a tree. Here we discuss 3 network representations of the virus world with decreasing levels of complexity, from a multilayer network that integrates sequence conservation and patterns of gene sharing to a classic genome similarity network. As new tools for network analysis are developed, we expect that novel insights into virus evolution will result from the study of more complex representations of the virus world.

Entities:  

Keywords:  bipartite network; gene sharing network; multilayer network; phylogenomics; viral evolution; viral taxonomy

Year:  2017        PMID: 28451057      PMCID: PMC5398231          DOI: 10.1080/19420889.2017.1296614

Source DB:  PubMed          Journal:  Commun Integr Biol        ISSN: 1942-0889


Since The Origin of Species was published, the idea that all extant and past forms of life can be organized as a Tree of Life (TOL) has become quintessential to evolutionary biology. More than 150 years later, in the wake of the genomic revolution, the TOL remains a valid approximation, as long as horizontal gene transfer (HGT) among prokaryotes and viruses does not completely blur the tree-like pattern that arises from vertical descent with modification. While HGT sets a fundamental limit to the TOL concept, high degrees of sequence divergence impose a technical limitation to the construction of deep phylogenetic trees. Because fast divergence and intense horizontal transfer are 2 main characteristics of viral evolution, the reconstruction of large-scale viral phylogenies poses a major technical and fundamental challenge to the interpretation of the virosphere as a tree. Furthermore, there are no universal genes shared by all or even most groups of viruses, which restricts phylogenetic analyses to discrete assemblages of more closely related viruses, thereby fragmenting and blurring the global understanding of the evolutionary relationships in the viral world. Instead, the evolutionary relationships among viruses can be more precisely represented as a network of gene sharing. Among many possible network representations, a highly informative description of the virus world is provided by a network with 2 layers (Fig. 1A). The gene layer consists of a sequence similarity network, with nodes representing viral genes and edges connecting pairs of homologous genes with a weight proportional to their sequence similarity. The second layer represents the viral genomes; nodes in the genome layer do not connect with each other, but rather to nodes from the gene layer: each genome node is connected to the genes that belong to that genome. A simpler representation of this 2-layer network can be obtained by aggregating the nodes from the gene layer into groups of homologous genes. Such groups appear in the gene layer as modules, i.e. sets of nodes that are much more densely connected with each other than with the rest of the network. Indeed, some popular methods to identify sets of orthologous or, more generally, homologous genes work by applying a module detection algorithm to a sequence similarity network. Once genes are grouped into families of homologs, a bipartite network is obtained by connecting genome nodes and gene family nodes whenever a gene family is present in a genome (Fig. 1B). Compared with the 2-layer network, the bipartite network lacks the former's precise information on sequence similarity, which could be used to reconstruct single-gene phylogenies, but keeps the essential information on which gene families are shared by which genomes. A further simplification results from projecting the bipartite network into a genome similarity network (Fig. 1C). There are multiple ways to obtain a genome similarity network from a bipartite gene sharing network. The main step is defining a measure of similarity between genomes, such as the number of shared genes, the fraction of shared genes, or the probability that such number of shared genes occurs by chance. The network is then readily obtained by connecting pairs of genomes with edges, whose weights are proportional to the similarity between the genomes.
Figure 1.

Three network representations of a toy virus world composed of 4 viral genomes (squares) and 12 genes (black circles) that belong to 5 gene families (white circles). (A) Two-layer network, with the gene layer on top and the genome layer at the bottom. Black edges of different thickness indicate the similarity between sequences in the gene layer. (B) Bipartite network, which results from clustering groups of homologous genes in the gene layer into gene family nodes. (C) Genome similarity network; the thickness of the links is proportional to the number of shared gene families.

Three network representations of a toy virus world composed of 4 viral genomes (squares) and 12 genes (black circles) that belong to 5 gene families (white circles). (A) Two-layer network, with the gene layer on top and the genome layer at the bottom. Black edges of different thickness indicate the similarity between sequences in the gene layer. (B) Bipartite network, which results from clustering groups of homologous genes in the gene layer into gene family nodes. (C) Genome similarity network; the thickness of the links is proportional to the number of shared gene families. Within the network framework, module detection algorithms have become a useful tool to define classes of viruses. Multi-scale approaches based on information theory or repeated application of Newman's modularity allow the study of genome similarity networks at multiple taxonomic levels. Local algorithms, such as OSLOM, can detect overlapping modules (e.g. those resulting from mosaic genomes) and remove nodes whose module assignation is poorly supported statistically (e.g., single members of new taxa that occasionally share widespread genes with otherwise unrelated groups). Module detection in bipartite networks often involves Barber's modularity maximization although relevant insights into the patterns of gene sharing and transmission can be obtained simply from the study of basic topological properties of the network. The inference of viral groups from genome similarity networks might not differ much from unsupervised machine learning techniques but the more realistic representation of the virus world as a bipartite network of gene sharing makes network-based approaches more powerful at dealing with decaying degrees of genomic similarity at long evolutionary distances, as well as with poorly sampled taxa. Genome similarity networks are more compact and easier to analyze than their bipartite counterparts but have several limitations. First, they lack information on the kind of genes that make 2 genomes similar, making it difficult to discriminate between cases of shared host-related genes and shared ancestral genes. Moreover, some properties of the final network may depend on the particular measure used to evaluate genome similarity. Finally, the projection of bipartite networks can lead to structural artifacts, such as spurious scale-free degree distributions. Despite these limitations, genome similarity networks have been successfully applied to bacteriophages to reveal the internal structure of this group of viruses and to assign newly discovered phages to established families. More recently, a large collection of viruses with dsDNA genomes has been studied under the framework of bipartite networks. The analysis of the network showed that most dsDNA viruses belong to one of 2 major groups, each of which includes viruses from the 3 domains of life and is characterized by a distinct major capsid protein. Dissection of those groups leads to a hierarchy of subgroups which is consistent, despite some exceptions, with the established taxonomy of viruses. The hierarchical organization of the dsDNA virus world is founded on 3 classes of conserved genes: i) hallmark genes, such as capsid proteins, maturation proteases and packaging ATPases, that characterize and distinguish the 2 major viral groups, ii) connector genes, such as the baseplate proteins of myoviruses, that are shared by multiple subgroups within a group, and iii) signature genes that are highly specific to sets of related viruses (Fig. 2). Notably, most viruses that infect Archaea do not fall into the 2 major groups of dsDNA viruses and form a more fragmented network that is only weakly connected to the rest of the dsDNA virosphere, apparently reflecting the existence of stronger barriers to HGT among distinct groups of archaeal viruses and especially between viruses of archaea and bacteria. In general, the pattern of connections is poorly conserved in more than 80% of the gene families of the bipartite virus network, in accord with the major role of HGT in virus evolution.
Figure 2.

Hierarchical structure of a portion of the bipartite network for tailed bacteriophages (order Caudovirales). On the small scale, sets of related viruses and their associated gene families form densely connected modules. Within a module, genome nodes are represented as colored circles, whereas gene family nodes are denoted by the points where the edges (gray and colored lines) join. Colored edges connect the genomes of a module with the module's signature genes. On the large scale, modules connect with each other through shared connector genes, represented here as small gray circles. The 4 hallmark genes that are shared by most members of the order Caudovirales occupy a central position in the network (small black circles). This portion of the network corresponds to modules 9a, 9d, 12, 13, and 18 from ref. 15. MCP, major capsid protein.

Hierarchical structure of a portion of the bipartite network for tailed bacteriophages (order Caudovirales). On the small scale, sets of related viruses and their associated gene families form densely connected modules. Within a module, genome nodes are represented as colored circles, whereas gene family nodes are denoted by the points where the edges (gray and colored lines) join. Colored edges connect the genomes of a module with the module's signature genes. On the large scale, modules connect with each other through shared connector genes, represented here as small gray circles. The 4 hallmark genes that are shared by most members of the order Caudovirales occupy a central position in the network (small black circles). This portion of the network corresponds to modules 9a, 9d, 12, 13, and 18 from ref. 15. MCP, major capsid protein. As new tools for analysis of bipartite networks are developed, it soon will become possible to extend the network analyses to the entire virosphere and compare the findings from this approach with the patterns observed for viral hallmark genes. From a complementary perspective, multilayer networks, such as the 2-layer representation of the dsDNA virosphere described above, have recently become a focus of network science. Although technically challenging, a detailed analysis of the 2-layer network is a promising direction that will integrate sequence similarity and gene sharing in a unified framework. Eventually, additional layers accounting for host range, geographic location and environmental conditions would allow integration of information on genome evolution with ecological data, eventually resulting in a comprehensive picture of virus evolution.
  15 in total

1.  Reticulate representation of evolutionary and functional relationships between phage genomes.

Authors:  Gipsi Lima-Mendez; Jacques Van Helden; Ariane Toussaint; Raphaël Leplae
Journal:  Mol Biol Evol       Date:  2008-01-29       Impact factor: 16.240

2.  When metabolism meets topology: Reconciling metabolite and reaction networks.

Authors:  Raul Montañez; Miguel Angel Medina; Ricard V Solé; Carlos Rodríguez-Caso
Journal:  Bioessays       Date:  2010-03       Impact factor: 4.345

3.  Scalable detection of statistically significant communities and hierarchies, using message passing for modularity.

Authors:  Pan Zhang; Cristopher Moore
Journal:  Proc Natl Acad Sci U S A       Date:  2014-12-08       Impact factor: 11.205

Review 4.  Virus world as an evolutionary network of viruses and capsidless selfish elements.

Authors:  Eugene V Koonin; Valerian V Dolja
Journal:  Microbiol Mol Biol Rev       Date:  2014-06       Impact factor: 11.056

5.  Search for a 'Tree of Life' in the thicket of the phylogenetic forest.

Authors:  Pere Puigbò; Yuri I Wolf; Eugene V Koonin
Journal:  J Biol       Date:  2009-07-13

6.  Bipartite Network Analysis of the Archaeal Virosphere: Evolutionary Connections between Viruses and Capsidless Mobile Elements.

Authors:  Jaime Iranzo; Eugene V Koonin; David Prangishvili; Mart Krupovic
Journal:  J Virol       Date:  2016-11-28       Impact factor: 5.103

7.  OrthoMCL: identification of ortholog groups for eukaryotic genomes.

Authors:  Li Li; Christian J Stoeckert; David S Roos
Journal:  Genome Res       Date:  2003-09       Impact factor: 9.043

8.  Finding statistically significant communities in networks.

Authors:  Andrea Lancichinetti; Filippo Radicchi; José J Ramasco; Santo Fortunato
Journal:  PLoS One       Date:  2011-04-29       Impact factor: 3.240

9.  The Double-Stranded DNA Virosphere as a Modular Hierarchical Network of Gene Sharing.

Authors:  Jaime Iranzo; Mart Krupovic; Eugene V Koonin
Journal:  MBio       Date:  2016-08-02       Impact factor: 7.867

Review 10.  Origins and evolution of viruses of eukaryotes: The ultimate modularity.

Authors:  Eugene V Koonin; Valerian V Dolja; Mart Krupovic
Journal:  Virology       Date:  2015-03-12       Impact factor: 3.616

View more
  15 in total

1.  Biological species in the viral world.

Authors:  Louis-Marie Bobay; Howard Ochman
Journal:  Proc Natl Acad Sci U S A       Date:  2018-05-21       Impact factor: 11.205

Review 2.  Viruses of archaea: Structural, functional, environmental and evolutionary genomics.

Authors:  Mart Krupovic; Virginija Cvirkaite-Krupovic; Jaime Iranzo; David Prangishvili; Eugene V Koonin
Journal:  Virus Res       Date:  2017-11-22       Impact factor: 3.303

Review 3.  The enigmatic archaeal virosphere.

Authors:  David Prangishvili; Dennis H Bamford; Patrick Forterre; Jaime Iranzo; Eugene V Koonin; Mart Krupovic
Journal:  Nat Rev Microbiol       Date:  2017-11-10       Impact factor: 60.633

Review 4.  Global Organization and Proposed Megataxonomy of the Virus World.

Authors:  Eugene V Koonin; Valerian V Dolja; Mart Krupovic; Arvind Varsani; Yuri I Wolf; Natalya Yutin; F Murilo Zerbini; Jens H Kuhn
Journal:  Microbiol Mol Biol Rev       Date:  2020-03-04       Impact factor: 11.056

5.  Hidden diversity of double-stranded DNA phages in symbiotic Rhizobium species.

Authors:  Rosa I Santamaría; Patricia Bustos; Jannick Van Cauwenberghe; Víctor González
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2021-11-29       Impact factor: 6.237

6.  Insights into origin and evolution of α-proteobacterial gene transfer agents.

Authors:  Migun Shakya; Shannon M Soucy; Olga Zhaxybayeva
Journal:  Virus Evol       Date:  2017-12-07

7.  Genomic diversity of bacteriophages infecting Microbacterium spp.

Authors:  Deborah Jacobs-Sera; Lawrence A Abad; Richard M Alvey; Kirk R Anders; Haley G Aull; Suparna S Bhalla; Lawrence S Blumer; David W Bollivar; J Alfred Bonilla; Kristen A Butela; Roy J Coomans; Steven G Cresawn; Tom D'Elia; Arturo Diaz; Ashley M Divens; Nicholas P Edgington; Gregory D Frederick; Maria D Gainey; Rebecca A Garlena; Kenneth W Grant; Susan M R Gurney; Heather L Hendrickson; Lee E Hughes; Margaret A Kenna; Karen K Klyczek; Hari Kotturi; Travis N Mavrich; Angela L McKinney; Evan C Merkhofer; Jordan Moberg Parker; Sally D Molloy; Denise L Monti; Dana A Pape-Zambito; Richard S Pollenz; Welkin H Pope; Nathan S Reyna; Claire A Rinehart; Daniel A Russell; Christopher D Shaffer; Viknesh Sivanathan; Ty H Stoner; Joseph Stukey; C Nicole Sunnen; Sara S Tolsma; Philippos K Tsourkas; Jamie R Wallen; Vassie C Ware; Marcie H Warner; Jacqueline M Washington; Kristi M Westover; JoAnn L Whitefleet-Smith; Helen I Wiersma-Koch; Daniel C Williams; Kira M Zack; Graham F Hatfull
Journal:  PLoS One       Date:  2020-06-18       Impact factor: 3.240

8.  The genomic underpinnings of eukaryotic virus taxonomy: creating a sequence-based framework for family-level virus classification.

Authors:  Pakorn Aiewsakun; Peter Simmonds
Journal:  Microbiome       Date:  2018-02-20       Impact factor: 14.650

9.  Origins and Evolution of the Global RNA Virome.

Authors:  Yuri I Wolf; Darius Kazlauskas; Jaime Iranzo; Adriana Lucía-Sanz; Jens H Kuhn; Mart Krupovic; Valerian V Dolja; Eugene V Koonin
Journal:  mBio       Date:  2018-11-27       Impact factor: 7.867

10.  Evaluation of the genomic diversity of viruses infecting bacteria, archaea and eukaryotes using a common bioinformatic platform: steps towards a unified taxonomy.

Authors:  Pakorn Aiewsakun; Evelien M Adriaenssens; Rob Lavigne; Andrew M Kropinski; Peter Simmonds
Journal:  J Gen Virol       Date:  2018-07-17       Impact factor: 3.891

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.