| Literature DB >> 28105314 |
Guillaume Bernard1, Mark A Ragan1, Cheong Xin Chan1.
Abstract
Ernst Haeckel based his landmark Tree of Life on the supposed ontogenic recapitulation of phylogeny, i.e. that successive embryonic stages during the development of an organism re-trace the morphological forms of its ancestors over the course of evolution. Much of this idea has since been discredited. Today, phylogenies are often based on families of molecular sequences. The standard approach starts with a multiple sequence alignment, in which the sequences are arranged relative to each other in a way that maximises a measure of similarity position-by-position along their entire length. A tree (or sometimes a network) is then inferred. Rigorous multiple sequence alignment is computationally demanding, and evolutionary processes that shape the genomes of many microbes (bacteria, archaea and some morphologically simple eukaryotes) can add further complications. In particular, recombination, genome rearrangement and lateral genetic transfer undermine the assumptions that underlie multiple sequence alignment, and imply that a tree-like structure may be too simplistic. Here, using genome sequences of 143 bacterial and archaeal genomes, we construct a network of phylogenetic relatedness based on the number of shared k-mers (subsequences at fixed length k). Our findings suggest that the network captures not only key aspects of microbial genome evolution as inferred from a tree, but also features that are not treelike. The method is highly scalable, allowing for investigation of genome evolution across a large number of genomes. Instead of using specific regions or sequences from genome sequences, or indeed Haeckel's idea of ontogeny, we argue that genome phylogenies can be inferred using k-mers from whole-genome sequences. Representing these networks dynamically allows biological questions of interest to be formulated and addressed quickly and in a visually intuitive manner.Entities:
Keywords: k-mers; phylogenetic networks; phylogenetic trees; phylogenies
Year: 2016 PMID: 28105314 PMCID: PMC5224691 DOI: 10.12688/f1000research.10225.2
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Figure 1. The alignment-free phylogenetic tree topology of the 143 Bacteria and Archaea genomes based on statistic, modified based on the tree in Bernard et al. [25]; jackknife support at each internal node is shown.
Each phylum is represented in a distinct colour, and the backbones identified in Beiko et al. [10] are shown on the internal node with black filled circles. The association of Coxiella burnetii and Nitrosomonas europaea is marked with an asterisk.
Figure 2. Alignment-free phylogenetic network of the 143 Bacteria and Archaea genomes based on statistic using 25-mers, at t = 2.
Each phylum is represented in a distinct colour, each node represents a genome and an edge represents a qualitative evidence of shared 25-mers between two genomes. The association between Coxiella burnetii and Wigglesworthia glossinidia is marked with an asterisk.