| Literature DB >> 22915837 |
Abstract
In the last two decades, a large number of whole-genome phylogenies have been inferred to reconstruct the Tree of Life (ToL). Underlying data models range from gene or functionality content in species to phylogenetic gene family trees and multiple sequence alignments of concatenated protein sequences. Diversity in data models together with the use of different tree reconstruction techniques, disruptive biological effects and the steadily increasing number of genomes have led to a huge diversity in published phylogenies. Comparison of those and, moreover, identification of the impact of inference properties (underlying data model, inference technique) on particular reconstructions is almost impossible. In this work, we introduce tree topology profiling as a method to compare already published whole-genome phylogenies. This method requires visual determination of the particular topology in a drawn whole-genome phylogeny for a set of particular bacterial clans. For each clan, neighborhoods to other bacteria are collected into a catalogue of generalized alternative topologies. Particular topology alternatives found for an ordered list of bacterial clans reveal a topology profile that represents the analyzed phylogeny. To simulate the inhomogeneity of published gene content phylogenies we generate a set of seven phylogenies using different inference techniques and the SYSTERS-PhyloMatrix data model. After tree topology profiling on in total 54 selected published and newly inferred phylogenies, we separate artefactual from biologically meaningful phylogenies and associate particular inference results (phylogenies) with inference background (inference techniques as well as data models). Topological relationships of particular bacterial species groups are presented. With this work we introduce tree topology profiling into the scientific field of comparative phylogenomics.Entities:
Keywords: SYSTERS protein family set; comparative phylogenomics; gene content phylogeny; inner tree topology; tree topology profiling; whole-genome phylogeny
Year: 2012 PMID: 22915837 PMCID: PMC3422217 DOI: 10.4137/EBO.S9642
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
Characterization of published studies analyzed in this paper.
| ID | Study | Figure | Tree size | Aim of the study variation in … | Inference background | [all] | [ana] | [16S] | Tree type | |||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
| |||||||||||
| Data model | Tree inference | Data model | Inference approach | Distance metric | ||||||||
| Tree 1 | Deeds et al | Fig. 4 | 50 | no | ✓ | SCOP | Dollo | 4 | 2 | ✓ | Domain content ? | |
| Tree 2 | Lin and Gerstein | 11 | ✓ | no | COGs | Kitch | Hamming | 12 | 1 | ✓ | GCT | |
| Tree 3 | Deeds et al | Fig. 6 | 50 | no | ✓ | SCOP | NJ | 4 | 2 | ✓ | Domain content ? | |
| Tree 4 | Snel et al | 13 | no | no | COG-like | NJ | Simpson | 2 | 1 | ✓ | GCT | |
| Tree 5 | Grishin et al | 19 | no | no | Large protein domain families; COG-like; ird | FM | 1 | 1 | GCT | |||
| Tree 6 | Ciccarelli et al | 191 | no | no | Concatenated alignment of 31 COGs | ML | MSA-ML | |||||
| Tree 7 | Daubin et al | 45 | ✓ | ✓ | SuperTree of 730 phylogenetic trees | BIONJ | gcd | 4 | 2 | SuperTree | ||
| Tree 8 | Brown et al | 45 | ✓ | no | 14 concatenated proteins; minus HGT | MP | 2 | 2 | MSA-MP | |||
| Tree 9 | Brown et al | 45 | ✓ | no | 23 concatenated proteins | MP | 2 | 2 | MSA-MP | |||
| Tree 10 | Ma and Zeng | 82 | (✓) | ✓ | Enzyme content in metabolic network | NJ | Korbel | 2 (+4) | 2 | (✓) | ECT | |
| Tree 11 | Gevers et al | 106 | no | no | 16S rRNA with functional annotation in the set of paralogs | NJ | 1 | 1 | ✓ | 16S rRNA | ||
| Tree 12 | Yang et al | 174 | no | no | SCOP; separation according to the kingdoms of life | NJ | 4 | 1 | Domain content ? | |||
| Tree 13 | Muller et al | 630 | no | no | COGs, KOGs and other OGs, particular gene families | ML | MSA-ML | |||||
| Tree 14 | Moran et al | 72 | – | – | Widely supported findings from different studies for symbionts | Integrative | ||||||
| Tree 15 | Wu and Eisen | 578 | no | no | Concatenated alignment of 31 housekeeping genes | ML | MSA-ML | |||||
| Tree 16 | Gophna et al | Fig. 5 | 147 | ✓ | no | ORFs, reciprocal best match; prevalence | FM | Weighted gene content | 6 | 2 | GCT | |
| Tree 17 | Dutilh et al | Fig. 4 | 89 | no | ✓ | COGs, presence/absence profiles, weighted characters | NJ | Korbel | 2 | 2 | (✓) | GCT |
| Tree 18 | Korbel et al | 50 | ✓ | no | COGs; gene order | NJ | Korbel | 3 | 2 | ✓ | Gene order | |
| Tree 19 | Ge et al | 40 | ✓ | no | COGs | NJ | PAM | 1 | 1 | GCT | ||
| Tree 20 | Clarke et al | Fig. 5 | 37 | ✓ | no | ORFs; after elimination of discordants | FM | 4 | 2 | ✓ | GCT | |
| Tree 21 | Clarke et al | 37 | ✓ | no | ORFs; before elimination of discordants | FM | 4 | 2 | ✓ | GCT | ||
| Tree 22 | Wolf et al | Fig. 5 | 40 | ✓ | ✓ | probable orthologs | NJ | Median of the percent identity distribution | 4 | 3 | ✓ | GCT |
| Tree 23 | Sangaralingam et al | 50 | no | ✓ | COGs | Non-phylogenetic model | Conditioned logdet distances | 3 | 3 | GCT | ||
| Tree 24 | Gophna et al | 147 | ✓ | no | ORFs, reciprocal best match | FM | Weighted gene content | 6 | 2 | GCT | ||
| Tree 25 | Korbel et al | 50 | ✓ | no | COGs; gene content | NJ | Korbel | 3 | 2 | ✓ | GCT | |
| Tree 26 | Dutilh et al | 89 | no | ✓ | COGs, presence/absence profiles | NJ | Korbel | 2 | 2 | (✓) | GCT | |
| Tree 27 | Daubin et al | 45 | ✓ | ✓ | SuperTree of 730 phylogenetic Trees | ML | 4 | 2 | SuperTree | |||
| Tree 28 | Henz et al | 91 | no | ✓ | HSPs; matched distance; GBDP | BIONJ | 4 | 1 | GCT | |||
| Tree 29 | Tekaia et al | 99 | ✓ | no | ORF products: orthologs | CA, HC | Jaccard | 4 | 4 | GCT | ||
| Tree 30 | Wolf et al | Fig. 4 | 40 | ✓ | ✓ | COGs; gene pairs | Dollo | 4 | 3 | GCT | ||
| Tree 31 | Lienau et al | Fig. 6 | 166 | ✓ | no | SLC, conditioned reconstruction | Parsimony | 1 (+7) | 1 | (✓) | GCT | |
| Tree 32 | Hughes et al | 99 | ✓ | no | SLC, e-value 10-6, similarity 60/80 | Strict consensus tree of 6 MP Trees | 3 | 2 | SuperTree | |||
| Tree 33 | Tekaia et al | 23 | ✓ | no | ORF products | CA; HC | 3 | 1 | GCT | |||
| Tree 34 | Hughes et al | 99 | ✓ | no | SLC, e-value 10-6, similarity 30/50 | Single MP | 3 | 2 | GCT ? | |||
| Tree 35 | Ma and Zeng | 82 | (✓) | ✓ | Enzyme content in metabolic network | NJ | Jaccard | 2 (+4) | 2 | (✓) | ECT | |
| Tree 36 | Tekaia et al | 99 | ✓ | no | ORF products: homologs; ancestral duplications and weighted conservation | CA, HC | Jaccard | 4 | 4 | GCT | ||
| Tree 37 | Tekaia et al | 99 | ✓ | no | ORF products; minimal profiles | CA, HC | Jaccard | 4 | 4 | GCT | ||
| Tree 38 | Sangaralingam et al | 49 | no | ✓ | COGs | Modified BIONJ | Conditioned logdet distances | 3 | 3 | GCT | ||
| Tree 39 | Wolf et al | 59 | no | no | COGs | FM | Jaccard | 1 | 1 | GCT | ||
| Tree 40 | Sangaralingam et al | 50 | no | ✓ | COGs | Phylogenetic model | Conditioned logdet distances | 3 | 3 | GCT | ||
| Tree 41 | Spencer et al | Fig. 4 | 66 | ✓ | no | COGs; birth-death model | Least squares, inverse square weighting | GCT | ||||
| Tree 42 | Spencer et al | Fig. 5 | 66 | ✓ | no | COGs; blocks model | Least squares, inverse square weighting | GCT | ||||
| Tree 43 | Spencer et al | 50 | no | ✓ | COGs | Modified BIONJ | Conditioned logdet distances | 3 | 3 | ✓ | GCT | |
| Tree 44 | Gu and Zhang | 35 | no | no | COGs | NJ | ggd | 1 | 1 | GCT | ||
| Tree 45 | Tekaia et al | Fig. 4 | 99 | ✓ | no | ORF products: profiles | CA, HC | Jaccard | 4 | 4 | GCT | |
| Tree 46 | Hong et al | 42 | no | no | Metabolic pathway content matrix | Complete linkage clustering | 2 | 1 | ✓ | ECT | ||
| Tree 47 | Wolf et al | 40 | ✓ | ✓ | COGs; gene content | Dollo | 4 | 3 | GCT | |||
Notes: Publications are ordered according to the heatmap Figure 2. Given are numbers for [all] and in this study [ana]-lyzed Trees and indication if a [16S] rRNA Tree is used or represented by the authors. Further study parameters are extracted: aim, background information, Tree size (number of species), and the internal Tree ID which gives orientation in the text of our study. Tree type is determined from the inference background of the phylogeny.
Abbreviations: BIONJ68; CA, correspondence analysis; Dice, Dice distance; Dollo, Dollo parsimony algorithm; FM, Fitch-Margoliash algorithm; GBDP, genome blast distance phylogeny; gcd, gamma corrected distance; ggd, general genome distance; HC, hierarchical clustering; -HGT, (without) horizontal gene transfer; HSPs, high-scoring (sequence) segment pairs; ird, inter-protein rate distribution; Jaccard, Jaccard distance; Kitch, Kitch algorithm; Korbel, distance given in Korbel; ML, maximum likelihood; MP, maximum parsimony; NJ, Neighbor Joining algorithm; SCOP, Structural Classification of Proteins93; COGs, Clusters of Orthologous Groups8; SLC, single linkage clustering; ORFs, open reading frame; Simpson, Simpson distance.
Figure 1Workflow applied in this study.
Note: See details in the sections Material and Methods and Results.
Generalized topology alternatives found in published phylogenies of selected nine species groups.
| Species group | Topology | Topology alternative | |||
|---|---|---|---|---|---|
| Leptospiraceae | Confirmable | Not within Spirochaetes | – | – | Within Spirochaetes |
| Buchnera | Confirmable | Among parasites | – | – | Within γ-Proteobacteria |
| Rickettsia | Confirmable | Among parasites | – | – | Within α-Proteobacteria |
| Mollicutes | Confirmable | Among parasites | – | – | Near other Firmicutes |
| Spirochaetes | Non-confirmable | Among parasites | Between Chlamydiae and (Proteobacteria or E A) | Between Firmicutes and (A or Proteobacteria or Cyanobacteria or Actinobacteria) | Between Chlamydiae and Firmicutes |
| Chlorobi | Non-confirmable | Among parasites | Other | – | Near Cyanobacteria |
| Chlamydiae | Non-confirmable | Among parasites | Near Spirochaetes only | – | Other |
| Cyanobacteria | Non-confirmable | Among parasites | Near Actinobacteria | Near Chlorobi | Other |
| Actinobacteria | Non-confirmable | Among parasites | Other | Near or between Cyanobacteria and E,A | Near or between Cyanobacteria and (Firmicutes or Proteobacteria) |
| Score/character status | −2 | −1 | 1 | 2 | |
| Color in heat map, | Red | Orange | Turquoise | Blue | |
Notes: Description is given with respective metrics (+2, +1, −1, or −2) and color coding. The term ‘within’ indicates that the species group ‘is monophyletic with’ the respective topology alternative, ie, the species are contained in the subclade. Otherwise, the species group is paraphyletic, ie, is a sister group of the indicated subclade(s).
Abbreviations: E, Eukaryota; A, Archaea (if not ignored in phylogeny).
Topology profiles for seven SYSTERS-PhyloMatrix gene content trees across the nine species groups in the topology catalogue: event matrix; score definitions can be found in Table 2 and phylogenies in Figures S1 to S7.
| Applied algorithm | Distance metric | Leptospiraceae | Buchnera | Rickettsia | Mollicutes | Spirochetes | Chlamydiae | Actinobacteria | Cyanobacteria | Chlorobi |
|---|---|---|---|---|---|---|---|---|---|---|
| NJ (Neighbor Joining) | Korbel | 2 | 2 | 2 | 2 | −1 | −1 | −1 | −1 | −1 |
| NJ | Simpson | 2 | 2 | 2 | 2 | 1 | 1 | 1 | −2 | −1 |
| Dollo parsimony | −2 | −2 | −2 | −2 | −2 | −2 | −1 | 2 | −1 | |
| NJ | Hamming | −2 | −2 | −2 | −2 | −2 | −2 | −1 | 1 | 2 |
| Wagner parsimony | −2 | −2 | −2 | −2 | −2 | −2 | 2 | −1 | −1 | |
| NJ | Dice | −2 | −2 | −2 | −2 | −2 | −2 | 1 | −1 | −1 |
| NJ | Jaccard | −2 | −2 | −2 | −2 | −2 | −2 | 2 | −1 | 2 |
Figure 2Topology profiles across nine species groups derived from 54 phylogenies (seven SYSTERS-PhyloMatrix gene content trees and 47 whole-genome phylogenies, Table 1).
Notes: Annotations in the table: citation and figure index in the respective publication, tree ID in this study, data model, and approach in the author’s opinion. Heatmap color definitions for up to four topology alternatives of respective taxa are given in Table 2 (light gray: species not regarded in the respective publication). Division 1 separates confirmable topologies from non-confirmable. Division 2 occurs several times, shown is here only one division that mainly separates the parasitic subclade from the rest; see Methods section. Particular topology states (in event matrices) are given in Table 3 (SYSTERS-PhyloMatrix trees) and in the supplemental Table S2.
Figure 3Subcluster from Figure 2 of well supported topology alternatives.
Notes: Five subgroups (1 to 5, colored bars) result from the clustering according to the dendrogram. Heatmap color definitions for up to four topology alternatives of respective taxa are given in Table 2 (light gray: species not regarded in respective publication). Particular topology states (in event matrices) are given in Table 3 and supplemental data Table S2.
Set of 25 species that are known for gain and loss of gene families. Identification by the NCBI Taxonomy and the UniProt HAMAP systematic; provided is also information about gene loss analyses elsewhere.
| Species | Grouping to the taxonomic ranks of genus, family, class, or phylum | NCBI TaxID | UniProt code | Feature analyzed in literature |
|---|---|---|---|---|
| Bifidobacterium longum | Actinobacteria | 216816 | BIFLO | |
| Corynebacterium efficiens | Actinobacteria | 152794 | COREF | |
| Corynebacterium glutamicum | Actinobacteria | 1718 | CORGL | |
| Mycobacterium leprae | Actinobacteria | 1769 | MYCLE | Gene loss |
| Mycobacterium tuberculosis | Actinobacteria | 1773 | MYCTU | |
| Streptomyces coelicolor | Actinobacteria | 1902 | STRCO | |
| Buchnera aphidicola (Acyrthosiphon pisum) | Buchnera | 118099 | BUCAI | Gene loss |
| Buchnera aphidicola (Schizaphis graminum) | Buchnera | 98794 | BUCAP | |
| Chlamydia muridarum | Chlamydia | 83560 | CHLMU | |
| Chlamydia trachomatis | Chlamydia | 813 | CHLTR | Gene loss |
| Chlamydophila pneumoniae | Chlamydia | 83558 | CHLPN | Gene loss |
| Chlorobium tepidum | Chlorobia | 1097 | CHLTE | |
| Nostoc sp. PCC 7120 | Cyanobacteria | 103690 | ANASP | |
| Synechococcus elongatus | Cyanobacteria | 32046 | SYNEL | |
| Synechocystis sp. PCC 6803 | Cyanobacteria | 1148 | SYNY3 | |
| Leptospira interrogans | Leptospiraceae | 173 | LEPIN | |
| Mycoplasma genitalium | Mollicutes | 2097 | MYCGE | Gene loss |
| Mycoplasma penetrans | Mollicutes | 28227 | MYCPE | |
| Mycoplasma pneumoniae | Mollicutes | 2104 | MYCPN | Gene loss |
| Mycoplasma pulmonis | Mollicutes | 2107 | MYCPU | Gene loss |
| Ureaplasma parvum | Mollicutes | 134821 | UREPA | Gene loss |
| Rickettsia conorii | Rickettsia | 781 | RICCN | Gene loss |
| Rickettsia prowazekii | Rickettsia | 782 | RICPR | Gene loss |
| Borrelia burgdorferi | Spirochaetes | 139 | BORBU | Gene loss |
| Treponema pallidum | Spirochaetes | 160 | TREPA | Gene loss |
Note:
Spencer M and Sangaralingam A.26
Event matrix for the validated gene content trees from literature for nine species groups.
| Literature | Figure | Tree in | Leptospiraceae | Buchnera | Rickettsia | Mollicutes | Spirochetes | Chlamydiae | Actinobacteria | Cyanobacteria | Chlorobi |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Deeds et al 2005 | Fig. 4 | Tree 1 | 0 | 0 | 0 | 0 | 0 | 0 | −2 | 1 | 0 |
| Lin and Gerstein 2000 | Fig. 2A | Tree 2 | 0 | 0 | 0 | −1 | 0 | 0 | 0 | 1 | 0 |
| Deeds et al 2005 | Fig. 6 | Tree 3 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
| Snel et al 1999 | Fig. 2A | Tree 4 | 0 | 0 | 0 | 1 | 2 | 0 | 0 | 1 | −1 |
| Grishin et al 2000 | Fig. 3 | Tree 5 | 0 | 0 | −1 | 1 | −1 | 0 | −1 | 1 | 0 |
| Ciccarelli et al 2006 | Fig. 2 | Tree 6 | 1 | 1 | 1 | 1 | −1 | −2 | −1 | 1 | −2 |
| Daubin et al 2002 | Fig 2A | Tree 7 | 0 | 1 | 1 | 1 | −2 | −2 | −2 | −1 | 0 |
| Brown et al 2001 | Fig. 2 | Tree 8 | 0 | 0 | 1 | 1 | −1 | −1 | −1 | 1 | −1 |
| Brown et al 2001 | Fig. 1 | Tree 9 | 0 | 0 | 1 | 1 | −2 | −1 | −1 | 1 | 0 |
| Ma and Zeng 2004 | Fig. 1B | Tree 10 | 0 | 1 | 1 | 1 | 2 | 1 | −2 | 2 | 1 |
| Gevers et al 2004 | Fig. 1 | Tree 11 | 1 | 1 | 1 | 1 | 2 | 1 | −1 | 2 | 1 |
| Yang et al 2005 | Fig. 3 | Tree 12 | 0 | 1 | 1 | 1 | 2 | 1 | −1 | 1 | −1 |
| Muller et al 2010 | Fig. 1 | Tree 13 | 1 | 1 | 1 | 1 | |||||
| Moran et al 2008 | Fig. 1 | Tree 14 | 1 | 1 | 1 | 1 | |||||
| Wu and Eisen 2008 | Fig. 2 | Tree 15 | 1 | 1 | 1 | 1 | −2 | −2 | 1 | −2 | −1 |
| Gophna et al 2005 | Fig. 5 | Tree 16 | 1 | 1 | 1 | 1 | −2 | −2 | 1 | −2 | −1 |
| Dutilh et al 2004 | Fig. 4 | Tree 17 | 0 | 1 | 1 | 1 | −2 | −2 | 2 | −2 | 0 |
| Korbel et al 2002 | Fig. 2 | Tree 18 | 0 | 1 | 1 | 1 | −2 | −2 | 0 | −1 | 0 |
| Ge et al 2005 | Fig. 2 | Tree 19 | 0 | 1 | 1 | 0 | −2 | −2 | 1 | −2 | 0 |
| Clarke et al 2002 | Fig. 5 | Tree 20 | 0 | 1 | 1 | 1 | 1 | −2 | 2 | −2 | 0 |
| Clarke et al 2002 | Fig. 2 | Tree 21 | 0 | 1 | 1 | 1 | 1 | −2 | 2 | −2 | 0 |
| Wolf et al 2001 | Fig. 5 | Tree 22 | 0 | 1 | 1 | 1 | 1 | −2 | 2 | −2 | 0 |
| Sangaralingam et al 2010 | Fig. 2 | Tree 23 | 0 | 1 | 1 | 1 | 1 | −2 | 1 | −2 | 0 |
| Gophna et al 2005 | Fig. 1 | Tree 24 | 1 | 1 | 1 | 1 | 1 | −2 | 2 | −2 | −1 |
| Korbel et al 2002 | Fig. 1 | Tree 25 | 0 | 1 | 1 | 1 | 1 | 1 | 2 | −2 | 0 |
| Dutilh et al 2004 | Fig. 3 | Tree 26 | 0 | 1 | 1 | 1 | 1 | −2 | 2 | 1 | 0 |
| Daubin et al 2002 | Fig 2B | Tree 27 | 0 | 1 | 1 | 1 | −2 | −2 | 1 | −1 | 0 |
| Henz et al 2005 | Fig. 2 | Tree 28 | 0 | 1 | 1 | 1 | 1 | −1 | −1 | −1 | −2 |
| Tekaia et al 2005 | Fig. S2 | Tree 29 | 0 | 1 | −1 | 1 | −2 | −1 | 2 | −2 | 0 |
| Wolf et al 2001 | Fig. 4 | Tree 30 | 0 | 1 | −1 | 1 | 1 | −2 | 1 | −2 | 0 |
| Lienau et al 2006 | Fig. 6 | Tree 31 | −1 | 1 | 1 | −1 | −2 | −1 | −1 | 2 | 1 |
| Hughes et al 2005 | Fig. 3 | Tree 32 | 0 | 1 | 1 | −1 | −1 | −1 | −1 | 2 | 1 |
| Tekaia et al 1999 | Fig. 2A | Tree 33 | 0 | 0 | −1 | −1 | −1 | −1 | 0 | 1 | 0 |
| Hughes et al 2005 | Fig. 2 | Tree 34 | 0 | −1 | −1 | −1 | −1 | −1 | −2 | 2 | 1 |
| Ma and Zeng 2004 | Fig. 1A | Tree 35 | 0 | −1 | −1 | −1 | −1 | −1 | −2 | 2 | 1 |
| Tekaia et al 2005 | Fig. S3 | Tree 36 | 0 | −1 | −1 | −1 | −1 | −1 | 2 | −1 | 0 |
| Tekaia et al 2005 | Fig. S1 | Tree 37 | 0 | −1 | −1 | −1 | −1 | −1 | 1 | −1 | 0 |
| Sangaralingam et al 2010 | Fig. 1 | Tree 38 | 0 | −1 | −1 | −1 | −1 | −1 | 1 | −2 | 0 |
| Wolf et al 2002 | Fig. 1 | Tree 39 | 0 | −1 | −1 | −1 | −1 | −1 | 1 | −2 | 0 |
| Sangaralingam et al 2010 | Fig. 3 | Tree 40 | 0 | −1 | −1 | −1 | −1 | −1 | 1 | −2 | 0 |
| Spencer et al 2006 | Fig. 4 | Tree 41 | 0 | −1 | −1 | −1 | −1 | −1 | 1 | −2 | 0 |
| Spencer et al 2006 | Fig. 5 | Tree 42 | 0 | −1 | −1 | −1 | −1 | −1 | 1 | −2 | 0 |
| Spencer et al 2009 | Fig. 3 | Tree 43 | 0 | −1 | −1 | −1 | −1 | −1 | 1 | −2 | 0 |
| Gu and Zhang 2004 | Fig. 3 | Tree 44 | 0 | −1 | −1 | −1 | −1 | −1 | 2 | −2 | 0 |
| Tekaia et al 2005 | Fig. 4 | Tree 45 | 0 | −1 | −1 | −1 | −1 | −1 | −2 | −1 | 0 |
| Hong et al 2004 | Fig. 2A | Tree 46 | 0 | 1 | −1 | −1 | −1 | −1 | 1 | −2 | 0 |
| Wolf et al 2001 | Fig. 3 | Tree 47 | 0 | 1 | 0 | −1 | −1 | −1 | 1 | −2 | 0 |
Notes: Abbreviation and definitions for the values used in the matrix and in the clustering can be found in Table 2. Publications are ordered according to the heat map in Figure 2.
Interactive see also, http://eggnog.embl.de/cgi_bin/stats.pl;
Eukaryota and Archaea are ignored;
16S rRNA Tree;
proteobacteria only;
symbionts only;
parasites excluded.
106 completely sequenced species as used for SYSTERS-PhyloMatrix GCT inferences ordered by protein family size.
| Species | Belongs to the parasites | NCBI TaxID | UniProt code | SYSTERS-PhyloMatrix protein family size |
|---|---|---|---|---|
| Guillardia theta | 55529 | GUITH | 300 | |
| Mycoplasma genitalium | Mollicutes | 2097 | MYCGE | 329 |
| Ureaplasma parvum | Mollicutes | 134821 | UREPA | 337 |
| Mycoplasma pneumoniae | Mollicutes | 2104 | MYCPN | 351 |
| Mycoplasma pulmonis | Mollicutes | 2107 | MYCPU | 377 |
| Buchnera aphidicola (Schizaphis graminum) | Buchnera | 98794 | BUCAP | 470 |
| Buchnera aphidicola (Acyrthosiphon pisum) | Buchnera | 118099 | BUCAI | 476 |
| Mycoplasma penetrans | Mollicutes | 28227 | MYCPE | 478 |
| Borrelia burgdorferi | Spirochaetae | 139 | BORBU | 491 |
| Wigglesworthia glossinidia endosymbiont of glossina brevipalpis | 36870 | WIGBR | 515 | |
| Treponema pallidum | Spirochaetae | 160 | TREPA | 524 |
| Rickettsia prowazekii | Rickettsia | 782 | RICPR | 553 |
| Rickettsia conorii | Rickettsia | 781 | RICCN | 630 |
| Encephalitozoon cuniculi | 6035 | ENCCU | 644 | |
| Chlamydia trachomatis | Chlamydia | 813 | CHLTR | 724 |
| Chlamydia muridarum | Chlamydia | 83560 | CHLMU | 726 |
| Chlamydophila pneumoniae | Chlamydia | 83558 | CHLPN | 740 |
| Thermoplasma volcanium | 50339 | THEVO | 810 | |
| Thermoplasma acidophilum | 2303 | THEAC | 815 | |
| Aeropyrum pernix | 56636 | AERPE | 868 | |
| Aquifex aeolicus | 63363 | AQUAE | 890 | |
| Helicobacter pylori J99 | 85963 | HELPJ | 891 | |
| Helicobacter pylori | 210 | HELPY | 900 | |
| Methanopyrus kandleri | 2320 | METKA | 901 | |
| Bifidobacterium longum | Actinobacteria | 216816 | BIFLO | 942 |
| Pyrobaculum aerophilum | 13773 | PYRAE | 943 | |
| Halobacterium sp. NRC-1 | 64091 | HALN1 | 988 | |
| Methanothermobacter thermautotrophicus str. Delta H | 187420 | METTH | 1035 | |
| Fusobacterium nucleatum subsp. nucleatum | 76856 | FUSNN | 1036 | |
| Methanocaldococcus jannaschii | 2190 | METJA | 1037 | |
| Thermotoga maritima | 2336 | THEMA | 1039 | |
| Chlorobium tepidum | CHLTE | 1097 | CHLTE | 1043 |
| Mycobacterium leprae | Actinobacteria | 1769 | MYCLE | 1057 |
| Campylobacter jejuni | 197 | CAMJE | 1076 | |
| Sulfolobus tokodaii | 111955 | SULTO | 1085 | |
| Sulfolobus solfataricus | 2287 | SULSO | 1108 | |
| Neisseria meningitidis serogroup B | 491 | NEIMB | 1164 | |
| Archaeoglobus fulgidus | 2234 | ARCFU | 1172 | |
| Neisseria meningitidis serogroup A | 65699 | NEIMA | 1188 | |
| Streptococcus pneumoniae R6 | 171101 | STRR6 | 1201 | |
| Leptospira interrogans | LEPIN | 173 | LEPIN | 1220 |
| Pyrococcus horikoshii | 53953 | PYRHO | 1221 | |
| Streptococcus mutans | 1309 | STRMU | 1229 | |
| Lactococcus lactis subsp. lactis | 1360 | LACLA | 1235 | |
| Pyrococcus abyssi | 29292 | PYRAB | 1250 | |
| Haemophilus influenzae | 727 | HAEIN | 1251 | |
| Streptococcus pneumoniae | 1313 | STRPN | 1273 | |
| Streptococcus pyogenes MGAS8232 | 186103 | STRP8 | 1282 | |
| Streptococcus pyogenes MGAS315 | 198466 | STRP3 | 1283 | |
| Pyrococcus furiosus | 2261 | PYRFU | 1297 | |
| Streptococcus pyogenes | 1314 | STRPY | 1298 | |
| Streptococcus agalactiae serogroup III | 216495 | STRA3 | 1302 | |
| Streptococcus agalactiae serogroup V | 216466 | STRA5 | 1347 | |
| Deinococcus radiodurans | 1299 | DEIRA | 1360 | |
| Thermoanaerobacter tengcongensis | 119072 | THETN | 1362 | |
| Pasteurella multocida | 747 | PASMU | 1383 | |
| Clostridium perfringens | 1502 | CLOPE | 1404 | |
| Methanosarcina mazei | 2209 | METMA | 1408 | |
| Xylella fastidiosa | 2371 | XYLFA | 1414 | |
| Synechococcus elongatus | Cyanobacteria | 32046 | SYNEL | 1474 |
| Corynebacterium efficiens | Actinobacteria | 152794 | COREF | 1490 |
| Methanosarcina acetivorans | 2214 | METAC | 1513 | |
| Corynebacterium glutamicum | Actinobacteria | 1718 | CORGL | 1530 |
| Staphylococcus epidermidis | 1282 | STAEP | 1551 | |
| Listeria monocytogenes | 1639 | LISMO | 1588 | |
| Listeria innocua | 1642 | LISIN | 1620 | |
| Mycobacterium tuberculosis | Actinobacteria | 1773 | MYCTU | 1638 |
| Clostridium acetobutylicum | 1488 | CLOAB | 1657 | |
| Synechocystis sp. PCC 6803 | Cyanobacteria | 1148 | SYNY3 | 1695 |
| Staphylococcus aureus subsp. aureus N315 | 158879 | STAAN | 1794 | |
| Staphylococcus aureus subsp. aureus MW2 | 196620 | STAAW | 1808 | |
| Staphylococcus aureus subsp. aureus Mu50 | 158878 | STAAM | 1825 | |
| Saccharomyces cerevisiae | 4932 | YEAST | 1848 | |
| Caulobacter vibrioides | 155892 | CAUCR | 1859 | |
| Oceanobacillus iheyensis | 182710 | OCEIH | 1882 | |
| Brucella melitensis biovar Suis | 29461 | BRUSU | 1887 | |
| Brucella melitensis | 29459 | BRUME | 1925 | |
| Bacillus halodurans | 86665 | BACHD | 1980 | |
| Bacillus subtilis | 1423 | BACSU | 2032 | |
| Schizosaccharomyces pombe | 4896 | SCHPO | 2048 | |
| Vibrio cholerae | 666 | VIBCH | 2090 | |
| Nostoc sp. PCC 7120 | Cyanobacteria | 103690 | ANASP | 2110 |
| Shewanella oneidensis | 70863 | SHEON | 2184 | |
| Ralstonia solanacearum | 305 | RALSO | 2208 | |
| Streptomyces coelicolor | Actinobacteria | 1902 | STRCO | 2225 |
| Yersinia pestis | 632 | YERPE | 2229 | |
| Vibrio vulnificus | 672 | VIBVU | 2283 | |
| Xanthomonas campestris pv. campestris | 340 | XANCP | 2307 | |
| Xanthomonas axonopodis pv. citri | 92829 | XANAC | 2365 | |
| Agrobacterium tumefaciens str. C58 | 176299 | AGRT5 | 2561 | |
| Pseudomonas aeruginosa | 287 | PSEAE | 2652 | |
| Salmonella typhi | 601 | SALTI | 2686 | |
| Sinorhizobium meliloti | 382 | RHIME | 2686 | |
| Escherichia coli O6 | 217992 | ECOL6 | 2715 | |
| Salmonella typhimurium | 602 | SALTY | 2771 | |
| Mesorhizobium loti | 381 | RHILO | 2800 | |
| Arabidopsis thaliana | 3702 | ARATH | 2836 | |
| Escherichia coli O157:H7 | 83334 | ECO57 | 2876 | |
| Escherichia coli | 562 | ECOLI | 3046 | |
| Caenorhabditis briggsae | 6238 | CAEBR | 3143 | |
| Caenorhabditis elegans | 6239 | CAEEL | 3353 | |
| Drosophila melanogaster | 7227 | DROME | 4252 | |
| Anopheles gambiae | 7165 | ANOGA | 4514 | |
| Takifugu rubripes | 31033 | FUGRU | 6460 | |
| Mus musculus | 10090 | MOUSE | 6649 | |
| Homo sapiens | 9606 | HUMAN | 6655 |
Notes: Identification of the NCBI Taxonomy and the UniProt HAMAP systematic is provided. The protein family size is the number of SYSTERS families that are present in the PhyloMatrix data set.
Explore the taxonomic tree for all 106 species in the PhyloMatrix data set using the ‘taxonomic tree’ link at http://systers.molgen.mpg.de/PhyloMatrix/;
explore the full set of SYSTERS protein families for a species at http://systers.molgen.mpg.de/cgi-bin/selecttaxon.pl; find the respective PhyloMatrix protein family subset by copy and paste using http://systers.molgen.mpg.de/PhyloMatrix/.