| Literature DB >> 23778980 |
Hai Fang1, Matt E Oates, Ralph B Pethica, Jenny M Greenwood, Adam J Sardar, Owen J L Rackham, Philip C J Donoghue, Alexandros Stamatakis, David A de Lima Morais, Julian Gough.
Abstract
We report a daily-updated sequenced/species Tree Of Life (sTOL) as a reference for the increasing number of cellular organisms with their genomes sequenced. The sTOL builds on a likelihood-based weight calibration algorithm to consolidate NCBI taxonomy information in concert with unbiased sampling of molecular characters from whole genomes of all sequenced organisms. Via quantifying the extent of agreement between taxonomic and molecular data, we observe there are many potential improvements that can be made to the status quo classification, particularly in the Fungi kingdom; we also see that the current state of many animal genomes is rather poor. To augment the use of sTOL in providing evolutionary contexts, we integrate an ontology infrastructure and demonstrate its utility for evolutionary understanding on: nuclear receptors, stem cells and eukaryotic genomes. The sTOL (http://supfam.org/SUPERFAMILY/sTOL) provides a binary tree of (sequenced) life, and contributes to an analytical platform linking genome evolution, function and phenotype.Entities:
Mesh:
Year: 2013 PMID: 23778980 PMCID: PMC6504836 DOI: 10.1038/srep02015
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Schematic flowchart illustrating the reconstruction of sTOL.
Figure 2The extent of agreement between the NCBI taxonomy and the molecular data.
The circular phylogram displays the NCBI taxonomy, wherein the nodes are labelled with one of three categories (‘Recovered’ in red, ‘Alternative’ in green, and ‘Others’ in blue) by colour-coding the edge above that node. The pie charts illustrate the clade-specific fractions of these three categories for either terminal tips or internal nodes. The clades illustrated in the right panel (from top to bottom) include ‘Cellular organisms’, ‘Eukaryota’, ‘Archaea’ and ‘Bacteria’, and in the bottom panel (from left to right) ‘Metazoa’, ‘Fungi’, and ‘Viridiplantae’.
Figure 3Detailed inspection of disagreements with the NCBI taxonomy.
The colour-coded tree is the NCBI taxonomy with alternative topologies (inserted close by in black) suggested by the molecular data. (A) Metazoa clade as exemplified by an alternative (I, in green) suggested by the molecular data, which is likely due to the biased genome assembly. (B) Viridiplantae clade containing four alternative topologies (I ~ IV, in green) suggested by the molecular data. (C) Fungal clade wherein the three internal nodes (I ~ III, in green) are strongly supported by the molecular data as being different.
Figure 4Presence-absence pattern of the nuclear receptor ligand-binding domain across the eukaryotic species tree of life.
The left panel illustrates the overview of the eukaryotic tree, with a branch (edge) highlighted in green if the domain can be found in all genomes under the clade attached to the branch. The right panel is the zoomed-in version of the kingdom Viridiplantae (plants), which further contains two clades, embryophytes (land plants) and chlorophyta (green algae).
Figure 5A list of domains annotated by stem cell maintenance and their distribution over the three kingdoms in eukaryotic evolution.
The diagram in the top panel shows the paths covering three kingdoms. The bottom panel lists the details of their presence (1) and absence (0) patterns at the major branching points of eukaryotic evolution. The last row tells how many distinct domains (i.e., superfamilies) are related to stem cell maintenance.
Top enriched GO terms for the domain repertoire present at Eukaryota, gained for Metazoa and lost for Fungi
| GO | SDFO level | GO Term | FDR |
|---|---|---|---|
| Biological Process | 1 | cellular metabolic process | 4.8E-30 |
| biosynthetic process | 9.3E-28 | ||
| primary metabolic process | 9.0E-21 | ||
| 2 | organic acid metabolic process | 2.4E-18 | |
| nucleobase-containing small molecule metabolic process | 7.7E-18 | ||
| organic substance catabolic process | 1.5E-17 | ||
| 3 | coenzyme metabolic process | 7.6E-15 | |
| carboxylic acid biosynthetic process | 5.3E-11 | ||
| ncRNA metabolic process | 1.9E-10 | ||
| 4 | ncRNA processing | 1.6E-07 | |
| coenzyme biosynthetic process | 3.7E-07 | ||
| pyridine-containing compound metabolic process | 3.4E-06 | ||
| Molecular Function | 1 | catalytic activity | 1.3E-35 |
| 2 | oxidoreductase activity | 5.1E-12 | |
| small molecule binding | 3.0E-08 | ||
| ligase activity | 3.0E-07 | ||
| 3 | lyase activity | 3.4E-12 | |
| nucleotidyltransferase activity | 3.1E-06 | ||
| cofactor binding | 3.6E-06 | ||
| 4 | structural constituent of ribosome | 3.4E-07 | |
| carbon-carbon lyase activity | 4.0E-05 | ||
| electron carrier activity | 2.4E-04 | ||
| Cellular Component | 1 | cytoplasmic part | 1.1E-16 |
| intracellular membrane-bounded organelle | 8.4E-09 | ||
| macromolecular complex | 1.1E-08 | ||
| 2 | mitochondrion | 1.9E-16 | |
| organelle membrane | 1.1E-07 | ||
| endomembrane system | 6.3E-03 | ||
| 3 | mitochondrial part | 1.1E-13 | |
| plastid | 1.4E-12 | ||
| ribonucleoprotein complex | 1.5E-06 | ||
| 4 | chloroplast part | 5.7E-06 | |
| ribosomal subunit | 5.8E-06 | ||
| cytosolic ribosome | 7.3E-06 | ||
| Biological Process | 1 | cellular developmental process | 1.3E-04 |
| regulation of metabolic process | 1.3E-04 | ||
| negative regulation of biological process | 1.9E-04 | ||
| 2 | muscle tissue development | 1.3E-04 | |
| negative regulation of developmental process | 1.3E-04 | ||
| positive regulation of signaling | 1.3E-04 | ||
| 3 | positive regulation of cell proliferation | 4.5E-04 | |
| positive regulation of intracellular protein kinase cascade | 5.8E-04 | ||
| regulation of mitotic cell cycle | 6.4E-04 | ||
| 4 | regulation of binding | 4.1E-04 | |
| digestive system development | 8.7E-04 | ||
| regulation of MAP kinase activity | 2.1E-03 | ||
| Molecular Function | 1 | protein binding | 1.7E-03 |
| 2 | nucleic acid binding | 1.3E-02 | |
| receptor binding | 2.2E-02 | ||
| enzyme regulator activity | 7.5E-02 | ||
| 3 | carbohydrate derivative binding | 1.1E-02 | |
| glycoprotein binding | 1.9E-02 | ||
| protein dimerization activity | 3.3E-02 | ||
| 4 | growth factor binding | 1.7E-03 | |
| extracellular matrix binding | 1.7E-03 | ||
| collagen binding | 1.7E-03 | ||
| Cellular Component | 1 | extracellular matrix | 3.2E-02 |
| macromolecular complex | 5.1E-02 | ||
| extracellular region | 1.7E-01 | ||
| 2 | cell surface | 2.3E-02 | |
| extracellular matrix part | 2.7E-02 | ||
| vesicle | 5.1E-02 | ||
| 3 | axon | 5.1E-02 | |
| synapse | 5.1E-02 | ||
| perinuclear region of cytoplasm | 9.6E-02 | ||
| 4 | secretory granule | 2.3E-02 | |
| receptor complex | 4.2E-02 | ||
| basolateral plasma membrane | 4.4E-02 | ||
| Molecular Function | 2 | receptor activity | 7.0E-02 |
| signal transducer activity | 7.0E-02 | ||
| 3 | transmembrane signaling receptor activity | 7.0E-02 | |
| enzyme inhibitor activity | 7.0E-02 | ||
| 4 | integrin binding | 7.0E-02 | |
| growth factor binding | 8.8E-02 | ||
| Cellular Component | 1 | extracellular region | 2.9E-02 |
| extracellular matrix | 2.9E-02 | ||
| 2 | extracellular matrix part | 1.8E-03 | |
| cell surface | 1.8E-03 | ||
| intrinsic to membrane | 8.3E-02 | ||
| 4 | receptor complex | 8.6E-02 | |
aSDFO stands for structural domain function ontology. For each GO namespace, it includes four levels of increasing granularity: 1 for highly general, 2 for general, 3 for specific, and 4 for highly specific;
bThe top three GO terms with the lowest FDR (< 0.1) are shown for each namespace and for each SDFO level.