| Literature DB >> 31587642 |
David López-Escardó1,2, Xavier Grau-Bové1,3,4, Amy Guillaumet-Adkins5,6, Marta Gut5,6, Michael E Sieracki7, Iñaki Ruiz-Trillo1,3,8.
Abstract
Understanding the origins of animal multicellularity is a fundamental biological question. Recent genome data have unravelled the role that co-option of pre-existing genes played in the origin of animals. However, there were also some important genetic novelties at the onset of Metazoa. To have a clear understanding of the specific genetic innovations and how they appeared, we need the broadest taxon sampling possible, especially among early-branching animals and their unicellular relatives. Here, we take advantage of single-cell genomics to expand our understanding of the genomic diversity of choanoflagellates, the sister-group to animals. With these genomes, we have performed an updated and taxon-rich reconstruction of protein evolution from the Last Eukaryotic Common Ancestor (LECA) to animals. Our novel data re-defines the origin of some genes previously thought to be metazoan-specific, like the POU transcription factor, which we show appeared earlier in evolution. Moreover, our data indicate that the acquisition of new genes at the stem of Metazoa was mainly driven by duplications and protein domain rearrangement processes at the stem of Metazoa. Furthermore, our analysis allowed us to reveal protein domains that are essential to the maintenance of animal multicellularity. Our analyses also demonstrate the utility of single-cell genomics from uncultured taxa to address evolutionary questions. This article is part of a discussion meeting issue 'Single cell ecology'.Entities:
Keywords: animal multicellularity; choanoflagellates; protein domain evolution; single-cell genomics
Mesh:
Year: 2019 PMID: 31587642 PMCID: PMC6792448 DOI: 10.1098/rstb.2019.0088
Source DB: PubMed Journal: Philos Trans R Soc Lond B Biol Sci ISSN: 0962-8436 Impact factor: 6.237
Figure 1.Phylogenetic position of the new choanoflagellate SAGs. Phylogenetic tree based on 117 sequences of the 18S rDNA gene, representing all that is known of the molecular diversity of choanoflagellates and unicellular holozoans, including environmental lineages. The phylogenetic analysis was inferred by maximum likelihood under the GTR+ Γ with IQ-TREE. Clades marked by a bullet (•) present high statistical split support, with values greater than 80% of SH-aLRT (bootstraps of single branch test) and greater than 95% of ultrafast bootstrap. Both indexes were computed with IQ-TREE. The remaining split supports obtained can be found at Figshare (https://doi.org/10.6084/m9.figshare.7819571.v1) in the tree file. The order and class names given are based on [30,41,42]. Choanoflagellates with transcriptomic data available are depicted with a red asterisk, and those with genomic data available are depicted with a blue hash. Choanoflagellates' craspedidan clades were named according to our phylogenomic analysis (figure 3). Clade 3 nomenclature and nomenclature within Acanthoecida are the same as in [27]. The Acanthoecida picture was taken from [28] and Craspedida pictures were taken in Nicole's King laboratory.
Figure 3.Summary of proteins gains and losses in Opisthokonta, focusing on Choanozoa gains. (a) Schematic of the choanoflagellate phylogeny obtained, including the numbers of protein domains gains and losses in each Opisthokonta clade (depicted in green and red respectively). Light green numbers represent the protein domain gains that have retention of over 70% in extant metazoan species, in a given ancestor. Protein domains from potential bacterial or archeal contamination were excluded from the analysis (see §2). The ability to form colonies (marked with a colony drawing) is shown on the right, and has been adapted from [27]. Our SAGs (UC1 and UC4) are marked in italic. Next to the tree, there is a bar chart indicating the percentage of protein domains gained at Choanozoa and judged to be involved in animal multicellular processes (a total of 69 domains out of 120), retained in each choanoflagellate taxa. Animal data are displayed by phylum instead of species, thus what it is shown is the average and the distribution of domains kept by all the analyzed species of each animal phylum. As a control, the retention of all protein domains that have the Choanozoan ancestor among extant species is shown in grey. Further to the right, the POU protein domain distribution and the protein domains gained at Choanozoa are shown, which are present in our sequenced SAGs UC1 and UC4. A black dot indicates the presence of each domain in the different taxa/clade. (b) Function of the protein domains gained at Choanozoa. In green, the biochemical roles in which the protein domains are involved. In blue, the biological processes in which the domains have been shown to participate. These two classifications are not exclusive; one protein domain can appear in one or multiple categories. In grey, protein domains with unknown function, or contaminants or a product of an horizontal gene transfer (HGT) event.
Summary of the genome statistics of each SAG assembly.
| SAG | taxonomy | scaffoldsa | largest scaffold (bp) | N50 | total length (Mb) | GC (%) | CEGMA (%) | Busco (%) |
|---|---|---|---|---|---|---|---|---|
| UC1 | Craspedida clade 1 | 3276 | 41 637 | 4928 | 7.74 | 49.8 | 20.1 | 31.7 |
| UC2 | Acanthoecidae | 746 | 32 186 | 1499 | 1.00 | 30.8 | 0.8 | 0.7 |
| UC3 | Stephanocidae | 819 | 11 187 | 2197 | 1.31 | 33.5 | — | 0.3 |
| UC4 | Basal Acanthoecida | 2527 | 72 672 | 11360 | 7.25 | 40.0 | 14.1 | 13.5 |
aScaffolds bigger than 500 bp.
Genome estimation of our SAGs† within choanoflagellate context.
| genome | assembly size (Mb) | genome size (Mb) | no. of annotated genes | total no. of genes |
|---|---|---|---|---|
| UC1 | 7.74 | 29.4† | 3025 | 6039† |
| UC4 | 7.25 | 52.5† | 2518 | 10 075† |
| — | 55.4 | — | 11 624 | |
| — | 41.6 | — | 9172 |
Figure 2.Phylogenomic tree of holozoans. Phylogenomic analysis of 87 single-copy protein domains [14] accounting for 23 364 amino acid positions. Tree topology is the consensus of two Markov chain Monte Carlo chains run for 5660 and 5685 generations, after a burn-in of 13%. Statistical supports are indicated at each node: on the left, non-parametric ML ultrafast-bootstrap (UFBS) values obtained from 1000 replicates using IQ-TREE and the LG+R7+C60 model; on the right, Bayesian posterior probabilities (BPP) under the LG+Γ7+CAT model as implemented in Phylobayes. Nodes with maximum support values (BPP = 1 and UFBS = 100) are indicated with a black bullet. Raw trees are available on Figshare (https://doi.org/10.6084/m9.figshare.7819571.v1) and electronic supplementary material, figure S1 shows the topology and the supports of the ML inference.
Figure 4.Distribution of the probability of retention of the protein domains acquired in the different ancestors: Opisthokonta (a), Holozoa (b), Filozoa (c) and Choanozoa (d) in the extant species.
Summary of the protein domains acquired before and at the origins of animals, which are maintained by all 21 metazoan extant species used in this analysis.
| origin | protein domains retained | protein domain information |
|---|---|---|
| Opisthokonta | transcription factors and DNA binding domains | |
| | transcription factor involved in animal development | |
| | nuclear effector of Notch signalling | |
| | related to BTD, nuclear effector of Notch signalling | |
| | involved in chromosome migration | |
| | interacts with STAT6 transcription factor | |
| signalling and GTPase interactors | ||
| | involved in the vesicle budding at Golgi apparatus | |
| | signalling integrators with GTPase activity | |
| translational regulator | ||
| | involved in ribosomal binding | |
| unknown function | ||
| | ||
| | ||
| | ||
| | ||
| Holozoa | signalling binding-related domains | |
| | interacts with guanylate kinase-like domain | |
| | phosphotyrosine interacting domain | |
| | known to be present in syntaxin-binding proteins | |
| nuclear membrane protein | ||
| | found in inner nuclear membranes | |
| transcription factor | ||
| | transcription factor related to animal development | |
| unknown function | ||
| | ||
| Filozoa | signalling and adhesion | |
| | extracellular domain of integrins | |
| ubiquitination | ||
| | D domain of beta-TrCP that acts as ubiquitin ligase | |
| Choanozoa | transcription factors | |
| | DNA binding domain of Smad TF | |
| | domain that interacts with Smad TF regulators | |
| | domain related with Homeobox superfamily | |
| extracellular matrix protein domains | ||
| | N terminal domain of laminins, extracellular proteins related to cell adhesion | |
| | domain from prolyl 4-hydroxylase that is important in the post-translational modification of collagen | |
| | C terminal domain of Thrombospondin, an adhesive glycoprotein that mediates cell-to-cell and cell-to-ECM interactions | |
| protein–protein interactions | ||
| | suggested to be involved in protein–protein interactions | |
| lysosomal protein | ||
| | integral membrane proteins of the lysosome with unclear functions | |
| Metazoa | signalling | |
| | Wnt signal transduction pathways | |
| | transforming growth factor beta, regulatory peptides that generate intracellular signals | |
| | C terminal of serine/threonine phosphatase, the domain may provide specificity to the reaction | |
| transcription factors | ||
| | transcription factor involved in multiple processes, cell differentiation, migration, etc. | |
| | ligand-binding domain of nuclear receptors that sense steroid and thyroid hormones | |
| extracellular matrix protease | ||
| | membrane-anchored protease that modifies the ECM | |
| protein–protein interaction | ||
| | interaction protein module. Related with death effector domain and caspase recruitment domain |
Figure 5.POU phylogenetic tree. (a) ML inference of Homeobox domain using LIM homeobox as an outgroup. LIM domains were downloaded from (http://homeodb.zoo.ox.ac.uk/download.get). Pou domains were selected from a wide range of metazoans available in our dataset, plus the choanoflagellate sequence (marked in red). Supports are SH-like approximate likelihood ratio test (left) and UFBS, respectively (right) calculated with IQ-TREE v. 1.5.1. (b) Maximum-likelihood inference of Pou transcription factors using the whole protein, the Pou domain and the Homebox domain. M. fluctuans Pou sequence falls within POU-2 group (marked in red).