| Literature DB >> 26232396 |
Jason S Presnell1, Christine E Schnitzler2, William E Browne3.
Abstract
The Krüppel-like factor and specificity protein (KLF/SP) genes play key roles in critical biological processes including stem cell maintenance, cell proliferation, embryonic development, tissue differentiation, and metabolism and their dysregulation has been implicated in a number of human diseases and cancers. Although many KLF/SP genes have been characterized in a handful of bilaterian lineages, little is known about the KLF/SP gene family in nonbilaterians and virtually nothing is known outside the metazoans. Here, we analyze and discuss the origins and evolutionary history of the KLF/SP transcription factor family and associated transactivation/repression domains. We have identified and characterized the complete KLF/SP gene complement from the genomes of 48 species spanning the Eukarya. We have also examined the phylogenetic distribution of transactivation/repression domains associated with this gene family. We report that the origin of the KLF/SP gene family predates the divergence of the Metazoa. Furthermore, the expansion of the KLF/SP gene family is paralleled by diversification of transactivation domains via both acquisitions of pre-existing ancient domains as well as by the appearance of novel domains exclusive to this gene family and is strongly associated with the expansion of cell type complexity.Entities:
Keywords: C2H2 zinc fingers; domain architecture; domain co-occurrence network; domain evolution; domain shuffling; low-complexity regions
Mesh:
Substances:
Year: 2015 PMID: 26232396 PMCID: PMC4558859 DOI: 10.1093/gbe/evv141
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
FDistribution of C2H2 zinc finger proteins, KLF-DBD containing proteins, and KLF/SP proteins in representative Eukarya taxa. Rows indicate representative genomes searched. Columns indicate the total number of protein sequences that contain at least one C2H2 zinc finger using the Pfam PF00096 HMM model, the total number of protein sequences that contain the archetypical KLF-DBD, the total number of bona fide KLF sequences recovered, and the total number of SP sequences recovered. Phylogeny is based on Adl et al. (2012), Derelle and Lang (2012), Dunn et al. (2008), Ryan et al. (2013), and Sebé-Pedrós et al. (2013).
Species used in this Study with Reference to Genome or Transcriptome Database
| Species | Genome/Transcriptome | Reference | |
|---|---|---|---|
| Amorphea | |||
| Metazoa | |||
| Holomycota | |||
| Opisthokonta | |||
| Amorphea | |||
| Bikonta | |||
FCombined gene tree estimates for the concatenated KLF/SP data set using Bayesian criterion (MrBayes) and ML criterion (RAxML). Gray node labels indicate congruent topology with BPP support = 84%. Black node labels indicate congruent topology with BPP support ≥90%. Clades collapsed to triangles indicate congruent topologies with BPP support ≥90%. The single highly divergent ctenophore MleKLFX sequence clusters with nonfilozoan KLF-DBD presumably due to long-branch attraction. Bayesian and ML trees with support values and branch lengths are available in supplementary figs. S2 and S3, Supplementary Material online.
FPhylogenetic distribution of transactivation/repression domains and LCRs associated with KLF/SP proteins. The + indicates the presence of the corresponding domain or LCR in at least one KLF/SP protein in the indicated taxa. Only filozoan lineages containing bona fide KLF/SP proteins are shown. An asterix indicates that RNA-seq data were used for that species. Phylogeny is based on Dunn et al. (2008), Ryan et al. (2013), and Sebé-Pedrós et al. (2013).
FInferred relationships between key events during the evolution and expansion of the KLF/SP gene family. Symbol key is at upper left. Colored rectangles represent the origin of particular transactivation/repression domains or LCRs co-occurring with the KLF-DBD (fig. 4). Yellow hexagons represent the origin of specific KLF/SP domain architectures (fig. 5). A black X over a hexagon represents the loss of specific domain architecture. Colored triangles represent the presence of specific transactivation domain motifs within whole eukaryote genomes to the exclusion of the KLF/SP gene family. (A) We infer the origin of the KLF-DBD in the opisthokont stem lineage prior to the divergence of the Holomycota. However, bona fide KLF gene architectures do not appear until the divergence of the filozoan lineage (KLF origin). The ancient unicellular KLF domain architecture is not recovered in metazoan lineages. The ancient PVDLS, SID, Btd box, and R3 domains were recovered, to the exclusion of KLF/SPs, in all eukaryote genomes searched. Notably, the Btd box was not recovered in Saccharomyces and Encephalitozoon fungal genomes. Our analysis suggests that the origin of the SP subfamily is in the metazoan stem lineage prior to the divergence of the poriferans; it is not present in the ctenophorans. The SP box motif only appears in SP genes in poriferans and is not found in additional genes until the divergence of Trichoplax. The R2 repressor domain appears to be a de novo innovation restricted to KLF genes in the vertebrate stem lineage, contributing to the KLF10/11 architecture class. Composite domain co-occurrence maps for each taxonomic group are shown to the right of the tree. Representative examples of putative domain shuffling events during the evolution and expansion of the KLF/SP gene family. (B) An ancient Btd box and a metazoan SP gene may have contributed to the origin of the SP gene subfamily early in metazoan evolution. (C) An ancient SID likely combined with a pre-existing ancestral KLF gene to form the KLF9/13 group, also early in metazoan evolution. (D) An ancient PVDLS domain combined with a pre-existing ancestral KLF gene to form the KLF3/8/12 group. We infer an independent convergent acquisition of the PVDLS domain within a KLF gene in the Protostomia lineage (see Discussion). Domain icon colors are the same as figure 5.
FKLF/SP protein domain co-occurrence networks. In all networks, each circle represents a transactivation/repression domain or an LCR. A line connecting two domains indicates a co-occurrence of those two domains. Domains are arranged in approximately the same 5′–3′ spatial orientation as they appear encoded in KLF/SP sequences. (A) General network diagram showing connectivity and unidirectional spatial relationships between transactivation domains among filozoan KLF/SPs. Blue arrows represent connectivity upstream of the KLF-DBD; the gold arrow represents connectivity downstream of the KLF-DBD. (B–I) KLF/SP co-occurrence networks from different taxonomic groups. Circle size indicates the relative frequency of occurrence in the network, with the KLF-DBD always representing 100%. Circle color follows the same convention as seen in figure 3. Repeated domains were counted as occurring only once. Lines connecting circles indicate the presence of that specific domain pair co-occurrence in at least one KLF/SP. Line width indicates the frequency of domain pair co-occurrence. Only LCR domains which are found N-terminal of the KLF-DBD are represented in these networks (supplementary fig. S4, Supplementary Material online). (B) Complete filozoan KLF/SP network. (C) Representative unicellular KLF/SP network. (D) KLF/SP network from nonbilaterian metazoans. (E) Invertebrate bilaterian KLF/SP network. (F) Vertebrate KLF/SP network. (G-H) Representative ctenophoran and poriferan KLF/SP networks for comparison with each other and with the network in D. (I) Ciona KLF/SP network for comparison with the networks in E and F. (J, K) Co-occurrence network maps for the KLF subfamily and SP subfamily mapped onto the filozoan phylogeny (Dunn et al. 2008; Ryan et al. 2013) for evolutionary comparison. Each network represents a composite for the taxonomic group indicated. (J) Co-occurrence maps for domains found in the KLF subfamily. (K) Co-occurrence maps for domains found in the SP subfamily. The unicellular filozoan genomes and ctenophore genomes do not contain SP genes.
FPhylogenetic distribution of explicit domain architectures represented among KLF/SP proteins. The key at lower left identifies LCRs and transactivation/repression domains used to determine domain architectures. The protein schematics along lower right represent the particular combinations of domains and LCRs with the KLF-DBD that define each specific KLF/SP protein architecture. All groups, except for the ancient unicellular KLF architecture recovered, are named according to established human KLF/SP paralogy groups that conform to each specific architecture. The three C-terminal zinc fingers of the KLF-DBD are indicated with grey boxes labeled zf1, zf2, and zf3. Architecture schematics are not to scale.