Yoshinori Endo1, Ken-Ichiro Kamei2, Miho Inoue-Murayama1,2,3. 1. Wildlife Research Center, Kyoto University, Japan. 2. Institute for Integrated Cell-Material Sciences (WPI-iCeMS), Kyoto University, Japan. 3. Wildlife Genome Collaborative Research Group, National Institute for Environmental Studies, Tsukuba, Ibaraki, Japan.
Abstract
Mammalian pluripotent stem cells (PSCs) have distinct molecular and biological characteristics among species, but to date we lack a comprehensive understanding of regulatory network evolution in mammals. Here, we carried out a comparative genetic analysis of 134 genes constituting the pluripotency gene regulatory network across 48 mammalian species covering all the major taxonomic groups. We report that mammalian genes in the pluripotency regulatory network show a remarkably high degree of evolutionary stasis, suggesting the conservation of fundamental biological process of mammalian PSCs across species. Nevertheless, despite the overall conservation of the regulatory network, we discovered rapid evolution of the downstream targets of the core regulatory elements and specific amino acid residues that have undergone positive selection. Our data indicate development of lineage-specific pluripotency regulating networks that may explain observed variations in some characteristics of mammalian PSCs. We further revealed that positively selected genes could be associated with species' unique adaptive characteristics that were not dedicated to regulation of PSCs. These results provide important insight into the evolution of the pluripotency gene regulatory network underlying variations in characteristics of mammalian PSCs.
Mammalian pluripotent stem cells (PSCs) have distinct molecular and biological characteristics among species, but to date we lack a comprehensive understanding of regulatory network evolution in mammals. Here, we carried out a comparative genetic analysis of 134 genes constituting the pluripotency gene regulatory network across 48 mammalian species covering all the major taxonomic groups. We report that mammalian genes in the pluripotency regulatory network show a remarkably high degree of evolutionary stasis, suggesting the conservation of fundamental biological process of mammalian PSCs across species. Nevertheless, despite the overall conservation of the regulatory network, we discovered rapid evolution of the downstream targets of the core regulatory elements and specific amino acid residues that have undergone positive selection. Our data indicate development of lineage-specific pluripotency regulating networks that may explain observed variations in some characteristics of mammalian PSCs. We further revealed that positively selected genes could be associated with species' unique adaptive characteristics that were not dedicated to regulation of PSCs. These results provide important insight into the evolution of the pluripotency gene regulatory network underlying variations in characteristics of mammalian PSCs.
Pluripotent stem cells (PSCs) have emerged as advanced cellular and genetic material for mammalian research. Despite their interspecies similarities related to their ability to differentiate into other somatic cells and to divide into make more stem cells with maintenance of the undifferentiated state, PSCs from different species show variations in some of their characteristics such as pluripotent state and gene expression. Various studies have shown molecular mechanisms underlying the maintenance of PSC characteristics that constitute a complex gene regulatory network. Unfortunately, such knowledge is largely limited to primates and rodents. This study provides comparative analyses involving pluripotency-regulating genes across mammalian taxa. As such, it helps the understanding of the genetic basis for the evolution of the mammalian PSCs.
Introduction
Pluripotent stem cells (PSCs) are undifferentiated cells that exhibit unlimited self-renewability and pluripotency, the potential to give rise to cells from all three embryonic germ layers. Pluripotent embryonic stem cells are isolated from the inner cell mass of developing preimplantation mouse or humanblastocysts (Evans and Kaufman 1981; Martin 1981; Thomson 1998). Although PSCs of rare animal species represent powerful resources for research and conservation, little equivalent derivations have been attempted with these species for ethical and technical reasons.Recent advances in somatic cell reprogramming into induced pluripotent stem cells (iPSCs) have broadened the possibility of obtaining PSCs from variety of mammals, including endangered species (Ben-Nun et al. 2011). Derivation of iPSCs has been reported from a wide range of taxonomic groups, including Carnivora (Shimada et al. 2009; Verma et al. 2012, 2013; Menzorov et al. 2015), Cetartiodactyla (Ezashi et al. 2009; Han et al. 2011; Liu et al. 2012), Chiroptera (Mo et al. 2014), Lagomorpha (Osteil et al. 2013), Metatheria (Weeratunga et al. 2018), Monotremata (Whitworth et al. 2019), Perissodactyla (Ben-Nun et al. 2011; Breton et al. 2013), Rodentia (Takahashi and Yamanaka 2006; Liao et al. 2009; Miyawaki et al. 2016; Lee et al. 2017), and Primates (Takahashi et al. 2007; Tomioka et al. 2010; Marchetto et al. 2013; Wunderlich et al. 2014; Ramaswamy et al. 2015). However, questions remain about species variation in the properties of the PSCs, such as pluripotent state, reprogramming efficiency, and optimal culture conditions.A highly interconnected pluripotency gene regulatory network (PGRN) controls the characteristics of PSCs (Li and Belmonte 2017). Mammals seem to possess evolutionarily conserved PGRN because iPSCs from different taxonomic groups have been derived using human genetic sequences of identical sets of transcription factors (OCT4, SOX2, KLF4, and MYC; collectively referred to as OSKM). However, various pluripotent states and configurations are observed across different species (Ezashi et al. 2016; Weinberger et al. 2016; Paterson et al. 2018), indicating diversity and specificity of the PGRN among species. However, our understandings of the PGRN are narrow, largely limited to primates and rodents (Manor et al. 2015), and to date the evolutionary history of the PGRN across different mammalian taxonomic groups has not been studied in depth.Comparative approaches can provide insights into the evolution of gene regulatory processes by examining the conservation and divergence of networks (Thompson et al. 2015). Evolutionary conservation and adaptations can be inferred by detecting purifying and positive selection (Nielsen et al. 2007). We have recently described genetic signatures for phenotypic adaptations in cetacean lipid metabolism (Endo et al. 2018), and numerous studies have revealed the effects of natural selection in the development of adaptive characteristics of animals and plants (Lenski 2017). Importantly, effects of changes on a regulatory process depends on the hierarchical position of the changes within the regulatory network (Erwin and Davidson 2009) so an holistic approach is recommended. Evolutionary pattern of the PGRN within the Class Mammalia and how natural selection acts on the genes involved in the PGRN, however, are yet to be described.In this study, we aimed to investigate whether any evolutionary pattern among mammalian taxa could be the result of selection of genes involved in the PGRN. To achieve this we conducted a comparative selection analyses of the PGRN genes across mammals covering all the major taxonomic groups. 1) To assess the evolutionary conservation profiles of the mammalian PGRN, we estimated trends in the stringency of purifying selection. 2) To identify rapidly evolving subcircuits of the PGRN, we compared the evolutionary rate based on the ratio of substitution rates at nonsynonymous and synonymous sites (dN/dS). 3) To detect phylogenetic inference of the variations in the PGRN, we identified genes under positive selection for lineages and investigated concordance in functional regions of the proteins. This paper presents the evolutionary history underlying the conservation and variations in mammalian PGRN, and provides the genetic basis for the evolution of the PGRN in mammals.
Materials and Methods
Data Sampling
Target genes were selected to include all 138 genes involved with “Signaling pathways regulating pluripotency of stem cells” (hsa04550) from KEGG pathway database (Kanehisa et al. 2017). Human coding DNA sequences (CDS) of those genes were used as references and blasted against publicly available genomes of 61 mammals, covering all major taxonomic groups. Gallus gallus, Xenopus tropicalis, and Danio rerio were used as an outgroup. The set of genes was tested for orthologs using OMA stand alone v2.3.0 (Altenhoff et al. 2018). When a CDS was not inferred as an ortholog, another CDS was selected based on its annotation and tested with OMA again. Because the genomes were of varying quality, mammalian species with the same or lower number of orthologs than G. gallus were removed. Genes in which orthologs were found in less than 80% of the study species were also excluded from the further analyses. Orthologs were aligned with using PRANK v.140603 (Löytynoja and Goldman 2005) and the aligned sequences were further tested for their quality with GUIDANCE2 v2.02 (Sela et al. 2015) using the default settings. For the chicken genes, NANOG (Gene ID: 100272166, NP_001139614.1) was included in analyses even though it was not determined as ortholog by OMA because it is well studied and known ortholog of humanNANOG (Fuet and Pain 2017). A consensus tree was prepared based on commonly accepted mammalian phylogenies (Meredith et al. 2011; Perelman et al. 2011) and applied for later natural selection analyses. The gene ontology (GO) and the KEGG orthology (KO) were determined according to the Gene Ontology Annotation and KEGG, respectively.
Testing the Stringency of Natural Selection
Natural selection was tested for each gene using RELAX (Wertheim et al. 2015) in HyPhy (Pond et al. 2005). RELAX is a general hypothesis testing framework that determines whether selective strength, distinguishing purifying or positive selection, was relaxed or intensified in the subsets of interest (Wertheim et al. 2015). RELAX estimates the selection intensity parameter, k, where k ≥ 1 indicates intensification and k < 1 indicates relaxation of natural selection, where ω > 1 represents positive selection and ω < 1 represents purifying selection. Mammalian lineages were set as a test branch and nonmammalian vertebrates as a reference branch. Because RELAX requires reference branches, genes without nonmammalian orthologs were excluded. P values were corrected for false discovery rate (FDR) using the Benjamini–Hochberg method (Benjamini and Hochberg 1995) with a cutoff of 10%.
Evolutionary Rate of Each Gene
Evolutionary rate of each gene was estimated by using the model M0 in CodeML (PAMLv.4.8) (Yang 2007). The synonymous substitution rate (dS), the nonsynonymous substitution rate (dN), and their ratio, dN/dS (also noted as ω) were calculated, where dN/dS < 1 indicates purifying selection, dN/dS = 1 indicates neutral, and dN/dS > 1 indicates positive selection. The dN/dS ratio was estimated for each gene using mammalian orthologs.
Comparison of Conservation Patterns among Subcircuits
The categories of subcircuits were determined according to KEGG descriptions in the “Pathway” category, including Core, JAK–STAT, MAPK, OCT4 (POU5F1), SOX2, and NANOG (OSN) activated, OSN suppressed, PI3K–Akt, TGFβ, and Wnt signaling pathways. As for the Core subcircuits, we included the genes in the core transcriptional network, OSN and the genes which interact with the OSN inside of nucleus. When one gene was tagged in multiple subcircuits, it was grouped in all the related pathways. Because subcircuits contain unequal sample sizes, multiple paired comparisons for the means of dN/dS ratios were statistically tested using ANOVA followed by the Tukey–Kramer post-test with P value of 0.05.
Lineage-Specific Positive Selection
Positively selected genes were identified using the branch-site model (Zhang et al. 2005) implemented in CodeML (PAMLv.4.8) (Yang 2007). The modified model A as an alternative hypothesis was compared with the model A1 as the null. P values were evaluated under the likelihood ratio test by comparing 2Δl of the two models to the 1:1 mixture of 0 and χ2 distribution. Lineage-specific positive selection was tested by setting each contemporary species and ancestral branch as a priori specified target separately and corrected for FDR using the Benjamini–Hochberg method (Benjamini and Hochberg 1995) with a cutoff of 10%.
Functional Impact
Positively selected sites were determined using the Bayes empirical Bayes (BEB) implemented in the branch-site model (Yang et al. 2005) with 95% posterior probability. Lineage-specific amino acid substitutions were explored in positively selected genes identified in ancestral branches as indicative of significant changes in protein function (Tian et al. 2013). The lineage-specific amino acid substitutions were identified manually by looking at a conversion of residue that occurred and fixed in more than 90% of the descendant contemporary species Functional domain structures and regions were referred to UniPort and Pfam using human protein as a reference.
Results
Summary of Sample Data
To investigate the evolutionary pattern of the gene regulatory network, the PGRN, that maintains pluripotency and self-renewability of PSCs, we performed comparative genetic analyses across mammalian species (fig. 1). We collected the protein coding sequences of PGRN-related genes covering all the major taxonomic groups. A total of 61 mammalian species were used in this study including Classes Primates, Rodentia, Cetartiodactyla, Chiroptera, Carnivora, Insectivora, Afrotheria, and Metatheria, with at least two species from each. For each species, we searched orthologs of the 138 genes assigned to functions associated with signaling pathways regulating pluripotency of stem cells. Our filtering strategy retained 134 gene sets from 48 mammalian species for later analyses (see Materials and Methods section, supplementary table S1, Supplementary Material online).
. 1
Flow diagram detailing the strategy to investigate the evolution of the PGRN across mammals. Boxes represent steps of methods in processing order. Key approaches used for each step are indicated in parenthesis.
Flow diagram detailing the strategy to investigate the evolution of the PGRN across mammals. Boxes represent steps of methods in processing order. Key approaches used for each step are indicated in parenthesis.
Evolutionarily Conserved Profiles across Mammals
To assess evolutionary constraint and reduction of selective strength in the mammalian PGRN, we examined the level of purifying selection with each gene. Of the 127 orthologous genes analyzed, we observed intensification of purifying selection in 53 genes (41.7%) as well as relaxation in 9 genes (7.1%) at P < 0.05 FDR <10%, revealing that intensification of purifying selection was more common among the PGRN (fig. 2, supplementary table S2, Supplementary Material online). For the commonly used reprogramming factor, OSKM, we found high degrees of genetic conservation in SOX2, KLF4, and MYC across mammals. We observed relaxation of purifying selection in a number of genes involved in the molecular mechanisms that determine the pluripotent states (APC2, DUSP9, INHBE, LEFTY2, and RIF1) and in a number of oncogenes and tumor suppressor genes across mammals (APC2, FZD6, INHBE, REST, RIF1, WNT4, and WNT8A). Intensification of purifying selection was observed across the PGRN, whereas relaxation appeared to be limited in particular subcircuits. This led to the question whether any conservation patterns among the PGRN subcircuits.
. 2
Trends in the stringency of purifying selection on the PGRN genes. Schematic representation of the PGRN according to KEGG is shown. Only genes with significant shifts in the stringency of purifying selection at 10% FDR or less are indicated: Green rectangle, intensification; lime circle, relaxation. Dashed lines indicate the subcircuits of the PGRN. Horizontal double line indicates the cellular envelop. Blue oval represents the nucleus. The genes under each subcircuit are simplified, see supplementary table S4, Supplementary Material online.
Trends in the stringency of purifying selection on the PGRN genes. Schematic representation of the PGRN according to KEGG is shown. Only genes with significant shifts in the stringency of purifying selection at 10% FDR or less are indicated: Green rectangle, intensification; lime circle, relaxation. Dashed lines indicate the subcircuits of the PGRN. Horizontal double line indicates the cellular envelop. Blue oval represents the nucleus. The genes under each subcircuit are simplified, see supplementary table S4, Supplementary Material online.
Conservation Patterns among the PGRN Subcircuits
We then asked if there are differences in degree of conservation among subcircuits constituting the PGRN. In the purifying selection analysis identified above, we found relatively higher number of genes with intensified purifying selection in JAK–STAT, OSN suppressed, and PI3K–Akt signaling pathways (52.9%, 50%, and 44.8%, respectively), whereas more relaxed genes were found in OSN activated and Wnt signaling pathways (20% and 10%, respectively) (table 1).
Table 1
Comparison of Selective Pressure among the Subcircuits
No. Gene with Significant Shifts in Strength of Purifying Selection
Subcircuits
No. gene
Intensified (%)
Relaxed (%)
Core
11
4 (36.4)
1 (9)
JAK–STAT
17
9 (52.9)
0
MAPK
26
8 (30.8)
1 (3.8)
OSNa activated
15
4 (26.7)
3 (20)
OSNa suppressed
18
9 (50)
0
PI3K–Akt
29
13 (44.8)
0
TGFβ
27
9 (33.3)
2 (7.4)
Wnt
40
16 (40)
4 (10)
OCT4 (POU5F1), SOX2, and NANOG.
Comparison of Selective Pressure among the SubcircuitsOCT4 (POU5F1), SOX2, and NANOG.To gain insights into the main force that has shaped the evolution of mammalian PGRN, we estimated the evolutionary rates, dN/dS, of each gene. The estimated dN/dS values ranged from 0.0036 to 0.3614. All the genes showed dN/dS values lower than 1 and 95.5% lower than 0.2 (fig. 3, supplementary table S3, Supplementary Material online). Mean dN/dS values of each subcircuit were generally low suggesting high degree of evolutionary stasis at level of protein-coding sequence (supplementary table S4, Supplementary Material online). ANOVA indicated that evolutionary rates differ among subcircuits (supplementary table S5, Supplementary Material online), and the following Tukey–Kramer showed that the OSN activated subcircuits, the downstream of the core transcriptional network, had significantly higher evolutionary rate compared with other subcircuits with P < 0.05 (fig. 3 and supplementary table S5, Supplementary Material online).
. 3
Conservation patterns of the PGRN. (A) Histogram of gene counts for dN/dS values showing distribution of evolutionary rates for the PGRN genes. (B) Box plots of the mean dN/dS values for the PGRN subcircuits. Multiple paired comparisons were tested with the Tukey–Kramer method, where subcircuits not assigned by the same latter are significantly different with P value of 0.05; error bars, 95% confidence interval. OSN represents OCT4 (POU5F1), SOX2, and NANOG.
Conservation patterns of the PGRN. (A) Histogram of gene counts for dN/dS values showing distribution of evolutionary rates for the PGRN genes. (B) Box plots of the mean dN/dS values for the PGRN subcircuits. Multiple paired comparisons were tested with the Tukey–Kramer method, where subcircuits not assigned by the same latter are significantly different with P value of 0.05; error bars, 95% confidence interval. OSN represents OCT4 (POU5F1), SOX2, and NANOG.If genes in the OSN activated subdivision experience lower evolutionary constraint across mammals, some species may have developed unique characteristics through changes in these genes over evolutionary time periods. To test this, we performed positive selection analysis for the genes in the OSN activated subdivision on all the ancestral branches and contemporary species in the mammalian taxonomic tree containing our study species. Of the 15 OSN activated genes, we identified 8 positively selected genes in 4 ancestral branches and 22 contemporary species with 10% FDR correction (fig. 4 and supplementary table S6, Supplementary Material online). The four ancestral branches where positively selected genes were found include the common ancestor of Eutheria, Primatomorpha (Primates and the flying lemur), Megachiroptera, and Pan. The number of positive selection events was the highest in the gene TCF3, which is an integral component of the core regulatory circuitry of PSCs (Cole et al. 2008) (one ancestral branch and eight contemporary species). Although three of the four tested epigenetic regulators (KO: 03036) have shown to be under intensified purifying selection across mammals (fig. 2), a relatively high number of positive selection events with KAT6A, SETDB1, and JARID2 were observed in Primates (five events over nine total events in Primates and over 11 events with these genes across mammals).
. 4
Mapping of positively selected genes identified in ancestral branches and contemporary species of mammals. Positively selected genes at 10% FDR or less (red) are indicated on a commonly accepted phylogeny of mammals. Ancestral branches with which positively selected genes were identified are shown in red. Colored boxes represent major taxonomic groups referred to the NCBI taxonomy.
Mapping of positively selected genes identified in ancestral branches and contemporary species of mammals. Positively selected genes at 10% FDR or less (red) are indicated on a commonly accepted phylogeny of mammals. Ancestral branches with which positively selected genes were identified are shown in red. Colored boxes represent major taxonomic groups referred to the NCBI taxonomy.
Functional Impact of Positively Selected Genes at Nucleotide Level
To assess whether the identified positively selected genes have undergone functional modification, we investigated the positively selected sites of the eight positively selected genes for each selected lineage (supplementary table S7, Supplementary Material online). The positions of positively selected sites were compared with protein functional regions using humans as reference. Although the majority of the positively selected sites were found outside of known functional regions, we found sites under positive selection in functional regions of five proteins encoded by KAT6A, LEFTY2, NANOG, SETDB1, and TCF3 (fig. 5). We also found lineage-specific amino acid substitutions in functional regions of the four positively selected genes at ancestral branches, including KAT6A, LEFTY2, RIF1, and TCF3 (fig. 6 and supplementary table S8, Supplementary Material online). These lineage-specific amino acid substitutions were unique to and shared among the descendent contemporary species, indicating functional importance of these genetic mutations (Tian et al. 2013).
. 5
Positively selected sites of the OSN activated genes: (A) KAT6A, (B) LEFTY2, (C) NANOG, (D) SETDB1, and (E) TCF3. Positively selected lineages and their positively selected sites with 95% posterior probability are mapped on the coded protein with the arrangements of functionally important regions. Only genes with significant positively selected sites in at least one lineage and on at least one functional region are shown. The functional region represented with bar is a region of interest that cannot be described in other subsections according to UniPort. Colored boxes represent major taxonomic groups [Afrotheria {orange}, Carnivora {red}, Cetartiodactyla {fuchsia}, Chiroptera {yellow}, Primates/flying lemur {lime}, Rodentia {aqua}, and other {gray}].
. 6
Lineage-specific amino acid substitutions in positively selected genes at ancestral branch: (A) KAT6A, (B) LEFTY2, (C) RIF1, and (D) TCF3. Positively selected lineages and their specific amino acid substitutions are mapped on the coded protein with the arrangements of functionally important regions. The functional region represented with bar is a region of interest that cannot be described in other subsections according to UniPort. Colored boxes represent major taxonomic groups [Eutheria {purple}, Chiroptera {yellow}, and Primates/flying lemur {lime}].
Positively selected sites of the OSN activated genes: (A) KAT6A, (B) LEFTY2, (C) NANOG, (D) SETDB1, and (E) TCF3. Positively selected lineages and their positively selected sites with 95% posterior probability are mapped on the coded protein with the arrangements of functionally important regions. Only genes with significant positively selected sites in at least one lineage and on at least one functional region are shown. The functional region represented with bar is a region of interest that cannot be described in other subsections according to UniPort. Colored boxes represent major taxonomic groups [Afrotheria {orange}, Carnivora {red}, Cetartiodactyla {fuchsia}, Chiroptera {yellow}, Primates/flying lemur {lime}, Rodentia {aqua}, and other {gray}].Lineage-specific amino acid substitutions in positively selected genes at ancestral branch: (A) KAT6A, (B) LEFTY2, (C) RIF1, and (D) TCF3. Positively selected lineages and their specific amino acid substitutions are mapped on the coded protein with the arrangements of functionally important regions. The functional region represented with bar is a region of interest that cannot be described in other subsections according to UniPort. Colored boxes represent major taxonomic groups [Eutheria {purple}, Chiroptera {yellow}, and Primates/flying lemur {lime}].
Discussion
Genetic Conservation of the PGRN
The relatively higher number and prevalence of evolutionary conserved genes across the PGRN indicates overall comparability of the regulating mechanisms for pluripotency and self-renewability of mammalian PSCs (fig. 2). In context of the reprogramming somatic cells into iPSCs, although human sequences are effective (Ben-Nun et al. 2011), the efficient combination of reprogramming factors may differ among species (Tomioka et al. 2010; Verma et al. 2013; Mo et al. 2014; Weeratunga et al. 2018). Evidence of evolutionary conservation in the three commonly used reprogramming factors explains the efficiency of human genetic sequences for reprogramming somatic cells from variety of species (Ezashi et al. 2016). However, low success rate and inefficiency of reprogramming by human OSKM are reported in some species (Verma et al. 2013; Mo et al. 2014; Whitworth et al. 2019). For increasing reprogramming efficiency, additional expression of other pluripotency-associated genes have been shown to enhance reprogramming rate (Takahashi and Yamanaka 2016). We detected significant evidence of purifying selection across mammals for TBX3, which improves reprogramming efficiency in mouse (Han et al. 2010), implying that TBX3 might be a first candidate enhancer for derivation of mammalian iPSCs with low reprogramming efficiency.The characteristics of mammalian PSCs appear to differ among species (Ezashi et al. 2016; Weinberger et al. 2016; Paterson et al. 2018) and reduction in selective constraint highlights evolutionary flexibility and innovation (Lahti et al. 2009; Moczek 2010; Hunt et al. 2011). For example, pluripotent state of PSCs can be described with multiple types such as naive or primed states, and the amenabilities of the pluripotent state in vitro differ between species (Boroviak et al. 2015). Although naive state PSCs are cultured with dual inhibition (2i) of MEK and GSK3 and leukemia inhibitory factor (LIF), human and mouse embryonic stem cells show distinct transcriptional responses to the 2i/LIF (Huang et al. 2014). Our findings of relaxation of purifying selection on genes, such as RIF1, may explain the diversity of the amenable pluripotent state of the species. RIF1 is involved in cellular response to LIF (GO: 1990830). INHBE and LEFTY2 participate in the regulation of MAPK cascade (GO: 0043408), the target of MEK inhibitor, and the protein encoded by DUSP9 is the essential regulator of MAPK (GO: 0000165, 0000187, 0000188). APC2 is known to form a protein complex with the scaffold Axin and the kinases GSK3 and CK1 (Pronobis et al. 2015). Mammals lack correlation between body size or life span and cancer risk which is referred to as “Peto's paradox” (Peto et al. 1975; Caulin and Maley 2011; Peto 2015). Our findings of the relaxation of selective strength for oncogenes and tumor suppressor genes may imply evolutional flexibility in mammaliancancer resistance.
Variable Evolutionary Conservation Patterns among PGRN Subcircuits
The patterns of genetic conservation among the PGRN subcircuits provide insight into the conservation and variations of the characteristics of mammalian PSCs (table 1 and fig. 3). Multiple upstream signaling pathways serve to maintain the self-renewability and pluripotency of PSCs. We found relatively high degree of conservation in JAK–STAT, OSN-suppressed, and PI3K–Akt signaling pathways, suggesting conservation of fundamental biological characteristics and homeostasis in mammalian PSCs. JAK–STAT signaling pathway is stimulated by LIF and essential for self-renewal (Niwa et al. 1998). PI3K signaling pathway plays a crucial role for proliferation in mouse embryonic stem cells (Takahashi et al. 2005) and equivalency of its function in mouse and primate ES cells has been reported (Watanabe et al. 2006). Furthermore, the role of PI3K signaling pathway in the proliferation of PSCs has been demonstrated in the horse (Whitworth et al. 2014). Our findings support the common regulatory mechanisms observed among phylogenetically distant taxonomic groups. However, we observed the relatively higher number of relaxed genes in the Wnt signaling pathway, the downstream target of which, ESRRB, is necessary and sufficient to mediate self-renewal independently of JAK–STAT signaling pathway (Martello et al. 2012), that may suggest evolutionary flexibility in the parallel pathways supporting self-renewal.The core transcriptional network genes, OSN, orchestrate a cascade of regulatory events involving an autoregulatory loop involving the other pluripotency regulators (Cole et al. 2008; Li and Belmonte 2017). We observed relatively high evolutionary rates with the genes activated by OSN suggesting diversity of subsequent regulating mechanisms among mammals. Importantly, changes in the proximal targets of the master regulators could have profound influence on the downstream circuitry (Erwin and Davidson 2009), our observations of relatively high evolutionary rates with the OSN activated subcircuit may reflect the variations of some characteristics among mammalian PSCs. In addition, understanding mammalian diversity in the downstream targets of the core transcriptional network may help elucidate the mechanisms of reprogramming toward pluripotency and increase the reprogramming efficiency (Takahashi and Yamanaka 2016). Thus, we focused our later analyses on the genes in the OSN activated subcircuit for exploring variations in the mammalian PGRN.
Development of Lineage-Specific PGRN
Evidence for positive selection on PGRN genes has implications for development of lineage-specific regulatory networks (fig. 4). We observed evidence of frequent episodes of positive selection with TCF3, implying that alternation of the PGRN have occurred multiple times during mammalian evolution. TCF3 is the integral component of the core regulatory circuitry of PSCs and encodes a member of the E protein family of helix-loop-helix transcription factors and plays a crucial role by binding to the component of the core transcription factors (Cole et al. 2008; Yi et al. 2008). We found that the common ancestor of Eutheria has been positively selected with fixed lineage-specific amino acid substitutions with TCF3, which targets and controls the expression of ESRRB (Yi et al. 2008; Martello et al. 2012). ESRRB has been proposed to be recruited to the Eutheria PGRN after the divergence of marsupials and Eutheria because neither Tasmanian devil nor platypus iPSCs express ESRRB (Weeratunga et al. 2018; Whitworth et al. 2019). Importantly, ESRRB not only regulates self-renewal in PSCs (Martello et al. 2012) but also plays a key role in trophoblast, which forms placenta (Luo et al. 1997; Latos et al. 2015). Our findings imply that functional modification in TCF3 may have caused rewiring Eutherian PGRN to incorporate ESRRB that subsequently led to the acquisition of placenta. We also report genetic evidence of positive selections with TCF3 and its another target LEFTY2 (Cole et al. 2008) in Megachiroptera. Positive selection has happened in the ancestral branch of Megachiroptera with LEFTY2 and the two descendant species, the large flying fox (Pteropus vampyrus) and the rousette (Rousettus aegyptiacus) in TCF3, highlighting the evolutionary history of pre- and postdivergence of these taxa. Possible implications from this observation will be discussed in the following section. TCF3 also regulates the expression of NANOG (Pereira et al. 2006; Cole et al. 2008). The reprogramming efficiency of NANOG varies among species; whereas overexpression of NANOG increases the reprogramming rate in felids (Verma et al. 2012, 2013), it does not increase the reprogramming rate in marmosets (Tomioka et al. 2010). We found that both the leopard and the marmoset have positively selected sites in their TCF3, but at different positions (fig. 5), which may reflect the variations of response to NANOG. This could have implications in development of lineage-specific PGRNs in these taxa.
Potential Drivers of Mammalian PGRN Variation
Our findings of lineage-specific positively selected genes among the PGRN genes provide insight into the influence of species adaptation on the PGRN and the characteristics of mammalian stem cells. The architecture of developmental gene regulatory networks, such as the PGRN, is composed of diverse components. Certain subcircuits of regulatory networks are not dedicated to a particular biological process, but are also used for diverse functions that might have led to species adaptations. The naked mole-rat (Heterocephalus glaber) exhibits extraordinary longevity and cancer resistance (Buffenstein 2008; Delaney et al. 2013). The iPSCs derived from the naked mole-rat have shown to resist tumor formation through the expression of ARF (Miyawaki et al. 2016), in which the Arf/p53 pathway has a protective role from cancer and aging (Matheu et al. 2008). Consistent with this finding, we found NANOG and RIF1 to be positively selected in this species. RIF1 encodes a protein that regulates DNA replication and damage, interacting with 53BP1 (Kumar and Cheok 2014) which enhances p53-dependent transcriptional responses (Cuella-Martin et al. 2016). NANOG is not only one of the core transcriptional network genes, but also exhibits tumorigenic activity through interaction with p53 (Cuella-Martin et al. 2016; Kim et al. 2016). Our finding of positively selected sites on the DNA-binding motifs of NANOG further suggests development of cancer resistance with p53 pathway in addition to ARF (fig. 5).Alternatively, the evidence of positive selection on RIF1 may indicate convergent evolution of species adaptation for perception. We identified that RIF1 has been positively selected in the naked mole-rat, the Cape golden mole (Chrysochloris asiatica), the star-nosed mole (Condylura cristata), and the vampire bat (Desmodus rotundus). RIF1 plays an important role in DNA replication and damage (Kumar and Cheok 2014), and these responses are important for neurogenesis, as represented in human diseases such as Meier-Gorlin syndrome and Wolf-Hirschhorn syndrome (Kerzendorfer et al. 2013). Animals that live underground have developed unique sensory systems such as the somatosensory vibrissa-like body hairs on the body of the naked mole-rat (Crish et al. 2003), the middle ear structure of airborne and seismic stimuli in the Cape golden mole (Willi et al. 2006), and the “star” of the star-nosed mole (Gould et al. 1993). Among Microchiroptera, a group that uses echolocation, the vampire bat has extremely sensitive neurons that detect not only echolocation signals but also low frequency sounds, presumably for foraging prey in the dark (Schmidt et al. 1991). Our results provide insights into additional possible development of cancer resistance in mammals.Differences in sites under positive selection in the protein functional domains may reflect species unique adaptation (fig. 5). As discussed previously, two Megachiroptera species have been identified under positive selection with TCF3, which is also associated with cancer (Patel et al. 2015) and neuronal differentiation (Kuwahara et al. 2014). We observed multiple positively selected sites on the functional motif basic helix-loop-helix (bHLH) of TCF3 in the large flying fox (P. vampyrus), among the largest species of bat with a wingspan of up to 1.5 meter (Kunz and Jones 2000). Potential mechanisms of reducing cancer risk through response to DNA damage has been reported in elephants (Abegglen et al. 2015; Vazquez et al. 2018). Interestingly, we also observed positively selected sites on the bHLH of TCF3 in elephants, implying the resolution to Peto's paradox and the underlying convergent evolution between species that have developed a larger body size compared with their phylogenetic relatives. We identified positively selected sites on the transactivation domain, 9aaTAD in the Rousette bat (R. aegyptiacus), which has been reported to exhibit enhanced infection tolerance (Pavlovich et al. 2018). Whereas the previous study revealed genetic signatures of unique signaling in NK cell receptors (Pavlovich et al. 2018), E protein encoded by TCF3 plays a critical role in B and T lymphocyte development (Engel et al. 2001; Seet et al. 2004), suggesting multiple strategies for antiviral defense in the rousette.The roles of transposable elements in genomic rearrangement, gene regulation, and epigenetics have been extensively studied to understand primate evolution (Lee et al. 2015). Our findings of the relatively frequent positive selection with epigenetic regulators among Primates may reflect the various impacts of transposable elements on the primate genome. Among Primates, we observed that two epigenetic regulators were under positive selection in the gibbon (Nomascus leucogenys), whose chromatin interactions and epigenetic landscape has been remarkably conserved in spite of extensive genomic shuffling (Lazar et al. 2018). Interestingly, we also observed an event of positive selection at the common ancestor of Primates and the flying lemur with LEFTY2, with which DNA methylation plays a critical role during early embryogenesis in vertebrates (Wang et al. 2017), although the results need to be treated with caution.Due to the habitat transition from terrestrial to aquatic environment, cetaceans have achieved a remarkable changes in their morphology (Uhen 2010). The minke whale (Balaenoptera acutorostrata) genome has provided support for genetic changes in HOX genes in this species (Yim et al. 2014), which have an important role in the body plan and embryonic development (Pearson et al. 2005). Consequently, we observed that the minke whale has been under positive selection with KAT6A, which regulates the expression of HOX gene (Voss et al. 2009), implying the morphological adaptation of the whale to the aquatic environment. However, our analyses did not indicate significant evidence of positive selection with KAT6A in other cetacean species. Because KAT6A is also associated with senescence and tumor growth (Baell et al. 2018), our findings with the minke whale may imply the adaptation of longevity and resistance to age-related diseases as illustrated with the genome and transcriptomes of the bowhead whale (Keane et al. 2015).Overall, our data indicated that the PGRN genes positively selected in some species are also involved in their unique adaptations, which may subsequently alter their regulatory function in PSCs. Further efforts, such as in vitro genetic modification and characteristics observation, are necessary to test the functional consequences of the genetic mutations discovered in this study.
Conclusions
Our analyses illustrate the evolutionary patterns in the PGRN, underlying the similarities and variations in characteristics among mammalian PSCs. This study is one of the first to compare the PGRN genes across major taxa. We showed the evolutionary conservation profiles of the mammalian PGRN and uncovered the evolutionary variable PGRN subcircuits. We identified phylogenetic inference of positive selection of genes involved in the PGRN which has enabled insights into development of lineage-specific PGRN and linkage between PGRN genes and species adaptation. These genes and the associated subcircuits will be plausible targets for future investigations exploring mammalian PSCs.
Supplementary Material
Supplementary data are available at Genome Biology and Evolution online.Click here for additional data file.
Authors: Maria C N Marchetto; Iñigo Narvaiza; Ahmet M Denli; Christopher Benner; Thomas A Lazzarini; Jason L Nathanson; Apuã C M Paquola; Keval N Desai; Roberto H Herai; Matthew D Weitzman; Gene W Yeo; Alysson R Muotri; Fred H Gage Journal: Nature Date: 2013-10-23 Impact factor: 49.962
Authors: Stephanie S Pavlovich; Sean P Lovett; Galina Koroleva; Jonathan C Guito; Catherine E Arnold; Elyse R Nagle; Kirsten Kulcsar; Albert Lee; Françoise Thibaud-Nissen; Adam J Hume; Elke Mühlberger; Luke S Uebelhoer; Jonathan S Towner; Raul Rabadan; Mariano Sanchez-Lockhart; Thomas B Kepler; Gustavo Palacios Journal: Cell Date: 2018-04-26 Impact factor: 66.850