Alejandro Gil-Gálvez1, Sandra Jiménez-Gancedo1, Alberto Pérez-Posada1, Martin Franke1, Rafael D Acemel1, Che-Yi Lin2, Cindy Chou2, Yi-Hsien Su2, Jr-Kai Yu2,3, Stephanie Bertrand4, Michael Schubert5, Héctor Escrivá4, Juan J Tena1, José Luis Gómez-Skarmeta1. 1. Centro Andaluz de Biología del Desarrollo, Consejo Superior de Investigaciones Científicas-Universidad Pablo de Olavide-Junta de Andalucía, 41013 Seville, Spain. 2. Institute of Cellular and Organismic Biology, Academia Sinica, 11529 Taipei, Taiwan. 3. Marine Research Station, Institute of Cellular and Organismic Biology, Academia Sinica, 26242 Yilan, Taiwan. 4. Biologie Intégrative des Organismes Marins, Observatoire Océanologique, Sorbonne Université, CNRS, 66650 Banyuls-sur-Mer, France. 5. Laboratoire de Biologie du Développement de Villefranche-sur-Mer, Institut de la Mer de Villefranche, Sorbonne Université, CNRS, 06230 Villefranche-sur-Mer, France.
Abstract
SignificanceIn this manuscript, we address an essential question in developmental and evolutionary biology: How have changes in gene regulatory networks contributed to the invertebrate-to-vertebrate transition? To address this issue, we perturbed four signaling pathways critical for body plan formation in the cephalochordate amphioxus and in zebrafish and compared the effects of such perturbations on gene expression and gene regulation in both species. Our data reveal that many developmental genes have gained response to these signaling pathways in the vertebrate lineage. Moreover, we show that the interconnectivity between these pathways is much higher in zebrafish than in amphioxus. We conclude that this increased signaling pathway complexity likely contributed to vertebrate morphological novelties during evolution.
SignificanceIn this manuscript, we address an essential question in developmental and evolutionary biology: How have changes in gene regulatory networks contributed to the invertebrate-to-vertebrate transition? To address this issue, we perturbed four signaling pathways critical for body plan formation in the cephalochordate amphioxus and in zebrafish and compared the effects of such perturbations on gene expression and gene regulation in both species. Our data reveal that many developmental genes have gained response to these signaling pathways in the vertebrate lineage. Moreover, we show that the interconnectivity between these pathways is much higher in zebrafish than in amphioxus. We conclude that this increased signaling pathway complexity likely contributed to vertebrate morphological novelties during evolution.
During embryonic development, thousands of genes are expressed in a coordinated and tightly regulated manner. This coordination is facilitated by complex hierarchical relationships between genes (1, 2), where the expression of a particular effector gene triggers the transcription of many other genes in a multilevel cascade that can involve hundreds of different targets (3). Signaling pathways control most of these genetic cascades, interconnecting many genes and playing pivotal roles in more complex gene regulatory networks (GRNs). As a consequence, they are key substrates for the generation of morphological diversity during evolution (4). However, the complexity of networks is not only dictated by the number of nodes but also by the number and patterns of interactions among them. In this context, effectors of signaling pathways constitute hubs in GRNs.It has already been demonstrated that after the vertebrate-specific whole genome duplications (WGDs), many duplicated developmental genes were maintained in this group (5). Furthermore, it is also known that regulatory landscapes in general, and especially those of developmental genes, have been expanded in the vertebrate lineage (6). Yet, it remains unclear how these features interplay to generate the highly complex body plans of adult vertebrates. Here, we investigated the contribution of key signaling pathways in the evolutionary transition from invertebrate chordates to vertebrates. To study this question, we compared the effects of interfering with the retinoic acid (RA), Wnt, fibroblast growth factor (FGF), and Nodal pathways during gastrulation in both amphioxus (a cephalochordate) and zebrafish embryos. To do so, we used pharmacological compounds known to act either as agonists of the RA and Wnt or antagonists of the FGF and Nodal pathways (7). We then examined the impact of these treatments on global gene expression by RNA sequencing(RNA-seq) in both species (Fig. 1 and ). Additionally, we performed assay for transposase accessible chromatin using sequencing (ATAC-seq) to identify open chromatin regions (8), including enhancers and promoters, affected by these manipulations (Fig. 1 and ).
Fig. 1.
Experimental design and differential analyses. (A) Overall design of the experiment: zebrafish and amphioxus embryos were treated with four different compounds at the blastula stage and dissociated at the late gastrula stage for RNA-seq and ATAC-seq library preparations. (B) GO term enrichment analysis for orthologous genes that are commonly affected in both amphioxus and zebrafish. Each panel shows GO enrichment visualization obtained with the REVIGO tool for one of the treatments, taking into account only genes significantly modified upon the indicated treatment. Circles represent GO terms, and the X and Y axes map the semantic space: the closer the terms appear in the plot, the more related they are. In order to facilitate direct comparisons, a blue line surrounds GO terms associated with developmental processes. Gray boxes at the right of each panel show some of the developmental genes represented in the plot.
Experimental design and differential analyses. (A) Overall design of the experiment: zebrafish and amphioxus embryos were treated with four different compounds at the blastula stage and dissociated at the late gastrula stage for RNA-seq and ATAC-seq library preparations. (B) GO term enrichment analysis for orthologous genes that are commonly affected in both amphioxus and zebrafish. Each panel shows GO enrichment visualization obtained with the REVIGO tool for one of the treatments, taking into account only genes significantly modified upon the indicated treatment. Circles represent GO terms, and the X and Y axes map the semantic space: the closer the terms appear in the plot, the more related they are. In order to facilitate direct comparisons, a blue line surrounds GO terms associated with developmental processes. Gray boxes at the right of each panel show some of the developmental genes represented in the plot.The WGDs at the base of the vertebrate lineage (5, 9) and the additional WGD in the teleost lineage (10) resulted in gene number imbalances between amphioxus and zebrafish. To overcome this limitation in our analysis, we used previously published data (6) to retrieve all the vertebrate gene family members (i.e., paralogy groups) corresponding to each amphioxus gene affected by the treatments. For zebrafish, we used only affected genes.The RNA-seq analyses revealed hundreds of differentially expressed genes following the different treatments in both amphioxus and zebrafish (). Interestingly, we found transcripts of orthologous genes similarly altered in both species upon the same treatment (Fig. 1). Gene ontology (GO) analysis for these commonly affected transcripts confirmed that they are highly associated with developmental processes (e.g., mesoderm, endoderm, and hindbrain development), which are known to be regulated by the tested signaling pathways (7, 11, 12) (Fig. 1 and Dataset S1). Surprisingly, only a few genes were similarly affected upon Wnt activation in both species (Fig. 1 and Dataset S1). We then examined all the genes affected by each of these treatments, and we observed that the number of genes perturbed in zebrafish is higher than that in amphioxus and, proportionally, more strongly associated with development and signaling terms. Moreover, confirming our experimental approach, the genes altered by these treatments and their corresponding GO terms clearly associated with developmental processes known to be regulated by these pathways (7, 11, 12) ( and Dataset S2).Regarding the ATAC-seq regions modified by the treatments, we observed a very different genomic distribution in both species, with a big proportion of peaks near gene promoters and very few distant enhancers in amphioxus and the opposite trend in zebrafish (), results that were expected from our previous work (6). We next analyzed the ATAC-seq data by searching for motifs enriched in peaks either more accessible after treatment with RA or Wnt agonists or less accessible upon treatment with Nodal or FGF inhibitors. The binding sites for transcription factors (TFs) that mediate signaling of these pathways and/or well-known downstream TFs of these pathways were found for all treatments in both species (13–27) ( and Dataset S3). Overall, these results confirmed that the pharmacological treatments of amphioxus and zebrafish embryos indeed perturbed the targeted pathways in vivo. Using previously published ATAC-seq data at different developmental stages in zebrafish and amphioxus (6), we calculated the ATAC-seq signal dynamics during embryonic development surrounding the differentially accessible regions upon treatments. We observed that, in general, these regions were active during development and showed a dynamic pattern of accessibility ().To better classify the genes that responded to perturbation of the different signaling pathways, we performed a clustering analysis of gene expression, which resulted in groups of genes with similar transcriptional behavior (Fig. 2). We then carried out GO enrichment analyses of these groups. The RNA-seq–derived clusters in zebrafish were mostly associated with GO terms related to embryonic development, while in amphioxus, we also detected many terms related to metabolism and cell homeostasis (Fig. 2 and Dataset S4). In some cases, the association of a single pathway with a specific function (as defined by the GO terms) was well established. For example, in amphioxus, retinol metabolism appeared as a main function in genes up-regulated by RA treatment (dark blue cluster) and muscle cell differentiation in genes down-regulated by Nodal inhibition (orange cluster). In zebrafish, heart formation and somitogenesis were associated with Nodal and FGF inhibition, respectively (dark orange and light blue clusters), while brain development was associated with Wnt pathway activation (light green cluster) (Fig. 2). This cluster analysis also revealed that in both amphioxus and zebrafish, the majority of developmental processes were influenced by several of the studied signaling pathways.
Fig. 2.
Clustering of RNA-seq and ATAC-seq data. (A) RNA-seq–derived gene expression FC in amphioxus (Upper) and zebrafish (Lower). FC values were calculated for each condition versus control samples by means of differential gene expression analyses. Tables on the right show three representative GO terms enriched in the cluster marked with the same color. (B) Clustering of ATAC-seq data in amphioxus (Left) and zebrafish (Right). The top DNA-binding motif found in each cluster is presented on the right, and three representative GO terms enriched in the clusters are shown in the tables below. dev., development; diff., differentiation; path., pathway; Reg., regulation; D/V, dorsal/ventral.
Clustering of RNA-seq and ATAC-seq data. (A) RNA-seq–derived gene expression FC in amphioxus (Upper) and zebrafish (Lower). FC values were calculated for each condition versus control samples by means of differential gene expression analyses. Tables on the right show three representative GO terms enriched in the cluster marked with the same color. (B) Clustering of ATAC-seq data in amphioxus (Left) and zebrafish (Right). The top DNA-binding motif found in each cluster is presented on the right, and three representative GO terms enriched in the clusters are shown in the tables below. dev., development; diff., differentiation; path., pathway; Reg., regulation; D/V, dorsal/ventral.In the case of ATAC-seq peak clustering, we assigned peaks to their putative target genes using the GREAT algorithm (28) in order to derive GO terms. In general, we observed a good correlation between the average ATAC-seq signal around the peaks in the different clusters, the GO terms of their putative target genes, and the TF-binding motifs identified within these peaks (Fig. 2). Furthermore, open chromatin regions altered upon treatments were mainly associated with development, especially in zebrafish (Fig. 2), which was consistent with our RNA-seq analysis. For example, the zebrafish cluster of ATAC-seq peaks that was, on average, more accessible upon Wnt stimulation (blue cluster) was associated with brain development (Fig. 2). In addition, TCF3 motifs, usually bound by effectors of the Wnt pathway, were found within these peaks (Fig. 2). Similarly, in both species, the clusters enriched following RA treatment (dark purple cluster in amphioxus, dark green cluster in zebrafish) were associated with response to RA and hindbrain development (Fig. 2) and contained RAR:RXR motifs (Fig. 2). Importantly, genes belonging to clusters that responded to the same pathways in both species shared similar functions, according to the GO terms associated with them ( and Dataset S4). Nevertheless, clusters of ATAC-seq peaks in zebrafish generally responded to more treatments than in amphioxus, and for this reason, the similarities of clusters found in both species were limited ().The majority of ATAC-seq peak clusters contained peaks that changed in more than one signaling pathway, which was similar to what we observed for the RNA-seq clustering. Several signaling pathways thus seemed to act through the same regulatory elements, suggesting a certain level of interconnection between the different pathways, especially in zebrafish (). The percentages of overlap between sets of ATAC-seq peaks affected by different pathways were variable in both species (from 0 to 72.4% in amphioxus and from 0 to 89.1% in zebrafish; ). Nevertheless, this overlap was significantly higher in zebrafish than in amphioxus ().In order to combine the treatment effect at the transcriptomic and regulatory levels, we intersected the differentially regulated genes (RNA-seq) with the genes associated to differential ATAC-seq peaks for each treatment and species. The resulting intersected gene lists are thereafter referred to as double-selected genes (DSGs). In this analysis, we included both positively and negatively affected genes/regulatory elements. Although the level of transcription of a gene did not necessarily correlate with the degree of accessibility of its promoter, comparisons of the treatment effects at the transcriptomic and regulatory levels in both species showed a slight correlation between them (). The intersection resulted in a total of 2,098 genes and 4,609 ATAC-seq peaks in zebrafish and 481 genes and 853 ATAC-seq peaks in amphioxus (Dataset S5). Although the lower numbers of treatment-affected genes and ATAC-seq peaks in amphioxus might be interpreted as a loss of regulatory information in this species, it is generally accepted that there was a general gain of regulatory input during the invertebrate chordate–to-vertebrate transition (6). It is thus very likely that this significant difference between zebrafish and amphioxus is indicative of a gain of response to these signaling pathways in vertebrates through the incorporation of cis-regulatory elements (CREs).We then clustered the results of the GO enrichment analyses corresponding to DSGs in amphioxus and zebrafish (Fig. 3 and ). We found that the P values associated with the enriched GO terms were, in general, much lower in zebrafish than in amphioxus and that the number of GO terms significantly affected by the different treatments was much higher in zebrafish (Fig. 3). This suggests that the regulatory networks involved in developmental processes are more complex in vertebrates than in invertebrate chordates. The same effect was observed when we analyzed gene families, which were not affected by the overestimation of the number of genes in amphioxus due to the inclusion of whole families of orthologous genes, instead of P values corresponding to specific genes ().
Fig. 3.
Clustering of GO terms. GO terms enriched in differentially regulated genes at the transcriptomic level (based on RNA-seq analysis) that are also associated with differential ATAC-seq peaks, clustered by P values (pval) in amphioxus (Upper) and zebrafish (Lower). BMP, bone morphogenetic protein; D/V, dorsal/ventral.
Clustering of GO terms. GO terms enriched in differentially regulated genes at the transcriptomic level (based on RNA-seq analysis) that are also associated with differential ATAC-seq peaks, clustered by P values (pval) in amphioxus (Upper) and zebrafish (Lower). BMP, bone morphogenetic protein; D/V, dorsal/ventral.These results further supported the notion that there are more genes controlled by the studied signaling pathways in vertebrates than in invertebrate chordates. Among the DSGs identified in zebrafish, we found significant numbers of TFs and regulators of signaling pathways (Dataset S6). We also found that some GO terms in zebrafish were significantly enriched for signaling pathways (e.g., targeting the FGF and Nodal pathways resulted in a significant overlap of development-related terms; Fig. 3). These two facts indicated that, in agreement with the results of our previous clustering analysis, different signaling pathways are interconnected and that the degree of connectivity is higher in vertebrates than in invertebrate chordates. To directly test this hypothesis, we counted the number of genes that were affected by one, two, three, or four different signaling pathways in zebrafish and amphioxus. Interestingly, independently of considering only RNA-seq data or combining RNA-seq and ATAC-seq information, we found that the interconnection between pathways was higher in zebrafish than in amphioxus, with a higher proportion of genes responding to two, three, or four perturbations in zebrafish (Fig. 4 and ). Moreover, we observed an increase of connectivity in vertebrates that was independent of the number of genes retained after the WGD events (). Nevertheless, we also found that the more gene copies were retained, the higher the gain of connectivity ().
Fig. 4.
Gene connectivity to different signaling pathways. (A) Percentage of genes that responded to one, two, three, or four different pathways in amphioxus (green bars) and zebrafish (blue bars), only at the RNA-seq level (Left) or at both the RNA-seq and ATAC-seq levels (Right). (B) Percentage of cells expressing highly connected genes (connectivity ≥3, pink bars) and lowly connected genes (connectivity = 1, light blue bars), according to single-cell RNA-seq experiments carried out in zebrafish (30) in ancient tissues (and thus also present in amphioxus) (Left) and in vertebrate-specific tissues (Right). Boxes represent the first, second, and third quartiles, lines represent the range. (C) Cytoscape plot showing connectivity networks in amphioxus (Left) and zebrafish (Right). Small blue dots represent genes, and big red dots represent signaling pathways. Gray lines mark the responsiveness of each gene to the connected signaling pathway. ns, not significant. *P value < 0.05; **P value < 0.01; ***P value < 0.001.
Gene connectivity to different signaling pathways. (A) Percentage of genes that responded to one, two, three, or four different pathways in amphioxus (green bars) and zebrafish (blue bars), only at the RNA-seq level (Left) or at both the RNA-seq and ATAC-seq levels (Right). (B) Percentage of cells expressing highly connected genes (connectivity ≥3, pink bars) and lowly connected genes (connectivity = 1, light blue bars), according to single-cell RNA-seq experiments carried out in zebrafish (30) in ancient tissues (and thus also present in amphioxus) (Left) and in vertebrate-specific tissues (Right). Boxes represent the first, second, and third quartiles, lines represent the range. (C) Cytoscape plot showing connectivity networks in amphioxus (Left) and zebrafish (Right). Small blue dots represent genes, and big red dots represent signaling pathways. Gray lines mark the responsiveness of each gene to the connected signaling pathway. ns, not significant. *P value < 0.05; **P value < 0.01; ***P value < 0.001.To address the question of whether this gain of connectivity is species-specific or a general distinction of vertebrates and invertebrates, we performed experiments in two additional species, the invertebrate hemichordate Ptychodera flava and another vertebrate, the amphibian Xenopus tropicalis. Embryos from both species were treated during gastrulation with the same antagonists of the FGF and Nodal pathways (). The results of these experiments suggest that increased connectivity is a general tendency of vertebrates compared to invertebrates, since we observed a similar distribution of connectivity in P. flava and amphioxus (P value 0.545, χ2 test), and a higher degree of connectivity in X. tropicalis (P values 0.006 and 0.048 against P. flava and amphioxus, respectively, χ2 test). Although the connectivity in X. tropicalis was lower than in zebrafish, in which the complexity of GRNs might have been further increased due to the extra round of WGD characterizing all teleost, this difference was not statistically significant (P value 0.196, χ2 test).The gain of complexity observed in the vertebrate species could be the product of the addition of new TF binding sites to the available regulatory regions during evolution or the generation of new regulatory regions. In order to shed light on this issue, we interrogated the seven regulatory regions known to be conserved between vertebrates and invertebrates (29), searching for TF-binding sites in the orthologous sequences of both zebrafish and amphioxus. We were unable to identify a significant increase in the number and/or complexity of the binding sites in the sequences of both species (). Pointing in the same direction, we found that newer regulatory regions presented a higher level of connectivity in zebrafish than more ancient enhancers (). Thus, according to these results, the gain in regulatory complexity in vertebrates might have been driven by the WGD-dependent increase in the number of regulatory regions. This hypothesis is coherent with the results of a previous study reporting a higher number of enhancers per gene in zebrafish when compared to amphioxus (6).Using already available single-cell RNA-seq data (30) in zebrafish, we observed that, interestingly, tissues that appeared as novelties during the invertebrate chordate–to-vertebrate transition, such as neural crest or sensory placodes, showed a statistically significant enrichment in the expression of highly connected genes, while in more evolutionary ancient tissues such as muscles or intestine, expression of highly and lowly connected genes was very similar (Fig. 4 and ). Finally, we used the Cytoscape tool (31) to visualize all connections between genes and the different signaling pathways in amphioxus and zebrafish (Fig. 4). This plot clearly showed that developmental GRNs associated with the four targeted signaling pathways were more interconnected in vertebrates.By comparing epigenomic and transcriptomic data in amphioxus, zebrafish, and other vertebrates, we recently showed that the invertebrate chordate–to–vertebrate transition was associated with an increase of regulatory information in the latter lineage and that this increase likely contributed to the spatial and functional specialization of some duplicated genes (6). Here, we go one step farther and demonstrate that some of the vertebrate CREs contribute to a more complex interconnection between the RA, Wnt, FGF, and Nodal signaling pathways. An increased interconnection between these four key developmental signaling pathways likely facilitated the restriction of the expression domains of some duplicated developmental genes, which, in turn, contributed to the increment of tissue complexity required to generate morphological novelties in vertebrates.
Methods
Animal Husbandry and Treatment of Embryos.
Zebrafish (Danio rerio) embryos were manipulated following the protocols that have been approved by the Ethics Committee of the Andalusian Government (license number 182–41106) and the national and European regulations established. All experiments with zebrafish were carried out in accordance with the principles of 3Rs (replacement, reduction, and refinement). Zebrafish embryos at 30% epiboly were treated by adding different drugs to the medium. In the case of SB505124, embryos were treated at the 256-cell stage (SB505124 is an inhibitor of transforming growth factor (TGF)-β type I receptors that blocks signaling through Nodal and other related TGF-β molecules; we will refer to this compound as Nodal inhibitor for simplicity). The final concentrations of drugs were all-trans retinoic acid (ATRA) 0.1 μM (cat no. R2625, Sigma-Aldrich Merck), SU5402 20 μM (cat no. SML0444, Sigma-Aldrich Merck), BIO 14 μM (cat no. B1686, Sigma-Aldrich Merck), and SB505124 30 μM (cat no. S4696, Sigma-Aldrich Merck). These molecules were dissolved in dimethyl sulfoxide (DMSO) (cat no. D2438, Sigma-Aldrich Merck). As a control experiment, the same volume of DMSO was added to the medium. Drugs were removed by changing the medium several times when embryos were at 80% epiboly stage. After that, embryos were carefully transferred to a glass Petri dish, and the chorion was removed using pronase.Amphioxus (Branchiostoma lanceolatum) adults were collected at the Racou beach in Argelès-sur-Mer (France). Spawning was induced as previously described (32, 33), and the fertilization of eggs was carried out in vitro. Amphioxus embryos were manipulated in filtered seawater unless otherwise specified. After fertilization, embryos were dechorionated in a Petri dish covered with agarose (0.8% agarose in filtered seawater) by pipetting them toward the border of the dish and then gently transferred to a small Petri dish. Drugs were dissolved in DMSO, and the following final concentrations were used: ATRA 1 μM (cat no. R2625, Sigma-Aldrich Merck), SU5402 25 μM (cat no. 572630, Sigma-Aldrich Merck), BIO 1 μM (cat no. B1686, Sigma-Aldrich Merck), and SB505124 50 μM (cat no. S4696, Sigma-Aldrich Merck). SB505124 was added to filtered seawater 3 h after fertilization, while the other drugs were added 5 h after fertilization. Control embryos were treated with 0.1% DMSO. At 15 h after fertilization, treated embryos were washed several times with filtered seawater to remove the drugs from the medium.Western clawed frog (X. tropicalis) embryos were treated with the antagonists for the FGF and Nodal pathways from late blastula (stage 9) to late gastrula stage (stage 12). Drugs were dissolved in DMSO, and the following final concentrations were used: 5 μM SU5402 (cat no. SML0444, Sigma-Aldrich Merck) and 100 μM SB505124 (cat no. S4696, Sigma-Aldrich Merck). Control embryos were treated with 0.1% DMSO. After treatment, embryos were washed with medium and subsequently used for total RNA extraction.P. flava adults were collected from Penghu Islands, Taiwan. Methods for spawning induction and embryo culture were described previously (34). P. flava embryos were treated with DMSO (control), SU5402 (20 μM) (35), or SB505124 (10 μM) at the late blastula stage (18 hpf) and collected at the late gastrula stage (43 hpf) for RNA-seq.
ATAC-seq.
ATAC-seq assays were performed in at least two biological replicates. Forty-five amphioxus embryos and 20 zebrafish embryos were dissociated into individual cells. After the number of cells was counted, around 70,000 cells were transferred to another tube to perform the experiment. ATAC-seq experiments were performed as previously described (6, 8). ATAC-seq data analyses were performed using standard pipelines (6, 8). Reads were aligned with Bowtie2 using GRCz10 (danRer10) and Bl71 (6) assemblies for zebrafish and amphioxus samples, respectively. Reads that were separated by more than 2 kb were filtered out of the analysis. The exact position of the Tn5 cutting site was determined as the position −4 (reverse strand) or +5 (forward strand) from the read start, and this position was extended 5 kb in both directions. BED files were transformed into BigWig using the wigToBigWig University of California, Santa Cruz (UCSC) tool. Reads were extended 100 bp in order to visualize the data in the UCSC Genome Browser (36). Macs2 (37) software was used to perform the peak calling in each sample, using the parameters –nomodel, –shift 45, and –extsize 100. The different called peaks of each sample were merged into a unique set of peaks, taking into account the replicates (two per sample). Using Bedtools (38), the number of reads per called peak and per sample in both treatment and control conditions was subsequently computed, and a differential analysis was performed using DESeq2 v1.18.0 in R 3.4.3 (39). A corrected P value <0.05 was set as cutoff for statistical significance of the differential accessibility of the chromatin in ATAC-seq peaks. Motif enrichment was calculated using the program FindMotifsGenome.pl from the Homer tool suite (40). k-means clustering of the ATAC-seq signal was performed using Deeptools 2.0 (41) and seqMiner (42). The assignment of ATAC-seq peaks to genes was done using the GREAT tool (43), with default parameters for basal plus extension regions calculation. We verified that the results were robust regardless of the peak-to-gene association method by comparing the GREAT association using the basal plus extension with an association to the closest genes. The overlap of genes associated by both methods was high (), and their functions were similar. Indeed, we calculated a GO enrichment overlap score of 0.94 out of 1 using the GOSemSim R package. GO analyses were carried out with the TopGO R package, using the elim test for selecting the most specific GO-enriched terms.
RNA-seq.
RNA-seq experiments were performed in three biological replicates for each species. RNA samples were extracted from 15 zebrafish, 100 amphioxus, or 20 western clawed frog embryos following previously published protocols (6). For the data analysis, reads were aligned against GRCz10 (danRer10) and Bl71 assemblies using STAR v2.5.3a (44) and were assigned to genes using the HTSeq toolkit v0.11.2 (45). In the case of P. flava and X. tropicalis, reads were pseudoaligned to the set of longest isoforms of the currently available transcriptome of P. flava (46) and to the set of primary coding DNA sequences of the XenTrop v10.0 transcriptome (47), respectively, using Kallisto with standard parameters (48). Differential gene expression analyses were performed using DESeq2 v1.18.0 (39). A corrected P value <0.05 and an absolute log2 fold change (FC) >1 were used as thresholds for calling differential genes. The enrichment of biological process GO terms was calculated using the TopGO R package. Gene clustering was performed using the Pheatmap 1.0.12 R package, with k-means as clustering method. The integration of differential ATAC-seq peaks and differential expressed genes was performed at the gene level in both species. The gene lists that resulted from these analyses were intersected for a given treatment to identify genes with a differential RNA-seq signal and one or more associated differential ATAC-seq peaks. Since there is currently no functional annotation available for the amphioxus genes, the orthologous zebrafish genes were used for completing the GO term enrichment analysis. Amphioxus versus zebrafish orthologous genes were already available from previous studies (6) (Dataset S7).
Connectivity Analyses.
In order to categorize genes that respond to more than one treatment, we computed a connectivity score based exclusively on genes responding at both the ATAC-seq and the RNA-seq levels. Each gene was assigned a discrete score ranging from 1 to 4 and corresponding to the number of different treatments that it responded to. Cytoscape (31) networks were generated to better represent the connectivity of responsive genes. The connectivity tables are provided in Dataset S7. Statistical analyses were performed using the programming language R.
Single-Cell RNA-seq (scRNA-seq) Analysis.
We selected genes with high connectivity (≥3) and low connectivity (=1) from the previously published scRNA-seq zebrafish developmental atlas (30) and explored their expression levels in a set of ancient and vertebrate-specific novel tissues. A gene was defined as expressed in a cell if its normalized expression in that cell was greater than 0. For a given tissue, we calculated the proportions of cells that expressed a certain gene and aggregated these proportions for genes with high or low connectivity scores in a boxplot. To determine which genes were expressed in each tissue, we defined a threshold of a minimum of 5% of the cells of that specific tissue. Subsequently, for a direct comparison between a set of ancient tissues (present in both zebrafish and amphioxus) and a set of vertebrate-specific tissues, we computed—and represented in barplots—the ratios between genes with high connectivity scores and genes with low connectivity scores expressed in a given tissue. Wilcoxon test (P < 0.05) was used in order to look for statistically significant differences.
Authors: Alexander Dobin; Carrie A Davis; Felix Schlesinger; Jorg Drenkow; Chris Zaleski; Sonali Jha; Philippe Batut; Mark Chaisson; Thomas R Gingeras Journal: Bioinformatics Date: 2012-10-25 Impact factor: 6.937
Authors: Sven Heinz; Christopher Benner; Nathanael Spann; Eric Bertolino; Yin C Lin; Peter Laslo; Jason X Cheng; Cornelis Murre; Harinder Singh; Christopher K Glass Journal: Mol Cell Date: 2010-05-28 Impact factor: 17.970
Authors: Jason D Buenrostro; Paul G Giresi; Lisa C Zaba; Howard Y Chang; William J Greenleaf Journal: Nat Methods Date: 2013-10-06 Impact factor: 28.547
Authors: Michael Hiller; Saatvik Agarwal; James H Notwell; Ravi Parikh; Harendra Guturu; Aaron M Wenger; Gill Bejerano Journal: Nucleic Acids Res Date: 2013-06-27 Impact factor: 16.971
Authors: Fidel Ramírez; Devon P Ryan; Björn Grüning; Vivek Bhardwaj; Fabian Kilpert; Andreas S Richter; Steffen Heyne; Friederike Dündar; Thomas Manke Journal: Nucleic Acids Res Date: 2016-04-13 Impact factor: 16.971
Authors: Eugenio Azpeitia; Stalin Muñoz; Daniel González-Tokman; Mariana Esther Martínez-Sánchez; Nathan Weinstein; Aurélien Naldi; Elena R Álvarez-Buylla; David A Rosenblueth; Luis Mendoza Journal: Sci Rep Date: 2017-02-10 Impact factor: 4.379
Authors: Alejandro Gil-Gálvez; Sandra Jiménez-Gancedo; Alberto Pérez-Posada; Martin Franke; Rafael D Acemel; Che-Yi Lin; Cindy Chou; Yi-Hsien Su; Jr-Kai Yu; Stephanie Bertrand; Michael Schubert; Héctor Escrivá; Juan J Tena; José Luis Gómez-Skarmeta Journal: Proc Natl Acad Sci U S A Date: 2022-03-09 Impact factor: 12.779