Literature DB >> 34145931

The genomics of mimicry: Gene expression throughout development provides insights into convergent and divergent phenotypes in a Müllerian mimicry system.

Adam M M Stuckert^1,2, Mathieu Chouteau³, Melanie McClure³, Troy M LaPolice¹, Tyler Linderoth⁴, Rasmus Nielsen⁴, Kyle Summers², Matthew D MacManes¹.

Abstract

A common goal in evolutionary biology is to discern the mechanisms that produce the astounding diversity of morphologies seen across the tree of life. Aposematic species, those with a conspicuous phenotype coupled with some form of defence, are excellent models to understand the link between vivid colour pattern variations, the natural selection shaping it, and the underlying genetic mechanisms underpinning this variation. Mimicry systems in which multiple species share the same conspicuous phenotype can provide an even better model for understanding the mechanisms of colour production in aposematic species, especially if comimics have divergent evolutionary histories. Here we investigate the genetic mechanisms by which vivid colour and pattern are produced in a Müllerian mimicry complex of poison frogs. We did this by first assembling a high-quality de novo genome assembly for the mimic poison frog Ranitomeya imitator. This assembled genome is 6.8 Gbp in size, with a contig N50 of 300 Kbp R. imitator and two colour morphs from both Ranitomeya fantastica and R. variabilis which R. imitator mimics. We identified a large number of pigmentation and patterning genes that are differentially expressed throughout development, many of them related to melanocyte development, melanin synthesis, iridophore development and guanine synthesis. Polytypic differences within species may be the result of differences in expression and/or timing of expression, whereas convergence for colour pattern between species does not appear to be due to the same changes in gene expression. In addition, we identify the pteridine synthesis pathway (including genes such as qdpr and xdh) as a key driver of the variation in colour between morphs of these species. Finally, we hypothesize that genes in the keratin family are important for producing different structural colours within these frogs.

Entities: Chemical

Keywords: Dendrobatidae; Ranitomeya; amphibians; aposematism; colour pattern; colour production

Mesh：

Year: 2021 PMID： 34145931 PMCID： PMC8457190 DOI： 10.1111/mec.16024

Source DB: PubMed Journal: Mol Ecol ISSN： 0962-1083 Impact factor: 6.622

INTRODUCTION

The diversity of animal coloration in the natural world has long been a focus of investigation in evolutionary biology (Beddard, 1892; Fox, 1936; Gray & McKinnon, 2007; Longley, 1917). Colour phenotypes can be profoundly shaped by natural selection, sexual selection, or both. Further, these colour phenotypes are often under selection from multiple biotic (e.g., competition, predation) and abiotic (e.g., temperature, salinity) factors (Rudh & Qvarnström, 2013). The mechanisms underlying colour and pattern phenotypes are of general interest because they can help explain the occurrence of specific evolutionary patterns, particularly in systems where these phenotypes embody key adaptations driving biological diversification. Adaptive radiations in aposematic species (those species which couple conspicuous phenotypes with a defence) provide examples of the effects of strong selection on such phenotypes (Kang et al., 2017; Ruxton et al., 2004; Sherratt, 2006, 2008). In these biological systems, geographically heterogeneous predation produces rapid diversification of colour and pattern within a species or group of species. This produces a diversity of polytypic phenotypes (defined as distinct defensive warning colour signals in distinct localities) that are maintained geographically, with each population characterized by a unique phenotype that deters predators. This spatial mosaic of local adaptations maintained by the strong stabilizing selection exerted by predators also results in convergence of local warning signals in unrelated species. Examples of impressive diversification within species and mimetic convergence between species have been documented in many biological systems, including Heliconius butterflies (Mallet & Barton, 1989), velvet ants (Wilson et al., 2015), millipedes (Marek & Bond, 2009) and poison frogs (Stuckert, Saporito, et al., 2014; Stuckert, Venegas, et al., 2014; Symula et al., 2001; Twomey et al., 2013), to name only a few of the documented examples of diverse aposematic phenotypes (Briolat et al., 2019). Understanding the genomic architecture underpinning these diversification events has been of substantial importance to the field of evolutionary biology (Hodges & Derieg, 2009). Hence, it is essential to examine these diverse colour pattern phenotypes and determine the mechanisms by which both divergence and convergence occurs at the molecular level. The majority of our knowledge of the genomics of warning signals and mimicry comes from butterflies of the genus Heliconius. In these insects, a small number of key genetic loci of large phenotypic effect control the totality of phenotypic variation observed within populations of the same species. These also provide the genetic underpinnings for mimetic convergence between distinct species, such as WntA (Martin et al., 2012) and optix (Reed et al., 2011; Supple et al., 2013), though there are many others probably involved as well (Kronforst & Papa, 2015). While Heliconius butterflies are excellent subjects for the study of mimicry, characterizing the genetics of mimicry in a phylogenetically distant system is critically important to determining whether adaptive radiations in warning signals and mimitic relationships are generally driven by a handful of key loci. Furthermore, given the dramatic differences in life history and social behaviour between these clades, studying aposematism and mimicry in Ranitomeya is likely to provide general and important insights into convergent mimetic phenotypes and the evolutionary processes that produce them. Here we investigate the genetics of convergent colour phenotypes in Ranitomeya from northern Peru. In this system, the mimic poison frog (Ranitomeya imitator, Dendrobatidae; Schulte, 1986) underwent a rapid adaptive radiation, such that it mimics established congenerics (Ranitomeya fantastica, R. summersi and R. variabilis), thereby “sharing” the cost of educating predators—a classic example of Müllerian mimicry (Stuckert, Saporito, et al., 2014; Stuckert, Venegas, et al., 2014; Symula, Schulte, & Summers, 2001, 2003). This is a powerful system for the evolutionary study of colour patterns, as the different R. imitator colour morphs have undergone an adaptive radiation to converge on shared phenotypes with the species that R. imitator mimics. Furthermore, despite the arrival of new genomic data that provide critical insights into the mechanisms of colour production in amphibians in general (Burgon et al., 2020), and poison frogs in particular (Rodríguez et al., 2020; Stuckert et al., 2019; Twomey et al., 2020), our knowledge of amphibian genomics is critically behind that of other tetrapods, both in terms of genomic resources as well as in our ability to make inferences from these resources. This is largely due to the challenging nature of these genomes, most of which are extremely large (frog genomes range from 1 to 11 Gbp, with an interquartile range of 3–5 Gbp; Funk et al., 2018) and rich with repeat elements that have proliferated throughout the genome (Rogers et al., 2018). This makes many amphibians a nearly intractable system for in‐depth genomic analyses (Funk et al., 2018; Rogers et al., 2018). As a result, there is a relative dearth of publicly available amphibian genomes (14 anuran species as of September 1, 2020). In fact, many of the available genomes are from a single group of frogs with a genome size of <1 Gbp, which is on the lower bound of known amphibian genome sizes (Funk et al., 2018). We investigated the genetics of Müllerian mimicry by first generating a high‐quality 6.8‐Gbp de novo genome assembly for the mimic poison frog, R. imitator (Figure 1). This is an important new resource for amphibian biologists, as it fills a substantial gap in the phylogenetic distribution of available amphibian genomes and enables for more detailed comparative work. A comparison between our R. imitator genome and the Oophaga pumilio genome provides insights into genome evolution within the family Dendrobatidae, particularly the proliferation of repeat content. We highlight these results in this paper. We then utilized this high‐quality Ranitomeya imitator genome to examine gene expression patterns using RNA sequencing of skin tissue from early tadpole development all the way through to the end of metamorphosis in both the mimic (R. imitator) and model species (R. fantastica and R. variabilis). As such, we were able to keep track of patterns of expression in genes responsible for colour throughout development both between colour morphs and between species. We aimed to identify the genes responsible for colour patterning that are convergent between model and mimic, as well as those genes whose role in colour and pattern may be species‐specific or even population‐specific. Colour patterns within these species begin to appear early during development when individuals are still tadpoles, which is consistent with observations that chromatophores (the structural elements in the integument that contain pigments) develop from the neural crest early during embryonic development (DuShane, 1935). This comparative genomic approach allowed us to carefully examine the genes and gene networks responsible for diversification of colour patterns and mimicry in poison frogs.

FIGURE 1

Normative photographs of frogs from the populations used in this study. The centre box represents the four mimetic colour morphs of Ranitomeya imitator. The left exterior box represents the two morphs of R. variabilis that R. imitator mimics, and the rightmost box represents two colour morphs of R. fantastica that R. imitator have a convergent phenotype with. R. imitator photos by A.S., R. fantastica and R. variabilis photos by M.C.

METHODS

Permits

Ranitomeya imitator

Animal use and research comply with East Carolina University’s IACUC (AUP #D281) and the University of New Hampshire’s IACUC (AUP #180705).

Ranitomeya fantastica and R. variabilis

The protocol for R. fantastica and R. variabilis sample collection was approved by the Peruvian Servicio Forestal y de Fauna Silvestre through authorization no. 232‐2016‐ SERFOR/DGGSPFFS and export permit no. 17PE 001718 and authorization from the French Direction de l’Environnement, de l’Agriculture, de l’Alimentation et de la forêt en Guyane no. 973‐ND0073/SP2000116‐13.

Generating a de novo genome for Ranitomeya imitator

Genome sequencing approach

Because all sequencing technologies are known to be biased in both known and unknown ways, we utilized a variety of sequencing technologies to assemble this complex and large genome. At the time of genome construction, both sequencing technologies and assembly algorithms were undergoing rapid change, and as such the generation of an optimal assembly required substantial trial and error. For instance, several of the authors of this paper were involved in the strawberry poison frog genome assembly (Rogers et al., 2018) and the difficulties encountered in the attempt to assemble that genome made it clear that short read data alone are insufficient if the goal is to assemble a highly contiguous and complete genome. To overcome the limitations associated with short read data, we collected linked and long read data from technologies thought to be complementary to one another (Illumina 10X, Oxford Nanopore and Pacific Biosystems).

10X chromium

A single, probably male, subadult R. imitator of the “intermedius” morph from the amphibian pet trade was used to produce a 10X library. This individual had no visible ovaries, and sex chromosomes in poison frogs are currently uncharacterized and therefore are not a mechanism used to identify sex. This frog was euthanized and high‐molecular‐weight DNA was extracted from liver tissue using the QIAGEN Blood & Cell Culture DNA Kit. A 10X Genomics Chromium Genome library (Weisenfeld et al., 2017) was prepared by the DNA Technologies and Expression Analysis Cores at the University of California Davis Genome Center and sequenced on an Illumina HiSeq X by Novogene Corporation (AB Mudd, DS Rokhsar, K Summers, and R Nielsen, unpubl. data).

Long read (Oxford Nanopore and Pacific BioSciences) sequencing

Captive bred subadult frogs from the pet trade that originated from the region near Tarapoto, Peru (green‐spotted morph, Figure 1) were euthanized and the skin and gastrointestinal tract were removed in order to reduce potential contamination from skin and gut microbial communities. To obtain the recommended mass of tissue for genomic DNA extraction, each frog was dissected into eight approximately equal chunks of tissue from the remaining portions of the whole‐body and DNA was extracted using a Qiagen Genomic Tip extraction kit. DNA concentration was quantified with a qubit 3.0 and fragment length was assessed with a TapeStation using a D1000 kit. For Nanopore sequencing we prepared libraries for direct sequencing via Oxford Nanopore using an LSK‐109 kit. Samples were loaded onto either R9 or R10 flowcells, which yielded minimum throughput. We basecalled raw fast5 files from Nanopore sequencing using the “read_fast5_basecaller.py” script in the ONT Albacore Sequencing Pipeline Software version 2.3.4. For Pacific Biosystems (PacBio) sequencing we used a Circulomics short‐read eliminator (Circulomics Inc.) kit to size‐select extracted DNA from 10 kb progressively up to 25 kb. After this, we sent ~15 µg of high‐molecular‐weight DNA to the Genomics Core Facility in the Icahn School of Medicine at Mt. Sinai (New York, USA) for library preparation and sequencing. Here libraries were prepared with 20‐ to 25‐kb inserts and were sequenced on three SMRTcell 8 M cells on a Pacific Biosciences Sequel II.

Genome assembly

We took a multifaceted approach to constructing the R. imitator genome, which contained iterative scaffolding steps. We detail our general approach here, and have a graphical depiction of this in Figure 2. PacBio has an algorithm to classify subreads that pass their minimum quality expectations for subreads as “good” subreads, and all subsequent analyses used only those reads passing this threshold. We used the contig assembler wtdbg2 version 2.5 (Ruan & Li, 2019) to create our initial assembly and create consensus contigs using only the PacBio data. wtdbg2 uses a fuzzy de bruijn graph approach to assemble reads into contigs. Since PacBio reads have, on average, lower per‐base accuracy than Illumina reads, we took several steps to correct individual bases in our assembly. To do this we first mapped the Illumina short reads to our assembly using bwa version 0.7.17‐r1188 (Li & Durbin, 2009). Polishing was then conducted using the software pilon version 1.22 (Walker et al., 2014). We used a flank size of five bases (thus ignoring the last five well‐aligned bases on either end of the alignment) and fixed both bases and gaps with a minimum size of one base.

FIGURE 2

Flowchart of our genome assembly approach. Colours of boxes represent the type of data used in that step (see internal key) and italicized font indicates the program(s) used

Flowchart of our genome assembly approach. Colours of boxes represent the type of data used in that step (see internal key) and italicized font indicates the program(s) used We then used arcs version 3.82 (Coombe et al., 2018) to scaffold our existing assembly using Illumina 10X data. Prior to doing this, we used the “basic” function within the program longranger (Marks et al., 2019) to trim, error correct and identify barcodes in the 10X data. We then used the “arks.mk” makefile provided in the arcs GitHub page (https://github.com/bcgsc/arcs/blob/master/Examples/arcs‐make), to run the arcs software. This makefile aligns the barcoded 10X data from longranger using bwa, then runs arcs to scaffold our assembly. After scaffolding our assembly with the 10X data, we ran one round of cobbler version 0.6.1 (Warren, 2016) to gap‐fill regions of unknown nucleotides (depicted as “N”s) in our assembly. Our input data were the full set of our PacBio and Nanopore data, which was mapped to our assembly using minimap2 version 2.10‐r761 (Li, 2018) with no preset for sequencing platform. We then ran rails version 5.26.1 (Warren, 2016) to scaffold again, with the same input data as for cobbler. We aligned the long‐read data with minimap2 because it maps a higher proportion of long reads than other aligners and we used bwa because it is more accurate for short‐read data (Li, 2018). After this we polished our assembly again by mapping Illumina reads with bwa and pilon. We then attempted to fill in gaps within our assembly using lr gapcloser (Xu et al., 2019). We ran this program iteratively, first using the Nanopore data and specifying the “‐s n” flag. We then filled gaps using lr gapcloser with PacBio reads and the “‐s p” flag. Following this, we again polished our assembly using bwa and pilon. We examined genome quality in two main ways. First, we examined the presence of genic content in our genome using Benchmarking Universal Single‐Copy Orthologs version 3.0.0 (busco; Simão et al., 2015) using the tetrapod database (tetrapoda_odb9; 2016‐02‐13). busco version 4.0 was released during this project, so we also examined our final assembly with busco version 4.0.6 and the tetrapod database (tetrapoda_odb10; 2019‐11‐20). Our second method of genome examination was genome contiguity. We used the assemblathon perl script (https://github.com/KorfLab/Assemblathon/blob/master/assemblathon_stats.pl) to calculate overall numbers of scaffolds and contigs as well as contiguity metrics such as N50 for both.

Repeats and genome annotation

We modelled genomic repeats with repeat modeler version 1.0.8 (Smit & Hubley, 2008–2015) using RepBase database 20170127. We used the classified consensus output from repeat modeler as input to repeat masker version 4.1.0 (Smit et al., 2013–2015) with the Homo sapiens database specified. We used a combined database of Dfam_3.1 and RepBase‐20181026 (Bao et al., 2015). We annotated our genome using maker version 3.01.02 (Campbell et al., 2014). We used transcript evidence from R. imitator to aid in assembly (“est2genome=1”). We produced transcript evidence from gene expression data of various R. imitator tissues and populations; for additional details on transcript evidence used to annotate this genome please see the Supplemental Methods. We annotated the genome using both the draft genome as well as a repeat masked genome from repeat modeler and repeat masker in order to identify the best annotation of our genome. We used busco version 4.0.6 (Simão et al., 2015) on the final version of the R. imitator genome hto locate putative duplicated orthologues within the genome. busco searches a database of universal single copy orthologues to measure genic completeness of the genome. busco identifies whether these single copy orthologues are present in the genome and if they are complete or fragmented. Additionally, it identifies whether these orthologues are present in a single copy or duplicated in the genome. Given the high proportion of duplicated orthologues present in our genome, we proceeded to investigate the read support for each of these duplicated regions across our sequencing technologies. We aligned PacBio data using minimap2 version 2.10 (Li, 2018). After we aligned the data, we used samtools version 1.10 (Li et al., 2009) to generate the depth at each base in the genome for data from each sequencing platform. We wrote a custom Python script (see code availability statement for link to code repository) to extract the depth per base within the duplicated regions of the genome. Additionally, we extracted the genic regions identified as single copy genes by busco in the same fashion. This allowed us to examine if there were any discrepancies between the duplicates and the rest of the genomic sequence. For each region in both the single copy and duplicated genes we calculated mean and standard deviation. We identified regions that were outliers in sequencing depth by identifying genes that had average sequencing depth that was outside of the average genome‐wide sequencing depth plus two standard deviations, or below 10× coverage. We did this because average coverage varied dramatically at a per‐base scale (PacBio genome‐wide coverage: average = 34.5 ± 79.1 SD). We view this as a method of detecting duplicated regions that have plausibly been incorrectly assembled (i.e., spuriously collapsed or duplicated), and is not completely definitive.

Identification of colour pattern candidate genes

Gene expression sample preparation

Samples were prepared differently for the mimic (R. imitator) and the model species (R. fantastica and R. variabilis). During the course of our work, we discovered that there were multiple groups approaching the same questions using collected samples from different species, but at slightly different time points. In light of this, we chose to combine our efforts into a single paper in an attempt at making broader inferences. We acknowledge this, and as a result, the data in this paper are analysed in a manner concordant with these differences. The initial breeding stock of R. imitator was purchased from Understory Enterprises. Frogs used in this project represent captive‐bred individuals sourced from the following wild populations going clockwise from top left in Figure 1: Tarapoto (green‐spotted), Sauce (orange‐banded), Varadero (redheaded) and Baja Huallaga (yellow‐striped). Tadpoles were sacrificed for analyses at 2, 4, 7 and 8 weeks of age. We sequenced RNA from a minimum of three individuals at each time point from the Sauce, Tarapoto and Varadero populations (except for Tarapoto at 8 weeks), and two individuals per time point from the Huallaga population. Individuals within the same time points were sampled from different family groups (Appendix Table 1). Tadpoles were anaesthetized with 20% benzocaine (Orajel), then sacrificed via pithing. Whole skin was removed and stored in RNA later (Ambion) at −20°C until RNA extraction. Whole skin was lysed using a BeadBug (Benchmark Scientific), and RNA was then extracted using a standardized Trizol protocol. RNA was extracted from the whole skin using a standardized Trizol protocol, cleaned with DNAse and RNAsin, and purified using a Qiagen RNEasy mini kit. RNA Libraries were prepared using standard poly‐A tail purification with Illumina primers, and individually barcoded using a New England Biolabs Ultra Directional kit as per the manufacturer's protocol. Individually barcoded samples were pooled and sequenced using 50‐bp paired‐end reads on three lanes of the Illumina HiSeq 2500 at the New York Genome Center.

Ranitomeya fantastica and Ranitomeya variabilis

We set up a captive colony in Peru (see Appendix) consisting of between six and 10 wild collected individuals per locality. We raised the tadpoles on a diet consisting of a 50/50 mix of powdered spirulina and nettle, which they received five times a week. Tadpoles were raised individually in 21‐oz plastic containers, within outside insectaries covered with 50% shading cloth, and water change was performed with rainwater. Three tadpoles per stage (1, 2, 5, 7 and 8 weeks after hatching; see Appendix Table 1) were fixed in an RNA later (Ambion) solution. To do so, tadpoles were first euthanized in a 250 mg/L benzocaine hydrochloride bath, then rinsed with distilled water before the whole tadpole was placed in RNA later and stored at 4°C for 6 h before being frozen at −20°C for long‐term storage. Before RNA extraction, tadpoles were removed from RNA later and the skin was dissected off. Whole skin was lysed using a Bead Bug, and RNA was then extracted using a standardized Trizol protocol. RNA libraries were prepared using standard poly‐A tail purification, prepared using Illumina primers, and individually dual‐barcoded using a New England Biolabs Ultra Directional kit. Individually barcoded samples were pooled and sequenced on four lanes of an Illumina HiSeq X at NovoGene. Reads were paired end and 150 bp in length.

Differential gene expression

We indexed our new R. imitator genome using star version 2.5.4a (Dobin et al., 2013). We removed adaptor sequences from reads using trimmomatic version 0.39 (Bolger et al., 2014). We then aligned our trimmed reads to our genome using star version 2.5.4a (Dobin et al., 2013), allowing 10 base mismatches (‐‐outFilterMismatchNmax 10), a maximum of 20 multiple alignments per read (‐‐outFilterMultimapNmax 20) and discarding reads that mapped at <50% of the read length (‐‐outFilterScoreMinOverLread 0.5). We then counted aligned reads using htseq‐count version 0.11.3 (Anders et al., 2015). Differential expression analyses were conducted in R version 3.6.0 (Team, 2019) using the package DESeq2 (Love et al., 2014). Some genes in our annotated genome are represented multiple times, and thus the alignment is nearly to the gene level with some exceptions. As a result, when we imported data into R we corrected for this by merging counts from htseq‐count into a gene‐level count. We filtered out low‐expression genes by removing any gene with a total experiment‐wide expression level ≤50 total counts. cDNA libraries for R. imitator were sequenced at a different core facility than those of R. fantastica and R. variabilis, so in order to statistically account for batch effects we analysed the data from each species independently (combining all species and batch effects in our data set produces rank deficient models). For each species we compared two models using likelihood ratio tests, one which tested the effect of colour morph and the other which tested the effect of developmental stage. Both models included sequencing lane, developmental stage and colour morph as fixed effects. We used a Benjamini and Hochberg (1995) correction for multiple comparisons and used an alpha value of .01 for significance. We then extracted data from our models for particular a priori colour genes that play a role in colour or pattern production in other taxa. This a priori list was originally used in Stuckert et al. (2019), but was updated by searching for genes that have been implicated in coloration in genomics studies from the last 3 years. Plots in this paper were produced using ggplot2 (Wickham, 2011). Finally, we ran an analysis with the specific intent of identifying genes involved in the production of different colour morphs that are convergent between model (R. fantastica or R. variabilis) and mimic (R. imitator). To do this we conducted a Walds test within species between the spotted and striped morph of R. imitator and R. variabilis that incorporated sequencing lane, tadpole age and colour morph as fixed effects. We then identified the set of genes that are differentially expressed between colour morphs in both species, as well as those that showed species‐specific patterns. We did this the same within species comparison using the banded and redheaded morphs of R. imitator and R. fantastica. To further elucidate potential genes that may influence convergent and divergent phenotypes in multiple species, we examined the list of all differentially expressed genes between colour morphs in R. imitator (via the likelihood ratio test described above) for genes that are differentially expressed between colour morphs in R. fantastica or R. variabilis (via the Walds tests we conducted).

Weighted gene correlation network analysis

To identify networks of genes that interact in response to differences in colour morphs, species or developmental stage, we used weighted gene co‐expression network analysis (WGCNA), an approach used for finding clusters of transcripts with highly correlated expression levels and for relating them to phenotypic traits, using the package WGCNA (Langfelder & Horvath, 2008). Although WGCNA names modules after colours, these name designations are unrelated to any phenotype in our data. For the WGCNA analysis, we used the variance stabilizing transformed data produced by deseq2 at this stage. WGCNA requires filtering out genes with low expression, which we did prior to all differential expression analyses. We estimated a soft threshold power (β) that fits our data, by plotting this value against mean connectivity to determine the minimum value at which mean connectivity asymptotes, which represents scale‐free topology. For our data, we used β = 10, the recommended minimum module size of 30, and we merged modules with a dissimilarity threshold <0.25. We then tested whether the eigenmodules (conceptually equivalent to a first principal component of the modules) were correlated with colour morphs, species or developmental stage at p < .05. While there may be many modules of co‐expressed genes, only a few are correlated with any given phenotype; these are the modules of particular interest to us. To examine gene ontology of these modules we then took module membership for each gene within these modules and ranked them in decreasing order of module membership. We then supplied this single ranked list of module membership to the Gene Ontology enRIchment anaLysis and visuaLizAtion tool (GOrilla), and ran it in fast mode using Homo sapiens set as the organism (Eden et al., 2009).

RESULTS

Ranitomeya imitator genome

Our initial contig assembly was 6.77 Gbp in length, had a contig N50 of 198,779 bp, and contained 92.3% of expected tetrapod core genes after polishing with pilon. After this initial assembly, we conducted two rounds of scaffolding and gap‐filling our genome using our data. Our final genome assembly was 6.79 Gbp in length and consisted of 73,158 scaffolds ranging from 1,019 to 59,494,618 bp in length with a scaffold N50 of 397,629 bp (77,639 total contigs with an N50 of 301,327 bp, ranging from 1,019 to 59,494,618 bp; see Figure 3). A total of 8,149 contigs were placed into scaffolds by our iterative scaffolding and gap‐filling, and on average scaffolds contain 1.1 contigs. Based on our busco analysis, the final genome contained 93.0% of expected tetrapod genes. We assembled 69.8% single copy orthologues and 23.2% duplicated orthologues. An additional 1.8% were fragmented and 5.2% were missing (see Table S1). The predicted transcriptome from maker had 79.1% of expected orthologues, 59.4% of which were single copy and complete, 19.7% of which were duplicated, 7.5% of which were fragmented and 13.4% missing.

FIGURE 3

Summary figures of genome assembly. (a) Distribution of scaffolds by length. (b) The cumulative proportion of the genome represented by scaffolds. Scaffolds were ordered from longest to shortest prior to plotting, and therefore in this panel the longest scaffolds are represented in the left portion of the figure. (c) Violin plot of the distribution of scaffold lengths in the genome. (d) Violin plot of the length of transcripts in the genome as predicted by maker. Red lines represent N50 in (a), (c) and (d) and L50 in (b)

Repetitive elements

Our analyses indicate a high proportion of repeats in the R. imitator genome (see Figure 4). repeat modeler masked 54.14% of total bases in the genome, of which 50.3% consisted of repeat elements (see Table 1). repeat masker was unable to classify most of the masked repeat bases (1.7 Gbp representing 25.67% of the genome). Many of the remaining repeats were retroelements (18.36%), nearly 10% of which were LINEs. Just under 8.5% were LTR elements, including 7.45% GYPSY/DIRS1 repeats. Given the quality of repeat databases and the scarcity of amphibian genomic resources in these databases, our results probably represent an underrepresentation of repeats in the genome as a whole. A fairly large proportion of the genome’s repeat elements are likely to be unassembled and missing in the genome, contributing to our underestimation of repeat content. Additionally, many of these repeat classes were long repeats (see Figure 5). For example, the average length of repeats was >1,000 bp in L1 (1,093.4 ± 1,490.5), L1‐Tx1 (1,417.3 ± 1,372.3), Gypsy (1,349.0 ± 1,619.3) and Pao (1,254.9 ± 1,385.8) repeat elements. Summary statistics on the number of instances, range, average length and standard deviation of range can be found in Table S2.

FIGURE 4

Pie chart of the contents of the Ranitomeya imitator genome. Much of the genome is classified as repeat elements, although we were unable to identify many of those

TABLE 1

Repeat elements classified by repeat masker. Most repeats fragmented by insertions or deletions were counted as a single element

Element type	Number of elements	Total length of elements (bp)	Percentage of genomic sequence
Retroelements	1,108,460	1,247,087,328	18.36
SINEs	5,440	428,468	0.01
Penelope	60,001	32,685,796	0.48
LINEs	655,737	671,087,603	9.88
CRE/SLACS	0	0	0
L2/CR1/Rex	241,795	194,226,476	2.86
R1/LOA/Jockey	0	0	0
R2/R4/NeSL	1,089	644,862	0.01
RTE/Bov‐B	8,249	5,177,385	0.08
L1/CIN4	343,520	436,013,718	6.42
LTR elements	447,283	575,571,257	8.47
BEL/Pao	3,504	2,966,778	0.04
Gypsy/DIRS1	340,203	505,961,953	7.45
Retroviral	85,837	44,644,563	0.66
DNA transposons	703,907	423,320,187	6.23
hobo‐Activator	144,891	93,261,380	1.37
Tc1‐IS630‐Pogo	504,231	296,915,689	4.37
En‐Spm	0	0	0
MuDR‐IS905	0	0	0
PiggyBac	10,239	8,260,072	0.12
Tourist/Harbinger	19,834	1,4807,248	0.22
Other (Mirage, P‐element, Transib)	0	0	0
Rolling‐circles	4,811	2,310,034	0.03
Unclassified	560,5837	1,743,383,014	25.67
Total interspersed repeats		3,413,790,529	50.26
Small RNA	0	0	0
Satellites	0	0	0
Simple repeats	193,0599	246,155,996	3.62
Low complexity	211,720	14,981,007	0.22

FIGURE 5

A violin plot of the length of individual repeats in the Ranitomeya imitator genome. Each point represents a single instance of the repeat

Pie chart of the contents of the Ranitomeya imitator genome. Much of the genome is classified as repeat elements, although we were unable to identify many of those Repeat elements classified by repeat masker. Most repeats fragmented by insertions or deletions were counted as a single element A violin plot of the length of individual repeats in the Ranitomeya imitator genome. Each point represents a single instance of the repeat A large proportion of tetrapod orthologues (i.e., those genes in busco’s tetrapoda database) are present in duplicate copies that are spread throughout scaffolds in the genome (23.2%). On average, scaffolds had 1.92 ± 2.93 duplicated orthologues, with a median and mode of 1. When normalized per 10,000 bp, the average was 0.060 ± 0.108 SD duplicated orthologues per 10,000 bp of a scaffold, with a median of 0.032. Intriguingly, the majority of orthologues identified as single copies by busco had average coverage greater than the average genome‐wide coverage. Duplicated orthologues, on the other hand, had coverage lower than the average genome‐wide coverage. Single copy orthologues had an average coverage of 60.5 ± 29.3 SD, whereas duplicated orthologues had an average coverage of 41.6 ± 31.1 SD, and a two‐tailed Student’s t test revealed a significant difference in coverage between single and duplicate copy orthologues (t 5094 = 23.965, p < .0001; see Figure S1).

Gene expression

We aligned an average of 20.97 million reads (±6.7 SD) per sample. On average, 73.27% of reads were uniquely mapped (±4.75% SD). Mapping rates were slightly higher in R. imitator than in R. fantastica or R. variabilis, because the libraries were of slightly higher quality in R. imitator. To further test if this was an artefact derived from mapping reads from other species to the R. imitator genome, we also mapped these reads to species‐specific transcriptome assemblies. We assembled these using data for each species in this study using the Oyster River Protocol (MacManes, 2018); additional details on this protocol can be found in the Appendix. After mapping our read data to these species‐specific transcriptome assemblies, we found similar results to that of our genome‐guided mapping. Thus, our mapping rates are driven primarily by the slightly lower quality of the R. fantastica and R. variabilis cDNA libraries (as exhibited by slightly lower Phred scores towards the 3’ end of reads) and not by species‐specific differences in coding regions. For data on the number of reads and mapping rates in each sample Table S3. All gene expression count data can be found in the GitHub repository (https://github.com/AdamStuckert/Ranitomeya_imitator_genome/tree/master/GeneExpression/data). Patterns of gene expression are largely driven by developmental stage (principal component 1; 43% of the variation) and species (principal component 2; 13% of the variance; see Figure 6). We note that differences in sample preparation and sequencing localities between R. imitator and R. fantastica/variabilis may be driving a portion of the pattern in principal component 2. However, the pattern found in principal component 2 closely parallels phylogeny, as R. fantastica and R. variabilis are more closely related to each other than either are to R. imitator (Brown et al., 2011). We then conducted a test for the effect of colour morph and developmental stage for each species independently. For a list of all differentially expressed colour genes see Table S4. For a list of all differentially expressed genes at alpha <.01 and alpha <.05 see Tables S5 and S6 respectively.

FIGURE 6

Principal component analysis of gene expression. The axes are labelled with the proportion of the data explained by principal components 1 and 2

Between developmental stage comparisons

In our comparison of developmental stages we found many differentially expressed genes (q < .01) in each species (R. imitator = 2,039, R. fantastica = 2,247, R. variabilis = 2,734; Table 2). Most of these are unlikely to be related to colour and patterning, although a small fraction of differentially expressed genes (average 3.5%) are found in our a priori list of genes that influence the generation of colour or patterning in other taxa (R. imitator = 92, R. fantastica = 106, R. variabilis = 77). Amongst genes that were significantly differentially expressed between developmental stages we identified genes related to carotenoid metabolism (e.g., bco1, retsat, scarb2, ttc39b; Figure 7), the synthesis of pteridines (e.g., gchfr, qdpr, pts, xdh; Figure 7), genes related to melanophore development and melanin synthesis (dct, kit, lef1, mitf, mlph, mreg, notch1, notch2, sfxn1, sox9, sox10, tyr, tyrp1; Figure 8), genes putatively related to the production of iridophores and their guanine platelets (e.g., gart, gas1, paics, pacx2, pax3‐a, pnp, rab27a, rab27b, rab7a, rabggta; Figure 9) and genes related to patterning (notch1, notch2).

TABLE 2

Number of differentially expressed genes for each comparison

Species	LRT between morphs		LRT between developmental stages		Genes differentially expressed between colour morphs and developmental stages
Species	Differentially expressed genes	Differentially expressed genes a priori colour genes	Differentially expressed genes	Differentially expressed genes a priori colour genes
Ranitomeya imitator	1,528	39	2,039	92	249
Ranitomeya fantastica	1,112	46	2,247	106	407
Ranitomeya variabilis	1,065	33	2,734	77	574

Abbreviation: LRT, likelihood ratio test.

FIGURE 7

Gene expression for select carotenoid and pteridine genes

FIGURE 8

Gene expression for select melanin genes

FIGURE 9

Gene expression for select iridophore genes

Number of differentially expressed genes for each comparison Abbreviation: LRT, likelihood ratio test. Gene expression for select carotenoid and pteridine genes Gene expression for select melanin genes Gene expression for select iridophore genes

Between morph comparisons

In our comparison of colour morphs we found many significantly differentially expressed genes in each species (R. imitator = 1,528, R. fantastica = 1,112, R. variabilis = 1,065; Table 2). Most of these are unlikely to be related to colour and patterning, although a small fraction of differentially expressed genes (average 3.3%) are related to the generation of colour or patterning in other taxa (R. imitator = 39, R. fantastica = 46, R. variabilis = 33). Amongst genes that were differentially expressed between colour morphs we identified genes related to carotenoid metabolism or xanthophore production (e.g., aldh1a1, ttc39b; Figure 7), the synthesis of pteridines (e.g., gchfr, qdpr, pts, xdh; Figure 7), genes related to melanophore development and melanin synthesis (kit, lef1, mlph, mreg, sfxn1, sox9, sox10), and genes putatively related to the production of iridophores and their guanine platelets (e.g., atic, dock7, gart, paics, pacx2, pax3‐a, rab27a, rab27b, rab7a, rabggta; Figure 9).

Convergent gene expression patterns between model and mimic

We compared gene expression between morphs for which we could make a direct comparison of expression between model and mimic (i.e., striped vs. spotted, banded vs. redheaded). In the striped vs. spotted comparison, we identified 323 differentially expressed genes in R. imitator and 1,260 in R. variabilis. Of these genes, 47 were shared between the two species and one was in our a priori colour gene list (edaradd). In the banded vs. redheaded comparison, we identified 1,035 differentially expressed genes in R. imitator and 1,335 in R. fantastica. Of these genes, 128 were shared between the two species and two were in our a priori colour gene list (pkc‐3 and qdpr). We additionally further broadened our search to examine any gene differentially expressed between morphs in two or more of our analysed species. Using this method, we identified an additional 10 colour genes (crabp2, dst, edar, erbb3, gchfr, kitlg, krt2, rab27b, rb1, tspan36). We identified 20 individual gene modules in our expression data using WGCNA (see Figure 10 WGCNA modules). Of these, we identified 14 modules that were significantly correlated with developmental stage; green (r = −.330; p < .0001), midnight blue (r = −.446; p < .0001), purple (r = −.442; p < .0001), greenyellow (r = −.652; p < .0001), light cyan (r = −.510; p < .0001), black (r = −.268; p = .0040), grey 60 (r = −.405; p < .0001), red (r = −.585; p < .0001), light green (r = .573; p < .0001), blue (r = .669; p < .0001), cyan (r = .734; p < .0001), magenta (r = .215; p = .022), light yellow (r = −.345; p = .00017), and salmon (r = .431; p < .0001). We identified 10 modules that were significantly correlated with species; green (r = −.280; p = .0025), pink (r = −.566; p < .0001), purple (r = .308; p = .000857), greenyellow (r = .320; p = .000529), light cyan (r = .226; p = .0156), grey 60 (r = .196; p = .0366), cyan (r = −.209; p = .0257), magenta (r = .304; p = .00103), salmon (r = −.308; p = .000869) and grey (r = .656; p < .0001). Finally, we identified three modules that were significantly correlated with colour morph, the grey (r = .403; p < .0001), pink (r = −.372; p < .0001), and purple modules (r = .190; p = .042). Significant GO terms associated with each module that is correlated with species, colour morph and developmental stage are found in Tables S7, S8 and S9 respectively.

FIGURE 10

The relationship between WGCNA modules and treatment groups. Within each cell the top number represents the R value (correlation between the grouping and module expression) and the bottom number represents the adjusted p value. The strength of the colour (i.e., darker red or darker blue) in the heatmap represents the strength of the association

DISCUSSION

The genetic, biochemical, cellular, physiological and morphological mechanisms that control coloration in mimetic systems are of interest because of their substantial impacts on survival. Despite this, these mechanisms are poorly characterized. Further, genetic mechanisms and genomic resources in amphibians are limited and poorly understood, particularly compared to better known groups like mammals and fish. In this study, we examined how gene expression contributes to differential phenotypes within species in a Müllerian mimicry complex of poison frogs. To do this we assembled a high‐quality genome for the mimic poison frog Ranitomeya imitator, which we leveraged to conduct gene expression analyses. Here we describe the resulting R. imitator genome assembly and highlight key pathways and genes that probably contribute to differential colour production within species, illuminating the mechanisms underlying Müllerian mimicry, and providing a rich foundation upon which future research may be built.

Genome

Our newly assembled R. imitator genome is a large, high‐quality genome with high genic content. This assembled genome is 6.8 Gbp in length and contains 93% of the expected genes according to our busco results. Further, our genome is relatively contiguous with a contig N50 of over 300 Kbp. For comparison, the O. pumilio genome produced with short read technologies had a contig N50 of 385 bp and many genic regions were not assembled, presumably because of long intronic regions strewn with repeat elements (Rogers et al., 2018). This dramatic difference in genome contiguity and genic content indicates that long read technologies are, unsurprisingly, critically important and can produce genomes with contiguity spanning large regions, even for species with large genomes containing many long, repetitive regions. Further, the relatively high error rate of current long read technologies does not preclude the ability to assemble and identify genes, as evidenced by our high busco score. This genome is a valuable resource and is well suited to a variety of future work, especially RNA sequencing analyses such as those we present below.

Repeat elements in the Ranitomeya imitator genome

Roughly 50% of our genome assembly was masked as repeats. This is a large proportion of the genome, but not as large as that of the strawberry poison frog (Oophaga pumilio), which was estimated to consist of ~70% repeats (Rogers et al., 2018). In comparison, the genome of Xenopus tropicalis is a little over one‐third repeat elements and Nanorana parkeri is ~48% repeats, which are both comparable to a mammalian genome (Hellsten et al., 2010; Sun et al., 2015). Scattered throughout the R. imitator genome are a large number of repeat elements which represent over half of the assembled genome. While our assembly of this genome is similar in size to the estimated size of the O. pumilio genome, we found a much smaller proportion of particular families of repeat elements (e.g., Gypsy) compared with Rogers et al. (2018). Instead, the R. imitator genome seems to have a higher evenness of repeat element types spread throughout the genome. A number of the repeat elements that we identified had an average length longer than 1,000 bp, including L1 (1,093.4 ± 1,490.5), L1‐Tx1 (1,417.3 ± 1,372.3), Gypsy (1,349.0 ± 1,619.3) and Pao (1,254.9 ± 1,385.8). Additionally, these repeat classes made up a large portion of the genome. Our R. imitator genome assembly has less than half the total content for many of the most abundant families of repeats in the strawberry poison frog, including Gypsy (0.44 Gbp vs. 1 Gbp), Copia (3 Mbp vs. 298 Mbp), hAT (97 Mbp vs. 255 Mbp) and Mariner (0.6 Mbp vs. 197 Mbp). However, R. imitator has much higher total repeat content of Tc1 (298 Mbp vs. 181 Mbp) and a large portion of the R. imitator genome consists of unidentified repeats (25%, 1.87 Gbp). Thus, it seems likely that different families of repeats have proliferated between the two poison frog genomes. However, there are two caveats to this conclusion: first is the proportion of repeats that we were unable to classify in the R. imitator genome, and second is that our two studies used very different input data, assembly algorithms and methods of analysing repeats (repdenovo in O. pumilio and repeatmasker in R. imitator) which are a potential source of bias. Our busco analysis also identified a large number of genes that have been duplicated scattered throughout the genome—23.2% of orthologues expected to be present as a single‐copy are in fact present in duplicate copies. These are spread throughout the scaffolds in the genome, and the same repeated orthologue is found at most three times in the genome. The repeats appear almost exclusively in two copies that very rarely appear on the same scaffold. Both copies of a duplicated orthologue only appeared on the same scaffold four times. All of the other duplicated genes were split across scaffolds. Additionally, we found that most single copy orthologues had higher‐than‐average coverage, whereas most duplicated orthologues had below average coverage. Indeed, relative to single copy orthologues, a larger proportion of duplicated orthologues had particularly low coverage (<10×). There are two plausible explanations for this phenomenon. First, this could be driven by historical duplication events (perhaps even chromosomal), followed by sequence divergence and possibly pseudogenization. This would lead to both multiple copies of orthologues as well as mapping ambiguities due to sequence similarities. Alternatively, this evidence, while preliminary, could indicate that some proportion of the large number of duplicated orthologues found in the genome arise from uncollapsed regions of heterozygosity. This is consistent with duplicated orthologues being present in only two copies that often have low coverage.

Gene expression

While the colours and patterns of Ranitomeya poison frogs are extremely variable, they always consist of vivid colour patches overlying a background that is largely black in most colour morphs. Recent evidence indicates that much of the differences in colour in poison frogs are derived from the structure (thickness) and orientation of iridophore platelets (Twomey, Kain, et al., 2020). Additionally, specific pigments that are deposited in the xanthophores, such as pteridines and carotenoids, interact with these structural elements to influence integumental coloration (in the yellow to red ranges of hue). Black and brown coloration is produced by melanophores and the melanin pigments found within the chromatophores (Bagnara et al., 1968; Duellman & Trueb, 1986. These data are corroborated by new genomic data that seem to highlight the importance of pigment production and modification genes such as those in the melanin synthesis pathway (Stuckert et al., 2019; Posso‐Terranova & Andrés, 2017), pteridine synthesis pathway (Rodríguez et al., 2020; Stuckert et al., 2019) and carotenoid processing pathways (Twomey, Johnson, et al., 2020) for their roles in producing different colour morphs in poison frogs. We conducted a targeted analysis of genes which show convergent expression patterns between the model and mimic. Somewhat surprisingly, there was minimal overlap in genes that were differentially expressed between convergent colour morphs in both the model and mimic. Of those genes that were shared, only three were in our a priori colour gene list (edaradd, pkc‐3 and qdpr). This seems to indicate one of three things: (i) the pattern of differential gene expression affecting convergent colour morph development is largely species‐specific, (ii) the expression patterns of a small number of genes have a very large effect on colour morph divergence, or (iii) we have insufficient power to identify these convergent genes. Our slightly less restrictive analysis of genes differentially expressed between any morph in multiple species only yielded an extra 10 colour genes, which is again fewer than we would have predicted if convergent phenotypes are being driven by the same mechanisms between species. Overall, these results suggest that the convergent colour patterns of these species are likely to have evolved via the expression of distinct underlying genetic and biochemical pathways. These findings are largely corroborated by the gene network analyses (WGNCA), which identified many more gene modules associated with developmental stage (14) or species (10) than with colour morph (three). In sum, colour differences are likely to be driven by expression patterns of a few genes of large effect. As for the genes that were differentially expressed throughout development, many are probably related to body restructuring rather than coloration per se. Nevertheless, we identified a number of very promising candidate colour genes that are likely to play a role in the production of mimetic phenotypes in this system.

Yellow, orange and red coloration

Yellows, oranges and reds are determined in large part by the presence of pigments deposited within the xanthophores, the outermost layer of chromatophores in the skin (Duellman & Trueb, 1986). These pigments are primarily composed of pteridines and carotenoids, and many studies to date have documented that these pigments play a key role in the production of yellows, oranges and reds (Croucher et al., 2013; Grether et al., 2001; Mcgraw et al., 2006; McLean et al., 2017; Obika & Bagnara, 1964). Given the clear importance of xanthophores, pteridines and carotenoids in the production of yellows, oranges and reds, we briefly examine the contribution of genes in all three pathways here, beginning with xanthophore production.

Xanthophores

We identified a number of key genes that are differentially expressed that are required for the production of xanthophores (Figure 7), notably paired box 7 (pax7) and xanthine dehydrogenase (xdh). pax7 is a transcription factor that is required for establishing and differentiating xanthophores in both embryonic and adult zebrafish (Nord et al., 2016). pax7 was differentially expressed between morphs in R. fantastica, notably with higher expression in the orange‐banded morph. xdh is sometimes referred to as a xanthophore differentiation marker, as it is found in xanthoblasts and is required for synthesizing the pteridine pigment xanthopterin (Epperlein & Löfberg, 1990; Parichy et al., 2000; Reaume et al., 1991). In our study, this gene exhibited differential expression across development in both R. imitator and R. fantastica. Additionally, we saw differential expression in this gene between R. imitator colour morphs, with the highest expression in the orange banded morph. Given their roles in xanthophore differentiation, pax7 and xdh are excellent candidates for production of xanthophores across all Ranitomeya, and early differences in expression of these genes could lay the groundwork for markedly different colours and patterns.

Pteridine synthesis genes

Genes and biochemical products in the pteridine pathway are important for pigmentation in the eyes and for vision, as well as for coloration across a wide variety of taxa (e.g., invertebrates, fish, lizards, amphibians), as data from both genetic studies and biochemical assays of pigments point to pteridines as important components of animal coloration. Although a number of genes in the pteridine pathway have been implicated in producing different colour patterns, in general the genetic control of pteridine pigmentation is poorly characterized and largely comes from studies of Drosophila melanogaster (Braasch et al., 2007; Kim et al., 2013). In this study we found a number of key pteridine synthesis genes that were differentially expressed between colour morphs (Figure 7). Prominent amongst these are the aforementioned xanthine dehydrogenase (xdh), quinoid dihydropteridine reductase (qdpr) and 6‐pyruvoyltetrahydropterin synthase (pts). In addition to its role in early xanthophore lineages, xdh appears to be highly conserved and its expression plays a role in the production of pterin‐based coloration in a variety of taxa such as spiders (Croucher et al., 2013), fish (Parichy et al., 2000; Salis et al., 2019), and the dendrobatid frogs D. auratus and O. pumilio (Rodríguez et al., 2020; Stuckert et al., 2019). Experimental inhibition of xdh causes a reduction in the quantity of pterins, resulting in an atypical black appearance in axolotls (Frost, 1978; Frost & Bagnara, 1979; Thorsteinsdottir & Frost, 1986). Additionally, typically green frogs with deficiencies in the xdh gene appear blue due to the lack of pterins in the xanthophores (Frost, 1978; Frost & Bagnara, 1979). As discussed above, xdh had the highest expression in the orange banded morph of R. imitator. Because xdh plays a role in the transformation of pterins into several different yellow pterin pigments such as xanthopterin and isoxanthopterin (Ziegler, 2003), differential expression of xdh is a plausible mechanism for the production of orange coloration in the banded R. imitator. In sum, xdh may function in xanthophore production and/or pterin synthesis and is a likely driver of the production of yellow, orange and green colours in this mimicry system. One of the first genes in the pteridine synthesis pathway is 6‐pyruvoyltetrahydropterin synthase (pts), which is responsible for producing the precursor to both the orange drosopterin pigment as well as yellow sepiapterin pigment. This gene has been implicated in the production of yellow colour phenotypes in a variety of systems (Braasch et al., 2007; McLean et al., 2019; Rodríguez et al., 2020), and has been linked to different colours in the poison frog O. pumilio (Rodríguez et al., 2020). While the precise mechanism by which this gene affects coloration is unknown, as the above studies indicate, differential expression of pts is often found in relation to yellow or orange skin phenotypes. In our study, we identified significant differential expression throughout development in R. fantastica and R. imitator, with overall expression of this gene increasing throughout development. Additionally, we see differential expression in pts between colour morphs of R. variabilis, with higher expression in the yellow striped morph, particularly close to the end of development. We did not find differential expression of pts in R. imitator. This indicates that pts may be a critical gene in developing yellow pigmentation in R. variabilis, but that it may not play the same role in R. imitator. Interestingly, however, Twomey, Kain, et al. (2020) found that the yellow pterin sepiapterin was only present in very small concentrations in the yellow striped morph of R. variabilis. Quinoid dihydropteridine reductase (qdpr) is another gene involved in the pteridine synthesis pathway and is known to alter patterns of production of the yellow pigment sepiapterin (Ponzone et al., 2004). We found differential expression in this gene across developmental stages in all three species in this study. Notably, the expression of qdpr showed a stark decline over development. qdpr showed a convergent pattern of differential expression between colour morphs in both R. imitator and R. fantastica, in both cases with the highest expression in the redheaded morph, although expression was also high in the yellow striped morph of R. imitator. This indicates that qdpr is probably playing a similar role in colour production in both the model and the mimic species. The qdpr gene was also differentially expressed across populations of another species of poison frog, and was only expressed in light blue or green coloured morphs which are likely to derive some of the coloration from pteridines (Stuckert et al., 2019). In combination, these studies indicate that qdpr may be playing a role in the production of pteridine pigmentation in this system and in amphibians in general, particularly since qdpr alters sepiapterin production (Ponzone et al., 2004).

Carotenoid genes

Carotenoids are important for both yellow and red coloration across a diversity of life forms (McGraw et al., 2006; Toews et al., 2017). While carotenoids are clearly an important class of pigments that broadly influence coloration, few known genes contribute to carotenoid based colour differences (e.g., bco2, scarb1, retsat), although this is probably an underestimation of the actual number of genes playing a role in carotenoid synthesis and processing given that we are continuously discovering new genes that play these roles (Emerling, 2018; Twomey, Johnson, et al., 2020). It appears that orange and red coloured Ranitomeya (outside of R. imitator) have slightly higher overall carotenoid concentrations, although these colours are more likely to be derived from the pteridine drosopterin (Twomey, Kain, et al., 2020). However, different colour morphs of R. imitator possess similar quantities of carotenoids (Twomey, Kain, et al., 2020), providing little evidence that carotenoid levels influence coloration in R. imitator, but that they may be important in other Ranitomeya spp. Intriguingly, Crothers et al. (2016) found no relationship between coloration and overall carotenoid abundance in the bright orange Solarte morph of O. pumilio, but did find a relationship between total dorsal reflectance and two carotenoids (xanthophyll and a canary xanthophyll ester). Thus, the evidence seems to indicate that overall carotenoid concentrations are not particularly important for red, orange or yellow colorations, and that instead these colours may be driven by a small subset of carotenoids or pteridines. We found differential expression of a number of carotenoid genes across development (e.g., aldh1a1, aldh1a2, bco1, retsat, scarb2). Although we did not find evidence that many known carotenoid genes are a major contributor to differences between specific colour morphs in our study, we did find differential expression of some carotenoid genes between morphs (e.g., slc2a11, ttc39b). We found that a gene that has recently been putatively linked to carotenoid metabolism, the lipoprotein coding gene tetratricopeptide repeat domain 39B (ttc39b), was differentially expressed between colour morphs of R. variabilis and R. fantastica (note the latter finding depends on lowering our stringent alpha value to .05; Figure 7). Tetratricopeptide repeat domain 39B is upregulated in orange or red skin in a variety of fish species (Ahi et al., 2020; Salis et al., 2019). Hooper et al. (2019) found that this gene is associated with bill colour in the long‐tailed finch, Poephila acuticauda, and they hypothesized that ttc39b is being used to transport hydrophobic carotenoids to their deposition site. We found that this gene was most highly expressed in the yellow–green spotted morph of R. variabilis and the redheaded morph of R. fantastica, consistent with findings from other gene expression studies. Given the repeat occurrence of this gene in pigmentation studies outlined above, functional validation of ttc39b is likely to be informative. While we found very few putative carotenoid genes identified in other taxa that were differentially expressed between our colour morphs, we nevertheless identified the oxidoreductase activity gene ontology as the primary gene ontology term in the purple module associated with colour morphs. In addition to some of the pteridine genes discussed above, this module contains a number of candidates for carotenoid metabolism, including retinol dehydrogenase and a variety of cytochrome p450 genes. A recent study by Twomey, Kain, et al. (2020) identified CYP3A80 as a novel candidate for carotenoid ketolase that is preferentially expressed in the livers of red Ranitomeya sirensis. Our gene ontology analysis results indicate that there may be additional genes strongly influencing carotenoid synthesis and processing in this system that have not been identified in previous studies.

Melanophore genes

Many of the differentially expressed genes in our data set occurred between developmental stages as tadpoles undergo a complete body reorganization as they prepare to metamorphose. In addition to growth, development and metamorphosis, during this time tadpoles are producing both the structural chromatophores and the pigments that will be deposited within them. In many species, the distribution of chromatophores that will regulate patterns are set early during development. Of these, melanin‐based coloration is the best understood aspect of coloration, in no small part because of a long history of genetic analyses in laboratory mice (Hoekstra, 2006; Hubbard et al., 2010). As a result, there are a large number of genes that are known to influence the production of melanin, melanophores and melanosomes. In vertebrates, black coloration is caused by light absorption by melanin in melanophores (Sköld et al., 2016). Melanophores (and the other chromatophores) originate from populations of cells in the neural crest early in development (Park et al., 2009). The four colour morphs of Ranitomeya used in this study have pattern elements on top of a generally black dorsum and legs, and therefore melanin‐related genes are likely to play a key role in colour and pattern, both throughout development and between colour morphs. Given that a large portion of pigmentation arises during development when we sampled individuals, we found that many of our differentially expressed candidate genes are in this pathway. Prominent amongst these genes are dct, kit, lef1, mc1r, mitf, mlph, mreg, notch1, notch2, sox9, sox10, tyr and tyrp1, all of which were differentially expressed across development in at least one species. It seems likely these genes are contributing to colour production across species given the important roles of each of these genes in melanocyte development and melanin synthesis. In fact, the well‐known patterning genes in the notch pathway (e.g., notch1, notch2; Figure 8) seem to be important in all three species, as notch1 was differentially expressed across development in R. imitator and R. variabilis, and notch2 in R. fantastica (Hamada et al., 2014). Expression patterns of melanophore and melanin synthesis genes did not follow a consistent pattern overall and instead were variable. In fact, one of the two a priori colour genes that was differentially expressed between convergent morphs in both R. imitator and R. fantastica showed opposite differential expression patterns. This gene, pkc‐3, has been implicated in melanocyte proliferation, dendricity and melanogenesis (Park and Gilchrest, 1993), indicating that this gene is probably critically important to the production of brown and black coloration. Different directionality of expression of pkc‐3 between R. imitator and R. fantastica is a peculiar finding, potentially indicating that this gene is influencing different pathways that are species‐specific. Our short read lengths in R. imitator preclude adequately testing this hypothesis. We found similar results with edaradd (Ectodysplasin‐A Receptor‐Associated Death Domain), which was differentially expressed between the spotted and striped morphs of R. imitator and R. variabilis. Like pkc‐3, while this gene was differentially expressed in both the model and the mimic, it showed different expression patterns between the two species. This gene is known to play a role in ectodermal dysplasia, a catch‐all term for a suite of similar human diseases that produce abnormalities of ectodermal structures. Phenotypically these appear as sparse hair, abnormal teeth and hair, and, most relevant to our study, abnormally light pigmentation (Cluzeau et al., 2011). Another gene in this pathway, edar (Ectiodysplasin A Receptor), was differentially expressed between colour morphs of both R. fantastica and R. variabilis, potentially indicating the importance of the TNFα‐related signalling pathway (Reyes‐Reali et al., 2018; Cluzeau et al., 2011). The TNFα‐related signalling pathway has not been implicated as a driver of colour and pattern in any context outside of ectodermal dysplasia, and thus this indicates a plausible mechanism for coloration in animals that has not yet been identified.

Iridophore genes

Iridophores are largely responsible for blue (and to a lesser extent green) coloration, which is mainly determined by the reflection of light from iridophores (Bagnara et al., 2007). This depends on the presence and orientation of guanine platelets, where thicker platelets tend to reflect longer wavelengths of light (Ziegler, 2003; Bagnara et al., 2007; Saenko et al., 2013). Iridophores are a key component of white, blue and green coloration. Recently, Twomey, Kain, et al. (2020) found that variation in coloration in Ranitomeya and related poison frogs is largely driven by a combination of the orientation and thickness of the guaninie platelets in iridophores. Using a combination of electron microscopy, biochemical pigment analyses, and modelling of the interaction between structural elements in the integument and pigmentation within chromatophores, they found that much more of the variation in coloration (hue) was driven by differences in the guanine platelet thickness of iridophores than expected. We discuss our findings in light of this recent work, with respect to how they can inform future work in conjunction with the results of Twomey, Kain, et al. (2020). The precise mechanisms underlying the development of iridophores and the size and orientation of the guanine platelets are unknown. However, previous work has suggested that ADP Ribosylation Factors (ARFs), which are ras‐related GTPases that control transport through membranes and organelle structure and Rab GTPases, are probably important in determining the size and orientation of iridophores (Higdon et al., 2013). We found a number of these genes to be differentially expressed between developmental stages (arf6, dct, dgat2, dock7, dst, edn3, erbb3, impdh2, paics, psen1, rab27a, rab27b, rabggta) or colour morphs (anxa1, dock7, dst, erbb3, gart, gne, paics, rab1a, rab27a, rab27b, rab7a, rabggta) in our study (Figure 9). We also found differential expression of a number of genes that are known to impact guanine or purine synthesis throughout development (adsl, gart, gas1, fh, qdpr) and between colour morphs (atic, psat1, qdpr). A number of these genes (adsl, dct, dock7, gart, fh, qdpr, psat1, rabggta) have been implicated in previous work in dendrobatids (Rodríguez et al., 2020; Stuckert et al., 2019) and in other taxa. The genes that have been implicated in previous studies are clearly good candidates for future study. Additionally, the ARFs and Rab GTPases that we identified and that are upregulated in iridophores relative to other chromatophore types in fish (e.g., Higdon et al., 2013) are also good candidates. Unfortunately, our understanding of how these genes affect the development of iridophores is limited, and thus more targeted examinations of iridophore production and pigment production are needed. Notably, a number of epidermis‐structuring genes (such as those in the krt family) have been implicated in the production of structural colours (e.g., krt1, krt2), although more evidence is needed to verify their role in coloration (Burgon et al., 2020; Cui et al., 2016; McGowan et al., 2006; Stuckert et al., 2019). We identified a number of these that are differentially expressed between colour morphs (e.g., krt1, krt2, krt10, krt14, krt5). Genes that influence keratin, and organization of the epidermis generally, are good candidates for the production of different colours, as they may produce structural influences on colour (via reflectance) in a manner that parallels what we see from guanine platelets. Keratins are known to influence the distribution and arrangement of melanosomes, which impact the ultimate colour phenotype of animals (Gu & Coulombe, 2007a; Gu and Coulombe, 2007b). The combination of differential expression between colours and/or colour morphs in multiple expression studies and a plausible mechanism for colour modification suggest that genes in the keratin family may well be important for producing colour differences, although detailed analyses would need to be conducted to confirm this.

Conclusion

In this study we examined the molecular mechanisms by which mimetic phenotypes are produced in a Müllerian mimicry system. Through our efforts, we have produced the first high‐quality poison frog genome, a 6.8‐Gbp, contiguous genome assembly with good genic coverage. We leveraged this to examine gene expression in the skin throughout development of four comimetic morphs from three species of Ranitomeya. We identified a large number of genes related to melanophores, melanin production, iridophore production and guanine synthesis that were differentially expressed throughout development, indicating that many of these are important in the production of pigmentation, albeit not colour morph‐specific coloration. Genes related to xanthophore production, carotenoid pathways, melanin production and melanophore production were rarely differentially expressed between colour morphs, but those genes that were differentially expressed may be critically important in producing polytypic differences within species that drive mimetic phenotypes. Our results indicate that divergence between colour morphs seems to be mainly the result of differences in expression and/or timing of expression, but that convergence for the same colour pattern may not be obtained through the same changes in gene expression between species. We identified the importance of the pteridine synthesis pathway in producing these different yellow, orange and red colour morphs across species. Thus, production of these colours is probably strongly driven by differences in gene expression in genes in the pteridine synthesis pathway, and our data indicate that there may be species‐specific differences in this pathway used in producing similar colours and patterns. Furthermore, we highlight the potential importance of genes in the keratin family for producing differential colour via structural mechanisms.

AUTHOR CONTRIBUTIONS

Designed research: A.M.M.S., M.C., M.M., R.N., K.S., M.D.M. Performed research: A.M.M.S., M.C., M.M., K.S., T.L. Analysed data: A.M.M.S., T.M.L. Contributed to writing: A.M.M.S., M.C., M.M., T.M.L., T.L., R.N., K.S., M.D.M.

Open Research Badges

This article has earned an Open Data Badge for making publicly available the digitally‐shareable data necessary to reproduce the reported results. The data is available at [https://www.ebi.ac.uk/ena/browser/view/PRJEB28312]. Fig S1 Click here for additional data file. Table S1‐S9 Click here for additional data file. Supplementary Material Click here for additional data file.

80 in total

1. North American velvet ants form one of the world's largest known Müllerian mimicry complexes.

Authors: Joseph S Wilson; Joshua P Jahner; Matthew L Forister; Erica S Sheehan; Kevin A Williams; James P Pitts
Journal: Curr Biol Date: 2015-08-17 Impact factor: 10.834

2. Only four genes (EDA1, EDAR, EDARADD, and WNT10A) account for 90% of hypohidrotic/anhidrotic ectodermal dysplasia cases.

Authors: Céline Cluzeau; Smail Hadj-Rabia; Marguerite Jambou; Sourour Mansour; Philippe Guigue; Sahben Masmoudi; Elodie Bal; Nicolas Chassaing; Marie-Claire Vincent; Géraldine Viot; François Clauss; Marie-Cécile Manière; Steve Toupenay; Martine Le Merrer; Stanislas Lyonnet; Valérie Cormier-Daire; Jeanne Amiel; Laurence Faivre; Yves de Prost; Arnold Munnich; Jean-Paul Bonnefont; Christine Bodemer; Asma Smahi
Journal: Hum Mutat Date: 2011-01 Impact factor: 4.878

Review 3. Hypohidrotic ectodermal dysplasia: clinical and molecular review.

Authors: Julia Reyes-Reali; María Isabel Mendoza-Ramos; Efraín Garrido-Guerrero; Claudia F Méndez-Catalá; Adolfo R Méndez-Cruz; Glustein Pozo-Molina
Journal: Int J Dermatol Date: 2018-05-31 Impact factor: 2.736

4. Repbase Update, a database of repetitive elements in eukaryotic genomes.

Authors: Weidong Bao; Kenji K Kojima; Oleksiy Kohany
Journal: Mob DNA Date: 2015-06-02

5. A ketocarotenoid-based colour polymorphism in the Sira poison frog Ranitomeya sirensis indicates novel gene interactions underlying aposematic signal variation.

Authors: Evan Twomey; James D Johnson; Santiago Castroviejo-Fisher; Ines Van Bocxlaer
Journal: Mol Ecol Date: 2020-06-09 Impact factor: 6.185

6. Red carotenoids and associated gene expression explain colour variation in frillneck lizards.

Authors: Claire A McLean; Adrian Lutz; Katrina J Rankin; Adam Elliott; Adnan Moussalli; Devi Stuart-Fox
Journal: Proc Biol Sci Date: 2019-07-17 Impact factor: 5.349

7. The expression of KRT2 and its effect on melanogenesis in alpaca skins.

Authors: Yucong Cui; Yajun Song; Qingling Geng; Zengfeng Ding; Yilong Qin; Ruiwen Fan; Changsheng Dong; Jianjun Geng
Journal: Acta Histochem Date: 2016-06-02 Impact factor: 2.479

8. Evolution of pigment synthesis pathways by gene and genome duplication in fish.

Authors: Ingo Braasch; Manfred Schartl; Jean-Nicolas Volff
Journal: BMC Evol Biol Date: 2007-05-11 Impact factor: 3.260

9. De novo characterization of the gene-rich transcriptomes of two color-polymorphic spiders, Theridion grallator and T. californicum (Araneae: Theridiidae), with special reference to pigment genes.

Authors: Peter J P Croucher; Michael S Brewer; Christopher J Winchell; Geoff S Oxford; Rosemary G Gillespie
Journal: BMC Genomics Date: 2013-12-08 Impact factor: 3.969

10. HTSeq--a Python framework to work with high-throughput sequencing data.

Authors: Simon Anders; Paul Theodor Pyl; Wolfgang Huber
Journal: Bioinformatics Date: 2014-09-25 Impact factor: 6.937

4 in total

1. Neurogenomic divergence during speciation by reinforcement of mating behaviors in chorus frogs (Pseudacris).

Authors: Oscar E Ospina; Alan R Lemmon; Mysia Dye; Christopher Zdyrski; Sean Holland; Daniel Stribling; Michelle L Kortyna; Emily Moriarty Lemmon
Journal: BMC Genomics Date: 2021-10-02 Impact factor: 3.969

2. Gene expression in male and female stickleback from populations with convergent and divergent throat coloration.

Authors: Jeffrey S McKinnon; William Burns Newsome; Christopher N Balakrishnan
Journal: Ecol Evol Date: 2022-04-29 Impact factor: 3.167

3. Mining Amphibian and Insect Transcriptomes for Antimicrobial Peptide Sequences with rAMPage.

Authors: Diana Lin; Darcy Sutherland; Sambina Islam Aninta; Nathan Louie; Ka Ming Nip; Chenkai Li; Anat Yanai; Lauren Coombe; René L Warren; Caren C Helbing; Linda M N Hoang; Inanc Birol
Journal: Antibiotics (Basel) Date: 2022-07-15

4. The genomics of mimicry: Gene expression throughout development provides insights into convergent and divergent phenotypes in a Müllerian mimicry system.

Authors: Adam M M Stuckert; Mathieu Chouteau; Melanie McClure; Troy M LaPolice; Tyler Linderoth; Rasmus Nielsen; Kyle Summers; Matthew D MacManes
Journal: Mol Ecol Date: 2021-07-16 Impact factor: 6.622

4 in total