Alexander S Mikheyev1, Timothy A Linksvayer2. 1. Ecology and Evolution Unit, Okinawa Institute of Science and Technology, Okinawa, Japan. 2. Department of Biology, University of Pennsylvania, Philadelphia, United States.
Abstract
Studies of the genetic basis and evolution of complex social behavior emphasize either conserved or novel genes. To begin to reconcile these perspectives, we studied how the evolutionary conservation of genes associated with social behavior depends on regulatory context, and whether genes associated with social behavior exist in distinct regulatory and evolutionary contexts. We identified modules of co-expressed genes associated with age-based division of labor between nurses and foragers in the ant Monomorium pharaonis, and we studied the relationship between molecular evolution, connectivity, and expression. Highly connected and expressed genes were more evolutionarily conserved, as expected. However, compared to the rest of the genome, forager-upregulated genes were much more highly connected and conserved, while nurse-upregulated genes were less connected and more evolutionarily labile. Our results indicate that the genetic architecture of social behavior includes both highly connected and conserved components as well as loosely connected and evolutionarily labile components.
Studies of the genetic basis and evolution of complex social behavior emphasize either conserved or novel genes. To begin to reconcile these perspectives, we studied how the evolutionary conservation of genes associated with social behavior depends on regulatory context, and whether genes associated with social behavior exist in distinct regulatory and evolutionary contexts. We identified modules of co-expressed genes associated with age-based division of labor between nurses and foragers in the antMonomorium pharaonis, and we studied the relationship between molecular evolution, connectivity, and expression. Highly connected and expressed genes were more evolutionarily conserved, as expected. However, compared to the rest of the genome, forager-upregulated genes were much more highly connected and conserved, while nurse-upregulated genes were less connected and more evolutionarily labile. Our results indicate that the genetic architecture of social behavior includes both highly connected and conserved components as well as loosely connected and evolutionarily labile components.
Entities:
Keywords:
age polyethism; eusociality; evolutionary biology; gene regulatory network; genomics; social evolution
The main conclusion of a decade of sociogenomic research with a range of solitary and
social animal species is that highly conserved genes underpinning core physiological
processes can also influence behavioral state (Amdam et
al., 2004, 2006; Toth and Robinson, 2007; Toth et
al., 2007, 2010; Woodard et al., 2011; Woodard et al., 2014). For example, the insulin signaling pathway,
which mediates an organism's response to its internal nutritional state, also
influences its behavior (Ament et al., 2008).
The genetic toolkit hypothesis and related hypotheses propose that a
conserved set of genes and gene pathways involved in core physiological processes such
as metabolism and reproduction has been repeatedly used in the evolution of complex
social behavior in diverse lineages (West-Eberhard,
1996; Amdam et al., 2004, 2006; Toth and
Robinson, 2007; Toth et al., 2007).
This hypothesis stems from findings in Evolutionary Developmental Biology that
morphological innovation in disparate lineages often involves the convergent use of a
conserved set of genes (e.g., Hox genes) (Carroll et
al., 2001; Toth and Robinson, 2007;
Wilkins, 2013).However, social behavior and other social traits are commonly viewed as having unique
genetic features and evolutionary dynamics, including especially rapid evolution (West-Eberhard, 1983; Tanaka, 1996; Moore et al.,
1997; Wolf et al., 1999; Nonacs, 2011; Bailey and Moore, 2012; Van Dyken and Wade,
2012). Could the molecular mechanisms underlying social interactions (e.g.,
social signal production and response) and social behavior, together with the process of
social evolution result in distinct genetic architectures for social traits compared
with other traits? Recent comparative transcriptomic and genomic studies find low
overlap in genes associated with social behavior in different highly social animals and
instead highlight the importance of novel genes and rapid evolution of social traits
(Johnson and Tsutsui, 2011; Ferreira et al., 2013; Simola et al., 2013; Wissler et
al., 2013; Feldmeyer et al., 2014;
Harpur et al., 2014; Sumner, 2014; Jasper et al.,
2015), in accordance with recent studies emphasizing the ubiquity of
taxonomically restricted genes (Domazet-Loso and
Tautz, 2003; Khalturin et al., 2009;
Tautz and Domazet-Loso, 2011). Perhaps
social evolution does not consistently use sets of highly conserved genes to the same
degree as morphological evolution? The novel social genes hypothesis
proposes that genes underlying social behavior are often novel socially acting genes or
are genes with novel social functions not present in solitary ancestors (Johnson and Linksvayer, 2010; Johnson and Tsutsui, 2011; Sumner, 2014).Research supporting the genetic toolkit hypothesis has stressed the significant signal
of highly conserved genes affecting core physiological processes in transcriptomic data
sets for social behavior (Robinson et al.,
2008; Toth et al., 2010; Fischman et al., 2011; Woodard et al., 2011, 2014; Toth
et al., 2014). In contrast, research supporting the novel social genes
hypothesis has stressed the overall low proportional overlap of genes underlying social
behavior in divergent lineages as well as the apparently general low degree of
transcriptomic and genomic conservation in divergent lineages (Johnson and Tsutsui, 2011; Ferreira et al., 2013; Simola et al.,
2013; Wissler et al., 2013; Feldmeyer et al., 2014; Harpur et al., 2014; Jasper et
al., 2015; Sumner, 2014).We sought to build on these previous results by examining how transcriptional regulatory
context influences evolutionary conservation for genes associated with ant social
behavior, and further whether genes associated with ant social behavior exist in
distinct regulatory and selective contexts compared to the rest of the genome. Research
in a range of model organisms demonstrates that the degree of a gene's
connectivity to the rest of the regulatory network and its level of expression is often
negatively correlated with its rate of molecular evolution (Krylov et al., 2003; Hahn and
Kern, 2005; Jovelin and Phillips,
2009; Ramsay et al., 2009). For
example, highly connected ‘hub’ genes are often highly expressed and
evolutionarily conserved. Previous research has compared rates of molecular evolution
for genes associated with reproductive division of labor in social insects (Hunt et al., 2010, 2013; Harpur et al.,
2014), as well as other conditionally expressed phenotypes in other organisms
(Brisson and Nuzhdin, 2008; Leichty et al., 2012; Purandare et al., 2014), indicating that genes associated with the
expression of worker traits experience elevated rates of molecular evolution. However,
the relationships among molecular evolution, connectivity, and expression have been
little explored in social insects and are generally little understood for genes
associated with social behavior. As a result, it is unclear if observed differences in
rates of molecular evolution are caused by differences in regulatory architecture,
expression, or perhaps result from distinct evolutionary mechanisms such as kin
selection, which may operate differentially on genes associated with social behavior
relative to the rest of the genome (Linksvayer and
Wade, 2009; Hall and Goodisman,
2012). We further sought to identify modules of co-expressed genes that may be
composed of both conserved and novel genes and may contribute to the expression and
evolution of social complexity.We studied the genetic basis and evolution of a fundamental aspect of social insect
behavior, age-based division of labor (age polyethism). Age polyethism involves the
progression of workers from in-nest tasks such as nursing to outside-nest tasks such as
foraging. Because age polyethism is a trait expressed by the functionally sterile worker
caste, it is expected to be shaped primarily through kin selection (Hamilton, 1964). While age polyethism plays a
central role in the functioning of many eusocial systems (Hölldobler and Wilson, 2009), the molecular underpinnings
have only been well studied in the honey beeApis mellifera (Whitfield et al., 2006; Ament et al., 2008; Chandrasekaran et al., 2011), so that the genetic and evolutionary basis of
age polyethism is not generally understood outside of honey bees. We identified
transcriptional modules of co-regulated genes associated with worker age polyethism in
the pharaoh antMonomorium pharaonis; we identified the degree that
these genes overlap with genes involved in age polyethism in two other social insects
(Alaux et al., 2009; Manfredini et al., 2014); and we studied the relationship between
expression level, connectivity and rates of molecular evolution at these genes compared
to the rest of the genome.
Results
Behavioral analysis
We tracked cohorts of age-marked workers and recorded their behavior and location
inside and outside the nest. In order to identify differentially expressed genes
associated with age-based division of labor, we collected age-marked workers and
workers observed performing specific behaviors. The observed location of workers from
different age classes changed with both nest location and behavior (glm with
quasipoisson errors and log link, both p < 0.01) (Figure 1, Figure
1—figure supplements 1, 2). In concordance with the expected pattern
of age polyethism, the average age of workers observed in the different locations
increased as distance from the brood area increased (Figure 1—figure supplement 2). Of the 15 behaviors
observed more than 15 total times (Supplementary file 1), the likelihood of observing workers
performing the behaviors ‘nurse’, ‘groom’,
‘rest’, ‘trophallaxis’, ‘walk’, and
‘forage’ depended on worker age (Figure
1A; glm with binomial errors and logit link, all nominal p < 0.0002,
α = 0.003, controlling for multiple testing). Nursing and foraging were
at the two extremes: the average age of workers observed nursing was 6.94 days and
the average age of workers observed foraging (i.e., in the act of collecting food)
was 13.04 days. There appeared to be a marked transition from nursing to foraging
between 9 and 12 days of age (Figure 1A), with
75% of nursing observations made for workers less than 10 days old and 75% of
foraging observations made for workers over 10 days old (Figure 1—figure supplement 1).
Figure 1.
Behavioral and transcriptional changes associated with worker age and
behavior.
Numbers along the x-axis represent ages of marked worker cohorts,
starting at worker eclosion as an adult (day 0). In all plots dark green
represents greater values, while white represents lower values of the
measure being plotted. (A) Behavioral results. Workers
showed an age-dependent progression of activity, progressing from tasks
such as nursing and grooming in the nest to outside tasks such as walking
and foraging. (B) Heat map of expression levels over the
course of worker aging (higher expression in darker green), for 25 genes
most differentially expressed between nurses and foragers. The red line
separates the samples classified as ‘nurses’ by K-nearest
neighbor classification on the left, from ‘foragers’ on the
right, suggesting a distinct transition between the two categories.
(C) Correlation between patterns of expression in the 14
identified modules across worker age and behavior. The colors of the
boxes are scaled with the value of correlation coefficients, ranging from
white to dark green. On the right side of the heat map are the numbers of
genes in each module and a dendrogram showing the inferred relationships
among modules. The modules show complex patterns of expression, for
example with some most highly expressed at age 0, some showing decreasing
expression over time, and some increasing expression over time.
DOI:
http://dx.doi.org/10.7554/eLife.04775.003
Boxplots show the distribution of age in days for each behavior; white
diamonds and the printed number show the mean age for each behavior; and
the number of observations for each behavior is shown at the bottom of
the graph.
DOI:
http://dx.doi.org/10.7554/eLife.04775.004
Boxplots show the distribution of age with the mean and sample size for
each category.
DOI:
http://dx.doi.org/10.7554/eLife.04775.005
The total number of genes, number of nurse-upregulated genes, and
forager-upregulated genes are shown, along with the proportion of
identified fire ant orthologs and prominent gene ontology terms enriched
for each module (see Supplementary file 4 for the full GO enrichment
profiles for each module). Both the module number and associated module
color are shown on the left.
DOI:
http://dx.doi.org/10.7554/eLife.04775.006
Figure 1—figure supplement 2.
The location of age-marked workers also changed as the workers aged,
from the nest area over the brood to outside the nest.
Boxplots show the distribution of age with the mean and sample size for
each category.
DOI:
http://dx.doi.org/10.7554/eLife.04775.005
Figure 1—figure supplement 1.
The behaviors performed by age-marked workers changed as the workers
aged, from nursing to foraging.
Boxplots show the distribution of age in days for each behavior; white
diamonds and the printed number show the mean age for each behavior; and
the number of observations for each behavior is shown at the bottom of
the graph.
DOI:
http://dx.doi.org/10.7554/eLife.04775.004
Behavioral and transcriptional changes associated with worker age and
behavior.
Numbers along the x-axis represent ages of marked worker cohorts,
starting at worker eclosion as an adult (day 0). In all plots dark green
represents greater values, while white represents lower values of the
measure being plotted. (A) Behavioral results. Workers
showed an age-dependent progression of activity, progressing from tasks
such as nursing and grooming in the nest to outside tasks such as walking
and foraging. (B) Heat map of expression levels over the
course of worker aging (higher expression in darker green), for 25 genes
most differentially expressed between nurses and foragers. The red line
separates the samples classified as ‘nurses’ by K-nearest
neighbor classification on the left, from ‘foragers’ on the
right, suggesting a distinct transition between the two categories.
(C) Correlation between patterns of expression in the 14
identified modules across worker age and behavior. The colors of the
boxes are scaled with the value of correlation coefficients, ranging from
white to dark green. On the right side of the heat map are the numbers of
genes in each module and a dendrogram showing the inferred relationships
among modules. The modules show complex patterns of expression, for
example with some most highly expressed at age 0, some showing decreasing
expression over time, and some increasing expression over time.DOI:
http://dx.doi.org/10.7554/eLife.04775.003
The behaviors performed by age-marked workers changed as the workers
aged, from nursing to foraging.
Boxplots show the distribution of age in days for each behavior; white
diamonds and the printed number show the mean age for each behavior; and
the number of observations for each behavior is shown at the bottom of
the graph.DOI:
http://dx.doi.org/10.7554/eLife.04775.004
The location of age-marked workers also changed as the workers aged,
from the nest area over the brood to outside the nest.
Boxplots show the distribution of age with the mean and sample size for
each category.DOI:
http://dx.doi.org/10.7554/eLife.04775.005
The identified modules vary in expression pattern, composition of
nurse-upregulated and forager-upregulated genes, and the proportion of
conserved genes with identified fire ant orthologs.
The total number of genes, number of nurse-upregulated genes, and
forager-upregulated genes are shown, along with the proportion of
identified fire ant orthologs and prominent gene ontology terms enriched
for each module (see Supplementary file 4 for the full GO enrichment
profiles for each module). Both the module number and associated module
color are shown on the left.DOI:
http://dx.doi.org/10.7554/eLife.04775.006
Genome and transcriptome assembly
There was a trade-off in the assemblies between N50 and overall assembly lengths, as
a function of kmer size. We chose k = 69 as a compromise between these two
metrics, resulting in a scaffolded assembly of 284 mb, with a N50 of 19.0 kb.
Although there is no M. pharaonis genome size estimate, the assembly
is in the range of genome sizes typical of other myrmicine ants (Tsutsui et al., 2008). CEGMA analysis (Parra et al., 2009) found complete sequences
for 92% of the ultra-conserved eukaryotic genes, and partial sequences for 97%. Most
reads (97.6%) could be re-mapped to the genome assembly, resulting in a coverage
estimate of 40×. Cufflinks assembly identified 22,385 transcribed loci. 74.9
± 18% (median 85.1%) of the reads for each sample could be re-mapped to
predicted transcripts extracted from the reference. After the reads were re-mapped to
the assembled transcripts using the RSEM pipeline, each library had 10,602,832
± 2,925,898 expected counts.
Gene expression differences associated with worker behavior
The complete analysis of gene expression data, including R code and output, is
available in the Supplementary
file 2 (with the complete R markdown script as Source code 1), and it is
summarized below. We wished to examine which of the four worker behavioral samples
(nursing larvae, foraging, grooming larvae, and worker–worker trophallaxis
[i.e., exchanging liquid food]) had distinct expression profiles vs all of the
others. We used linear contrasts to determine the number of differentially expressed
genes between the focal behavioral category and the other behaviors. Of these
contrasts, only foragers and nurses had significantly different gene expression
patterns, when compared to the rest, that is, there was no evidence that workers
engaged in grooming and trophallaxis had distinct transcriptional states.
Consequently, we focused subsequent analysis on nurse and forager behavioral
categories, except in the construction of the co-expression networks, where all
behavioral category and age class samples were used (see below). There were 1217
forager-upregulated, 1247 nurse-upregulated transcripts, and 14,907 transcripts that
were not differentially expressed.
Gene expression associated with age polyethism
Qualitatively, gene expression patterns mirrored the behavioral transition from
nursing to foraging that we observed around day 10 (Figure 1A,B). To quantify these observations, we used a supervised
learning approach (K-nearest neighbor classifier or KNN) to check whether genes
differentially expressed in nurses and foragers could be used to differentiate the
age class data into two clusters. After the KNN was trained on nurse and forager
profiles, it clearly separated workers into two distinct classes based on age,
assigning those younger than 12 days into the nurse class, and the rest into the
forager class (Supplementary
file 2 pages 14–15), suggesting a fairly discrete transcriptomic
transition between the two behaviors.
Gene expression conservation analyses
The proportion of genes with identified orthologs in the fire ant Solenopsis
invicta differed between behavioral categories (Manfredini et al., 2014), with forager-upregulated genes having
a higher proportion (0.54) relative to nurse-upregulated (0.43) and
non-differentially expressed (0.43) (multiple comparison Kruskal–Wallis, p
< 0.05). Similarly, the proportion of genes with identified honey beeA. mellifera orthologs was higher for forager-upregulated genes
(0.50), relative to nurse-upregulated (0.38), and non-differentially expressed genes
(0.38) (multiple comparison Kruskal–Wallis, p < 0.05) (note we used a
less conservative BLAST threshold for the honey bee so that the proportions of honey
bee and fire ant orthologs are not directly comparable, see ‘Materials and
methods’). Furthermore, approximately half of non-differentially expressed
(0.51) and nurse-upregulated (0.50) genes did not have orthologs identified in either
the fire ant or honey bee genomes, but this proportion was lower for
forager-upregulated genes (0.39); correspondingly, the proportion of
forager-upregulated genes with orthologs identified from both fire ants and honey
bees was higher (0.43) compared to nurse-upregulated and non-differentially expressed
genes (0.32) (X2 = 71.42, df = 6, p <
10−13).Genes previously detected as upregulated in nurses and foragers of S.
invicta were more likely to have identified M. pharaonis
orthologs up-regulated in these contexts as well (p = 0.0022 and p =
0.040, respectively). However, the actual percentage of genes differentially
expressed in the same context in these two ant data sets was small: 3.8% (47/1247) of
nurse genes and 3.2% (39/1217) of forager genes; or if only considering genes with
orthologs identified in both species, 8.6% (47/549) nurse genes and 5.9% (39/657)
forager genes. While there was low overlap in the lists of differentially expressed
genes, there could still be stronger overlap in genome-wide expression profiles when
comparing nurse and forager samples between S. invicta and
M. pharaonis. Thus, we estimated the correlation in the change of
expression between nurse and forager samples (i.e., log fold change) between the
S. invicta and M. pharaonis datasets for all
genes with identifiable homologs. There was a significant correlation in the change
of expression for nurse and forager samples, but one that explained only 2% of the
variance (Spearman's rho = 0.14, 6324 genes, p <
10−16).In contrast to the fire ant and pharaoh ant comparison, previously identified
forager- and nurse-upregulated honey beeA. mellifera genes (Alaux et al., 2009) were not more likely to have
M. pharaonis orthologs expressed in the same context (p =
0.99, p = 0.98, respectively), consistent with a previous comparison between
S. invicta and A. mellifera (Manfredini et al., 2014). The actual overlap in
honey bee and pharaoh ant gene lists was higher (71 nurse-upregulated genes and 46
forager-upregulated genes) due to the less conservative BLAST threshold we used for
identifying honey bee orthologs, but the honey bee lists were also larger (Alaux et al., 2009) and the overlap was not
significant.
Gene ontology analysis
Nurse-upregulated genes were strongly enriched for a range of GO terms associated
with metabolism (nearly 50 metabolism-related terms with p <
10−5; Supplementary file 3). Forager-upregulated genes had a more diffuse
signal, being relatively more weakly enriched for various GO terms, for example,
associated with signal transduction and gland morphogenesis. Forager-upregulated
genes showed a more consistent signal for underrepresented terms, for example, GO
terms associated with metabolic processes and chromatin modification (Supplementary file 3).
Modules inferred by weighted gene co-expression network analysis (WGCNA)
The number of modules produced by WGCNA can vary based on several thresholding
parameters, which we left as defaults (Supplementary file 2, pages 20–21). These settings
resulted in 14 co-expression modules, ranging in size from 83 to 4218 genes (Figure 1C; Figure 1—figure supplement 3). A module's overall expression
can be characterized by its eigengene. Correlations between eigengenes and traits in
the original data suggest the involvement of corresponding modules in these traits.
Eigengenes in two of the modules—1 and 14, which contained the most nurse and
forager genes, respectively—were strongly correlated with worker age, although
in opposite directions, suggesting their role in aging and age-based division of
labor (r = −0.95, r = 0.91 and with FDR-adjusted p-values
0.0038, 0.023, respectively) (Supplementary file 2, page 24). Other modules showed complex patterns of
age and behavior specific expression, with most of them showing a peak in expression
once or twice during the lifetime of a worker (Supplementary file 2, page
26). Interestingly, most module eigengenes switched signs during the period between 9
and 12 days, corresponding to the behavioral transition from nursing to foraging. In
other words, there appeared to be a major reprogramming step, where modules initially
showing low expression became up-regulated, while modules initially showing high
expression were down-regulated.
Figure 1—figure supplement 3.
The identified modules vary in expression pattern, composition of
nurse-upregulated and forager-upregulated genes, and the proportion of
conserved genes with identified fire ant orthologs.
The total number of genes, number of nurse-upregulated genes, and
forager-upregulated genes are shown, along with the proportion of
identified fire ant orthologs and prominent gene ontology terms enriched
for each module (see Supplementary file 4 for the full GO enrichment
profiles for each module). Both the module number and associated module
color are shown on the left.
DOI:
http://dx.doi.org/10.7554/eLife.04775.006
Forager-upregulated genes were concentrated in just a few modules, with only two
modules containing more than 100 forager-upregulated genes (Figure 1—figure supplement 3). In contrast,
nurse-upregulated genes were more widely distributed, with five modules having more
than 100 nurse-upregulated genes (Figure
1—figure supplement 3). These five modules were mainly enriched for
GO terms associated with metabolism and development (Figure 1—figure supplement 3; Supplementary file 4).
Module 5, which contained 116 nurse-upregulated genes, was also enriched for terms
associated with female gonad development, which is surprising given that M.
pharaonis workers lack ovaries and are completely sterile. The modules
containing forager-upregulated genes were enriched for a broad range of GO terms, for
example associated with regulation of signaling, development and neurogenesis, and
gene expression (Figure 1—figure supplement
3; Supplementary
file 4). The proportion of module genes with identified S.
invicta orthologs ranged from 0.28 to 0.53 (Figure 1—figure supplement 3), suggesting that in
addition to being involved in different functions, the modules are composed of
different proportions of conserved and taxonomically restricted genes.
Relationship between gene behavioral category, expression level, connectivity,
and evolutionary rate
Forager-upregulated genes were much more connected than nurse or non-differentially
expressed genes, while nurse-upregulated genes were less connected than
non-differentially expressed genes (Figure 2A)
(multiple comparison Kruskal–Wallis, p < 0.05). There was a small but
significant difference in evolutionary rate dN/dS (Figure 2C), with nurse-upregulated genes evolving more rapidly than
non-differentially expressed genes (multiple comparison Kruskal–Wallis, p
< 0.05). Nurse and forager genes were also more highly expressed (Figure 2B) than non-differentially expressed
genes (Kruskal–Wallis, p < 0.05), although this last comparison is
likely biased because differential expression is more easily detected in highly
expressed genes.
Figure 2.
Connectivity, expression, and evolutionary rate for nurse-upregulated
(blue), forager-upregulated (red), and non-differentially expressed genes
(gray).
Overall, connectivity and expression are positively correlated
(F) and negatively associated with evolutionary rate
(D and E), as expected. At the same time,
forager-upregulated genes are much more strongly connected while
nurse-upregulated genes are more loosely connected compared to
non-differentially expressed genes (A); Nurse-upregulated genes
have a small but significant increase in evolutionary rate (C);
and both forager- and nurse-upregulated genes are more highly expressed than
non-differentially expressed genes (B). The top panels show
results for all data, while the bottom panels show results only for genes
with S. invicta orthologs that had estimated evolutionary
rates.
DOI:
http://dx.doi.org/10.7554/eLife.04775.007
Connectivity, expression, and evolutionary rate for nurse-upregulated
(blue), forager-upregulated (red), and non-differentially expressed genes
(gray).
Overall, connectivity and expression are positively correlated
(F) and negatively associated with evolutionary rate
(D and E), as expected. At the same time,
forager-upregulated genes are much more strongly connected while
nurse-upregulated genes are more loosely connected compared to
non-differentially expressed genes (A); Nurse-upregulated genes
have a small but significant increase in evolutionary rate (C);
and both forager- and nurse-upregulated genes are more highly expressed than
non-differentially expressed genes (B). The top panels show
results for all data, while the bottom panels show results only for genes
with S. invicta orthologs that had estimated evolutionary
rates.DOI:
http://dx.doi.org/10.7554/eLife.04775.007Co-expression network connectivity and expression level were overall negatively
associated with evolutionary rate, such that highly connected and highly expressed
genes had decreased rates of molecular evolution (Figure 2D,E; evolutionary rate and connectivity, r = −0.15,
p < 2 × 10−16; evolutionary rate and expression,
measured in terms of transcriptional abundance, fragments per million reads mapped,
FPKM, r = −0.12, p < 2 × 10−16); and
connectivity and expression were positively correlated (r = 0.30, p < 2
× 10−16). In a full model considering how a gene's
rate of molecular evolution depended on its gene expression level, network
connectedness, and behavioral category, the largest effects were main effects of
expression (z = −7.42, p = 1.29 ×
10−13) and connectivity (z = −3.69, p =
0.00023).We also studied the effects of gene category (i.e., upregulated in nurses or
foragers, or not differentially expressed), expression level, and connectivity on
whether a given M. pharaonis gene had an identifiable fire antS. invicta and honey beeA. mellifera orthologs.
Overall, genes with orthologs in the fire ant or honey bee had greater connectivity
and expression (Figure 3, Figure 3—figure supplement 1). In considering a model
with both main and interaction effects of behavioral category, expression level, and
connectivity, connectivity had the strongest effect (glm with quasibinomial
residuals: t = 24.5, p < 10−16, for the presence of
S. invicta orthologs; t = 32.2, p <
10−16, for the presence of A. mellifera
orthologs), with more highly connected genes being more likely to have an ortholog.
There were also much smaller interaction effects indicating that nurse-upregulated
genes had fewer orthologs than expected given their connectivities (i.e.,
connectivity had a weaker effect on nurse-upregulated genes than other genes, Figure 3 and Figure 3—figure supplement 1; t = −3.17, p =
0.0015 for S. invicta orthologs; t = −2.76, p =
0.0057 for A. mellifera orthologs), and forager-upregulated genes
had fewer orthologs than expected given their expression (t = −2.33, p
= 0.02 for S. invicta orthologs; t = −2.58, p
= 0.0098 for A. mellifera orthologs; Figure 3 and Figure
3—figure supplement 1).
Figure 3.
Genes with identified fire ant orthologs were more highly connected
and expressed, but this relationship also depended on whether the gene
was nurse-upregulated (blue), forager-upregulated (red), or
non-differentially expressed (NDE, gray).
As shown in Figure 2,
forager-regulated genes were much more highly connected, and overall,
forager-upregulated genes had a higher proportion of identified fire ant
orthologs (0.54) relative to nurse-upregulated and non-differentially
expressed genes (0.43).
DOI:
http://dx.doi.org/10.7554/eLife.04775.008
Forager-upregulated genes had a higher proportion of identified honey bee
orthologs (0.50) relative to nurse-upregulated and non-differentially
expressed genes (0.38).
DOI:
http://dx.doi.org/10.7554/eLife.04775.009
Figure 3—figure supplement 1.
Very similarly to Figure 3,
genes with identified honey orthologs were more highly connected and
expressed, but this relationship also depended on whether the gene was
nurse-upregulated (blue), forager-upregulated (red), or
non-differentially expressed (NDE, gray).
Forager-upregulated genes had a higher proportion of identified honey bee
orthologs (0.50) relative to nurse-upregulated and non-differentially
expressed genes (0.38).
DOI:
http://dx.doi.org/10.7554/eLife.04775.009
Genes with identified fire ant orthologs were more highly connected
and expressed, but this relationship also depended on whether the gene
was nurse-upregulated (blue), forager-upregulated (red), or
non-differentially expressed (NDE, gray).
As shown in Figure 2,
forager-regulated genes were much more highly connected, and overall,
forager-upregulated genes had a higher proportion of identified fire ant
orthologs (0.54) relative to nurse-upregulated and non-differentially
expressed genes (0.43).DOI:
http://dx.doi.org/10.7554/eLife.04775.008
Very similarly to Figure 3,
genes with identified honey orthologs were more highly connected and
expressed, but this relationship also depended on whether the gene was
nurse-upregulated (blue), forager-upregulated (red), or
non-differentially expressed (NDE, gray).
Forager-upregulated genes had a higher proportion of identified honey bee
orthologs (0.50) relative to nurse-upregulated and non-differentially
expressed genes (0.38).DOI:
http://dx.doi.org/10.7554/eLife.04775.009
Discussion
Pharaoh ant workers showed a clearly defined age-based transition from nursing to
foraging, in terms of both behavioral and transcriptional patterns, with nurses and
foragers having strongly differentiated sets of upregulated genes (Figure 1). We recovered the commonly observed genome-wide
relationship between a gene's rate of molecular evolution, its expression level,
and its network connectivity (Krylov et al.,
2003; Hahn and Kern, 2005; Jovelin and Phillips, 2009; Ramsay et al., 2009). Specifically, the rate of molecular
evolution (dN/dS) as well as the likelihood a gene had identified fire ant and honey bee
orthologs was negatively correlated with its expression level and connectivity within
co-expression networks, while expression and connectivity were positively correlated
(Figures 2, 3). In addition to these
genome-wide patterns, nurse- and forager-upregulated genes had distinct regulatory and
evolutionary patterns relative to each other and to the rest of the transcriptome (Figures 2, 3). Most strikingly,
forager-upregulated genes were much more highly connected and correspondingly more
conserved, while nurse-upregulated genes were less connected, and more rapidly evolving
and less conserved.Previous studies of the evolutionary genetic basis of social behavior have focused on
the overlap of genes lists associated with social traits in different lineages. We found
significant but seemingly low (<4%) overlap in lists of differentially expressed
genes and the correlation in genome-wide expression profiles (r = 0.14) when
comparing gene expression in nurse and forager samples between the pharaoh ant and fire
ant, S. invicta. Such low overlap seems surprising, given that these
two ants are in closely related ant genera, having diverged on the order of 50 Mya
(Ward et al., 2014). However, the comparison
is not perfect, given substantial differences between the two studies in methodology
used to characterize the behaviors, and in the technology used to measure gene
expression (i.e., microarray vs RNA sequencing) (Manfredini et al., 2014). We did not find significant overlap between lists
of honey bee and pharaoh ant genes associated with age polyethism, consistent with
results reported by the earlier fire ant study (Manfredini et al., 2014). While we expected decreased overlap given that
honey bees and ants diverged longer ago, ∼170 Mya (Ronquist et al., 2012), and represent independent origins of
eusociality, the ant-honey bee comparison is also more problematic because the honey bee
data are based on brain gene expression profiles whereas the fire ant and pharaoh ant
data are based on whole body gene expression profiles.Past studies have often interpreted significant but similarly low overlap in lists of
genes associated with social behavior from different lineages as supporting the genetic
toolkit hypothesis (Toth et al., 2010, 2014; Woodard et
al., 2014). In contrast, other authors have recently interpreted low overlap
as being consistent with the novel social genes hypothesis, which emphasizes the
importance of taxonomically restricted genes (Ferreira
et al., 2013; Feldmeyer et al., 2014;
Sumner, 2014). The contrasting emphasis of
authors on either conserved or novel genes begs the question: what degree of
conservation in gene lists is necessary for confirmation or rejection of these two
hypotheses? For example, the fact that nurse-upregulated genes in M.
pharaonis are more rapidly evolving than the rest of the genome and that 50%
of nurse-upregulated genes do not have identifiable fire ant or honey bee orthologs
suggests that novel genes may have important nurse-specific functions. At the same time,
the significant overlap of fire ant and pharaoh ant gene lists and the strong enrichment
of nurse-upregulated genes for gene ontology terms associated with metabolism and
development suggests that conserved genes involved in core physiological processes also
play important roles in nurse function and the evolution of division of labor. Thus, our
results are generally consistent with both hypotheses. We suggest that neither of these
two hypotheses has yet been formulated in a way that is readily tested, in part because
it is unclear what precise genes are expected to be included or excluded from a genetic
toolkit (Wilkins, 2013). Furthermore, these
hypotheses are not mutually exclusive, since both conserved and novel genes likely play
roles in the evolution of all new traits (Johnson and
Linksvayer, 2010; Woodard et al.,
2011).We suggest that shifting the focus, from lists of genes to modules of co-expressed genes
in the context of genome-wide transcriptional and evolutionary patterns, can help to
elucidate how social evolution has produced social complexity. In this way, one question
we can ask is whether we see any simple molecular signature of social evolution, for
example due to kin selection? As Monomorium ant workers are obligately
sterile, all worker traits are expected to be shaped exclusively by indirect selection
(i.e., kin selection) (Hamilton, 1964).
All-else-equal, such indirect selection is weaker than direct selection, proportional to
relatedness (Hamilton, 1964), and a priori is
expected to produce relaxed selective constraint and elevated rates of molecular
evolution for all genes associated with worker traits (Linksvayer and Wade, 2009). Past studies have found different rates of
molecular evolution for worker-biased and queen-biased genes, with most studies finding
that worker-biased genes are more rapidly evolving (Ferreira et al., 2013; Feldmeyer et al.,
2014; Harpur et al., 2014; but see;
Hunt et al., 2010). Some researchers have
interpreted different patterns between lineages as being consistent with simple kin
selection predictions based on differences in within-colony relatedness (Hall and Goodisman, 2012), but most studies have
emphasized the association between conditional expression and relaxed selection (Hunt et al., 2011, 2013), as well as genes associated with worker traits simply
experiencing stronger positive selection (Hunt et al.,
2010; Ferreira et al., 2013; Feldmeyer et al., 2014; Harpur et al., 2014). We observed weakly elevated rates of
molecular evolution at nurse-upregulated genes compared to the rest of the genome, but
much more notable was the distinct connectivity and corresponding differences in gene
conservation for forager-upregulated genes relative to nurse-upregulated and
non-differentially expressed genes. These results suggest that social evolution does not
just have simple genome-wide effects such as relaxed effective selection associated with
kin selection, but instead shapes complex social traits while acting within general
systems-level constraints imposed by regulatory architecture.The common perception that social evolution often involves rapid evolutionary dynamics
(West-Eberhard, 1983; Tanaka, 1996; Moore et al.,
1997; Wolf et al., 1999; Nonacs, 2011; Bailey and Moore, 2012; Van Dyken and Wade,
2012) may result from the fact that genes influencing many key social traits
are not only conditionally expressed, but are also located peripherally within
regulatory networks, and so are relatively unconstrained. For example, we expect that
traits associated with social signal production (e.g., pheromone and glandular
secretions) are often located peripherally within regulatory networks and as a result
may be evolutionarily labile (Jasper et al.,
2015), as is the case more generally with secreted proteins (Julenius and Pedersen, 2006; Liao et al., 2010; Nogueira et
al., 2012). More core and conserved components are also certain to be
important to the expression of these traits, but their contribution to trait evolution
may be minimized by virtue of the fact that they are highly connected. These arguments
suggest how both conserved, toolkit genes, as well as rapidly evolving and taxonomically
restricted novel genes, likely play important roles in the evolution of social novelty,
with novel genes being added peripherally to regulatory networks. Our results are
consistent with this interpretation, because M. pharaonis age-based
division of labor seems to have a complex genetic basis with some components that are
highly connected and conserved, and other components that are more loosely connected and
evolutionarily labile.Our findings that nurse-upregulated genes are more rapidly evolving and less conserved
among social insect lineages relative to forager-upregulated genes suggest that nurse
traits have been a major focus of evolutionary innovation between social insect
lineages. This result seems surprising given that foragers of different lineages
experience diverse environments outside the nest compared to the relatively constant
within-nest environment experienced by nurses and could be expected to experience more
diverse selective pressures. One explanation is that the physiological mechanisms
associated with metabolically costly foraging activities and older adult life
(M. pharaonis workers usually only live several weeks [Peacock and Baxter, 1950], so that foragers which
start right before their second week of age may already be senescing) may be relatively
conserved and simple. Nursing behavior, occurring during very early adult life, may
involve more diverse physiological and developmental processes, and nursing itself may
also involve more diverse behaviors and physiological processes, such as food processing
and the synthesis of glandular secretions that are fed to larvae. Perhaps the relatively
more complex genetic architecture (less tightly connected, involving more modules, and
diverse processes) has served as less of a constraint and facilitated more evolutionary
change for nurse-related genes. If so, we predict that nurse-specific functions and
functions for early adult life may be generally more evolutionarily labile as well as
more physiologically and behaviorally labile within and across lineages than
forager-specific functions. Note that this prediction is opposite the typical
expectation that genes acting early in development have more pleiotropic effects and are
thus especially constrained (Roux and
Robinson-Rechavi, 2008; Piasecka et al.,
2013), but obligate sterility may, in part, release workers from these
constraints on the evolution of genes acting early in worker development.We identified two discrete sets of genes with distinct genetic architecture associated
with age-based division of labor. The majority of forager-upregulated genes were
contained within a single gene module (module 14; Figure 1—figure supplement 3) that was significantly positively
associated with age. Another module with expression negatively associated with age
contained the largest number of nurse genes, but nurse genes were also broadly spread
out across a number of other modules with complex expression patterns across age and
behavioral groups. Interestingly, the modules differed in the proportion of constituent
genes which had identifiable S. invicta and A.
mellifera orthologs, indicating that modules vary in the degree to which
they are composed of conserved genes and gene networks vs rapidly evolving genes with
unknown function. That said, the modules were enriched for various gene ontology terms,
providing some insight into their putative functional importance (Supplementary file 4).By explicitly studying regulatory architecture and inferring modules of tightly
connected genes in other species as well as M. pharaonis, it will be
possible to further identify what network components contribute to the expression of
social traits, how rapidly these components are evolving within populations, and how
they have contributed to phenotypic differences between divergent lineages. Building on
the genetic toolkit conceptual framework, it will be possible to ask to what degree
diverse lineages repeatedly use the same modules, and importantly approaches already
exist for quantifying module overlap in the absence of functional information (Oldham et al., 2006; Langfelder et al., 2011). Similarly, after finding non-significant
overlap in lists of genes associated with queen- and worker-caste development in paper
wasps and honey bees, Berens et al. (2014)
recently invoked a ‘looser’ version of the genetic toolkit hypothesis by
examining the overlap of inferred functional enrichment of gene lists (i.e., via gene
ontology analysis). Focus on co-expressed modules may actually improve the feasibility
of inferring the function of co-expressed genes based on observed expression patterns
together with standard functional information inferred from the subset of conserved
annotated genes with identifiable orthologs from model systems. It will also be possible
to determine the relative contribution of conserved vs taxonomically restricted genes to
co-expression modules.
Materials and methods
Colony setup
Two replicate M. pharaonis observation colonies were established,
each with 10 queens, approximately 4000 workers, and 1000 brood, representing a
random subsample of a larger source colony. Each colony was established from a
separate source colony, which came from a stock of approximately 40 colonies that
have been repeatedly mixed across generations so that they are genetically similar.
Observation nests were constructed of two pieces of 5 × 15 cm glass separated
by 1.5 mm thick plastic sheeting. Colonies were given water in cotton-plugged test
tubes, 50% honey solution, beef liver, egg yolk, and mealworms ad
libitum, replaced twice a week. Colonies were maintained at 27 C and 65%
relative humidity in climate controlled rooms at the University of Pennsylvania.Every 3 days, 600 newly eclosed callow workers, which were inferred to be
approximately 0–1 days old, were collected from 8–10 stock colonies.
These callow workers were briefly anesthetized with CO2 and individually
paint marked on the gaster with a unique age cohort color dot using a Sharpie extra
fine oil based paint pen, and then 300 were added to each of the observation
colonies. Five uniquely marked age cohorts were thus added to the colonies on days 1,
4, 7, 10, and 13 of the study. Nestmate recognition is at most weak and transient in
M. pharaonis (Schmidt et al.,
2010), and callows in particular are readily accepted. We also set up a
camera to automatically take images of the nest areas of each colony once every 20
min for the entire period of the study, although we do not further discuss these
images.Previous literature indicates that M. pharaonis workers are expected
to live 9–10 weeks (Peacock and Baxter,
1950), but our preliminary trials with our setup indicated that workers
tend to die or lose their paint marks after several weeks. We ran the study for 1
month, expecting to capture the major age-based transitions in worker behavior (e.g.,
the nursing to foraging transition observed in other species), but it is possible
that we missed late behavioral transitions that occurred towards the end of
workers' lives. In practice, such late transitions are difficult to detect as
sample size necessarily declines as increasing numbers of workers die.
Behavioral analyses
A behavioral scan of each colony was completed once each day for the duration of the
month-long study by recording the instantaneous behavior and location observed for
every visible paint-marked worker. Each behavioral scan was performed at 20×
magnification with a Nikon SMZ800 stereomicroscope. We recorded 30 distinct
behaviors, but only 15 were observed more than 15 total times during the study period
(Supplementary file
1). We defined an individual as foraging if it was observed on a food or
water source or actually carrying food (i.e., foraging included the behaviors
‘on honey’, ‘on liver’, ‘on water’, or
‘carrying food’; Supplementary file 1). Each experimental colony contained four
identifiable locations that were redefined prior to each behavioral scan: brood area,
brood periphery, remaining nest area, and foraging area. The brood area was defined
as the central area within the nest containing all brood and queens (Edwards, 1991). The nest periphery was defined
as the region directly adjacent to the brood area, where workers were dense in
aggregation but not in contact with any of the brood. The nest area was defined as
the sparsely occupied remainder of the space within the nest, not including the brood
area and nest periphery. The foraging area included all areas outside of the nest.
Analyses of behavioral data were conducted in R (www.r-project.org).
Worker sampling, genomic DNA sequencing, mRNA amplification, and RNA library
preparation
Every 3 days, whole bodies of five individuals from each available uniquely paint
marked age cohort were collected, flash frozen in liquid nitrogen, and stored at
−80°C. This sampling scheme resulted in seven groups of individuals of
known age (0, 3, 6, 9, 12, 15, and 18+ days old). 20 individuals of each of
these age category were pooled for whole body RNA extraction for each of the two
replicate observation colonies. In addition, for each of the two replicate
observation colonies, we collected and pooled 20 non-paint marked workers in the act
of the following five behaviors: nursing larvae, grooming larvae, engaged in
trophallaxis with other workers, foraging for protein (collecting egg, mealworm, or
liver), and foraging for carbohydrates (collecting honey solution). RNA was extracted
from pools of worker samples of known age or observed behavior using Qiagen RNeasy
kits with standard protocols. RNA sequencing libraries were constructed at the
University of Arizona Genetics Core (UAGC) with RNA TruSeq library construction kits
following standard protocols. In total there were 24 libraries: 2 colony replicates
× (7 age groups + 5 behavioral groups). RNA sequencing was conducted at
the University of Arizona Genetics Core on an Illumina HiSeq2000 with 100 bp paired
ends reads, with six samples multiplexed per lane, randomly distributed across four
lanes. Sequences were post-processed by cutadapt (Martin, 2011) to remove Illumina adapter sequences and ConDeTri (Smeds and Künstner, 2011) to remove
low-quality bases.
Reference genome sequencing and assembly
DNA from a single haploid male (183 ng) was used to prepare a TruSeq library, which
was sequenced in multiplex on an Illumina HiSeq 2000, yielding 70,894,179 million 100
bp read pairs. Raw genomic reads were quality and adaptor trimmed using ConDeTri and
cutadapt (Martin, 2011; Smeds and Künstner, 2011), producing
57,002,951 read pairs and 8,361,560 single reads (12.3 Gb total). The assembly was
carried out using ABYSS, with a range of kmers from 53 to 91 (Simpson et al., 2009). We then chose the assembly with the
longest N50 as the reference for transcriptome assembly. Genome assembly quality was
evaluated using the CEGMA pipeline (Parra et al.,
2009), and by re-mapping the paired end trimmed reads using bowtie2 (Langmead and Salzberg, 2012).
Reference-based transcriptome assembly, annotation and differential gene
expression analysis
The transcriptome was mapped to the reference using Tophat 2, and assembled into
transcripts using Cufflinks 2.1 (Roberts et al.,
2011; Kim et al., 2013). Gene
expression data were obtained by re-mapping the transcript reads to the extracted
transcripts using RSEM and calculating the expected counts at the gene level (Li and Dewey, 2011). When multiple isoforms of
a single locus were found, only the longest transcript was used for gene annotation.
Assembled transcripts were annotated using BLASTX from the non-redundant NCBI
database with expectation values of E = 10−5. These results
were used to assign Gene Ontology (GO) profiles with Blast2go (Conesa et al., 2005).
Differential gene expression analysis and transcriptional network
analysis
Transcript counts were filtered by abundance, removing those with less than 1
fragment per kilobase mapped (FPKM) in more than half of the libraries (Mortazavi et al., 2008). Differential gene
expression analysis was carried out in edgeR, using a GLM fit to the count data and
identifying differentially expressed genes using planned linear contrasts (Robinson et al., 2010). In order to infer
co-expression modules and gain an insight into network structure of gene
interactions, we performed a weighted gene co-expression network analysis (WGCNA) on
the count data (Langfelder and Horvath,
2008). WGCNA was conducted on the entire transcript set, after filtering out
the low-abundance transcripts. This analysis relies on patterns of gene
co-expression, but has been shown to reconstruct protein–protein interaction
networks with reasonable accuracy (Zhao et al.,
2010; Allen et al., 2012). We used
total connectivity as a measure of gene interaction strength, because it is not as
sensitive to module assignments, and most likely reflects the overall selective
pressures acting on the gene, beyond those imposed by its role in age polyethism. As
with most gene expression analysis, WGCNA provides better estimates for highly
abundant genes, and in particular for genes showing variation in their expression
levels. Consequently, low-abundance and invariant genes will show lower
connectivity.GO term enrichment analysis was performed using the R package GOstats (Falcon and Gentleman, 2007). We report GO terms
as enriched when p < 0.05.
Evolutionary rate and gene expression conservation analyses
Fire ant (S. invicta) orthologs for each gene were determined using
reciprocal best BLASTP, using cutoffs of 10−10. This
parameterization allowed for high specificity, though at the cost of sensitivity,
since paralogs were ignored (Chen et al.,
2007). These results were used to predict the M. pharaonis
coding sequence using ORFPredictor (Min et al.,
2005). We then computed the pairwise dN/dS ratios for each gene using the
branch model in PAML (v. 4.6). Using the list of differentially expressed genes in
foragers vs nest workers in the fire ant (Manfredini et al., 2014), Fisher's exact tests were used to examine
whether genes differentially expressed in these categories of workers were more
likely conserved, than expected by chance. We repeated the analysis above using honey
bee (A. mellifera) genes, except that the BLAST cutoff was lowered
to 10−5 to increase the chance of identifying orthologs in the more
divergent honey bee.To initially study whether evolutionary rate (dN/dS), connectivity (kTotal), and
expression (FPKM) differed between behavioral categories (nurse-upregulated,
forager-upregulated, and non-differentially expressed), we used a
Kruskal–Wallis test, adjusted for multiple comparisons (kruskalmc function in
the R package pgirmess). Finally, to study the main and interaction effects of
connectivity, expression, and behavioral category on evolutionary rate, we used a
linear model log transformed rate as the dependent variable, log transformed
connectivity and expression as continuous predictors, and behavioral category as a
categorical predictor.
Statistical analysis
Statistical analysis was performed with R. Means are presented ± their
standard deviations. p-value cutoffs of 0.05 were used throughout the analysis. In
the case of differential gene expression, data analyses were corrected for multiple
comparisons using the Benjamini-Hochberg (FDR) procedure (Benjamini and Hochberg, 1995).
Code and data availability
Scripts for the bioinformatic analyses, and a README explaining the workflow can be
found at https://github.com/mikheyev/monomorium-polyethism. Most of the
workflow and output is shown in Supplementary file 2, with the corresponding R script shown
in Source code 1. All
behavioral and gene expression data, including a MySQL database for the gene
expression data have been deposited to Dryad, doi:10.5061/dryad.cv0q3 (Mikheyev
and Linksvayer, 2014). Raw reads and the genome assembly are available at
the DNA Data Bank of Japan, DDBJ BioProject PRJDB3164.eLife posts the editorial decision letter and author response on a selection of the
published articles (subject to the approval of the authors). An edited version of the
letter sent to the authors after peer review is shown, indicating the substantive
concerns or comments; minor concerns are not usually shown. Reviewers have the
opportunity to discuss the decision before the letter is sent (see review
process). Similarly, the author response typically shows only responses
to the major concerns raised by the reviewers.Thank you for sending your work entitled “Genes associated with ant social
behavior show distinct transcriptional and evolutionary patterns” for
consideration at eLife. Your article has been favorably evaluated by
Diethard Tautz (Senior editor), a Reviewing Editor, and three reviewers.The Reviewing editor and the reviewers discussed their comments before we reached this
decision, and the Reviewing editor has assembled the following comments to help you
prepare a revised submission.While all three reviewers are enthusiastic about the study, several important concerns
were voiced by all of them. All reviewers pointed out that this paper does not provide
the necessary statistical details to be able to assess the quality of the work. This is
certainly the most important concern, as hypothesis acceptance/rejection—which is
the central message of the study—fully depends on gene expression analysis and
its interpretation. The respective comments are combined below and it will be necessary
to address them before a final decision about acceptance can be reached.More specifically:The authors investigate the “novel social genes” vs the “gene
toolkit” hypotheses. This is certainly interesting and worthwhile. However, one
general issue I have with this (and most other) studies on the topic is that it is not
clear what “exactly” constitutes evidence for one hypothesis or the
other.For example, the authors here state that the percentage of genes differentially
expressed in the same context in the two ants is small (∼3%). However, the
overlap is marginally significant (fifth paragraph of the Results section). So the
question is, what percent overlap exactly would constitute support for the toolkit
hypothesis? 50%? 25%? 10% 5%?The point is that it isn't clear how and when authors reject or fail to reject
the toolkit/novel hypotheses because there is never an explicit significance or overlap
threshold provided. The authors are not alone in having to deal with this issue, and so
I don't want to lay this completely at their feet, but it would be nice if they
stated explicitly what exactly would constitute evidence, or lack thereof, for the
toolkit hypothesis somewhere before they get to the results.Related to this issue, I wanted more detail about the analyses comparing expression
patterns between taxa. Did the authors just ask if the same genes were differentially
expressed between behavioral types in the two taxa to be viewed as
'consistent' with the toolkit model? Or did a particular gene have to be
differentially expressed in the same direction in both taxa? (The latter, I think,
although this wasn't clear.) And did a gene need to be significantly
differentially expressed to be counted? Or was it sufficient that it show the same
directionality in expression, regardless of significance? Directionality without
significance is as important as significance, given that the studies in the taxa had
different power, used different methods, etc. These are not trivial issues and they may
affect the outcome and interpretation of the results. I urge the authors to look into
this more closely.In the third paragraph: The authors state they are interested in the “molecular
mechanisms of social interactions (e.g. social signal production, reception and
response)”. They refer to genes related to social behavior throughout the
manuscript. But are their expression profiles indeed reflecting social information
producing/processing skills? Or morphological changes related to other functions, such
as exposure to exterior environments? They do not state they used whole ants or just ant
heads for transcriptome profiling, which is highly crucial for interpreting the results
(especially given that some other studies have used the animal head).In the third paragraph of the Results section: “Of these contrasts, only foragers
and nurses had significantly different gene expression patterns.” This is not
well explained. Four categories are compared and 2 of these are said to have different
expression. What is the reference here; are the other two categories (grooming and
trophallaxis) not different from these two? Or perhaps I am missing something?This can be partly followed in the Supplementary Material, but should be referred in the
main text, e.g. it seems as if all samples were grouped together in the DE analysis. The
foragers and nurses were most different as they represent the youngest and oldest. I
would have stated this explicitly.I am also concerned somewhat about the PCA in the Supplementary Material: There seems to
be two groups emerging, but this is likely technical (I would guess sample processing
dates). It might be difficult to control for this, but if possible, could improve DE
analysis significantly.In the third paragraph of the Results section: “There were 1217 forager- and 1247
nurse-upregulated genes”. What was the p-value cutoff? How did the authors
control for multiple-testing? (This can also be followed in the Supplementary Material,
but should be referred in the main text.)In the fourth paragraph of the Results section: “(…) it separated workers
into two distinct classes based on age”. If I understand what was done, I think
the authors might be overinterpreting: the algorithm will separate the profiles into 2
classes if k=2, and n classes if k=n. Thus, without additional analysis I
think one cannot decide on the existence of distinct classes. The authors could consider
applying some other test; e.g. check the slope of the expression-age curve.In the fifth and sixth paragraphs of the Results section: In the gene expression
conservation analysis, we are given no information how many genes are used in the
comparisons (i.e, the number of genes showing DE in both this and the other datasets, as
well as background genes). If the numbers are low, they could instead check the effect
size of orthologous genes identified as DE for honeybee, for example. Was the honeybee
data generated by Manfredini et al., 2014? If not, the authors should state that.Most importantly, if the honeybee data was generated from the brain (as done by quite a
few studies) and the data in this study from the whole body, this could also be a reason
for finding limited overlap.In comparisons with the Fisher's exact test, it would be useful to state what the
background is (non-DE genes, genes up-regulated in the other category, or both?).The expression “whether genes differentially expressed in these categories of
workers were more likely conserved” is a bit confusing, as it also implies
sequence conservation, but I think the authors mean conservation with respect to
correlated changes.In the seventh paragraph of the Results section: Connectivity—this could be more
explicitly defined, such as emphasizing that the prediction comes from transcription
data correlations (e.g. not protein-protein interaction data), and that it depends on
how the modules are defined. I think the authors could also discuss potential biases
here. Depending on the signal/noise ratio of a gene and the module size, how would
connectivity be affected? One would want to make sure that these factors are not
influential on the reported result.Figure 2: Would it not be informative to add a
violin plot (similar to A and B) for dN/dS? Especially so, as lower conservation among
up-regulated genes is one of the paper's main points. But no information is given
regarding the magnitude of the effect. The authors could also plot expression versus
connectivity.In the sixth paragraph of the Results section: There is little discussion on the GO
analysis. Does the UV response pathway have to do with sudden exposure to the sun? At
least would one not expect to see the same pathway up-regulated in foragers of other
taxa?Please indicate the p-value cutoffs for the GO analysis. This is also found in the
Supplementary Material, but should be in the main text or Methods.It would be helpful if the authors addressed the following:What is the estimated genome size? What was the CEGMA assembly score for the de novo
genome assembly? What was the average coverage per sample for the genomic and
transcriptomic data?The main conclusion that “genes unregulated in foragers and nurses were on
average less connected and more rapidly evolving” (ninth paragraph of the Results
section) relies heavily on the assumption that they are working with a high-quality
transcriptome and that their orthology assignments are correct.How did they evaluate this? A table with summary statistics would be very useful. How
many transcripts had homology to the fire ant and/or the honey bee? How was the paralog
problem dealt with, particularly with respect to the molecular evolution analyses?Similarly, for the network analyses: Were these co-expression networks calculated only
on significant transcripts or on all transcripts? How was a significant
“network” determined? Two of the modules had > 8000 transcripts in
each of them. Does that mean all 8000 transcripts show tightly-correlated expression
levels?Finally, why didn't the authors include Polistes in their
comparative analyses? There are at least 2 studies on Polistes, both of
which are already cited in this manuscript. This seems like it would be another
independent data point worth discussing.While all three reviewers are enthusiastic about the study, several important
concerns were voiced by all of them. All reviewers pointed out that this paper does
not provide the necessary statistical details to be able to assess the quality of the
work. This is certainly the most important concern, as hypothesis
acceptance/rejection—which is the central message of the
study—fully depends on gene expression analysis and its
interpretation. The respective comments are combined below and it will be necessary
to address them before a final decision about acceptance can be reached.The statistical analysis for this project, as for most other bioinformatics projects, is
quite complex, and we chose to present the complete analytical pipeline and data as a
supplement to make it completely reproducible. We may have relied too much on our
supplement containing the complete analysis, omitting details from the main text. In
this revision we spent a lot of time addressing statistical questions and included the
relevant details in the main text. We also added several additional analyses.More specifically:The authors investigate the “novel social genes” vs the
“gene toolkit” hypotheses. This is certainly interesting and
worthwhile. However, one general issue I have with this (and most other) studies on
the topic is that it is not clear what “exactly” constitutes evidence
for one hypothesis or the other.For example, the authors here state that the percentage of genes differentially
expressed in the same context in the two ants is small (∼3%). However, the
overlap is marginally significant (fifth paragraph of the Results section). So the
question is, what percent overlap exactly would constitute support for the toolkit
hypothesis? 50%? 25%? 10% 5%?The point is that it isn't clear how and when authors reject or fail to
reject the toolkit/novel hypotheses because there is never an explicit significance
or overlap threshold provided. The authors are not alone in having to deal with this
issue, and so I don't want to lay this completely at their feet, but it would
be nice if they stated explicitly what exactly would constitute evidence, or lack
thereof, for the toolkit hypothesis somewhere before they get to the
results.We agree completely. We struggled to come up with a satisfying approach to
quantitatively test these hypotheses, but finally came to the conclusion that, as
currently described in the literature, both the “genetic toolkit” and
“novel social genes” hypotheses are somewhat ambiguous, such that it is
not currently possible to really test these hypotheses. Similarly, a recent book chapter
by Adam Wilkins (2013) points out that the
concept of a developmental genetic toolkit is widely accepted, but it is also very
unclear what actually makes up the genetic toolkit, so that the hypothesis is very
difficult to actually test. Because these two hypotheses are entrenched in the
literature—and the idea of a genetic toolkit underlying social behavior is
frequently cited as the major finding of sociogenomic
research—we describe them in the Introduction to motivate our work. In the
Discussion, we explain how our results fit with these two hypotheses, but then we move
on to our main, and we believe, most exciting point: the genes we identified as being
associated with ant division of labor show different patterns of connectivity and
evolutionary conservation, and thus the genetic architecture of ant division of labor
includes both highly connected and conserved genes as well as more loosely-connected and
evolutionarily labile genes.Related to this issue, I wanted more detail about the analyses comparing
expression patterns between taxa. Did the authors just ask if the same genes were
differentially expressed between behavioral types in the two taxa to be viewed as
'consistent' with the toolkit model? Or did a particular gene have to
be differentially expressed in the same direction in both taxa? (The latter, I think,
although this wasn't clear.) And did a gene need to be significantly
differentially expressed to be counted? Or was it sufficient that it show the same
directionality in expression, regardless of significance? Directionality without
significance is as important as significance, given that the studies in the taxa had
different power, used different methods, etc. These are not trivial issues and they
may affect the outcome and interpretation of the results. I urge the authors to look
into this more closely.Following this suggestion we conducted the suggested analysis using the fire ant data.
When we look at just the direction of expression, or the fold-change in expression
across the genome, we find results very similar to those obtained when we only consider
differentially expressed genes. Whereas we previously found a 3.2-3.8 percent overlap in
differentially expressed genes between the fire ant and pharaoh ant data set, the
analysis based on the correlation of log fold change between nurse and forager samples
in the two species involves thousands of genes, but explains only 2% of the variance.
So, the results of the two analyses are consistent. This significant but seemingly low
overlap is also consistent with previous studies interpreted as supporting the genetic
toolkit hypothesis. For example, Woodard et al.
(2014) found 18 brain-expressed genes associated with feeding behavior in both
the bumblebee Bombus terrestris and the paper waspPolistes
metricus, which was more than expected by chance. Considering only genes
with identifiable homologs between the two study species and A.
mellifera, this corresponds to an overlap of 7.5% (18/239), but considering
all 2,563 identified B. terrestris feeding-associated genes, this
corresponds to an overlap of only 0.7% (18/2563).In the third paragraph: The authors state they are interested in the
“molecular mechanisms of social interactions (e.g. social signal production,
reception and response)”. They refer to genes related to social behavior
throughout the manuscript. But are their expression profiles indeed reflecting social
information producing/processing skills? Or morphological changes related to other
functions, such as exposure to exterior environments? They do not state they used
whole ants or just ant heads for transcriptome profiling, which is highly crucial for
interpreting the results (especially given that some other studies have used the
animal head).We agree that these are very important points. We used whole ant bodies and we emphasize
this point more in the revised text. We also agree that the differentially expressed
genes we observed may be related to a wide range of physiological and behavioral
processes, and not just social signal production and response. The underlying
physiological processes are particularly intermeshed because by definition, age-based
division of labor in social insects means that the individuals are aging as they
transition from nursing to foraging tasks. We have clarified these issues in the revised
text.In the third paragraph of the Results section: “Of these contrasts, only
foragers and nurses had significantly different gene expression patterns.”
This is not well explained. Four categories are compared and 2 of these are said to
have different expression. What is the reference here; are the other two categories
(grooming and trophallaxis) not different from these two? Or perhaps I am missing
something?This can be partly followed in the Supplementary Material, but should be
referred in the main text, e.g. it seems as if all samples were grouped together in
the DE analysis. The foragers and nurses were most different as they represent the
youngest and oldest. I would have stated this explicitly.There were four behavioral categories: foragers, nurses, and workers engaged in
trophallaxis and grooming. We wished to see which of the categories were
transcriptionally distinct from the others, and made comparisons of the focal category
vs all of the others. This section was indeed unclear and we re-wrote it with extra
detail.I am also concerned somewhat about the PCA in the Supplementary Material: There
seems to be two groups emerging, but this is likely technical (I would guess sample
processing dates). It might be difficult to control for this, but if possible, could
improve DE analysis significantly.All the samples were processed simultaneously and in the same facility. We cross-checked
the samples, and the two groups don’t correspond to sequencing lanes or to any
other discernable categories. The within-sample replicates, except for the loosely
defined behavioral category ‘trophallaixis’ are close to each other,
suggesting that the biological signal is strong in the data. Most importantly, the
separation between the clusters does not correlate with any biological signal detected
in the data. Finally, worker samples on the first day of eclosion (age 0) are an outlier
on this plot. As these workers are just beginning their transition to adult life, with
numerous physiological changes, such as cuticle hardening, it is logical to expect that
they would be separate from the others. If we discount these points, there is far less
evidence for the existence of separate groups in the scatter.In the third paragraph of the Results section: “There were 1217 forager-
and 1247 nurse-upregulated genes”. What was the p-value cutoff? How did the
authors control for multiple-testing? (This can also be followed in the Supplementary
Material, but should be referred in the main text.)We followed the standard practice of 0.05 cutoff after using FDR to adjust for multiple
comparisons. We now specify this in the new Statistical Analysis section of the
Methods.In the fourth paragraph of the Results section: “(…) it separated
workers into two distinct classes based on age” If I understand what was done,
I think the authors might be overinterpreting: the algorithm will separate the
profiles into 2 classes if k=2, and n classes if k=n. Thus, without
additional analysis I think one cannot decide on the existence of distinct classes.
The authors could consider applying some other test; e.g. check the slope of the
expression-age curve.We were interested in testing whether our temporal data could be classified into two
groups, corresponding to nurses and foragers, and, if so, where would the transition
point be. In the analysis, k=2 refers not to the number of classes, but to the
number of nearest neighbors considered by the k-nearest neighbor classifier to make the
distinction. We obtain basically the same result with k=2 and k=3, which
is within the range of the rule-of-thumb recommendations (references listed in KNN
section of the complete analysis). We also include a simple cluster diagram as Figure 1—figure supplement 1, to show that
the there is indeed a breakpoint before day 12. We have also clarified the logic behind
this analysis in the text.In the fifth and sixth paragraphs of the Results section: In the gene expression
conservation analysis, we are given no information how many genes are used in the
comparisons (i.e, the number of genes showing DE in both this and the other datasets,
as well as background genes). If the numbers are low, they could instead check the
effect size of orthologous genes identified as DE for honeybee, for example. Was the
honeybee data generated by Manfredini et al., 2014? If not, the authors should state
that.We added the numbers of genes, which were indeed low, and a parallel analysis using
effect sizes (log fold expression change) for fire ants, which produced the same result.
Unfortunately, for the honey bee study, only the list of differentially expressed genes
is publicly available, so that we could not perform a similar analysis for the honey bee
data set. The unavailability of previous data sets is one reason that we have strived to
make our full analysis available and completely transparent, as well as making all of
the data available.Most importantly, if the honeybee data was generated from the brain (as done by
quite a few studies) and the data in this study from the whole body, this could also
be a reason for finding limited overlap.In comparisons with the Fisher's exact test, it would be useful to state
what the background is (non-DE genes, genes up-regulated in the other category, or
both?).The expression “whether genes differentially expressed in these
categories of workers were more likely conserved” is a bit confusing, as it
also implies sequence conservation, but I think the authors mean conservation with
respect to correlated changes.The honeybee data was indeed generated from the brain, and this is certainly a plausible
reason for the low observed overlap. We have added this very important point to our
discussion, qualifying our conclusions in this light. In the revised manuscript we focus
much less on the proportional overlap of the pharaoh ant and fire ant/honey bee data
sets, because of these methodological limitations of the comparisons, and as discussed
above, it is difficult to interpret what exactly these overlaps mean.In the seventh paragraph of the Results section: Connectivity—this could
be more explicitly defined, such as emphasizing that the prediction comes from
transcription data correlations (e.g. not protein-protein interaction data), and that
it depends on how the modules are defined. I think the authors could also discuss
potential biases here. Depending on the signal/noise ratio of a gene and the module
size, how would connectivity be affected? One would want to make sure that these
factors are not influential on the reported result.We added a more detailed description of WGCNA to the Methods section, including
citations to studies evaluating WGCNA performance. For our measure of connectivity, we
used the total connectivity of a gene, which is less sensitive to how modules are
defined, and most likely reflect the overall role of the gene, beyond the modules we
detect in our data set. Although the authors of the WGCNA package suggest that the
method can run on normalized count data, in the course of considering potential biases,
we found the gene lengths varied somewhat among behavioral categories. We then re-ran
the analysis using FPKM data, which are length-standardized. This analysis also captured
major network effects, but had a much better fit to the data. In particular, we had to
use a smaller soft thresholding level before an approximately scale-free topology of the
network was observed (a WGCNA requirement). We were also able to detect many more
modules at the same cutoff levels, suggesting greater network resolution.The most obvious remaining bias from this sort of analysis is that genes with low
expression and low variability will not be detected as differentially expressed. Indeed,
this effect can clearly be seen in Figure 2B,
with the average expression level of non-differentially expressed genes is lower.
However, contrary to what you would expect if the pattern was driven by this bias, the
nurse-upregulated genes show lower connectivity than non-differentially expressed genes
(Figure 2A). We also explicitly control for
expression level by including expression level in our GLM analyses.: Would it not
be informative to add a violin plot (similar to A and B) for dN/dS? Especially so, as
lower conservation among up-regulated genes is one of the paper's main points.
But no information is given regarding the magnitude of the effect. The authors could
also plot expression versus connectivity.We have added each of these suggested plots and have also added plots showing the
effects of connectivity and expression on whether genes had identifiable fire ant and
honey bee orthologs (Figure 3).In the sixth paragraph of the Results section: There is little discussion on the
GO analysis. Does the UV response pathway have to do with sudden
exposure to the sun? At least would one not expect to see the same pathway
up-regulated in foragers of other taxa?Please indicate the p-value cutoffs for the GO analysis. This is also found in
the Supplementary Material, but should be in the main text or Methods.We have added further discussion of the GO analysis and include the full set of GO terms
enriched for both the forager- and nurse-upregulated genes as well as each module
separately. We do not dwell on GO terms since approximately 50% of all genes in our
analysis do not even have identifiable orthologs, so we cannot generally be confident
about inferring function of differentially-expressed genes or gene modules.It would be helpful if the authors addressed the following:What is the estimated genome size? What was the CEGMA assembly score for the de
novo genome assembly? What was the average coverage per sample for the genomic and
transcriptomic data?As we mention in response to Reviewer 1, a genome project was not the goal of this
study. That being said, as the quality of the reference genome assembly is important to
this study, we now include quality control statistics in the Results, such as coverage
statistics and CEGMA, as requested. Unfortunately, there is no independently estimated
size of the M. pharaonis genome.The main conclusion that “genes unregulated in foragers and nurses were
on average less connected and more rapidly evolving” (ninth paragraph of the
Results section) relies heavily on the assumption that they are working with a
high-quality transcriptome and that their orthology assignments are
correct.How did they evaluate this? A table with summary statistics would be very
useful. How many transcripts had homology to the fire ant and/or the honey bee? How
was the paralog problem dealt with, particularly with respect to the molecular
evolution analyses?As with all studies involving gene orthology, particularly with-non models species,
there is no genome-wide gold standard that allows the performance of a method to be
evaluated. Because the data set is based on transcriptomic data, there will necessarily
be false negatives associated with poorly expressed transcripts. That being said, our
choice of reciprocal best hit is well justified, based on comparisons of various
available methods, as it has high specificity, though at the cost of sensitivity, as it
ignores paralogous relationships for genes that have been duplicated in one of the
lineages (Chen et al., 2007).Another important consideration is that we used an independently assembled genome to
estimate evolutionary rates. Consequently, our measurements of nucleotide-level changes
were not dependent on the quality of the transcriptome, except that some regions of the
gene may be missing due to poor transcript coverage. So, transcript quality should not
bias the evolutionary rate estimates.Similarly, for the network analyses: Were these co-expression networks
calculated only on significant transcripts or on all transcripts? How was a
significant “network” determined? Two of the modules had > 8000
transcripts in each of them. Does that mean all 8000 transcripts
show tightly-correlated expression levels?The networks were calculated using all transcripts and all samples. There are no
measures of module significance per se, and the actual module structure may change if
different cutoffs are chosen in the analysis (see Supplementary File 2).
Indeed, in the current analysis, the number of modules is greater than the original
analysis. The effect of cutoff on the number of modules can be seen in the Supplementary File 2. As
described above, we use WGNA to estimate total network connectivity (i.e.
“kTotal” in WGCNA) for each node (i.e. gene), and present the module trait
correlation heat map as a low-dimensional visualization of the underlying
transcriptional changes.As we discuss in the manuscript, the true measure of a module’s validity may be
its repeatability across different studies within and between lineages. Thus, it will be
important to cross-check the modules found in our data with future studies conducted
using comparable methodology. This is another reason we are including the complete data
with our analysis.Finally, why didn't the authors include Polistes in
their comparative analyses? There are at least 2 studies on
Polistes, both of which are already cited in this manuscript. This seems like
it would be another independent data point worth discussing.The Polistespaper wasp studies focus on reproductive division of labor
between queens and workers and do not include data on worker age polyethism, and are
thus not directly comparable with our study. Even though age polyethism is thought to be
widespread in social insects, the transcriptomic basis of age polyethism has only been
studied in the honey beeApis mellifera and the fire antSolenopsis invicta, with the fire ant study only comparing the two
extremes of workers inside the nest and workers outside the nest (Manfredini et al., 2014).
Authors: Daniel F Simola; Lothar Wissler; Greg Donahue; Robert M Waterhouse; Martin Helmkampf; Julien Roux; Sanne Nygaard; Karl M Glastad; Darren E Hagen; Lumi Viljakainen; Justin T Reese; Brendan G Hunt; Dan Graur; Eran Elhaik; Evgenia V Kriventseva; Jiayu Wen; Brian J Parker; Elizabeth Cash; Eyal Privman; Christopher P Childers; Monica C Muñoz-Torres; Jacobus J Boomsma; Erich Bornberg-Bauer; Cameron R Currie; Christine G Elsik; Garret Suen; Michael A D Goodisman; Laurent Keller; Jürgen Liebig; Alan Rawls; Danny Reinberg; Chris D Smith; Chris R Smith; Neil Tsutsui; Yannick Wurm; Evgeny M Zdobnov; Shelley L Berger; Jürgen Gadau Journal: Genome Res Date: 2013-05-01 Impact factor: 9.043
Authors: Benjamin E R Rubin; Beryl M Jones; Brendan G Hunt; Sarah D Kocher Journal: Philos Trans R Soc Lond B Biol Sci Date: 2019-06-03 Impact factor: 6.237
Authors: Christopher B Cunningham; Kathryn VanDenHeuvel; Daven B Khana; Elizabeth C McKinney; Allen J Moore Journal: Biol Lett Date: 2016-04 Impact factor: 3.703
Authors: Solenn Patalano; Anna Vlasova; Chris Wyatt; Philip Ewels; Francisco Camara; Pedro G Ferreira; Claire L Asher; Tomasz P Jurkowski; Anne Segonds-Pichon; Martin Bachman; Irene González-Navarrete; André E Minoche; Felix Krueger; Ernesto Lowy; Marina Marcet-Houben; Jose Luis Rodriguez-Ales; Fabio S Nascimento; Shankar Balasubramanian; Toni Gabaldon; James E Tarver; Simon Andrews; Heinz Himmelbauer; William O H Hughes; Roderic Guigó; Wolf Reik; Seirian Sumner Journal: Proc Natl Acad Sci U S A Date: 2015-10-19 Impact factor: 11.205
Authors: Krista K Ingram; Deborah M Gordon; Daniel A Friedman; Michael Greene; John Kahler; Swetha Peteru Journal: Proc Biol Sci Date: 2016-08-31 Impact factor: 5.349
Authors: Christopher B Cunningham; Lexiang Ji; R Axel W Wiberg; Jennifer Shelton; Elizabeth C McKinney; Darren J Parker; Richard B Meagher; Kyle M Benowitz; Eileen M Roy-Zokan; Michael G Ritchie; Susan J Brown; Robert J Schmitz; Allen J Moore Journal: Genome Biol Evol Date: 2015-10-09 Impact factor: 3.416
Authors: Chris R Smith; Sara Helms Cahan; Carsten Kemena; Seán G Brady; Wei Yang; Erich Bornberg-Bauer; Ti Eriksson; Juergen Gadau; Martin Helmkampf; Dietrich Gotzek; Misato Okamoto Miyakawa; Andrew V Suarez; Alexander Mikheyev Journal: Mol Biol Evol Date: 2015-07-29 Impact factor: 16.240