Literature DB >> 32489603

Caterpillars on a phytochemical landscape: The case of alfalfa and the Melissa blue butterfly.

Matthew L Forister^1,2, Su'ad A Yoon^1,2, Casey S Philbin^2,3, Craig D Dodson^2,3, Bret Hart⁴, Joshua G Harrison⁵, Oren Shelef⁶, James A Fordyce⁷, Zachary H Marion⁸, Chris C Nice⁹, Lora A Richards^1,2, C Alex Buerkle⁵, Zach Gompert¹⁰.

Abstract

Modern metabolomic approaches that generate more comprehensive phytochemical profiles than were previously available are providing new opportunities for understanding plant-animal interactions. Specifically, we can characterize the phytochemical landscape by asking how a larger number of individual compounds affect herbivores and how compounds covary among plants. Here we use the recent colonization of alfalfa (Medicago sativa) by the Melissa blue butterfly (Lycaeides melissa) to investigate the effects of indivdiual compounds and suites of covarying phytochemicals on caterpillar performance. We find that survival, development time, and adult weight are all associated with variation in nutrition and toxicity, including biomolecules associated with plant cell function as well as putative anti-herbivore action. The plant-insect interface is complex, with clusters of covarying compounds in many cases encompassing divergent effects on different aspects of caterpillar performance. Individual compounds with the strongest associations are largely specialized metabolites, including alkaloids, phenolic glycosides, and saponins. The saponins are represented in our data by more than 25 individual compounds with beneficial and detrimental effects on L. melissa caterpillars, which highlights the value of metabolomic data as opposed to approaches that rely on total concentrations within broad defensive classes.

Entities: CellLine Chemical Disease Species

Keywords: Lycaeides melissa; Medicago sativa; metabolomics; plant defense; specialization

Year: 2020 PMID： 32489603 PMCID： PMC7246198 DOI： 10.1002/ece3.6203

Source DB: PubMed Journal: Ecol Evol ISSN： 2045-7758 Impact factor: 2.912

INTRODUCTION

One of the conceptual pillars of trophic ecology is the idea that herbivores must overcome the barrier of plant defensive chemistry before extracting the nutrients necessary for growth and reproduction (Feeny, Rosenthal, & Berenbaum, 1992). The success of this idea is reflected in several areas of research that include coevolution (Agrawal, Petschenka, Bingham, Weber, & Rasmann, 2012), ecological specialization (Dyer, 1995), and nutrient flow in ecosystems (Hättenschwiler & Vitousek, 2000). In most cases, progress has been made by chemical ecologists focusing on small subsets of the specialized metabolites produced by plants and consumed by herbivores. The focus on a few charismatic molecules or classes of compounds, such as furanocoumarins (Berenbaum, 1983) or cardiac glycosides (Zalucki, Brower, & Alonso‐M, 2001), was at least in part necessitated by early methods in natural products chemistry that were targeted and not easily optimized for the discovery of large suites of co‐occurring metabolites (Dyer et al., 2018; Maag, Erb, & Glauser, 2015). As technological limitations have dissipated, the opportunity now exists for a more comprehensive understanding of the challenges faced by herbivores, with the possibility of discovering, among other things, novel compounds and synergistic interactions among compounds (Prince & Pohnert, 2010; Richards, Dyer, Smilanich, & Dodson, 2010; Sardans, Penuelas, & Rivas‐Ubach, 2011). More generally, an important task is to quantify the phytochemical complexity of the antagonistic interaction between plants and herbivores, with an eye toward understanding constraints on the evolution of both players (Fordyce & Nice, 2008; Macel, van Dam, & Keurentjes, 2010) and predicting the formation of new plant‐herbivore interactions (Erbilgin, 2018). Here we use the example of a specialized herbivore and a recently colonized host plant to investigate the phytochemical landscape from the perspective of developing caterpillars. By the "phytochemical landscape" we mean metabolomic variation among individual plants and associated toxic and nutritional effects on, in our case, a focal herbivore (Glassmire et al., 2019; Hunter, 2016; Wu, Wilson, Chang, & Tian, 2019). The Melissa blue butterfly, Lycaeides melissa, is specialized on larval host plants in the pea family (Fabaceae), primarily in the genera Astragalus and Lupinus. Within the last 200 years, L. melissa has colonized introduced alfalfa, Medicago sativa (Fabaceae), at least twice and probably multiple times (Chaturvedi et al., 2018), forming a heterogeneous patchwork of association throughout the range of the butterfly in western North America, often with naturalized or weedy patches of M. sativa. In general, M. sativa is a suboptimal host plant for L. melissa: individuals that feed on the plant have reduced survival and performance relative to individuals feeding on native hosts (Forister, Nice, Fordyce, & Gompert, 2009). M. sativa‐associated populations do, however, show evidence for a slight increase in the ability to develop on the novel resource relative to populations that remain on native plants (Gompert et al., 2015). Additional evolutionary change in populations associated with the novel host is evidenced by reduced female oviposition preference for native hosts (Forister et al., 2012) and reduced caterpillar performance on native hosts (relative to populations that have not shifted to the exotic) (Gompert et al., 2015). The genetic architecture of host use in this system is known to be polygenic and characterized by loci with conditionally neutral (host‐specific) effects and ongoing local adaptation (Gompert et al., 2015). What is needed next is an understanding of which plant traits most affect L. melissa fitness. Previous work has suggested that phytochemical variation among host populations is biologically significant for caterpillars eating M. sativa (Harrison et al., 2016), but the magnitude of these effects and the salient compounds are unclear. Moreover, caterpillars do not encounter compounds in isolation, but in combinations of covarying molecules, and it is unknown how variation among hosts in phytochemical mixtures affects herbivore evolution. For example, will the trajectory of further local adaptation by L. melissa to M. sativa be a matter of evolving the ability to detoxify one or a large number of compounds? A better understanding of how key compounds covary among individual plants could also shed light on the potential for evolutionary response of the plant to herbivores in its introduced North American range. Here we use a common garden approach and caterpillars individually reared in a controlled environment to address these questions while describing the effects of metabolomic variation in M. sativa on L. melissa.

METHODS

Plants and caterpillars

Plants used in this project were grown at the University of Nevada, Reno, Main Station experimental farm. The common garden was planted in 2016 with seeds collected the previous year from 45 plants (previously studied by Harrison et al. (2018)) growing in a fallow field in north‐western Nevada on the western edge of the Great Basin Desert. The focal butterfly, L. melissa, was present in the source field but has not colonized the university farm where experimental plants were grown. The 45 maternal plants each contributed 15 offspring to a randomized grid design in the common garden, irrigated with broadcast sprayers in 2016 and drip in 2017, without supplemental fertilization. A single plant was randomly selected from each maternal family for use in the rearing experiment reported here as a way to capture as much genetic and phenotypic variation as possible. On 17 and 18 July 2017, a total of 45 L. melissa females were collected from an alfalfa‐associated population near Verdi, NV, and confined to oviposition arenas (three females per arena, 500 ml plastic cups) with host plant leaves and mesh lids sprayed with Gatorade®, a sports drink with sugar, water, carbohydrates, salt, and other ingredients that has been used elsewhere for captive butterflies (Mattila & Otis, 2003). After 3 days, eggs were removed from leaves, pooled, and kept at room temperature until hatching, at which time caterpillars were placed individually in Petri dishes (100 × 25 mm) with leaves of a particular M. sativa individual (which became the only plant from which they were fed throughout the experiment). Ten caterpillars were assigned to each of the 45 experimental M. sativa plants (for a total of 450 independently reared caterpillars) and kept in a growth chamber set to 25ºC and a 12 hr light/12 hr dark cycle. Caterpillars were given new, undamaged leaves as needed, approximately every 2–3 days. From each caterpillar we recorded survival to adult, sex, date of eclosion (if successful), and adult weight to the nearest 0.01 mg on a Mettler Toledo XP26 microbalance. Adult weight is taken as a proxy for fitness in L. melissa (Forister et al., 2009).

Phytochemistry and plant traits

Metabolomic variation among individual plants was characterized with liquid chromatography–mass spectrometry (LC‐MS) (Jorge, Mata, & António, 2016) using leaves collected on a single day at the start of the rearing experiment (as described above, one plant was randomly selected from each of 45 maternal lines in a common garden). Leaves were taken haphazardly from four different stems, avoiding the youngest and oldest leaves, and combined in a single paper collection envelope; we also avoided damaged leaves, although plants were exposed to constant, low levels of natural herbivory from insect and small mammal herbivores before and during the experiment (thus the present study does not address plasticity of defense in response to herbivore attack). Vacuum‐dried, ground leaves (10 mg) were extracted in 2 ml of 70% aqueous ethanol, and injected into an Agilent 1,200 analytical high performance liquid chromatograph paired with an Agilent 6,230 Time‐of‐Flight mass spectrometer via an electrospray ionization source. Resulting chromatograms were analyzed using MassHunter Quantitative Analysis (v.B.06.00, Agilent, Santa Clara, CA), and major classes of compounds were identified using characteristic relative mass defects (Ekanayaka, Celiz, & Jones, 2015), as described further in Appendix 1. Leaf protein content was quantified with three replicates (~2 mg each) per plant using the Bicinchoninic acid assay (Pierce Biotechnology, Waltham, MA). Before grinding, five dried leaflets from each sample were weighed to the nearest 0.1 mg, scanned, and area was measured using ImageJ (v.1.52a); specific leaf area (SLA) was calculated as leaf area divided by dry mass. Finally, leaf toughness was measured on fresh material in the common garden, at the start of the experiment (mid‐July, when leaves were also sampled for chemistry and protein) and at the end of the experiment (mid‐August), from three leaves per plant at each date, with a penetrometer (Chatillon 516 Series) through the center of the middle leaflet, as in (Harrison et al., 2018); the three leaves were selected haphazardly, avoiding the oldest and youngest leaves. Leaf toughness (averaged across the three leaves per plant at each collection) was correlated between early and late in the season (r = 0.36), but we focus on the measurements taken at the first time point in subsequent analyses for consistency with samples taken at that time for metabolomics.

Overview of analyses of plant traits and caterpillar performance

Our analytical strategy to understand the association between phytochemical variation and caterpillar performance followed two complementary paths, one focusing on reducing the number of variables (through dimension reduction and feature selection) to produce relatively simple models, and the other on the estimation of effects of all individual compounds on caterpillars (without reducing the number of predictor variables). For the first path, involving dimension reduction, we utilized an approach developed for gene transcription studies that identifies groups or modules of correlated variables with hierarchical clustering (Langfelder & Horvath, 2008); after clustering, we reduced the number of independent variables by selecting among modules and other plant traits (specific leaf area, protein and leaf toughness) using lasso regression (Ogutu, Schulz‐Streeck, & Piepho, 2012). Lasso regression shrinks coefficients for less important variables to zero, and is thus useful for model selection, in contrast to ridge regression which constrains coefficients (providing stable estimates) while not excluding variables. Modules (and other plant traits) selected in the lasso regression step were subsequently analyzed in Bayesian linear models that are useful in this context because they allowed us to quantify our confidence in the sign of effects (positive or negative) as continuous probabilities (as opposed to relying on arbitrary significance cutoffs). For the second analytical path, we utilized ridge regression (Ogutu et al., 2012) to estimate effects for all compounds simultaneously, which allowed us to investigate the distribution of effects among compounds and classes of compounds. Both analytical paths incorporated cross‐validation during the lasso and ridge regressions (further details below in section 2.4.2), and as a means of evaluating the predictive success of the Bayesian models. We also used randomization tests to compare the performance of modules and individual compounds with randomly chosen suites of compounds.

Dimension reduction and feature selection

Clustering of phytochemical variables

We chose an approach (the first set of analyses mentioned above) that reduces the number of independent variables while allowing us to learn about the correlational structure of the data, specifically unsupervised hierarchical clustering as implemented in the blockwiseModules function of the WGCNA package (Langfelder & Horvath, 2008) in R (R Core Development Team, 2016). Among the options in the pipeline, we used positive correlations among variables (“signed” network type), merge cut height at 0.25, and correlations raised to the power of five (which is where the scale free topology index reached a plateau). Through experimentation, we found that our results with LC‐MS data were robust to variation in these choices, including the choice of signed or unsigned networks. After an initial round of clustering, we took a remaining 19 unassigned compounds and put them through a second round of clustering (although the majority of consequential compounds were identified in the first round). One output of the WGCNA procedure is the first eigenvector from each cluster of compounds, which reduced our number of predictor variables by a factor of 10.

Lasso regression and Bayesian models

The resulting eigenvectors plus protein, SLA (specific leaf area) and leaf toughness were then put through the feature reduction step of lasso regression (Ogutu et al., 2012), a penalized regression that allows beta coefficients to be constrained to zero (thus excluding variables). We used the cv.glmnet function of the glmnet package (Friedman, Hastie, Simon, & Tibshirani, 2016) with cross‐validation during error reduction set to leave out one plant (and associated caterpillars) at each iteration. The variables selected by the lasso were then put into a Bayesian linear model to estimate coefficients and associated credible intervals using JAGS (version 3.2.0) run in R with the rjags package (Plummer, 2003). Two Markov chains were run for 10,000 steps for each analysis (no burn in was required) and chain performance was assessed by plotting chain histories, and calculating the Gelman and Rubin convergence diagnostic and effective sample sizes (Brooks & Gelman, 1998; Gelman & Rubin, 1992). For all models, minimally influential priors for the regression coefficients were modeled as a normal distribution with a mean of zero and variance of 100 (variance = 1/precision). We quantified our confidence in the sign of coefficients (positive or negative) as the fraction of the posterior samples that were less than zero (for coefficients with a median negative value) or greater than zero (for coefficients with a median positive value). All analyses were done using the R statistical language (R Core Development Team, 2016) on scaled (z‐transformed) predictor variables, and both the lasso and Bayesian models used binomial (for survival), Poisson (for development time), and Gaussian (for adult weight) errors. The latter two analyses (development time and adult weight) included sex as a factor. The analysis of development time also included adult weight as a covariate; while (reciprocally) the analysis of adult weight included development time as a predictor. These variables are negatively correlated (r = −0.52), and they function as useful covariates of each other, allowing us to investigate the possibility of unique plant effects on weight gain and development time, which could not be discovered if, for example, these variables were combined into a single performance index.

Cross‐validation and resampling to judge model performance

The success of models developed with the dimension reduction and feature selection pipeline was judged in two ways. We used a cross‐validation procedure in which we left out one plant (and associated caterpillars) in each iteration of the Bayesian model and then used the estimated coefficients (for phytochemical variables and other plant traits) to predict the performance of the unobserved caterpillars. After 45 iterations (one for each plant), we calculated a simple correlation coefficient between the observed and predicted performance of caterpillars across plants. In addition, we repeatedly resampled the original LC‐MS data to match the structure of the reduced set of predictor variables to ask to what extent randomly assembled modules could outperform the empirically derived modules (in other words, if a model contained two modules with 15 and 20 compounds, simulated predictors would include modules based on 15 and 20 randomly selected compounds).

Individual compound effects

The second path of our two‐part analytical strategy involved simultaneous estimation of the effects of all individual chemical compounds on caterpillar survival, development time, and adult weight. For this approach, we again used penalized regression (in the glmnet package; Friedman et al., 2016), but this time with ridge regression (instead of lasso) which constrains beta coefficients to avoid variance inflation but does not eliminate variables. As with the analyses above, ridge regression was done using the error structures appropriate to the specific response variables, and included additional covariates where possible (in models of development time and adult weight). The resulting coefficients associated with all individual compounds were examined as a second perspective on the modules examined in the first set of analyses, and were used to ask to what extent individual compound effects could be predicted by the degree to which they vary among individual plants as quantified with the simple coefficient of variation. To assess confidence in the results of ridge regressions, we used a bootstrap approach, repeatedly resampling the data and estimating coefficients 1,000 times, noting the compounds whose bootstrap confidence intervals did or did not overlap zero (Delaney & Chatterjee, 1986). We also allowed for the discovery of interactions among compounds using penalized regression on all individual compounds and all pairwise interactions between compounds. For ease of interpretation, this final analysis of potential interactions used lasso (not ridge) regression, letting the coefficients for many of the individual compounds and pairwise interactions go to zero.

RESULTS

Of the 450 caterpillars that started the experiment, 261 were reared to eclosion as adults (a mortality rate similar to previous work with this system; Gompert et al., 2015) on leaves from 45 individual alfalfa plants that were characterized for protein, leaf toughness, specific leaf area, and 163 individual metabolomic features (see Figure 1 for variation among plants in caterpillar performance and a subset of plant traits, and Table S1 for a list of compounds). Hierarchical clustering identified 14 subsets (or modules) of compounds with generally low correlations among modules and high correlations within modules (see Figures S1 and S2 for correlations within and among modules, and Figure S3 for module variation among plants). The correlational structure of the phytochemical data is illustrated as an adjacency network in Figure 2 (and in Figure S4 colored by compound class instead of module), where it can be seen that some modules (e.g., modules 1 and 2) contain a great diversity of compound types, while other modules are made up of more narrow classes (e.g., modules 7 and 8 are mostly saponins, a class of defensive metabolites; Levin, 1976). From the 14 eigenvectors summarizing variation in the modules, as well as the other plant traits, lasso regression (Ogutu et al., 2012) produced a reduced set of potential predictors which were then used in Bayesian multiple regression models that included between six and seven independent variables (Table 1). The models had reasonably high performance in leave‐one‐out cross‐validation: correlations between the observed and predicted values were between 0.50 and 0.59 (Table 1), and thus model predictions explained between 25% and 35% of the observed variation in caterpillar performance. Resampling analyses were similarly successful (Figure S5), with only a small fraction (never more than 3%) of randomly generated models exceeding the variance explained by the models reported in Table 1.

Figure 1

Figure 2

Illustration of correlational structure among compounds: each node in the network is a compound, and compounds are linked by a line if they are correlated among individual plants at 0.5 or above (links among compounds in modules 12–14 represent weaker correlations, greater than 0.1; see main text for details). Nodes are clustered in space for ease of visualization, but relative distances among nodes (and the relative lengths of lines) convey no additional information. Two letter codes within nodes indicate compound classes, as explained in the legend. Colors of nodes correspond to membership in modules as determined by hierarchical cluster analysis; the color key to the 14 modules is shown in the lower left (also see Figure S4 where nodes are colored by compound class). Not shown are a small number of compounds with weak connections to all other compounds, including two compounds that were not included in any module (shown as module zero in Figure 3)

Table 1

	Survival coefficient (CI; prob.)	Development time coefficient (CI, prob.)	Weight coefficient (CI, prob.)
m2	0.14 (−2.06, 0.48; .80)	−0.01 (−0.04, 0.02; .77)
m3	−0.40 (−0.67, −0.14; >.99)		−0.44 (−0.84, −0.03; .98)
m4			0.29 (−0.14, 0.70; .91)
m6		−0.01 (−0.03, 0.01; .80)
m9	−0.30 (−0.63, 0.03; .96)	0.01 (−0.02, 0.04; .79)
m10	0.35 (0.08, 0.62; >.99)		0.40 (−0.01, 0.82; .98)
m11	0.36 (0.14, 0.58; >.99)
m13	−0.18 (−0.43, 0.06; .93)
m14		0.01 (−0.02, 0.03; .71)
SLA	−0.32 (−0.56, −0.08; >.99)		−0.35 (−0.72, 0.012; .97)
Sex	/	0.06 (0.02, 0.10; >.99)	1.12 (0.40, 1.84; >.99)
mg	/	−0.03 (−0.05, −0.01; >.99)	/
Days	/	/	−1.41 (−1.76, −1.05; >.99)
Intercept	0.34 (0.14, 0.54; >.99)	3.48 (3.45, 3.52; >.99)	10.36 (9.81, 10.91; >.99)
Validation	0.59	0.59	0.50

For each regression coefficient, numbers in parentheses are 95% credible intervals (CI, the first two numbers) and the probability that the coefficient has the estimated sign (e.g., 0.80 for the m2 survival coefficient of 0.14 indicates a 80% probability that the m2 module has a positive effect on survival). Note that negative coefficients for development time indicate faster caterpillar development (fewer days) associated with variation in a particular module. Modules (listed in the left column) are only shown if they were included in one of the three regressions following feature selection using the lasso regression (see main text for additional details). Empty spaces in the Table appear if a particular module was selected through the lasso regression for one or two analyses but not all three (m3, for example, was not selected by lasso regression for development time). Slash marks (/) indicate variables not considered for a particular analysis (e.g., sex, adult weight [mg], and development time [days] were not possible for the survival analysis because they are not observed on dead individuals). Values for “validation” shown in the last row are the correlation between observed and predicted values in cross‐validation (Figure S4).

Variation among plants in caterpillar survival (a), development time (b) adult weight (c), three individual compounds (d–e), and two external plant traits, specific leaf area (g) and leaf toughness (h). The three example compounds shown here (out of the 163 assayed) were among the top five most influential compounds for survival, development time, and adult weight: cpd. 9 is an alkaloid with a negative association with survival, cpd. 94 (a peptide) has a negative association with development time, and cpd. 160 is a phospholipid with a negative association with adult weight. Individual plants in all panels are organized from left to right by decreasing caterpillar survival in the top panel (a). Standard errors are shown for panels b, c, g, and h. The units for d–e are compound relative abundance per dry weight of sample; the units for specific leaf area are cm2/mg, and grams/newton for leaf toughness Illustration of correlational structure among compounds: each node in the network is a compound, and compounds are linked by a line if they are correlated among individual plants at 0.5 or above (links among compounds in modules 12–14 represent weaker correlations, greater than 0.1; see main text for details). Nodes are clustered in space for ease of visualization, but relative distances among nodes (and the relative lengths of lines) convey no additional information. Two letter codes within nodes indicate compound classes, as explained in the legend. Colors of nodes correspond to membership in modules as determined by hierarchical cluster analysis; the color key to the 14 modules is shown in the lower left (also see Figure S4 where nodes are colored by compound class). Not shown are a small number of compounds with weak connections to all other compounds, including two compounds that were not included in any module (shown as module zero in Figure 3)

Figure 3

Results from Bayesian regressions of module eigenvectors and covariates predicting caterpillar survival, development time, and adult weight (as binomial, Poisson, and Gaussian regressions, respectively, with corresponding units in log‐odds, log number of days, and milligrams) For each regression coefficient, numbers in parentheses are 95% credible intervals (CI, the first two numbers) and the probability that the coefficient has the estimated sign (e.g., 0.80 for the m2 survival coefficient of 0.14 indicates a 80% probability that the m2 module has a positive effect on survival). Note that negative coefficients for development time indicate faster caterpillar development (fewer days) associated with variation in a particular module. Modules (listed in the left column) are only shown if they were included in one of the three regressions following feature selection using the lasso regression (see main text for additional details). Empty spaces in the Table appear if a particular module was selected through the lasso regression for one or two analyses but not all three (m3, for example, was not selected by lasso regression for development time). Slash marks (/) indicate variables not considered for a particular analysis (e.g., sex, adult weight [mg], and development time [days] were not possible for the survival analysis because they are not observed on dead individuals). Values for “validation” shown in the last row are the correlation between observed and predicted values in cross‐validation (Figure S4). Variation among plants in the suites of covarying compounds had large effects on the caterpillar performance: for example, the beta coefficient of −0.40 (on the log‐odds scale) associated with module 3 corresponds to a 0.10 reduction in the probability of survival (relative to average) associated with a one unit change in that phytochemical module (Table 1; note that in Table 1 and elsewhere negative coefficients for development time are associated with fewer days, and thus can be thought of as potentially beneficial effects, in contrast to negative coefficients for survival and weight that are detrimental to caterpillars). The phytochemical predictor variables are eigenvectors from clustering analysis, and thus are not entirely straightforward to interpret, especially when the clustering analysis was itself based on z‐transformed data. It is important to note that our LC‐MS data (used in clustering analysis) consists of peak areas divided by the peak of an internal standard, and again divided by the dry weight of the sample (thus, in total, referred to as "relative abundance per dry weight"; see Appendix 1 for additional details including choice of standard). Variation in these numbers reflects variation in concentrations within compounds (among plants), but care should be used in comparing among compounds because of different ionization responses relative to the standard (thus the use of z‐transformation for among‐compound analyses). Nevertheless, the effects reported in Table 1 reflect real variation in suites of compounds, as can be seen in correlations between the eigenvectors and individual compounds in Figure S2, and in variation among plants in average z‐scores in Figure S3. In some cases, modules included in the multiple regression models had common effects across response variables (e.g., the positive association of module 10 with both survival and adult weight or the negative association of module 3 also with survival and weight), while other modules had more specific effects on a single response (e.g., modules 11 and 13 on survival). SLA had a negative association with survival and adult weight, with the coefficients for SLA (−0.32 for survival and −0.35 for weight) being of similar magnitude to some of the phytochemical effects. Neither leaf toughness nor protein had sufficiently strong associations with any of our caterpillar response variables to pass the initial filter of the cross‐validated lasso regressions. Module‐based analyses (as in Table 1) focused on feature reduction with lasso regression; as a complementary analytical approach, we also used ridge regression (Ogutu et al., 2012) on all of the individual compounds (ridge regression estimates effects of compounds without excluding variables as in lasso regression). Analyses of individual compounds by ridge regression (Figure 3) were broadly consistent with the strongest module‐specific effects, as can be seen, for example, with module 10 having positive effects on survival and adult weight in module analyses (Table 1) and in compound‐specific analyses (Figure 3). Similarly, the individual compounds in module 3 had negative compound‐specific effects on survival (Figure 3), and that module had the strongest negative effect on survival in the eigenvector‐based analyses in Table 1. Not surprisingly, the larger modules (with a greater number of covarying compounds, including many primary metabolites) tended to have a more complex mix of positive and negative effects (for example, modules 1 and 2, Figure 3). For ease of interpretation, the coefficients from compound‐specific regressions of survival and development time (in Figures 3 and 4) have been back‐transformed to be on the scales of probability and days (respectively), and displayed as changes relative to intercepts. For example, a compound with a relatively large effect on survival in Figure 3 could be associated with a 0.005 reduction in the probability of survival relative to average survival and while holding other compounds constant.

Figure 4

Violin plots of compound‐specific effects (coefficients from ridge regressions) summarized by chemical classes. Sample sizes for each category are shown above the top panel ("Other" includes one sugar, two pigments, and two halogenated compounds). Categories are arranged from left to right based on the gradient of median positive to negative effects on survival. Coefficients for survival (a) and development time (b) have been back‐transformed from the units of log‐odds and log to probability and days to pupation, respectively, and shown as deviations from the mean or intercept value (as in Figure 3). Note that negative effects for development time correspond to fewer days (more rapid development). Violin plots show medians (black dots) and interquartile ranges (boxes); vertical lines are upper and lower fences (the third quartile + 1.5 * the interquartile range, and the lower quartile −1.5 * the interquartile range, respectively) surrounded by kernel density envelopes

Effects of individual compounds on survival, development time, and adult weight, as estimated by ridge regression (using binomial, Poisson, and Gaussian models, respectively). The strength of effect for each compound is indicated by the horizontal extent of each bar, and compounds are grouped by modules (m1, m2, etc.); the order of compounds along the vertical axis is arbitrary within modules and fixed across columns. Orange colors indicate negative effects on survival, development, and weight, while blue colors are positive effects (note that negative effects for development time correspond to fewer days, or more rapid development). The darker shades of orange and blue mark coefficients whose 95% confidence intervals did not overlap zero in 1,000 bootstrap samples. Values for survival and development time have been back‐transformed from units on the log‐odds and log scales to units of probability and days to pupation, and are shown as changes from the mean or intercept values. For example, a negative (orange) survival coefficient of 0.005 means a reduction of that amount from the average probability of survival associated with variation in a particular compound. The 15 compounds with the largest coefficients (by absolute value) and bootstrap intervals not overlapping zero are labeled by compound classes (see Figure 2 for abbreviations) in each panel. Structural annotations are shown to the right for six compounds based on matches from the METLIN metabolomics database, as follows by compound number: 154 (unidentified sterol); 9 (unidentified alkaloid); 60 (soyasaponin A3); 40 (unidentified saponin); 46 (medicagenic acid 3‐O‐triglucoside); 45 (medinoside E). Those same compounds are identified in parentheses in the main panels next to bars corresponding to their individual effects Violin plots of compound‐specific effects (coefficients from ridge regressions) summarized by chemical classes. Sample sizes for each category are shown above the top panel ("Other" includes one sugar, two pigments, and two halogenated compounds). Categories are arranged from left to right based on the gradient of median positive to negative effects on survival. Coefficients for survival (a) and development time (b) have been back‐transformed from the units of log‐odds and log to probability and days to pupation, respectively, and shown as deviations from the mean or intercept value (as in Figure 3). Note that negative effects for development time correspond to fewer days (more rapid development). Violin plots show medians (black dots) and interquartile ranges (boxes); vertical lines are upper and lower fences (the third quartile + 1.5 * the interquartile range, and the lower quartile −1.5 * the interquartile range, respectively) surrounded by kernel density envelopes We saw some variation among classes of compounds in their effects on caterpillars (Figure 4). All classes included positive and negative effects, with saponins, alkaloids, and phenolic glycosides including some of the stronger negative effects of individual compounds, while lipids and sterols tended toward positive associations with survival and development (Figure 4). We also considered potential pairwise interactions among individual compounds, and found few interactions that passed the filter of the penalized regression (Table S2), at least relative to the large number of potential interactions. Saponins and alkaloids tended to be overrepresented in the interactions that were detected, and phenolic glycosides were involved in stronger negative interactions relative to other compounds (Figure S6). We did not find evidence that more or less variable compounds (among individual plants) had differential effects on caterpillars (Figure S7).

DISCUSSION

The results reported here represent a dissection of the phytochemical landscape facing a specialized insect herbivore attacking a recently colonized host plant (Hunter, 2016). The phytochemical landscape is both physical, referring to variation in compounds among individual plants in a common garden (Figure S3), and hypothetical to the extent that effects of individual compounds on caterpillars are estimated, although compounds are, of course, not encountered in isolation. Our exploration of the phytochemical landscape facing L. melissa on M. sativa is necessarily a first draft based on a single point in time. Despite the snapshot nature of our study, models including suites of covarying metabolites and other plant traits had predictive success for caterpillar performance and suggested different natural products affecting survival, development time and adult weight. Previous work with M. sativa and other insect herbivores has focused on saponins (Levin, 1976), and a simple outcome from our study could have been that one or a small number of saponins have anti‐herbivore properties that reduce fitness of our focal insect. Instead, we find large numbers of compounds with potentially consequential effects on caterpillars (Figure 3), and which were in some cases of similar magnitude or greater than the effects of morphological features, including leaf toughness and SLA (Carmona, Lajeunesse, & Johnson, 2011). We find that prominent classes of specialized metabolites in our focal plants, such as saponins and peptides, include compounds with both positive and negative effects on survival and development of caterpillars. Positive effects of these compounds are potentially associated with feeding stimulation, as has been observed (along with other positive effects) for other specialist herbivores and plant toxins (Seigler & Price, 1976; Smilanich, Fincher, & Dyer, 2016). Negative effects of saponins on insects potentially include disruption of hormone production (Chaieb, 2010), although exact modes of action on L. melissa will await further study. Although many of the compounds with strong effects are specialized metabolites (including alkaloids and phenolic glycosides, as well as saponins and peptides), we also find both positive and negative effects on caterpillar performance associated with primary metabolites (Figure 4), especially phospholipids (Figure 2). These could be direct effects if a compound is suboptimal for development, or they could be associated with nutritional imbalance (Behmer, 2009), such that too much of one nutrient makes it difficult for caterpillars to consume a balanced diet. It has been suggested that the presentation of unbalanced nutrition can be a kind of anti‐herbivore strategy (Berenbaum, 1995), although this has not been studied in the L. melissa‐M. sativa interaction. The finding that our specialist herbivore is affected by a wide range of metabolites that vary greatly even within a single host population has implications for our understanding of heterogeneity in the system, and also for local adaptation of the herbivore to the novel host. Lycaeides melissa typically colonizes weedy or feral patches of M. sativa on roadsides or integrated into natural communities, and previous work has documented dramatic variation among individual alfalfa locations (often in close proximity) in the extent to which they can support caterpillar development (Harrison et al., 2016). Previous phytochemical data with a lower resolution was less successful in explaining that variation (Harrison et al., 2016), but the results reported here suggest that among‐patch variation could be explained by future studies using detailed metabolomic data. The within‐population complexity described in the current study also raises the possibility that the novel host presents a multi‐faceted and potentially ever‐shifting target from the perspective of evolving butterfly populations (Chaturvedi et al., 2018; Gompert et al., 2015; Harrison et al., 2016). In particular, it is possible that M. sativa defense against a specialist herbivore might be realized through different combinations (within and among populations) of individually acting compounds, thus making it less likely that butterflies in any one population possess an effective suite of alleles that improve fitness on M. sativa. The correlational structure of the phytochemical variation that we observed has implications for the evolution of plant defense and the accumulation of insect herbivores on M. sativa. Specifically, correlations among modules (which are themselves composed of a diversity of compound types) should make it possible to hypothesize directions of least resistance for defense evolution. Module 3, including an alkaloid with a prominent effect on caterpillars but also phospholipids and saponins, had a negative effect on survival (Table 1, Figure 3). Module 3 negatively covaried with module 2, which was itself positively associated with caterpillar survival (including a peptide of large effect but many other compound types as well). Thus an increase in module 3 and an associated decrease in 2 would be beneficial for the plant, at least with respect to herbivory by our focal herbivore. Predicting evolutionary response by M. sativa would of course depend on a genetic understanding of the relevant plant traits, which the present study does not include. However, a recent study of M. melissa performance on a related plant, Medicago truncatula, found that genetic variation in the plant explained a substantial proportion of phenotypic variation (between 8% and 57%) in phytochemical and structural traits but also in caterpillar performance (Gompert et al., 2019). Of course, most plants do not have the luxury of optimizing defense against a single herbivore, and it is easy to imagine that improvements in defense against one enemy could lead to increased attraction to another (Salazar et al., 2018), especially given the diversity of effects even within major classes studied here, including saponins and phenolic glycosides. Compounds in the latter class (phenolics) were found to have strong positive and negative effects on assemblages of arthropods associated with the maternal plants from which seeds were collected to start the common garden used in the present study (Harrison et al., 2018). The results reported here raise a number of avenues for future exploration, including the apparent overrepresentation of both saponins and alkaloids in interactions with other compounds (Figure S6). Relative mass defect (RMD) is a useful tool for the categorization of compounds (Table S1), but it has limitations in complex mixtures; we are developing methods that use other data from high‐resolution mass spectrometry to further refine categorization of Medicago metabolites (Philbin & Forister, n.d.). Also, in the present study, we have not attempted to separate constitutive and induced defenses (Jansen et al., 2009) as the plants in the common garden were exposed to natural and continuous levels of herbivory. We also acknowledge that feeding under laboratory conditions is of course not natural, although we found in a previous study that genetic variants (in caterpillars) associated with success in laboratory feeding trials were at least partially predictive of genetic variation associated with alfalfa use by L. melissa in the wild (Chaturvedi et al., 2018). Thus it is clear that metabolomic data, such as those analyzed here, have the potential to both open up new avenues of conceptual development in plant‐insect interactions and to link micro‐evolutionary trajectories across hosts and herbivores.

CONFLICT OF INTEREST

The authors declare no competing interests.

AUTHOR CONTRIBUTIONS

MLF: designed experiment, conducted analyses, wrote first draft. SY: conducted experiment and contributed to experimental design. CSP, CDD, BH: generated and interpreted phytochemistry and protein data. MLF, JGH, OS: developed and maintained common garden. JAF, ZHM, CCN, LAR: contributed to analyses and experimental design. CAB, JAF, ZG, CCN: contributed to experimental design. All authors: contributed to writing. Figures S1‐S8 and Tables S1‐S3 Click here for additional data file.

26 in total

1. The role of polyphenols in terrestrial ecosystem nutrient cycling.

Authors:
Journal: Trends Ecol Evol Date: 2000-06 Impact factor: 17.712

2. Specificity, rank preference, and the colonization of a non-native host plant by the Melissa blue butterfly.

Authors: M L Forister; C F Scholl; J P Jahner; J S Wilson; J A Fordyce; Z Gompert; D R Narala; C Alex Buerkle; C C Nice
Journal: Oecologia Date: 2012-10-04 Impact factor: 3.225

Review 3. Toxic cardenolides: chemical ecology and coevolution of specialized plant-herbivore interactions.

Authors: Anurag A Agrawal; Georg Petschenka; Robin A Bingham; Marjorie G Weber; Sergio Rasmann
Journal: New Phytol Date: 2012-01-31 Impact factor: 10.151

4. METLIN: a metabolite mass spectral database.

Authors: Colin A Smith; Grace O'Maille; Elizabeth J Want; Chuan Qin; Sunia A Trauger; Theodore R Brandon; Darlene E Custodio; Ruben Abagyan; Gary Siuzdak
Journal: Ther Drug Monit Date: 2005-12 Impact factor: 3.681

5. Synergistic effects of amides from two piper species on generalist and specialist herbivores.

Authors: Lora A Richards; Lee A Dyer; Angela M Smilanich; Craig D Dodson
Journal: J Chem Ecol Date: 2010-08-31 Impact factor: 2.626

6. The predictability of genomic changes underlying a recent host shift in Melissa blue butterflies.

Authors: Samridhi Chaturvedi; Lauren K Lucas; Chris C Nice; James A Fordyce; Matthew L Forister; Zachariah Gompert
Journal: Mol Ecol Date: 2018-05-03 Impact factor: 6.185

7. Antagonistic, stage-specific selection on defensive chemical sequestration in a toxic butterfly.

Authors: James A Fordyce; Chris C Nice
Journal: Evolution Date: 2008-07 Impact factor: 3.694

8. Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions.

Authors: Joseph O Ogutu; Torben Schulz-Streeck; Hans-Peter Piepho
Journal: BMC Proc Date: 2012-05-21

9. WGCNA: an R package for weighted correlation network analysis.

Authors: Peter Langfelder; Steve Horvath
Journal: BMC Bioinformatics Date: 2008-12-29 Impact factor: 3.169

10. The Many Dimensions of Diet Breadth: Phytochemical, Genetic, Behavioral, and Physiological Perspectives on the Interaction between a Native Herbivore and an Exotic Host.

Authors: Joshua G Harrison; Zachariah Gompert; James A Fordyce; C Alex Buerkle; Rachel Grinstead; Joshua P Jahner; Scott Mikel; Christopher C Nice; Aldrin Santamaria; Matthew L Forister
Journal: PLoS One Date: 2016-02-02 Impact factor: 3.240

4 in total

1. Additive genetic effects in interacting species jointly determine the outcome of caterpillar herbivory.

Authors: Zachariah Gompert; Tara Saley; Casey Philbin; Su'ad A Yoon; Eva Perry; Michelle E Sneck; Joshua G Harrison; C Alex Buerkle; James A Fordyce; Chris C Nice; Craig D Dodson; Sarah L Lebeis; Lauren K Lucas; Matthew L Forister
Journal: Proc Natl Acad Sci U S A Date: 2022-08-29 Impact factor: 12.779

2. Predicting patch occupancy reveals the complexity of host range expansion.

Authors: M L Forister; C S Philbin; Z H Marion; C A Buerkle; C D Dodson; J A Fordyce; G W Forister; S L Lebeis; L K Lucas; C C Nice; Z Gompert
Journal: Sci Adv Date: 2020-11-27 Impact factor: 14.136

3. A continental-scale survey of Wolbachia infections in blue butterflies reveals evidence of interspecific transfer and invasion dynamics.

Authors: Vivaswat Shastry; Katherine L Bell; C Alex Buerkle; James A Fordyce; Matthew L Forister; Zachariah Gompert; Sarah L Lebeis; Lauren K Lucas; Zach H Marion; Chris C Nice
Journal: G3 (Bethesda) Date: 2022-09-30 Impact factor: 3.542

4. Opposing Effects of Ceanothus velutinus Phytochemistry on Herbivore Communities at Multiple Scales.

Authors: Casey S Philbin; Matthew Paulsen; Lora A Richards
Journal: Metabolites Date: 2021-06-07

4 in total