| Literature DB >> 35961029 |
Emma Hine1, Daniel E Runcie2, Scott L Allen1, Yiguan Wang1,3, Stephen F Chenoweth1, Mark W Blows1, Katrina McGuigan1.
Abstract
The interaction of evolutionary processes to determine quantitative genetic variation has implications for contemporary and future phenotypic evolution, as well as for our ability to detect causal genetic variants. While theoretical studies have provided robust predictions to discriminate among competing models, empirical assessment of these has been limited. In particular, theory highlights the importance of pleiotropy in resolving observations of selection and mutation, but empirical investigations have typically been limited to few traits. Here, we applied high-dimensional Bayesian Sparse Factor Genetic modeling to gene expression datasets in 2 species, Drosophila melanogaster and Drosophila serrata, to explore the distributions of genetic variance across high-dimensional phenotypic space. Surprisingly, most of the heritable trait covariation was due to few lines (genotypes) with extreme [>3 interquartile ranges (IQR) from the median] values. Intriguingly, while genotypes extreme for a multivariate factor also tended to have a higher proportion of individual traits that were extreme, we also observed genotypes that were extreme for multivariate factors but not for any individual trait. We observed other consistent differences between heritable multivariate factors with outlier lines vs those factors without extreme values, including differences in gene functions. We use these observations to identify further data required to advance our understanding of the evolutionary dynamics and nature of standing genetic variation for quantitative traits.Entities:
Keywords: zzm321990 Drosophila melanogasterzzm321990 ; zzm321990 Drosophila serratazzm321990 ; House of Cards; gene expression; genetic covariance; mutation–selection balance; sparse factor analysis; standing genetic variance
Mesh:
Year: 2022 PMID: 35961029 PMCID: PMC9526065 DOI: 10.1093/genetics/iyac122
Source DB: PubMed Journal: Genetics ISSN: 0016-6731 Impact factor: 4.402
Fig. 1.Quantile–quantile plots of observed data vs simulated normally distributed data. For each dataset (D. serrata, left column and D. melanogaster, right column) each individual gene expression trait was centered and scaled to its own mean and standard deviation (SD scale, top row) or median and IQR (scale, bottom row). All 201,300 phenotypic (grey) or 101,550 genetic (black) values were then pooled and sorted. On both scales, the distributions of the middle 95% of values were in close agreement between the observed data (y-axis) and simulated normal data (x-axis). Horizontal (vertical) lines demarcate 2.5–97.5% of the observed (simulated) data; these quantiles were indistinguishable between phenotypic (solid grey lines) and genotypic (dotted black lines) values. Dashed grey lines demarcate ±3 units on either scale; on the SD scale this corresponds to a common threshold for identifying outliers, and on the IQR scale is the threshold used in the current study to identify extreme values.
Description of datasets and summary of estimated and derived parameters.
| Category | Term and description/relation to model |
|---|---|
| Types of data |
|
|
| |
|
| |
|
| |
| Estimated parameters |
|
|
| |
|
| |
|
| |
|
| |
|
| |
| Significance testing |
|
|
| |
|
| |
|
| |
|
| |
| Data interrogation |
|
|
| |
|
|
Prior distributions corresponding to the estimated parameters can be found in Supplementary Table 1.
Fig. 2.Distributions of the 5,727,420 pairwise genetic (line-mean) correlations of gene expression traits in observed and randomized data for D. serrata (left) and D. melanogaster (right). Genetic correlations for the observed (randomized) data correspond to white bars above (below) y = 0. The frequencies shown for the randomized datasets are averaged across the 100 datasets. Grey bars show the difference in frequency between the observed and randomized data [i.e. above (below) y = 0 indicate inflation (deflation) in the observed relative to the randomized]. Vertical lines indicate the quantiles 0.005, 0.025, 0.500, 0.975, and 0.995 for the observed (dotted black lines) and randomized (dashed grey lines) datasets.
Fig. 3.Observed trait distributions before and after adjusting for the predicted contribution of latent factors in D. serrata (left panels) and D. melanogaster (right panels). Top row: Distribution of the 203,100 (60 observations × 3,385 traits) IQR-scaled values in the observed data (above y = 0) and the adjusted data (defined in Table 1; below y = 0). Bottom row: Distribution of the 30 IQR-scaled line means for the 132 traits in D. serrata (3,960 observations) and the 228 traits in D. melanogaster (6,840 observations) that were associated with at least one outlier line (defined in Table 1; Supplementary Table 2). Counts of extreme values (top row) and outlier lines (bottom row) in the observed and adjusted data are shown within each panel. To facilitate the comparison of the tails of the distributions, the y-axis range for each panel has been truncated at the maximum count for values more than ±3 IQR from the median in each panel (i.e. values between −3 and 3 on the x-axis extend beyond the shown limit on the y-axis).
Fig. 4.Example heritable factors from the D. serrata dataset. Left hand column shows the distribution of estimated latent trait values of the 30 lines (points) on the IQR scale for each of the 2 replicates per line. The dashed line indicates a 1:1 relationship between replicate latent trait value IQR deviations. The corresponding trait loadings for these heritable factors are also illustrated (right column): black (grey) circles depict significant (nonsignificant) trait loadings. Traits are ordered by numerical identifiers that were arbitrarily assigned before analyses. Heritable factor 26 (top row) had no outlier lines, HF 19 (middle) had one outlier line (8) and HF 6 (bottom) had 2 outlier lines (23 and 29). Further details on these factors in Supplementary Fig. 1 and Supplementary Table 4.
Fig. 5.Prediction of observed trait outliers from latent trait values. For traits significantly influenced by a specific factor, for each line we calculated the relative frequency of outlier values (y-axis; note that scales differ between panels). This value was plotted against the magnitude of the latent trait value per line (x-axis), revealing a significant correlation (Spearman’s correlation statistics in bottom right of each panel). Plot symbols (numbers) indicate the number of significant trait loadings for the factor, in steps of 10, from “0” indicating a factor with <10 loadings through to a dot for >100 loadings (Supplementary Tables 4 and 5). Plot colors indicate the number of outlier lines for that factor (see figure for key). For example, a red “3” indicates a factor with 2 extreme line-mean values (red) and between 30 and 39 significant trait loadings (“3”). All lines with nonextreme latent trait values (<3 IQR from the median) are shown in grey, including all 30 lines for those factors with no outlier lines, and the 27–29 nonoutlier lines for factors with at least one outlier line (Supplementary Figs. 1 and 2).
Fig. 6.Comparison of characteristics of heritable factors with and without outliers. Bold line, box, and whiskers represent the median, 1.5 IQR, and 3 IQR, respectively. Values exceeding 3 IQR are indicated with an asterisk. We compared each characteristic between the 2 types of heritable factor (outliers absent or present) using the Wilcoxon Rank-Sum Test (results shown within each panel).
Fig. 7.Directionality of outlier heritable latent trait values and of the associated gene expression traits. Latent trait line means (on the IQR scale) for outlier lines are shown as whiskers at the top and bottom of each panel, corresponding to positive and negative deviations from the median, respectively. For the subset of observations associated with this set of latent factors and outlier lines, the reflected histograms show the (square root) frequency of IQR-scaled individual expression trait line means that deviate above or below the median (at y = 0). The total counts of positive and negative deviations are shown above and below the median, respectively, in each panel.