| Literature DB >> 28575451 |
Jian-Rong Yang1, Calum J Maclean1, Chungoo Park1, Huabin Zhao1, Jianzhi Zhang1.
Abstract
It is commonly, although not universally, accepted that most intra and interspecific genome sequence variations are more or less neutral, whereas a large fraction of organism-level phenotypic variations are adaptive. Gene expression levels are molecular phenotypes that bridge the gap between genotypes and corresponding organism-level phenotypes. Yet, it is unknown whether natural variations in gene expression levels are mostly neutral or adaptive. Here we address this fundamental question by genome-wide profiling and comparison of gene expression levels in nine yeast strains belonging to three closely related Saccharomyces species and originating from five different ecological environments. We find that the transcriptome-based clustering of the nine strains approximates the genome sequence-based phylogeny irrespective of their ecological environments. Remarkably, only ∼0.5% of genes exhibit similar expression levels among strains from a common ecological environment, no greater than that among strains with comparable phylogenetic relationships but different environments. These and other observations strongly suggest that most intra and interspecific variations in yeast gene expression levels result from the accumulation of random mutations rather than environmental adaptations. This finding has profound implications for understanding the driving force of gene expression evolution, genetic basis of phenotypic adaptation, and general role of stochasticity in evolution.Entities:
Keywords: Saccharomyces; adaptation; evolution; genetic drift; transcriptome
Mesh:
Year: 2017 PMID: 28575451 PMCID: PMC5850415 DOI: 10.1093/molbev/msx171
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
. 1.Phylogenetic trees of the nine Saccharomyces yeast strains constructed using genome sequence, gene expression, and morphology data, respectively. The three species are indicated by different colors, while the ecological environments where the strains were isolated are shown by different symbols. (A) The genome tree of the nine strains based on the alignment of the coding sequences of 4,325 genes. Bootstrap percentages estimated from 1,000 replications are shown on interior branches. Asterisks indicate > 99.5% bootstrap support. The scale bar shows 0.01 nucleotide substitutions per site. (B–D) The transcriptome tree of the nine strains based on standardized Euclidian distances in gene expression levels of all 4,325 genes (B), the 75% most highly expressed genes (C), and the 50% most highly expressed genes (D). Bootstrap percentages estimated from 10,000 replications are shown on interior branches. Asterisks indicate > 99.5% bootstrap support. The scale bar shows 0.1 unit of the standardized Euclidian distance per gene. (E) The morphology tree of nine strains based on standardized Euclidian distances in 219 morphological traits. Strains IFO1804, RM11, CLIB219, Y12, and YPS163 are used as proxies of N44, BC187, DBVPG6040, Y9, and YPS606, respectively. Bootstrap percentages estimated from 10,000 replications are shown on interior branches. Asterisks indicate > 99.5% bootstrap support. The scale bar shows 0.1 unit of the standardized Euclidian distance per trait. (F) Frequency distributions of topological distances (dT) between the genome tree and random tree topologies (grey), bootstrapped transcriptome trees with all genes (brown), bootstrapped transcriptome trees with the 75% most highly expressed genes (dark purple), bootstrapped transcriptome trees with the 50% most highly expressed genes (light purple), and bootstrapped morphology trees (blue), respectively. Each distribution is based on 10,000 random trees or bootstrapped trees. Arrows indicate the observed dT between the genome tree and various other trees based on the original (rather than bootstrapped) data. P value shows the probability with which the dT between the genome tree and a random tree topology is equal to or smaller than the observed dT between the genome tree and the tree being compared. (G) Frequency distributions of topological distances (dT) between the potential environment tree and bootstrapped genome trees (yellow), bootstrapped transcriptome trees with all genes (brown), bootstrapped transcriptome trees with the 75% most highly expressed genes (dark purple), bootstrapped transcriptome trees with the 50% most highly expressed genes (light purple), gene expression trees based on 533 individual GO categories (dark green), and bootstrapped morphology trees (blue), respectively, as well as frequency distributions of dT between three control environment trees and 533 GO-based gene expression trees, respectively (light green). The dT between a tree and the potential environment tree is defined by the minimal topological distance between the tree and any tree containing a monophyly of the five wild strains. In the three control environment trees, one or both S. cerevisiae wild strains in the aforementioned monophyly are swapped with their sister strains in the genome tree. Each distribution except for the bootstrapped genome trees (1,000 replications) and GO-based trees (533 GO categories) is derived from 10,000 bootstrapped trees. Arrows indicate the observed dT between the potential environment tree and various other trees based on the original (rather than bootstrapped) data. The P value is from a Z-test of the null hypothesis that the mean dT between 10,000 bootstrapped morphology trees and the potential environment tree is equal to or larger than that between the bootstrapped trees being compared and the potential environment tree.
. 2.Principal component analysis of the (A) genome sequences, (B) gene expression levels, and (C) morphological data of the nine yeast strains. The three species are indicated by different colors, while the ecological environments where the strains were isolated are shown by different symbols. Percentage variance explained by a principal component is indicated in the parentheses. In panel A, the inset shows an enlarged view of the boxed area.
. 3.Little evidence for environmental adaptation from the expression levels of individual genes. (A) Twenty-two genes whose expression levels support a monophyly of the five wild strains. Expression levels of the nine strains for each gene have been scaled to a standard normal distribution for comparison. The three species are indicated by different colors, while the ecological environments where the strains were isolated are shown by different symbols. (B) Number of genes for which the expression tree supports the monophyly of each of the 126 possible five-strain sets. The red dot shows the five-strain set composed of the five wild strains (“Five wild strains”), the three blue symbols show the three control sets in which one (“YPS606←→Y9” and “DBVPG1788←→BC187”) or both (“Swap both”) of the two wild S. cerevisiae strains in the “Five wild strains” set are swapped with their sister nonwild strains, the green circles show all other five-strain sets that include the three non-S. cerevisiae strains (“2 nonwild S.c. + 3 non-S.c.”), and the grey dots show all other five-strain sets (“All others”). (C) Number of genes whose expression tree supports the monophyly of each of the 84 possible six-strain sets. The red dot shows the six-strain set composed of the six S. cerevisiae strains (“Six S.c. strains”), while the grey dots show all other six-strain sets (“All others”). (D) Same as panel B except that coding sequence data instead of gene expression data are used. (E) Same as panel C except that coding sequence data instead of gene expression data are used. (F) Same as panel B except that morphological data instead of gene expression data are used. (G) Same as panel C except that morphological data instead of gene expression data are used.
. 4.Gene expression variances among wild strains and among all strains. (A) Frequency distribution of the logarithm of the ratio between the mean expression difference between wild strains and that between wild and nonwild strains (“Wild”, black bars). As controls, the same quantity is plotted when one (“YPS606←→Y9” and “DBVPG1788←→BC187”) or both (“Swap both”) of the wild S. cerevisiae strains are swapped with their sister nonwild strains in the calculation. The P value from Mann–Whitney U test, measuring the probability that the median value of the observed distribution (black) is equal to or greater than that of a control distribution, is indicated with the same color as the control distribution. (B) Frequency distribution of the logarithm of the ratio between the variance in expression level among the wild strains and that among all strains (“Wild”, black bars). As controls, the same quantity is plotted when one (“YPS606←→Y9” and “DBVPG1788←→BC187”) or both (“Swap both”) of the wild S. cerevisiae strains are swapped with their sister nonwild strains in the calculation. The P value from Mann–Whitney U test, measuring the probability that the median value of the observed distribution is equal to or greater than that of a control distribution, is indicated with the same color as the control distribution. (C) Same as panel A except that morphological data instead of gene expression data are used. (D) Same as panel B except that morphological data instead of gene expression data are used.