| Literature DB >> 29301966 |
Casey W Dunn1, Felipe Zapata2, Catriona Munro3, Stefan Siebert4, Andreas Hejnol5.
Abstract
There is considerable interest in comparing functional genomic data across species. One goal of such work is to provide an integrated understanding of genome and phenotype evolution. Most comparative functional genomic studies have relied on multiple pairwise comparisons between species, an approach that does not incorporate information about the evolutionary relationships among species. The statistical problems that arise from not considering these relationships can lead pairwise approaches to the wrong conclusions and are a missed opportunity to learn about biology that can only be understood in an explicit phylogenetic context. Here, we examine two recently published studies that compare gene expression across species with pairwise methods, and find reason to question the original conclusions of both. One study interpreted pairwise comparisons of gene expression as support for the ortholog conjecture, the hypothesis that orthologs tend to have more similar attributes (expression in this case) than paralogs. The other study interpreted pairwise comparisons of embryonic gene expression across distantly related animals as evidence for a distinct evolutionary process that gave rise to phyla. In each study, distinct patterns of pairwise similarity among species were originally interpreted as evidence of particular evolutionary processes, but instead, we find that they reflect species relationships. These reanalyses concretely show the inadequacy of pairwise comparisons for analyzing functional genomic data across species. It will be critical to adopt phylogenetic comparative methods in future functional genomic work. Fortunately, phylogenetic comparative biology is also a rapidly advancing field with many methods that can be directly applied to functional genomic data.Entities:
Keywords: functional genomics; gene expression; hourglass; ortholog conjecture; phylogenetics
Mesh:
Year: 2018 PMID: 29301966 PMCID: PMC5776959 DOI: 10.1073/pnas.1707515115
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205
Fig. 1.Pairwise and phylogenetic comparative approaches illustrated on an example gene tree with multiple genes per species. The internal nodes of the tree are speciation and gene duplication events. (A) Many comparative functional genomic studies rely on pairwise comparisons, where traits of each gene are compared with traits of other genes across species. This leads to many more comparisons than unique observations, making each comparison dependent on others. (B) Comparative phylogenetic methods, including PICs (2), make a smaller number of independent comparisons, where each contrast measures independent changes along different branches. Phylogenetic approaches are rarely used for functional genomic studies.
Fig. 2.Pairwise (A, C, and E) and phylogenetic (B, D, and F) analyses of the original data (A and B), data simulated under the null hypothesis (C and D), and data simulated under the ortholog conjecture (E and F). In the pairwise plots, each point indicates the correlation coefficient of tau for a set of pairwise comparisons annotated with a specific node name (e.g., Primates) and event type (speciation or duplication, giving rise to orthologs and paralogs, respectively). The phylogenetic plots show the difference between the density distributions for tau phylogenetic contrasts for speciation and duplication events, where a value above zero indicates an excess of speciation contrasts in the indicated interval. A horizontal line at zero would indicate that the density distributions are identical. A reproduces the pattern presented in figure 2A of the work by KMRR (16) of higher correlation across speciation events than duplication events, which they took as evidence of the ortholog conjecture. The recovery of a similar pattern under both simulations (C and E) indicates that this pairwise approach does not make distinct testable predictions. The phylogenetic analysis of the original data (B) does not show an excess of larger contrasts for duplication events and does not reject the null hypothesis, providing no support for the ortholog conjecture. D and F validate the phylogenetic approach by showing that it does not reject the null when data are simulated under the null (D) but does reject the null when data are simulated under the ortholog conjecture (F). OC, ortholog conjecture.
Fig. 3.Distributions of pairwise similarity scores for each phase of development. Pairwise scores for the ctenophore are red. Wilcoxon test values for the significance of the differences between early–mid distributions and late–mid distributions are on the right. Model of variance, which is inversely related to similarity, is on the left. (A) The distributions as published by Levin et al. (19). Low similarity (i.e., high variance) in the midphase of development was interpreted as support for an inverse hourglass model for the evolution of gene expression. The five least similar midphase scores were all from the ctenophore. (Inset) The ctenophore image is by S. Haddock and reproduced from phylopic.org. (B) The distributions after the exclusion of the ctenophore. The early-phase and midphase distributions are not statistically distinct.