| Literature DB >> 23940099 |
Xun Gu1, Yangyun Zou, Wei Huang, Libing Shen, Zebulun Arendsee, Zhixi Su.
Abstract
Thanks to the microarray technology, our understanding of transcriptome evolution at the genome level has been considerably advanced in the past decade. Yet, further investigation was challenged by several technical limitations of this technology. Recent innovation of next-generation sequencing, particularly the invention of RNA-seq technology, has shed insightful lights on resolving this problem. Though a number of statistical and computational methods have been developed to analyze RNA-seq data, the analytical framework specifically designed for evolutionary genomics remains an open question. In this article we develop a new method for estimating the genome expression distance from the RNA-seq data, which has explicit interpretations under the model of gene expression evolution. Moreover, this distance measure takes the data overdispersion, gene length variation, and sequencing depth variation into account so that it can be applied to multiple genomes from different species. Using mammalian RNA-seq data as example, we demonstrated that this expression distance is useful in phylogenomic analysis.Entities:
Keywords: RNA-seq; genome expression distance; transcriptome evolution
Mesh:
Year: 2013 PMID: 23940099 PMCID: PMC3787673 DOI: 10.1093/gbe/evt121
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
FModel of transcriptome evolution between two species. (A) A schematic illustration for a rooted two-gene tree: ρ2 refers to among-gene expression variability at the common ancestor of species X and Y; v2 and v2 measure the among-gene expression variability in lineage X and Y since the split of common ancestor, respectively. (B) The variance–covariance matrix of genome expression between for current genomes X and Y. (C) The expression distance U plotted against the evolutionary time t. Expression divergence is an accelerated process under the adaptive model, a constant-rate process under the neutral model, and a decelerated process under the stabilizing model. In particular, when W→0, we have U →2σ2t, i.e., the stabilizing selection model is reduced to the neutral model; and when t→∞, U →1/W, i.e., the expression divergence approaches a saturated level.
Definitions, Theoretical Expectations, and Formulas of Statistical Estimation for Three Quantities J, J, and J
| Quantity | Expectation | Estimation |
|---|---|---|
aE[.] is short form for expectation.
bDerivation of each expectation can be found in Materials and Methods. See figure 1 and the text for the description of model parameters.
cx (or y) is the mean RNA-seq count of gene i over its biological replicates in genome X (or Y); and n is the number of genes under study.
FFlow chart for illustrating the statistical procedure of expression distance estimation.
Summary for the Estimates of Deep-Sequencing Parameters and Overdispersed Parameters in Mammalian Brains and Cerebellums
| ΩX | ||||
|---|---|---|---|---|
| Brain | Cerebellum | Brain | Cerebellum | |
| Human | 0.619 | 1.183 | 0.165 | 0.034 |
| Chimpanzee | 0.660 | 0.831 | 0.102 | 0.049 |
| Gorilla | 1.215 | 1.063 | 0.051 | 0.034 |
| Orangutan | 1.462 | 0.970 | 0.039 | 0.033 |
| Macaque | 0.846 | 0.598 | 0.046 | 0.009 |
| Mouse | 1.439 | 0.876 | 0.162 | 0.054 |
| Opossum | 1.030 | 0.746 | 0.153 | 0.003 |
| Platypus | 1.093 | 0.999 | 0.034 | 0.013 |
Pairwise Tissue Expression Distance (U) Matrix of Brain and Cerebellum in Mammals
| Human | Chimpanzee | Gorilla | Orangutan | Macaque | Mouse | Opossum | Platypus | |
|---|---|---|---|---|---|---|---|---|
| Human | 0 | 0.116 ± 0.038 | 0.174 ± 0.031 | 0.338 ± 0.021 | 0.247 ± 0.025 | 0.248 ± 0.025 | 0.473 ± 0.017 | 0.797 ± 0.012 |
| Chimpanzee | 0.304 ± 0.023 | 0 | 0.191 ± 0.029 | 0.300 ± 0.023 | 0.258 ± 0.025 | 0.333 ± 0.021 | 0.494 ± 0.017 | 0.799 ± 0.012 |
| Gorilla | 0.357 ± 0.021 | 0.329 ± 0.022 | 0 | 0.348 ± 0.021 | 0.299 ± 0.023 | 0.379 ± 0.020 | 0.512 ± 0.016 | 0.890 ± 0.011 |
| Orangutan | 0.523 ± 0.016 | 0.393 ± 0.019 | 0.511 ± 0.016 | 0 | 0.302 ± 0.023 | 0.426 ± 0.018 | 0.535 ± 0.016 | 0.912 ± 0.011 |
| Macaque | 0.468 ± 0.017 | 0.343 ± 0.021 | 0.456 ± 0.018 | 0.459 ± 0.018 | 0 | 0.306 ± 0.023 | 0.464 ± 0.018 | 0.852 ± 0.012 |
| Mouse | 0.493 ± 0.017 | 0.467 ± 0.017 | 0.549 ± 0.016 | 0.680 ± 0.014 | 0.518 ± 0.016 | 0 | 0.361 ± 0.020 | 0.704 ± 0.013 |
| Opossum | 0.810 ± 0.012 | 0.699 ± 0.013 | 0.785 ± 0.012 | 0.821 ± 0.012 | 0.672 ± 0.014 | 0.676 ± 0.014 | 0 | 0.512 ± 0.016 |
| Platypus | 1.010 ± 0.010 | 0.842 ± 0.012 | 0.976 ± 0.010 | 0.992 ± 0.010 | 0.823 ± 0.012 | 0.786 ± 0.012 | 0.777 ± 0.012 | 0 |
Note.—Up diagonal for brain and down diagonal for cerebellum; the sampling variances of expression distance are presented in the form of standard error.
FMammalian brain expression phylogeny. (A) Expression phylogeny inferred by the neighbor-joining method based on expression distance matrix of brains. Nodes with * means bootstrapping values >0.95 and with ** values >0.99. (B) The result of mapping the expression distance to a given species tree, which is extracted from the tree of life (http://tolweb.org/, last accessed September 18, 2013).