| Literature DB >> 16423292 |
Bas E Dutilh1, Martijn A Huynen, Berend Snel.
Abstract
BACKGROUND: The massive scale of microarray derived gene expression data allows for a global view of cellular function. Thus far, comparative studies of gene expression between species have been based on the level of expression of the gene across corresponding tissues, or on the co-expression of the gene with another gene.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16423292 PMCID: PMC1382217 DOI: 10.1186/1471-2164-7-10
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Inparanoid pairwise orthologous groups between all species pairs for C. elegans (15950 genes) D. melanogaster (4456 genes) H. sapiens (12193 genes) and S. cerevisiae (6199 genes).
| 2393 | 1907 | ||
| 3814 | 2335 | ||
| 2520 | 1516 | ||
| 2739 | 1891 | ||
| 1641 | 1193 | ||
| 2514 | 1580 | ||
| total | 15621 | 10422 | |
Figure 1Method used to calculate the expression context conservation between gn_A and gn_B. Genes gn_A and gn_B are the query genes in species A and species B, respectively. First, the correlation between the expression levels of the query gene and all 1-1 orthologs over multiple microarray experiments was calculated in both species (a; uncentered correlation). The resulting expression correlation values were correlated between the two species (b; Pearson's correlation), yielding the expression context conservation between gn_A and gn_B. For an unambiguous comparison between species, we only analyze the expression correlation values of the studied genes with the 1-1 orthologs.
Figure 2Expression context conservation between different classes of orthologs and random non-orthologous gene pairs. The plots are normalized histograms of the combined data from all species comparisons. For statistical comparison of the histograms see Table 4. The distributions are normally distributed (Shapiro-Wilk test, P < 1·10-4).
Probability that the expression context conservation scores in different classes of orthologs and random non-orthologous gene pairs were drawn from the same distribution (see histograms in Fig. 2; Pvalues, Student's t-test; the distributions are normal according to a Shapiro-Wilk test, P < 1·10-4). The expression context data is combined over all species comparisons: 1-1 orthologs (n = 10303) all X-X orthologs (n = 27147) most conserved X-X orthologs (n = 5180) less conserved X-X orthologs (n = 21967) random non-orthologous gene pairs (n = 6000).
| 6.31·10-233 | 0 | 1.78·10-70 | 3.55·10-21 | |
| 9.66·10-173 | 0 | 0.172 | . | |
| 0 | 0 | . | . | |
| 1.38·10-57 | . | . | . |
Figure 3Functional classification of 1-1 orthologs with a conserved expression context (score higher than zero). From all species pairs, all 1-1 orthologs that could be assigned to a KOG were included. The categories are grouped in the four main KOG classes. The horizontal dashed lines are the fraction of genes with a conserved expression context for the entire class. The functional categories are (the number between brackets is the number of genes with a conserved expression context): "Cellular processes and signaling" (D: Cell cycle control, cell division, chromosome partitioning (n = 442), M: Cell wall/membrane/envelope biogenesis (n = 73), N: Cell motility (n = 23), O: Posttranslational modification, protein turnover, chaperones (n = 1330), T: Signal transduction mechanisms (n = 1151), U: Intracellular trafficking, secretion, and vesicular transport (n = 953), V: Defense mechanisms (n = 67), W: Extracellular structures (n = 111), Y: Nuclear structure (n = 96), and Z: Cytoskeleton (n = 378)), "Information storage and processing" (A: RNA processing and modification (n = 823), B: Chromatin structure and dynamics (n = 244), J: Translation, ribosomal structure and biogenesis (n = 1153), K: Transcription (n = 985), and L: Replication, recombination and repair (n = 545)), "Metabolism" (C: Energy production and conversion (n = 486), E: Amino acid transport and metabolism (n = 367), F: Nucleotide transport and metabolism (n = 205), G: Carbohydrate transport and metabolism (n = 452), H: Coenzyme transport and metabolism (n = 131), I: Lipid transport and metabolism (n = 383), P: Inorganic ion transport and metabolism (n = 228), and Q: Secondary metabolites biosynthesis, transport and catabolism (n = 71)) and "Poorly characterized" (R: General function prediction only (n = 1716), S: Function unknown (n = 912), and X: Not categorized by NCBI staff (n = 2)) [10].
Figure 4Example of an X-X orthologous group between C. elegans and S. cerevisiae. This X-X orthologous group (KOG0054: Multidrug resistance-associated protein/mitoxantrone resistance protein, ABC superfamily) has three genes in C. elegans and two genes in S. cerevisiae. The expression context conservation scores are given in the table. The gene pair with the highest score is the "most conserved X-X orthologous gene pair" (bold, yellow), the rest are the "less conserved X-X orthologs" (blue).
Correlation between sequence identity and expression context conservation for 1-1 orthologs between all species pairs. P is the probability that the data set is a sample drawn from a distribution with correlation coefficient zero.
| 0.077 | 8.41·10-4 | ||
| 0.060 | 4.49·10-3 | ||
| 0.121 | 5.14·10-6 | ||
| 0.092 | 6.27·10-5 | ||
| 0.050 | 9.01·10-2 | ||
| 0.061 | 1.46·10-2 |
Figure 5Consistency of sequence divergence with divergence in expression context for simple duplications. Consistency or inconsistency of sequence divergence with divergence in expression context for orthologous groups with a single gene duplication (1–2 orthologs). We display both the observed frequencies (plotted are the number of 1–2 orthologous groups; P is the probability to find at least this number of consistent observations by chance, binomial distribution) and the maximum consistent and minimum inconsistent frequencies expected (horizontal edge of the triangles), based on a completely consistent re-allocation of the expression context conservation scores from the overlapping distributions (see Methods).
Sequence identity and expression context conservation of the two βNAC in-paralogs in S. cerevisiae. The β subunit of the Nascent polypeptide-Associated Complex has two orthologs in S. cerevisiae: Enhanced Gal4 DNA binding protein 1 (EGD1, β1NAC) and Basic Transcription factor Three 1 (BTT1, β3NAC). The three other species in this analysis have only one ortholog: inhibitor of cell death 1 (icd-1 in C. elegans), bicaudal (bic in D. melanogaster) and Basic Transcription Factor 3 (BTF3 in H. sapiens).
| 0.385 | 0.350 | 0.375 | ||
| 0.302 | 0.203 | 0.199 | ||
| 0.300 | 0.305 | 0.340 | ||
| -0.205 | -0.092 | 0.006 |
Figure 6Consistency of sequence divergence with divergence in expression context for expanded orthologous groups. Consistency (positive correlation) or inconsistency (negative correlation) of sequence divergence with divergence in expression context for all expanded orthologous groups (X-X orthologs, except 1–2 orthologs). Plotted frequencies are the number of X-X orthologous groups with a positive and negative correlation. P is the probability to find at least this number of positively correlated observations by chance (binomial distribution).