| Literature DB >> 15588312 |
Wen Zhang1,2, Quaid D Morris1,3, Richard Chang1, Ofer Shai3, Malina A Bakowski1, Nicholas Mitsakakis1, Naveed Mohammad1, Mark D Robinson1, Ralph Zirngibl2, Eszter Somogyi2, Nancy Laurin2, Eftekhar Eftekharpour4, Eric Sat5, Jörg Grigull1, Qun Pan1, Wen-Tao Peng1, Nevan Krogan1,2, Jack Greenblatt1,2, Michael Fehlings4,6, Derek van der Kooy2, Jane Aubin2, Benoit G Bruneau2,7, Janet Rossant2,5, Benjamin J Blencowe1,2, Brendan J Frey3, Timothy R Hughes1,2.
Abstract
BACKGROUND: Large-scale quantitative analysis of transcriptional co-expression has been used to dissect regulatory networks and to predict the functions of new genes discovered by genome sequencing in model organisms such as yeast. Although the idea that tissue-specific expression is indicative of gene function in mammals is widely accepted, it has not been objectively tested nor compared with the related but distinct strategy of correlating gene co-expression as a means to predict gene function.Entities:
Mesh:
Substances:
Year: 2004 PMID: 15588312 PMCID: PMC549719 DOI: 10.1186/jbiol16
Source DB: PubMed Journal: J Biol ISSN: 1475-4924
Figure 1Expression of previously characterized tissue-specific genes. Genes were identified manually by searching MEDLINE abstracts [66] and XM sequence description fields (see Additional data file 1) for keywords corresponding to the appropriate tissues. Rows and columns were ordered manually.
Figure 2Validation of expression data by independent confirmation. (a) The P value of Spearman's Rank correlations (see Materials and methods) is shown for all possible comparisons among the 13 tissues common to all three studies (ours and those by Su et al. [15] and Bono et al. [17]) and 1,109 genes for which the same isoform is unambiguously represented on the arrays used in each of the studies (see Materials and methods). (b) Microarray data and RT-PCR results for 47 known and predicted XM genes are shown. Genes were selected to represent primarily those without GO Biological Processes (GO-BP) assignment and to encompass expression in all 18 tissues, and were biased towards those with functions predicted by support vector machines (SVMs) in categories of interest (or expressed in tissues of interest). The three columns on the far right show whether each XM gene was uncharacterized (not annotated) in GO-BP, and whether it is represented by a cDNA or EST.
Figure 3Defining whether a gene is expressed, and how many genes are detected as expressed per sample. (a) The curves show the cumulative distribution for negative-control probes (cyan line) and for probes on the array (blue line), over all arrays, to illustrate how genes were defined as expressed. The dotted black line indicates the 99thpercentile for the negative control spots. (b) The number of genes expressed in any given number of tissues (between 1 tissue and 55 tissues; for example, there are 4,475 genes detected in only one sample, 171 genes expressed in exactly 27 samples, 1,790 genes detected in all 55 samples, and so on). The genes expressed in each of the 55 tissues were determined as in (a). (c) Number of genes defined as expressed in each of the 55 tissues, using criteria in (a).
Figure 4Correspondence between gene expression patterns and GO-BP annotations. (a) Ratios for the 21,622 expressed genes were grouped by two-dimensional hierarchical agglomerative clustering and diagonalization, using the Pearson correlation coefficient. (b) Negative logs of P values resulting from applying the Wilcoxon-Mann-Whitney (WMW) test to each of the GO-BP categories in each of the tissues are shown. The categories (vertical axis) were clustered and ordered as in (a). (c, d) 'Density' of GO-BP annotations significantly enriched in specific points along the vertical axis at left (genes) are indicated; note that genes are in the same order in (a, b, c).
Figure 5Expression of genes in 17 different functional categories. The categories were ordered manually. The genes within each category were clustered separately from those in other categories. The order of tissues is preserved from previous figures.
Figure 6Predicting GO-BP categories of mouse genes using microarray data and SVMs. (a) The number of the 992 initial GO-BP categories exceeding the indicated precision value, with recall fixed for each line; for example at 40% recall (green line), around 100 categories achieve precision of 30%. To estimate the significance of the colored lines, we repeated their calculation after permuting the gene labels in the annotation database. The dotted gray line indicates the maximum number of GO categories that achieve the indicated precision, with recall of 10% or greater. The dotted magenta line indicates the result obtained using 'binary' expression data (expressed/not expressed) in each tissue. (b) The number of genes with predicted GO-BP categories (blue line) or superGO categories (red line) at varying precision values. The individual predictions are given in the Additional data files. (c) Comparison of the overall predictive capacity of three data sets, restricted to the 13 tissues and 1,800 genes shared by all three data sets. Each of the lines corresponds to the 30% recall line in (a). All of the lines are to the lower right of those in (a), since fewer genes and tissues were used. (d) A histogram comparing the precision of predictions derived from lists of tissue-specific genes with the precision of predictions from SVMs. For each category, the tissue-specific list yielding the highest precision value was identified, along with its associated recall value, and the SVM precision for the same category at the same recall value was identified. The difference between the two precision values is plotted for each category, such that instances where the SVM is superior are to the right of center.
Figure 7Expression patterns of 1,092 unannotated genes predicted to belong to any of 117 'superGO' categories with 50% confidence or higher. The vertical axis was clustered and diagonalized as in Figure 4. The height of each predicted category has been normalized to facilitate display; the number of genes predicted in each category is indicated at the left. The gene order (vertical axis) has been clustered within each category to illustrate that some categories are characterized by multiple patterns. The proportion (%) of predicted genes in each category that have gene-trap ES cell lines available are represented at right (color scale from 0 to 100%).
Domains associated with genes predicted to function in specific biological processes
| Predicted function | Enriched domain | Description of domain | Proportion of genes with this domain | -log10 significance ( |
| Chromosome organization or DNA packaging | HMG | HMG (high mobility group) box | 13/87 | 10.5 |
| Pregnancy/embryo implantation | Hormone_1 | Somatotropin hormone family | 3/14 | 7.3 |
| Acyl-CoA/fatty acid/peroxisome | FabG | Short-chain alcohol dehydrogenase | 3/11 | 6.8 |
| RNA/ribosome metabolism/processing | RRM | RNA recognition motif | 10/149 | 6.6 |
| Carboxylic acid/amine metabolism | ECH | Enoyl-CoA hydratase/isomerase family | 3/62 | 6.2 |
| Humoral immune response | Sp100 | The function of this domain is unknown | 2/95 | 6.1 |
| Vision | Uteroglobin | The function of this domain is unknown | 3/56 | 5.9 |
| RNA-nucleus import/export | COG5136 | U1 snRNP-specific protein C | 2/13 | 5.7 |
| Microtubule-based process | Smc | Chromosome-segregation ATPases | 4/86 | 5.2 |
P values were calculated using the hypergeometric P value [48], which compares against expectation from random draws among the 15,443 XM genes with encoded domains. Domain names and descriptions are from the NCBI 'COG' database [65].
Figure 8PWP1 functions in ribosomal large-subunit biogenesis. (a) The expression pattern of mouse Pwp1 is similar to that of most known RNA-processing proteins. (b) The domain structures of Pwp1 homologs identified by BLASTP searches. Accession number and amino-acid length is given. We identified a single strong match in each of the species shown. Domains were identified by CDD search [29]. (c) A northern blot showing the accumulation of 35S rRNA precursor (blue arrow), reduction in other rRNA precursors (top panel), and reduction in 25S rRNA (red arrow) in the yeast TetO7-PWP1 mutant (strain TH_2220) in comparison to the parental wild-type strain (R1158) [9]. The U2 spliceosomal RNA is shown for comparison; its apparent abundance is increased because 5 μg RNA was loaded per lane, and the relative proportion of rRNA to snRNA is decreased in the mutant. Blotting procedures and probes were as previously described [9]. (d) Affinity-purification of yeast Pwp1p-TAP reveals association with proteins known to function in ribosomal large-subunit biogenesis (Ebp2p, Nop12p, Brx1p) as well as a subset of ribosomal proteins. The asterisks mark degradation products of Pwp1p-TAP.