| Literature DB >> 15550177 |
Maurizio Pellegrino1, Paolo Provero, Lorenzo Silengo, Ferdinando Di Cunto.
Abstract
BACKGROUND: Public repositories of microarray data contain an incredible amount of information that is potentially relevant to explore functional relationships among genes by meta-analysis of expression profiles. However, the widespread use of this resource by the scientific community is at the moment limited by the limited availability of effective tools of analysis. We here describe CLOE, a simple cDNA microarray data mining strategy based on meta-analysis of datasets from pairs of species. The method consists in ranking EST probes in the datasets of the two species according to the similarity of their expression profiles with that of two EST probes from orthologous genes, and extracting orthologous EST pairs from a given top interval of the ranked lists. The Gene Ontology annotation of the obtained candidate partners is then analyzed for keywords overrepresentation.Entities:
Mesh:
Substances:
Year: 2004 PMID: 15550177 PMCID: PMC535557 DOI: 10.1186/1471-2105-5-179
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Schematic representation of the CLOE approach
Figure 2The top percentiles of single organisms ranked lists obtained with orthologous probes are strongly enriched of orthologous sequences. 100 orthologous (CLOE) and 100 randomly chosen (Random) EST pairs were used to rank the ESTs in the human and mouse datasets on the basis of expression similarity. The ranked lists were divided in 1% rank intervals, and the average number of human ESTs in a given rank interval with at least one orthologous EST in the corresponding mouse rank interval was determined. The average number of these ESTs in the first top 10 rank intervals was plotted. Error bars = standard error.
Percentage of known protein-protein interactions in the lists of candidate partners generated by the different co-expression-based approaches. For every protein found in the three analyzed complexes, and represented by at least one EST probe in both datasets, we selected the most representative human and mouse probes. A CLOE analysis with a top 1% cutoff was performed on these sequences. In parallel, the human dataset was ranked for each human EST, and lists of candidates corresponding to the top 1% ranks where obtained (Single organism). The prevalence of ESTs corresponding to other proteins of the same complexes was then determined for both approaches. Finally, to determine the prevalence of correct predictions by the Multiple Species approach, we determined the ratio between the number of links with other proteins of the same complex and the total number of links for all the complex components (data from [11]).
| Centrosome | 1.4 | 4.2 | 6.2 |
| PSD | 0.9 | 5.5 | 6.5 |
| TNFα/NF-kB | 1.6 | 6.1 | 6.8 |
| Average | 1.3 | 5.7 | 6.6 |
Prevalence of functionally compatible predictions obtained with the three different methods. The percentage of compatible predictions was determined as in the previous table using the functional index described in the text.
| Centrosome | 19.5 | 36 | 26.3 |
| PSD | 33.8 | 47.8 | 41.3 |
| TNFα/NF-kB | 47.2 | 47.4 | 44.8 |
| Average | 33.5 | 43.7 | 37.4 |
List of candidate partners generated by CLOE analysis on the most representative ESTs corresponding to the protein of unknown function FAD104.
| 523 | IMAGE:240295 | FAD104 | 1 | H3020H08 | 1600019O04Rik | 1 | 1 |
| 1668 | IMAGE:343072 | ITGB1 | 2 | IMAGE:1051975 | Itgb1 | 45 | 23.5 |
| 9060 | IMAGE:786680 | ANXA5 | 21 | H3016C05 | Anxa5 | 103 | 62 |
| 8045 | IMAGE:486787 | CNN3 | 102 | H3056D03 | Cnn3 | 29 | 65.5 |
| 9769 | IMAGE:488479 | TPM1 | 107 | 3110002E24 | Tpm1 | 57 | 82 |
| 11683 | IMAGE:487437 | PPIC | 87 | H3028H10 | Ppic | 93 | 90 |
| 6369 | IMAGE:142788 | SERPINH1 | 70 | H3125A07 | Serpinh1 | 129 | 99.5 |
| 899 | IMAGE:469969 | ITGAV | 18 | 1110004F14 | Itgav | 182 | 100 |
| 9579 | IMAGE:345538 | CTSL | 140 | 2600002C17 | Ctsl | 95 | 117.5 |
| 8192 | IMAGE:613056 | RCN1 | 52 | H3027B09 | Rcn | 195 | 23.5 |
| 11615 | IMAGE:230261 | RALA | 78 | H3121E01 | Rala | 198 | 138 |
| 221 | IMAGE:897760 | LAMC1 | 43 | H3113E11 | Lamc1 | 239 | 141 |
| 1306 | IMAGE:897164 | CTNNA1 | 258 | 2210403L09 | Catna1 | 48 | 153 |
| 4123 | IMAGE:840697 | FKBP9 | 83 | H3147A05 | Fkbp9 | 236 | 159.5 |
| 12331 | IMAGE:841664 | CAV1 | 24 | H3089D06 | Cav | 301 | 162.5 |
| 5726 | IMAGE:377384 | NR2F2 | 308 | H3124H07 | Nr2f2 | 26 | 167 |
| 13914 | IMAGE:810485 | ID1 | 5 | H3003F10 | Idb1 | 365 | 185 |
Gene Ontology keywords overrepresented in the list shown in the previous supplementary table. The results strongly suggest that this protein could be involved in some aspects of the functional interaction between the cytoskeleton and the extracellular matrix.
| Endoplasmic reticulum | Cellular Component | 9.3·10-3 |
| Protein binding | Molecular Function | 6.5·10-3 |
| Peptidyl-prolyl cis-trans isomerase | Molecular Function | 6.5·10-3 |
| Structural constituent of muscle | Molecular Function | 3.4·10-3 |
| Collagen binding | Molecular Function | 3.1·10-3 |
| Structural molecule | Molecular Function | 1.7·10-3 |
| Tropomyosin binding | Molecular Function | 8.9·10-4 |
| Basement membrane | Cellular Component | 5.7·10-4 |
| Cytoskeleton | Cellular Component | 5.6·10-4 |
| Cell adhesion | Biological Process | 6.4·10-5 |
| Actin binding | Molecular Function | 4.6·10-8 |