| Literature DB >> 26673865 |
Petr Klus1,2, Riccardo Delli Ponti1,2, Carmen Maria Livi1,2, Gian Gaetano Tartaglia3,4,5.
Abstract
BACKGROUND: Comparison between multiple protein datasets requires the choice of an appropriate reference system and a number of variables to describe their differences. Here we introduce an innovative approach to discriminate multiple protein datasets (multiCM) and to measure enrichments in gene ontology terms (cleverGO) using semantic similarities.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26673865 PMCID: PMC4681139 DOI: 10.1186/s12864-015-2280-z
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1RNA-binding abilities of S. cerevisiae chaperone substrates. a RNA-binding ability of yeast chaperones substrates is visualized in a microarray-like table. Hsp90 and Hsp40 are predicted to have the largest number of nucleic-acid binding partners (Positive set: vertical axis; Negative set: horizontal axis; Green: positive set is enriched with respect to negative set; Red: negative set is enriched with respect to positive set [3]; Yellow: non significant enrichment; Grey: not calculable enrichment due strong overlap between the sets). The enrichment is associated with a p-value < 10−5 calculated with Fisher’s exact test. b GO annotations are shown through an innovative interface that allows clustering through semantic similarity. The largest cluster of Hsp90 interactors is related to the molecular function (MF) RNA/DNA binding (red cluster corresponding to a coverage of 372 out of 877 proteins). Full analysis is available at http://www.tartaglialab.com/cs_multi/confirm/286/d67c93dd10/
Fig. 2Physico-chemical determinants of protein insolubility. Comparing low-solubility (LS) and high-solubility (HS) proteins in three eukaryotic cells [15], we found that a LS proteins are structurally disordered in human and mouse (red dots indicate enrichments in LS proteins).b The Boxplotter algorithm indicates that there is a significant difference between aggregation-propensities of HS and LS groups in yeast (p-value = 10−11; Mann–Whitney–Wilcoxon test; area under the ROC curve = 0.72), which is c inversely related to protein abundance (p-value = 10−9; Mann–Whitney–Wilcoxon test; area under the ROC curve = 0.70), in agreement with previous evolutionary observations [30–32]. In all organisms, we find d more nucleic acid binding in LS fractions. e, f LS proteins are enriched in nucleic-acid binding ability (Additional file 1: Figure S1), as shown with cleverGO analysis on human and yeast. The links to multiCM, Boxplotter and cleverGO analyses are available at http://www.tartaglialab.com/cs_multi/confirm/737/6065feed14/
Fig. 3Protein aggregation and longevity. We used multiCM to analyze insoluble fractions of C. elegans proteins [16]. a Analysis of mass-spectrometry data indicates that in the hsf-1 strain (short-lived) highly enriched proteins (class HSF 4/4) are more aggregation prone than those less enriched (class HSF1 1/4). b In the daf-2 strain (long-lived), highly enriched proteins (DAF2 4/4) show lower aggregation propensities than the ones poorly enriched (DAF2 1/4). In these calculations, the insoluble fraction of the strains is divided into 4 equal sets containing proteins with fold enrichments > 1 with respect to wild type worm and ranked from low (1/4) to high (4/4) [green dots indicate row vs column enrichments]. c Using the cleverGO algorithm, we analyzed proteins present in the hsf-1 strain (i.e., reported in HSF-1 4/4 and not in DAF-2 4/4) and found enrichments in metabolic pathways, oxidative stress response and mitochondrial function. Links to the analyses are at http://www.tartaglialab.com/cs_multi/confirm/757/9e1710f579/ and http://www.tartaglialab.com/cs_multi/confirm/758/95acfc44da/