| Literature DB >> 19690572 |
Jing Li1, Lisa J Zimmerman, Byung-Hoon Park, David L Tabb, Daniel C Liebler, Bing Zhang.
Abstract
Protein assembly and biological interpretation of the assembled protein lists are critical steps in shotgun proteomics data analysis. Although most biological functions arise from interactions among proteins, current protein assembly pipelines treat proteins as independent entities. Usually, only individual proteins with strong experimental evidence, that is, confident proteins, are reported, whereas many possible proteins of biological interest are eliminated. We have developed a clique-enrichment approach (CEA) to rescue eliminated proteins by incorporating the relationship among proteins as embedded in a protein interaction network. In several data sets tested, CEA increased protein identification by 8-23% with an estimated accuracy of 85%. Rescued proteins were supported by existing literature or transcriptome profiling studies at similar levels as confident proteins and at a significantly higher level than abandoned ones. Applying CEA on a breast cancer data set, rescued proteins coded by well-known breast cancer genes. In addition, CEA generated a network view of the proteins and helped show the modular organization of proteins that may underpin the molecular mechanisms of the disease.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19690572 PMCID: PMC2736651 DOI: 10.1038/msb.2009.54
Source DB: PubMed Journal: Mol Syst Biol ISSN: 1744-4292 Impact factor: 11.429
Figure 1ROC curves from cross-validation studies. (A) Comparison among three network-assisted methods: clique-enrichment approach (CEA, red), neighbor-voting (NV, black), and Hopfield (green). Cross-validations using data sets with manually introduced 10% noise are shown in dashed curves. (B) ROC curves from the yeast data set (red) and the mouse organ data set (green and black). YPIN: yeast protein interaction network. MPIN1: mouse protein interaction network 1 that includes only literature-supported interactions. MPIN2: mouse protein interaction network 2 that includes both literature-supported and computationally predicted interactions.
Figure 2Evaluation of the rescued proteins using relevant gene expression data and publications. (A–C) For the proteins rescued by the clique-enrichment approach (CEA) in mouse brain, placenta, and lung, relevant data sets in microarray (M), EST library studies (E), and publications in PubMed (P) were investigated for supporting evidence. Red, orange, yellow, and white correspond to support from three, two, one, or zero information resources, respectively. (D) Percentage of proteins in all annotated mouse proteins, confident proteins, rescued non-confident proteins, and un-rescued non-confident proteins that are supported by, at least, two information resources in brain, placenta, and lung, respectively.
Figure 3Breast cancer specific sub-networks. Different sub-networks are shown in different colors and identified by IDs from ‘a' to ‘n'. Proteins shared by multiple sub-networks are colored in red. The most enriched Gene Ontology (GO) biological process annotations for each sub-network are labeled. The IDs accompanying the GO annotations match those of corresponding sub-networks. Triangle vertices represent the proteins rescued by the clique-enrichment approach (CEA). Vertex size represents different levels of publication support: the large size indicates support from breast cancer-related publications, the middle size indicates support from cancer-related publications, and the small size indicates no support from existing cancer-related publications.