| Literature DB >> 24573882 |
Janos X Binder1, Sune Pletscher-Frankild, Kalliopi Tsafou, Christian Stolte, Seán I O'Donoghue, Reinhard Schneider, Lars Juhl Jensen.
Abstract
Information on protein subcellular localization is important to understand the cellular functions of proteins. Currently, such information is manually curated from the literature, obtained from high-throughput microscopy-based screens and predicted from primary sequence. To get a comprehensive view of the localization of a protein, it is thus necessary to consult multiple databases and prediction tools. To address this, we present the COMPARTMENTS resource, which integrates all sources listed above as well as the results of automatic text mining. The resource is automatically kept up to date with source databases, and all localization evidence is mapped onto common protein identifiers and Gene Ontology terms. We further assign confidence scores to the localization evidence to facilitate comparison of different types and sources of evidence. To further improve the comparability, we assign confidence scores based on the type and source of the localization evidence. Finally, we visualize the unified localization evidence for a protein on a schematic cell to provide a simple overview. Database URL: http://compartments.jensenlab.org.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24573882 PMCID: PMC3935310 DOI: 10.1093/database/bau012
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.Visualization of localization evidence. When querying the database for a protein, its localization is visualized on a schematic of a cell. When the user hovers the cursor over a compartment, we also graphically summarize the types of evidence supporting this localization. The confidence of the evidence is color coded, ranging from light green for low confidence to dark green for high confidence. White indicates an absence of localization evidence.
Overview of the localization evidence for human proteins
| Compartment | Knowledge | Experi ments | Text mining | PSORT | YLoc |
|---|---|---|---|---|---|
| Nucleus | 6082 | 5848 | 2288 | 9600 | 5335 |
| Cytosol | 2538 | 4872 | 577 | 9128 | 4630 |
| Cytoskeleton | 1843 | 1215 | 1257 | 134 | – |
| Peroxisome | 124 | – | 240 | 315 | 262 |
| Lysosome | 386 | – | 262 | 5 | 120 |
| Endoplasmic reticulum | 1382 | 151 | 656 | 281 | 178 |
| Golgi apparatus | 1250 | 814 | 348 | 64 | 313 |
| Plasma membrane | 4440 | 1271 | 1515 | 3681 | 3815 |
| Endosome | 170 | – | 88 | – | – |
| Extracellular space | 2267 | – | 1528 | 4331 | 1625 |
| Mitochondrion | 1156 | 924 | 793 | 2008 | 871 |
We counted protein–compartment associations separately for each of the 11 labeled compartments and for each evidence channel. The only exception is the predictions channel, for which we show the results from the two sequence-based methods (PSORT and YLoc) separately. Dashes denote compartments for which a channel or prediction method cannot provide evidence.
Overview of the localization evidence for yeast proteins
| Compartment | Knowledge | Text mining | PSORT | YLoc |
|---|---|---|---|---|
| Nucleus | 2194 | 211 | 3870 | 1476 |
| Cytosol | 422 | 42 | 3242 | 1533 |
| Cytoskeleton | 231 | 108 | 44 | – |
| Peroxisome | 69 | 65 | 20 | 127 |
| Vacuole | 268 | 88 | 0 | 23 |
| Endoplasmic reticulum | 486 | 129 | 42 | 38 |
| Golgi apparatus | 236 | 75 | 12 | 57 |
| Plasma membrane | 457 | 135 | 775 | 350 |
| Endosome | 16 | 18 | – | – |
| Extracellular space | 94 | 69 | 302 | 624 |
| Mitochondrion | 1118 | 162 | 1486 | 422 |
For details refer to the footnote of Table 1.
Figure 2.Overlap between the knowledge, experimental and text-mining evidence for human proteins. The Venn diagram shows the number of proteins with localization evidence from one or more of the three types of evidence. The two sequence-based prediction methods are not included as they are able to provide a prediction for any protein sequence.
Figure 3.Benchmark of text-mining results. The performance of the text-mining pipeline on human and yeast proteins is shown as receiver operating characteristics (ROC) curves for each of 11 compartments. The curves do not intercept sensitivity = 1.0 and FPR = 1.0 because many of the protein–compartment pairs in the benchmark set are never found mentioned together in Medline, for which reason they have no text-mining score.
Figure 4.Compartment relationships derived from shared proteins. Illustrating the usefulness of COMPARTMENTS for global analysis of protein localization, we studied relationships between compartments. Each node represents a single compartment, which is highlighted in green. The number of proteins in the compartment is shown in parenthesis. We show an edge between two compartments whenever they share more proteins than expected at random (false discovery rate <0.1%). The number of proteins co-localized to the two compartments is shown next to the edge.