| Literature DB >> 24911789 |
Chin-Sheng Yu1, Chih-Wen Cheng2, Wen-Chi Su2, Kuei-Chung Chang2, Shao-Wei Huang3, Jenn-Kang Hwang4, Chih-Hao Lu5.
Abstract
CELLO2GO (http://cello.life.nctu.edu.tw/cello2go/) is a publicly available, web-based system for screening various properties of a targeted protein and its subcellular localization. Herein, we describe how this platform is used to obtain a brief or detailed gene ontology (GO)-type categories, including subcellular localization(s), for the queried proteins by combining the CELLO localization-predicting and BLAST homology-searching approaches. Given a query protein sequence, CELLO2GO uses BLAST to search for homologous sequences that are GO annotated in an in-house database derived from the UniProt KnowledgeBase database. At the same time, CELLO attempts predict at least one subcellular localization on the basis of the species in which the protein is found. When homologs for the query sequence have been identified, the number of terms found for each of their GO categories, i.e., cellular compartment, molecular function, and biological process, are summed and presented as pie charts representing possible functional annotations for the queried protein. Although the experimental subcellular localization of a protein may not be known, and thus not annotated, CELLO can confidentially suggest a subcellular localization. CELLO2GO should be a useful tool for research involving complex subcellular systems because it combines CELLO and BLAST into one platform and its output is easily manipulated such that the user-specific questions may be readily addressed.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24911789 PMCID: PMC4049835 DOI: 10.1371/journal.pone.0099368
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Flowchart for CELLO2GO and examples of the input and output interfaces.
(A) The flowchart for annotation of a protein sequence used by CELLO2GO. The search databases used in the work are modified forms of the UniProtKB/SwissProt and UniProtKB/TrEMBL databases. (B) The CELLO2GO output page for a multiple-sequence query, which provides four pie charts, one for the localization predictions returned by CELLO (upper right) and three for the GO terms returned by BLAST for each query sequence. The list, which can be hidden, below the pie charts presents the CELLO-predicted subcellular localizations and the associated GO annotations in the order that the sequences were submitted. (C) The CELLO2GO output page for a single sequence query, which provides four pie charts, one for the CELLO-predicted subcellular localizations (upper right) and three for the GO terms returned by BLAST for the retrieved homologous sequences. The list, which can be hidden, below the pie graphs presents the CELLO-predicted subcellular localization(s) and the associated GO annotations in the order that the homologous sequences were found by BLAST. (D) By clicking on the GO-term list in (B), a new list of submitted sequence entries with the same GO term is returned.
Figure 2The frequency distributions for the GO slim of the UniProtKB/SwissProt entries in the in-house database.
(A) Molecular function (x axis) verses cellular component (y axis). (B) Biological process (x axis) verses cellular component (y axis). The size of each sphere is proportional to the number of entries.
Functional annotation returned by CELLO2GO for Gram-negative bacteria sequence dataset PS30GN.
| Molecular Function | Biological Process | Cellular Component | CELLO | |||||
| Localization | Number of Proteins |
|
|
|
|
|
| Accuracy |
| Extracellular | 419 | 32.0 | 30.5 | 16.0 | 14.8 | 6.7 | 6.7 | 53.6 |
| Outer Membrane | 541 | 27.7 | 27.5 | 10.4 | 10.2 | 1.5 | 1.3 | 71.4 |
| Periplasmic | 437 | 13.3 | 13.3 | 12.8 | 12.8 | 0.5 | 0.5 | 50.0 |
| Inner Membrane | 1607 | 22.6 | 22.5 | 11.4 | 11.4 | 1.4 | 1.3 | 52.4 |
| Cytoplasmic | 5025 | 1.4 | 1.3 | 1.3 | 1.3 | 1.5 | 1.5 | 95.9 |
|
| 8029 | 9.5 | 9.4 | 5.3 | 5.2 | 1.5 | 1.4 | 77.9 |
The percentage of homologous sequences for which GO functional annotations were not found by a BLAST search of the in-house database derived from the UniProtKB/SwissProt database for bacteria.
The percentage of homologous sequences for which GO functional annotations were not found by a BLAST search of the in-house database derived from the UniProtKB/TrEMBL database for bacteria.
The percentage of entries for which GO annotations for cellular components were missing or homologs were not retrieved by BLAST searching of the UniProtKB/TrEMBL databases, but for which CELLO accurately predicted the subcellular localization(s).
The Gram-negative bacterial benchmark dataset found in PSORTb3.0 [23], denoted PS30GN, includes 8029 protein sequences in five subcellular categories: extracellular, outer membrane, periplasmic, inner membrane, and cytoplasmic.
Functional annotation returned by CELLO2GO for Pseudomonas aeruginosa PA01 dataset.
| Molecular Function | Biological Process | Cellular Component | CELLO | |||||
| Localization | Number of Proteins |
|
|
|
|
|
| Accuracy |
| Extracellular | 94 | 44.7 | 41.5 | 42.6 | 38.3 | 30.9 | 28.7 | 63.0 |
| Outer Membrane | 194 | 28.9 | 23.7 | 20.1 | 16.5 | 16.0 | 12.4 | 62.5 |
| Outer Membrane Vesicel | 338 | 32.0 | 28.1 | 28.1 | 24.0 | 27.2 | 25.1 | 27.1 |
| Periplasmic | 522 | 24.3 | 20.7 | 18.4 | 15.5 | 24.7 | 22.8 | 51.3 |
| Inner Membrane | 1302 | 38.6 | 33.5 | 29.2 | 24.5 | 24.7 | 23.0 | 82.3 |
| Cytoplasmic | 2629 | 18.6 | 12.5 | 19.8 | 17.2 | 40.0 | 39.3 | 99.1 |
| Unknown Location | 1312 | 68.8 | 54.2 | 68.9 | 61.7 | 74.9 | 70.7 | - |
|
| 5572 | 35.9 | 28.3 | 33.7 | 29.5 | 43.1 | 41.2 | - |
The percentage of homologous sequences for which GO functional annotations were not found by a BLAST search of the in-house database derived from the UniProtKB/SwissProt database for bacteria.
The percentage of homologous sequences for which GO functional annotations were not found by a BLAST search of the in-house database derived from the UniProtKB/TrEMBL database for bacteria.
The percentage of entries for which GO annotations for cellular components were missing or homologs were not retrieved by BLAST searching of the UniProtKB/TrEMBL databases, but for which CELLO accurately predicted the subcellular localization(s).
The proteomic sequence data is that of the newly documented Pseudomonas aeruginosa PA01 dataset [31], which contains hypothetical and uncharacterized proteins.
Functional annotation returned by CELLO2GO for the Gram-positive bacteria dataset PS30GP.
| Molecular Function | Biological Process | Cellular Component | CELLO | |||||
| Localization | Number of Proteins |
|
|
|
|
|
| Accuracy |
| Extracellular | 312 | 15.1 | 15.1 | 19.2 | 19.2 | 2.2 | 2.2 | 100.0 |
| Cell wall | 82 | 25.6 | 22.0 | 31.7 | 25.6 | 9.8 | 1.2 | 0.0 |
| Membrane | 360 | 14.7 | 14.7 | 4.7 | 4.7 | 0.6 | 0.6 | 100.0 |
| Cytoplasmic | 1822 | 1.4 | 1.3 | 2.0 | 1.9 | 2.4 | 2.4 | 86.0 |
|
| 2576 | 5.7 | 5.5 | 5.4 | 5.2 | 2.3 | 2.1 | 86.8 |
The percentage of homologous sequences for which GO functional annotations were not found by a BLAST search of the in-house database derived from the UniProtKB/SwissProt database for bacteria.
The percentage of homologous sequences for which GO functional annotations were not found by a BLAST search of the in-house database derived from the UniProtKB/TrEMBL database for bacteria.
The percentage of entries for which GO annotations for cellular components were missing or homologs were not retrieved by BLAST searching of the UniProtKB/TrEMBL databases, but for which CELLO accurately predicted the subcellular localization(s).
The Gram-positive bacterial benchmark dataset found in PSORTb3.0 [23], denoted PS30GP, includes 2576 protein sequences in four subcellular categories: extracellular, cell wall, membrane, and cytoplasmic.
Functional annotation returned by CELLO2GO for archaeal dataset PS30Arch.
| Molecular Function | Biological Process | Cellular Component | CELLO | |||||
| Localization | Number of Proteins |
|
|
|
|
|
| Accuracy |
| Extracellular | 27 | 25.9 | 7.4 | 74.1 | 55.6 | 33.3 | 33.3 | 66.7 |
| Cell wall | 18 | 100.0 | 50.0 | 100.0 | 50.0 | 50.0 | 44.4 | 62.5 |
| Membrane | 85 | 27.1 | 24.7 | 8.2 | 7.1 | 4.7 | 4.7 | 75.0 |
| Cytoplasmic | 675 | 0.7 | 0.4 | 5.3 | 5.2 | 0.9 | 0.7 | 100.0 |
|
| 805 | 6.6 | 4.3 | 10.1 | 8.1 | 3.5 | 3.2 | 73.1 |
The percentage of homologous sequences for which GO functional annotations were not found by a BLAST search of the in-house database derived from the UniProtKB/SwissProt database for archaea.
The percentage of homologous sequences for which GO functional annotations were not found by a BLAST search of the in-house database derived from the UniProtKB/TrEMBL database for archaea.
The percentage of entries for which GO annotations for cellular components were missing or homologs were not retrieved by BLAST searching of the UniProtKB/TrEMBL databases, but for which CELLO accurately predicted the subcellular localization(s).
The archaeal benchmark dataset found in PSORTb3.0 [23], denoted PS30Arch, includes 805 protein sequences in four subcellular categories: extracellular, cell wall, membrane, and cytoplasmic.