| Literature DB >> 18241343 |
Wen-Lin Huang1, Chun-Wei Tung, Shih-Wen Ho, Shiow-Fen Hwang, Shinn-Ying Ho.
Abstract
BACKGROUND: Gene Ontology (GO) annotation, which describes the function of genes and gene products across species, has recently been used to predict protein subcellular and subnuclear localization. Existing GO-based prediction methods for protein subcellular localization use the known accession numbers of query proteins to obtain their annotated GO terms. An accurate prediction method for predicting subcellular localization of novel proteins without known accession numbers, using only the input sequence, is worth developing.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18241343 PMCID: PMC2262056 DOI: 10.1186/1471-2105-9-80
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Essential GO terms and their definitions
| Compartment | Essential | Definition |
| GO term | ||
| Centriole | GO:0005814 | A cellular organelle, found close to the nucleus in many eukaryotic cells, consisting of a small cylinder with microtubular walls, 300–500 nm long and 150–250 nm in diameter. |
| Cytoplasm | GO:0005737 | All of the contents of a cell excluding the plasma membrane and nucleus, but including other subcellular structures. |
| Cytoskeleton | GO:0005856 | Any of the various filamentous elements that form the internal framework of eukaryotic cells, and typically remain after treatment of the cells with mild detergent to remove membrane constituents and soluble components of the cytoplasm. |
| Endoplasmic reticulum | GO:0005783 | The irregular network of unit membranes, visible only by electron microscopy, that occurs in the cytoplasm of many eukaryotic cells. |
| Extracellular | GO:0030198 | A process that is carried out at the cellular level which results in the formation, arrangement of constituent parts, or disassembly of an extracellular matrix |
| Golgi apparatus | GO:0005794 | A compound membranous cytoplasmic organelle of eukaryotic cells, consisting of flattened, ribosome-free vesicles arranged in a more or less regular stack. ... |
| Lysosome | GO:0005764 | Any of a group of related cytoplasmic, membrane bound organelles that are found in most animal cells and that contain a variety of hydrolases, most of which have their maximal activities in the pH range 5–6. ... |
| Chloroplast | GO:0009507 | Any of the small, heterogeneous, artifactual, vesicular particles, 50–150 nm in diameter, that are formed when some eukaryotic cells are homogenized and that sediment on centrifugation at 100000 g. |
| Microsome | GO:0005792 | A semiautonomous, self replicating organelle that occurs in varying numbers, shapes, and sizes in the cytoplasm of virtually all eukaryotic cells. It is notably the site of tissue respiration. |
| Mitochondrion | GO:0005739 | A membrane-bounded organelle of eukaryotic cells in which chromosomes are housed and replicated. ... |
| Nucleus | GO:0005634 | A small, membrane-bounded organelle that uses dioxygen (O2) to oxidize organic molecules; contains some enzymes that produce and others that degrade hydrogen peroxide (H2O2). |
| Peroxisome | GO:0005777 | The membrane surrounding a cell that separates the cell from its external environment. It consists of a phospholipid bilayer and associated proteins. |
| Plasma membrane | GO:0005886 | A cellular organelle, found close to the nucleus in many eukaryotic cells, consisting of a small cylinder with microtubular walls, 300–500 nm long and 150–250 nm in diameter. It contains nine short, parallel, peripheral microtubular fibrils, each fibril consisting of one complete microtubule fused to two incomplete microtubules. |
| Cell wall | GO:0005618 | The rigid or semi-rigid envelope lying outside the cell membrane of plant, fungal, and most prokaryotic cells, maintaining their shape and protecting them from osmotic lysis. |
| Cyanelle | GO:0009842 | Plastid type found in Glaucophyta having unstacked thylakoid membranes bearing phycobilisomes; cyanelles are bound by a double membrane and a peptidoglycan layer. |
| Vacuole | GO:0005773 | A closed structure, found only in eukaryotic cells, that is completely surrounded by unit membrane and contains liquid material. |
| Plastid | GO:0009536 | Any member of a family of organelles found in the cytoplasm of plants and some protists, which are membrane-bounded and contain DNA. |
Data set SCL12. The data set SCL12 consists of SCL12L and SCL12T as the learning data set and testing data set, respectively. There are 12 essential GO terms corresponding to subcellular compartments. The number t of (t) in SCL12L represents the number of sequences which are correctly annotated by only one essential GO term.
| Label | Compartment | Essential | Number of sequences | |
| GO term | SCL12L | SCL12T | ||
| 1 | Centriole | GO:0005814 | 20 (1) | 25 |
| 2 | Cytoplasm | GO:0005737 | 155 (38) | 377 |
| 3 | Cytoskeleton | GO:0005856 | 12 (6) | 14 |
| 4 | Endoplasmic reticulum | GO:0005783 | 28 (19) | 35 |
| 5 | Extracellular | GO:0030198 | 140 (0) | 301 |
| 6 | Golgi apparatus | GO:0005794 | 33 (5) | 42 |
| 7 | Lysosome | GO:0005764 | 32 (27) | 40 |
| 8 | Microsome | GO:0005792 | 7 (0) | 8 |
| 9 | Mitochondrion | GO:0005739 | 125 (111) | 228 |
| 10 | Nucleus | GO:0005634 | 196 (179) | 580 |
| 11 | Peroxisome | GO:0005777 | 18 (16) | 23 |
| 12 | Plasma membrane | GO:0005886 | 153 (23) | 368 |
| Total | 919 (425) | 1122 | ||
Data set SCL16. The data set SCL16 consists of SCL16L and SCL16T as the learning data set and testing data set, respectively. There are 15 essential GO terms corresponding to eukaryotic subcellular compartments. Note that GO:0005814 is not appeared in the set of n = 2870 GO terms. The number t of (t) in SCL16L represents the number of sequences which are correctly annotated by only one essential GO term.
| Label | Compartment | Essential | Number of sequences | |
| GO term | SCL16L | SCL16T | ||
| 1 | Centriole | GO:0005814 | 17 (0) | 4 |
| 2 | Cytoplasm | GO:0005737 | 384 (92) | 334 |
| 3 | Cytoskeleton | GO:0005856 | 20 (7) | 5 |
| 4 | Endoplasmic reticulum | GO:0005783 | 91 (83) | 22 |
| 5 | Extracellular | GO:0030198 | 402 (1) | 404 |
| 6 | Golgi apparatus | GO:0005794 | 68 (8) | 17 |
| 7 | Lysosome | GO:0005764 | 37 (32) | 9 |
| 8 | Chloroplast | GO:0009507 | 207 (192) | 51 |
| 9 | Mitochondrion | GO:0005739 | 183 (173) | 45 |
| 10 | Nucleus | GO:0005634 | 474 (395) | 695 |
| 11 | Peroxisome | GO:0005777 | 52 (38) | 12 |
| 12 | Plasma membrane | GO:0005886 | 323 (29) | 90 |
| 13 | Cell wall | GO:0005618 | 20 (16) | 5 |
| 14 | Cyanelle | GO:0009842 | 78 (65) | 19 |
| 15 | Vacuole | GO:0005773 | 36 (30) | 8 |
| 1 6 | Plastid | GO:0009536 | 31 (1) | 7 |
| Total | 2423 (1162) | 1727 | ||
Results of GO annotation for all sequences in SCL12L and SCL16
| Data set | Total GO terms | Number of GO terms | Number of sequences annotated by | ||||
| Smallest | Largest | Mean | |||||
| SCL12L | 1714 | 0 | 35 | 8.3 | 404 | 453 | 62 |
| SCL16L | 2870 | 0 | 50 | 7.7 | 1025 | 1247 | 151 |
Figure 1Training accuracies of SVM-IGO and SVM-RBS performed by using SVM with a number r of selected informative GO terms.
The m = 44 informative GO terms by applying GOmining to SCL12L. The GO terms in bold style are essential GO terms.
| Rank by MED | GO term | Branch | MED | Rank by MED | GO term | Branch | MED |
| 1 | C | 390.1 | 23 | GO:0007218 | B | 57.1 | |
| 2 | C | 350.3 | 24 | GO:0042742 | B | 56.3 | |
| 3 | GO:0016021 | C | 297.6 | 25 | GO:0005815 | C | 56.3 |
| 4 | GO:0005576 | C | 136.6 | 26 | GO:0005319 | M | 55.2 |
| 5 | GO:0008285 | B | 72.8 | 27 | GO:0020037 | M | 54.7 |
| 6 | C | 70.2 | 28 | C | 50.8 | ||
| 7 | GO:0050909 | B | 69.3 | 29 | C | 43.0 | |
| 8 | GO:0008633 | B | 69.3 | 30 | GO:0005215 | M | 42.5 |
| 9 | GO:0009396 | B | 67.4 | 31 | C | 41.7 | |
| 10 | B | 66.9 | 32 | GO:0016757 | M | 39.5 | |
| 11 | GO:0031227 | C | 66.7 | 33 | C | 37.3 | |
| 12 | GO:0006888* | B | 66.1 | 34 | C | 36.0 | |
| 13 | GO:0005859 | C | 65.4 | 35 | GO:0050896 | B | 33.6 |
| 14 | C | 64.7 | 36 | GO:0005813 | C | 30.6 | |
| 15 | GO:0009596 | B | 64.1 | 37 | GO:0005578 | C | 28.8 |
| 16 | GO:0006421 | B | 63.7 | 38 | GO:0005615 | C | 28.4 |
| 17 | GO:0006941 | B | 63.7 | 39 | GO:0007165 | B | 22.7 |
| 18 | GO:0005622 | C | 63.0 | 40 | GO:0006886 | B | 20.1 |
| 19 | GO:0004356 | M | 62.6 | 41 | GO:0030662 | C | 19.9 |
| 20 | GO:0008484 | M | 62.6 | 42 | GO:0005216 | M | 9.7 |
| 21 | GO:0017119 | C | 62.1 | 43 | C | 3.8 | |
| 22 | GO:0006879 | B | 58.9 | 44 | C | 1.2 |
The m = 60 informative GO terms by applying GOmining to SCL16L. The GO terms in bold style are essential GO terms.
| Rank by MED | GO term | Branch | MED | Rank by MED | GO term | Branch | MED |
| 1 | C | 331.7 | 31 | GO:0005525 | M | 56.5 | |
| 2 | C | 244.2 | 32 | GO:0005789 | C | 55.6 | |
| 3 | GO:0016020 | C | 148.1 | 33 | GO:0004725 | M | 55.6 |
| 4 | C | 147.8 | 34 | GO:0008270 | M | 54.6 | |
| 5 | GO:0001844 | B | 70.7 | 35 | GO:0015031 | B | 54.1 |
| 6 | GO:0005212 | M | 70.5 | 36 | GO:0005813 | C | 52.2 |
| 7 | C | 68.5 | 37 | GO:0005524 | M | 51.8 | |
| 8 | GO:0006094 | B | 67.9 | 38 | GO:0051536 | M | 51.8 |
| 9 | GO:0045261 | C | 67.5 | 39 | GO:0016702 | M | 51.1 |
| 10 | C | 66.6 | 40 | GO:0005887 | C | 48.4 | |
| 11 | GO:0007010 | B | 65.8 | 41 | GO:0005905 | C | 46.1 |
| 12 | B | 65.4 | 42 | C | 43.6 | ||
| 13 | GO:0009626 | B | 65.4 | 43 | GO:0005622 | C | 43.4 |
| 14 | GO:0005047 | M | 64.6 | 44 | GO:0005759 | C | 43.1 |
| 15 | GO:0017134 | M | 64.1 | 45 | C | 42.1 | |
| 16 | GO:0000287 | M | 63.3 | 46 | GO:0016757 | M | 41.7 |
| 17 | GO:0006888 | B | 61.7 | 47 | C | 41.3 | |
| 18 | GO:0030234 | M | 61.7 | 48 | C | 35.7 | |
| 19 | GO:0007323 | B | 61.0 | 49 | C | 31.2 | |
| 20 | GO:0008083 | M | 60.8 | 50 | C | 27.2 | |
| 21 | GO:0004521 | M | 60.3 | 51 | GO:0006811 | B | 25.3 |
| 22 | GO:0003723 | M | 60.2 | 52 | GO:0006350 | B | 24.3 |
| 23 | GO:0009514 | C | 59.9 | 53 | GO:0016740 | M | 23.1 |
| 24 | GO:0020015 | C | 59.4 | 54 | C | 20.2 | |
| 25 | GO:0000922 | C | 59.0 | 55 | GO:0005829 | C | 19.6 |
| 26 | GO:0005681 | C | 58.9 | 56 | GO:0016798 | M | 15.5 |
| 27 | GO:0030149 | B | 57.9 | 57 | GO:0007186 | B | 14.8 |
| 28 | GO:0000917 | B | 56.9 | 58 | GO:0019843 | M | 13.0 |
| 29 | GO:0009405 | B | 56.8 | 59 | C | 11.8 | |
| 30 | GO:0005615 | C | 56.5 | 60 | C | 10.2 |
Comparison of prediction accuracy (%) using 10-CV. Performance comparison uses prediction accuracy (%) of 10-CV.
| Data set | Fuzzy | SVM-GO | SVM-RBS | SVM-IGO | |
| SCL12L | 74.3 | 71.4 | 88.5 | 86.5 | 89.8 |
| SCL16L | 66.0 | 59.3 | 84.8 | 83.5 | 86.5 |
Comparison of prediction accuracy (%) for SCL12. Prediction accuracies (%) for using leave-one-out cross-validation (LOOCV) on SCL12L and independent test on SCL12T are obtained from the paper [4]. The input data is sequence only (S) or sequence with accession number (AN).
| Method | Input data | Features | LOOCV SCL12L | Independent test SCL12T |
| ProtLock [8] | S | AAC | 29.7 | 26.4 |
| Least Euclidean distance [9] | S | AAC | 30.1 | 29.5 |
| Ploc [10] | S | AAC and amino acid pairs | 30.5 | 34.3 |
| HSLPred [11] | S | AAC and dipeptide composition | 30.7 | 33.3 |
| ProLoc-GO | S | GO terms using BLAST | 90.0 | 88.1 |
| Hum-PLoc [4] | S with AN | Hybridization of GO terms and Pse-AA | 81.1 | 85.0 |
| ProLoc-GO | S with AN | GO terms (No BLAST) | 91.1 | 90.6 |
Comparison of prediction accuracy (%) for SCL16. Prediction accuracies (%) of using LOOCV on SCL16L and independent test on SCL16T are obtained from the paper [2]. The input data is sequence only (S) or sequence with accession number (AN).
| Method | Input data | Features | LOOCV SCL16L | Independent test SCL16T |
| ProtLock [8] | S | AAC | 28.7 | 25.3 |
| Least Euclidean distance [9] | S | AAC | 25.8 | 20.4 |
| Ploc [10] | S | AAC and amino acid pairs | 35.1 | 32.8 |
| HSLPred [11] | S | AAC and dipeptide composition | 33.1 | 34.5 |
| ProLoc-GO | S | GO terms using BLAST | 86.6 | 83.3 |
| Euk-OET-PLoc [2] | S with AN | Hybridization of GO terms and Pse-AA | 81.6 | 83.7 |
| ProLoc-GO | S with AN | GO terms (No BLAST) | 89.0 | 85.7 |
Accuracies and MCC preformed on SCL12
| Label | Compartment | SCL12L | SCL12T | |
| Sequence | Sequence | Accession no. | ||
| 1 | Centriole | 65.0 (0.803) | 60.0 (0.670) | 60.0 (0.774) |
| 2 | Cytoplasm | 88.4 (0.784) | 82.9 (0.734) | 85.1 (0.790) |
| 3 | Cytoskeleton | 16.7 (0.406) | 0.0 (-0.002) | 50.0 (0.314) |
| 4 | Endoplasmic reticulum | 89.3 (0.804) | 71.4 (0.501) | 85.7 (0.603) |
| 5 | Extracellular | 86.4 (0.871) | 76.4 (0.802) | 79.5 (0.830) |
| 6 | Golgi apparatus | 48.5 (0.630) | 33.3 (0.328) | 44.4 (0.364) |
| 7 | Lysosome | 96.9 (0.952) | 87.5 (0.744) | 87.5 (0.781) |
| 8 | Microsome | 85.7 (0.800) | 100.0 (0.446) | 100.0 (0.446) |
| 9 | Mitochondrion | 99.2 (0.986) | 97.1 (0.978) | 100.0 (0.995) |
| 10 | Nucleus | 95.9 (0.939) | 94.3 (0.930) | 96.9 (0.952) |
| 11 | Peroxisome | 94.4 (0.943) | 100.0 (0.912) | 100.0 (0.912) |
| 12 | Plasma membrane | 96.1 (0.942) | 90.7 (0.893) | 91.6 (0.921) |
| Overall accuracy % (MCC) | 90.0 (0.822) | 88.1 (0.661) | 90.6 (0.724) | |
Accuracies and MCC preformed on SCL16
| Label | Compartment | SCL16L | SCL16T | |
| Sequence | Sequence | Accession no. | ||
| 1 | Centriole | 61.1 (0.747) | 66.7 (0.729) | 50.0 (0.577) |
| 2 | Cytoplasm | 72.9 (0.706) | 74.6 (0.676) | 72.8 (0.659) |
| 3 | Cytoskeleton | 20.0 (0.363) | 0.0 (-0.002) | 0.0 (-0.003) |
| 4 | Endoplasmic reticulum | 90.1 (0.841) | 77.3 (0.707) | 72.7 (0.678) |
| 5 | Extracellular | 89.6 (0.778) | 84.4 (0.738) | 84.9 (0.796) |
| 6 | Golgi apparatus | 63.2 (0.653) | 41.2 (0.508) | 41.2 (0.486) |
| 7 | Lysosome | 97.3 (0.973) | 100.0 (0.865) | 100.0 (0.865) |
| 8 | Chloroplast | 98.6 (0.989) | 100.0 (0.990) | 100.0 (0.990) |
| 9 | Mitochondrion | 99.5 (0.963) | 91.1 (0.879) | 95.6 (0.877) |
| 10 | Nucleus | 89.0 (0.913) | 86.3 (0.865) | 93.5 (0.924) |
| 11 | Peroxisome | 98.1 (0.971) | 100.0 (0.960) | 100.0 (0.925) |
| 12 | Plasma membrane | 86.4 (0.851) | 88.9 (0.773) | 82.2 (0.798) |
| 13 | Cell wall | 75.0 (0.655) | 60.0 (0.422) | 80.0 (0.631) |
| 14 | Cyanelle | 88.5 (0.939) | 94.7 (0.973) | 100.0 (1.000) |
| 15 | Vacuole | 86.1 (0.847) | 62.5 (0.790) | 50.0 (0.706) |
| 16 | Plastid | 51.6 (0.595) | 42.9 (0.426) | 42.9 (0.311) |
| Overall accuracy % (MCC) | 86.6 (0.799) | 83.3 (0.706) | 85.7 (0.710) | |
Distribution of the m informative GO terms. Most instructive GO terms (80%) are not offspring of the essential GO terms that the ratios are 26/32 and 36/45 for SCL12L and SCL16L, respectively.
| SCL12L ( | SCL16L ( | |
| Essential GO terms | 12: 1 (B), 11 (C) | 15: 1(B), 14(C) |
| Instructive GO terms: | 32: | 45: |
| (a) offspring of some essential GO term | 4 (C) | 9 (C) |
| (b) between two essential GO terms | 2 (C) | 0 |
| (c) not offspring of any essential GO term | 7(M), 14(B), 5 (C) | 18(M), 13(B), 5(C) |
Figure 2Some of the selected GO terms which are offspring of essential GO terms. For SCL12L, there are three terms shown: GO:0031227, GO:0030662 and GO:0017119. For SCL16L, five GO terms are shown: GO:0009514, GO:0005681, GO:0005789, GO:0005759 and GO:0005829.
Figure 3Some of the selected GO terms which are between two essential GO terms. For SCL12L, the two instructive GO terms GO:0005815 and GO:0005813 are between the essential GO terms GO:0005856 and GO:0005814. For SCL16L, GO:0005813 and GO:0000922 are offspring of the essential GO term GO:0005856, belonging to the class (a). GO:0005814 is not an essential GO term for SCL16L.
Figure 4Some of the selected GO terms are NOT offspring of any essential GO terms. For SCL12L, five instructive GO terms are shown belonging to cellular component branch: GO:0016021, GO:0005576, GO:0005622, GO:0005578 and GO:0005615, which are not offspring of essential GO terms. For SCL16L, five GO terms belonging to the class (c) are shown: GO:0005622, GO:0005615, GO:0020015, GO:0016020 and GO:0045261. GO:0005905 and GO:0005887 belong to the class (a).
Figure 5Sequence representation and IGA-chromosome encoding method.
The used control parameters of IGA
| Parameter | Value |
| Population size | 50 |
| Selection probability | 0.2 |
| Crossover probability | 0.8 |
| Mutation probability | 0.05 |
| Factor number of orthogonal arrays | 7 |
| Maximum generations | 60 |
Figure 6Prediction flowchart of ProLoc-GO using both classifiers SVM-IGO and SVM-GO.