| Literature DB >> 20122195 |
Byungkyu Park1, Kyungsook Han.
Abstract
BACKGROUND: Keyword matching or ID matching is the most common searching method in a large database of protein-protein interactions. They are purely syntactic methods, and retrieve the records in the database that contain a keyword or ID specified in a query. Such syntactic search methods often retrieve too few search results or no results despite many potential matches present in the database.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20122195 PMCID: PMC3009494 DOI: 10.1186/1471-2105-11-S1-S23
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Gödel number representation. Original Gödel numbers cannot represent the Directed Acyclic Graph (DAG) structure of the Gene Ontology.
| Term | Natural number | Relation | Gödel Number |
|---|---|---|---|
| is a | 1 | ||
| part of | 2 | R1: Term2 is a Term1 | 243153 = 6, 000 |
| Term1 | 3 | R2: Term3 is part of Term2 | 253254 = 180, 000 |
| Term2 | 4 | R3: Term4 is a Term2 | 263154 = 120, 000 |
| Term3 | 5 | ||
| Term4 | 6 |
The following example shows why the original Gödel numbers fail to represent the GO structure. Suppose that we represent terms by unique natural numbers and the relations between them by Gödel numbers. In this example, Term4 is a kind of Term2 by relation R3, and Term4 is a kind of Term1 by relation R1. But these relations cannot be inferred from the representation because the original Gödel numbers are not sufficient to represent the DAG structure of GO.
Modified Gödel number representation.
| Term | Prime number | Relation | Modified Gödel Number |
|---|---|---|---|
| Term1 | 2 | ||
| Term2 | 3 | R1: Term2 is a Term1 | relation(is a) = 3 × 2 = 6 |
| Term3 | 5 | R2: Term3 is part of Term2 | relation(part of) = 5 × 3 × 2 = 30 |
| Term4 | 7 | R3: Term4 is a Term2 | relation(is a) = 7 × 3 × 2 = 42 |
Each term is assigned a prime number instead of natural number, and each relation is denoted by a modified Gödel number, which is a multiplication of prime numbers representing the term and its ancestors in the ontology hierarchy. For example, relation R3 is denoted by 42, which has prime factors 2 (Term1, root node in the hierarchy), 3 (Term2), and 7 (Term3). Using this representation, relation R3 can be easily inferred by unique factorization of it into primes (Term1, Term2, and Term3).
Reasoning protein-protein interactions.
| Term | Prime number | Relation | Modified Gödel Number |
|---|---|---|---|
| domainA | 11 | ||
| domainB | 13 | R4: domainA interacts with domainB | relation(interacts) = 11 × 13 = 143 |
| ProteinA | 17 | R5: ProteinA has domainA | relation(has a) = 17 × 11 = 187 |
| ProteinB | 19 | R6: ProteinB has domainB | relation(has a) = 19 × 13 = 247 |
Using our representation, it is possible to infer protein-protein interactions from domain-domain interactions. Suppose that domainA interacts with domainB, proteinA has domainA, and that proteinB has domainB. Simple arithmetic operations such as integer division and modulo operations are sufficient to infer that ProteinA interacts with ProteinB.
Figure 1User interface of the ontology-based search engine. An example of using the autocomplete functionality for GO terms.
Comparison of search results by two search methods.
| GO term ID | GO term name | ID-matching search | Ontology-based search |
|---|---|---|---|
| Biological process | |||
| GO:0008150 | biological process | 5 (0.01%) | 36, 523 (95.63%) |
| GO:0008152 | metabolic process | 2, 862 (7.49%) | 19, 828 (51.92%) |
| GO:0044238 | primary metabolic process | 0 (0.00%) | 17, 434 (45.65%) |
| GO:0043170 | macromolecule metabolic process | 0 (0.00%) | 7, 211 (18.88%) |
| GO:0019538 | protein metabolic process | 5, 324 (13.94%) | 5, 659 (14.82%) |
| Molecular function | |||
| GO:0003676 | nucleic acid binding | 10 (0.03%) | 8, 733 (22.87%) |
| GO:0003677 | DNA binding | 1, 944 (5.09%) | 6, 935 (18.16%) |
| GO:0003700 | transcription factor activity | 5, 164 (13.52%) | 5, 164 (13.52%) |
| Cellular component | |||
| GO:0005622 | intracellular | 0 (0.00%) | 31, 694 (82.99%) |
| GO:0005737 | cytoplasm | 17, 312 (45.33%) | 20, 990 (54.96%) |
| GO:0005829 | cytosol | 452 (1.18%) | 452 (1.18%) |
The number of protein-protein interactions found in HPRD release 7 by each search method. The numbers inside parentheses indicate the ratio of the interactions to the total 38, 190 interactions of HPRD. The ID-matching search often finds more interactions with a specialized GO terms than with a less specialized terms since it does not consider semantic relation between ontology terms.
Figure 2Example of the gene ontology hierarchy. A partial view of the three ontologies of the Gene Ontology (GO). The GO terms 'biological process', 'molecular function', and 'cellular component' are the root nodes of three GO hierarchies. Several intermediate terms between the nodes are not shown for clarity.
Figure 3Interaction network of human proteins found with Nucleotide binding and ATP binding. Networks 1-11 represent the 70 protein-protein interactions found by the ontology-based search with the query of 'Nucleotide binding' from the HPRD data. Networks 7-11 represent the 31 interactions found by the ontology-based search with the query of 'ATP binding', which is a more specific term than 'Nucleotide binding'. The ID-matching search found only 5 interactions (networks 5-6) with 'Nucleotide binding' and missed all the other interactions whereas its search results with 'ATP binding' are same as those of the ontology-based search (networks 7-11). Yellow nodes represent the human proteins explicitly annotated with 'ATP binding', pink nodes represent the human protein explicitly annotated with 'Nucleotide binding', and white nodes represent the human proteins with no explicit annotation of 'ATP binding' nor 'Nucleotide binding'. The GO term IDs of the proteins found by the search methods are listed in Table 5.
Ontology-based search with 'Nucleotide binding' GO term.
| Query protein | Partner protein | ||
|---|---|---|---|
| HPRD_ID | Function | HPRD_ID | Function |
| HPRD_02944 | ATP binding | HPRD_02431 | Acyltransferase activity |
| HPRD_01368 | ATP binding | HPRD_02147 | ATP binding |
| HPRD_01368 | ATP binding | HPRD_02300 | ATPase activity |
| HPRD_02147 | ATP binding | HPRD_12171 | Catalytic activity |
| HPRD_01368 | ATP binding | HPRD_02110 | Extracellular matrix structural constituent |
| HPRD_02944 | ATP binding | HPRD_02682 | GTPase activity |
| HPRD_09468 | ATP binding | HPRD_08986 | Phospholipase activity |
| HPRD_01368 | ATP binding | HPRD_02610 | Protein binding |
| HPRD_02944 | ATP binding | HPRD_03913 | Protein binding |
| HPRD_09468 | ATP binding | HPRD_01496 | Protein serine/threonine kinase activity |
| HPRD_09468 | ATP binding | HPRD_02619 | Protein serine/threonine kinase activity |
| HPRD_09468 | ATP binding | HPRD_03479 | Protein serine/threonine kinase activity |
| HPRD_05802 | ATP binding | HPRD_04066 | Protein serine/threonine kinase activity |
| HPRD_09468 | ATP binding | HPRD_05428 | Protein serine/threonine kinase activity |
| HPRD_05802 | ATP binding | HPRD_02963 | Receptor activity |
| HPRD_09468 | ATP binding | HPRD_01158 | Structural constituent of myelin sheath |
| HPRD_02147 | ATP binding | HPRD_01235 | Transcription regulator activity |
| HPRD_09468 | ATP binding | HPRD_00591 | Translation regulator activity |
| HPRD_09468 | ATP binding | HPRD_06774 | Translation regulator activity |
| HPRD_09468 | ATP binding | HPRD_06802 | Translation regulator activity |
| HPRD_09468 | ATP binding | HPRD_09084 | Translation regulator activity |
| HPRD_01368 | ATP binding | HPRD_03051 | Transporter activity |
| HPRD_16742 | FAD binding | HPRD_11762 | DNA binding |
| HPRD_16742 | FAD binding | HPRD_02171 | Hydrolase activity |
| HPRD_04100 | GTP binding | HPRD_07135 | Acyltransferase activity |
| HPRD_04100 | GTP binding | HPRD_01721 | Auxiliary transport protein activity |
| HPRD_04100 | GTP binding | HPRD_01722 | Auxiliary transport protein activity |
| HPRD_04100 | GTP binding | HPRD_04738 | GTP binding |
| HPRD_04738 | GTP binding | HPRD_06716 | GTP binding |
| HPRD_11978 | GTP binding | HPRD_00743 | GTPase activity |
| HPRD_11978 | GTP binding | HPRD_00766 | GTPase activity |
| HPRD_04100 | GTP binding | HPRD_03297 | GTPase activity |
| HPRD_10360 | GTP binding | HPRD_03297 | GTPase activity |
| HPRD_04100 | GTP binding | HPRD_12228 | GTPase activity |
| HPRD_04738 | GTP binding | HPRD_12228 | GTPase activity |
| HPRD_10360 | GTP binding | HPRD_06692 | GTPase activity |
| HPRD_10360 | GTP binding | HPRD_08555 | GTPase activity |
| HPRD_11978 | GTP binding | HPRD_09191 | GTPase activity |
| HPRD_11978 | GTP binding | HPRD_09973 | GTPase activity |
| HPRD_11978 | GTP binding | HPRD_11820 | GTPase activity |
| HPRD_10360 | GTP binding | HPRD_04398 | Protein binding |
| HPRD_11978 | GTP binding | HPRD_01265 | Protein serine/threonine kinase activity |
| HPRD_06419 | GTP binding | HPRD_03384 | Protein serine/threonine phosphatase activity |
| HPRD_04100 | GTP binding | HPRD_06288 | RNA binding |
| HPRD_04738 | GTP binding | HPRD_01853 | Structural constituent of cytoskeleton |
| HPRD_04100 | GTP binding | HPRD_01451 | Structural molecule activity |
| HPRD_06419 | GTP binding | HPRD_01859 | Transcription factor activity |
| HPRD_06419 | GTP binding | HPRD_16515 | Transcription regulator activity |
| HPRD_04100 | GTP binding | HPRD_03967 | Ubiquitin-specific protease activity |
| HPRD_04738 | GTP binding | HPRD_03967 | Ubiquitin-specific protease activity |
| HPRD_09704 | Nucleotide binding | HPRD_03356 | Signal transducer activity |
| HPRD_09704 | Nucleotide binding | HPRD_16544 | Transcription factor activity |
| HPRD_09704 | Nucleotide binding | HPRD_03221 | Transcription regulator activity |
| HPRD_13115 | Nucleotide binding | HPRD_03015 | Unknown |
| HPRD_09704 | Nucleotide binding | HPRD_04484 | Unknown |
In HPRD the ontology-based search found 70 interactions involving a protein annotated with 'Nucleotide binding'. Only 5 interactions have a protein with an explicit annotation of 'Nucleotide binding'. The remaining 65 interactions were inferred by finding a protein annotated with a more specialized term such as 'ATP binding', 'FAD binding' or 'GTP binding'. Due to space limit, 15 interactions (7 self-interactions and 8 interactions involving a protein with unknown function) are not shown here.
Example of searching protein-protein interactions by specifying multiple GO terms on the query protein.
| Multiple GO terms | Search method | ||
|---|---|---|---|
| Query protein | ID-matching search | Ontology-based search | |
| Biological process | Cellular component | ||
| GO:0019538 | GO:0005737 | 1994 (5.22%) | 3062 (8.02%) |
| GO:0019538 | GO:0005576 | 753 (1.97%) | 769 (2.01%) |
| Molecular function | Cellular component | ||
| GO:0003700 | GO:0005737 | 576 (1.51%) | 592 (1.55%) |
| GO:0003700 | GO:0005576 | 6 (0.02%) | 103 (0.27%) |
Search results when two GO terms are specified on the query protein, one for the biological process and another for the cellular component of the query protein.
Figure 4Interaction network of HCV-human proteins. The network contains HCV proteins (core, E1, E2, NS2, NS3, NS4A, NS4B, NS4A, NS5B, F and p7) and the human proteins interacting with them. The interaction data was obtained from a literature [19] and the network was visualized by Cytoscape [18]. The GO annotations for the HCV proteins and human proteins in the network are available at [21].