| Literature DB >> 21233089 |
Ruth Isserlin1, Rashad A El-Badrawi, Gary D Bader.
Abstract
The Biomolecular Interaction Network Database (BIND) is a major source of curated biomolecular interactions, which has been unmaintained for the last few years, a trend which will eventually result in the loss of a significant amount of unique biomolecular interaction information, mostly as database identifiers become out of date. To help reverse this trend, we converted BIND to a standard format, Proteomics Standard Initiative-Molecular Interaction 2.5, starting from the last curated data release (from 2005) available in a custom XML format and made the core components (interactions and complexes) plus additional valuable curated information available for download (http://download.baderlab.org/BINDTranslation/). Major work during the conversion process was required to update out of date molecule identifiers resulting in a more comprehensive conversion of BIND, by measures including number of species and interactor types covered, than what is currently accessible elsewhere. This work also highlights issues of data modeling, controlled vocabulary adoption and data cleaning that can serve as a general case study on the future compatibility of interaction databases. Database URL: http://download.baderlab.org/BINDTranslation/Entities:
Mesh:
Year: 2011 PMID: 21233089 PMCID: PMC3021793 DOI: 10.1093/database/baq037
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.Interaction types present in BIND.
Main BIND data types and usage statistics
| Element name | Types | Number of instances | |
|---|---|---|---|
| BIND-interaction | 206 859 | ||
| BIND-Interaction_iid | 206 859 | ||
| BIND-Interaction_a | 206 859 | ||
| Complex | 2932 | ||
| DNA | 3020 | ||
| Gene | 10 872 | ||
| Not-specified | 15 635 | ||
| Protein | 143 800 | ||
| RNA | 758 | ||
| Small molecule | 29 842 | ||
| BIND-place | 4955 | ||
| BIND-Interaction_b | 206 859 | ||
| Complex | 645 | ||
| DNA | 54 027 | ||
| Gene | 7337 | ||
| Not-specified | 4005 | ||
| Photon | 291 | ||
| Protein | 131 153 | ||
| RNA | 3706 | ||
| Small molecule | 5695 | ||
| BIND-place | 10 508 | ||
| BIND-Interaction_descr | 206 859 | ||
| BIND-condition | 299 801 | ||
| BIND-place | 739 | ||
| BIND-action | 3272 | ||
| BIND-state (interactor A) | 2027 | ||
| BIND-state (interactor B) | 841 | ||
| BIND-loc | 121 064 | ||
| BIND-descr_intramolecular | 57 | ||
| BIND-Interaction_source | 206 859 | ||
| BIND-pub-set_disputed | 37 | ||
| PubMedId | 254 191 | ||
| BIND-Interaction_division | 206 859 | ||
| BIND-molecular-complex | |||
| BIND-Molecular-Complex_mcid | 3703 | ||
| BIND-Molecular-Complex_descr | 3703 | ||
| BIND-Molecular-Complex_sub-units | 3703 | ||
| Complex | 527 | ||
| DNA | 64 | ||
| Not-specified | 10 | ||
| Protein | 20 764 | ||
| RNA | 78 | ||
| Small molecule | 159 | ||
| BIND-place | 8417 | ||
| BIND-Molecular-Complex_interaction-list | 3703 | ||
| BIND-Molecular-Complex_ordered | 48 = True, 3655 = False | ||
| BIND-Molecular-Complex_source | 3703 | ||
| BIND-pub-set_disputed | 0 | ||
| PubMedId | 4244 | ||
| BIND-Molecular-Complex_division | 3699 |
Fields present in BIND-descr representing mostly unique information with usage statistics
| BIND-descr | Number of instances | Notes | Reason |
|---|---|---|---|
| BIND-condition | 299 801 | ||
| BIND-condition_action | 1740 | Not translated | No good mapping to PSI-MI |
| BIND-condition_bait-condition | 299 801 | Translated | |
| BIND-condition_descr | 294 187 | Translated | |
| BIND-condition_exp-form-a | 163 673 | Translated | |
| BIND-condition_exp-form-b | 250 924 | Translated | |
| BIND-condition_general | 299 801 | List of possible general experimental conditions: | No good mapping to PSI-MI |
| BIND-condition_genetic-exp | 14 554 | Not translated | Will translate in a future version |
| BIND-condition_negative-result | 26 | Not translated | Will translate in a future version |
| BIND-condition_other-db | 14 | Not translated | Will translate in a future version |
| BIND-condition_site | 63 117 | Not translated | Will translate in a future version |
| BIND-condition_source (individual Pubs) | 292 306 | Translated | |
| BIND-condition_system | 299 801 | Translated | |
| BIND-cons-seq-set | 14 | Not translated | No good mapping to PSI-MI |
| BIND-place | 739 | Translated | |
| BIND-action | 3272 | Not translated | No good mapping to PSI-MI |
| BIND-state (interactor A) | 2027 | Not translated | Partial mapping to PSI-MI |
| BIND-state (interactor B) | 841 | Not translated | Partial mapping to PSI-MI |
| BIND-loc | 121 064 | Not translated | Will translate in a future version |
| BIND-descr_intramolecular | 57 | Translated |
Fields translated in current translation are indicated.
Figure 2.Identifier mapping process.
Data cleaning: selected classes of errors, with examples, found in BIND
| Error type | Examples |
|---|---|
| No unified representation for missing information of type character/String | Missing information may be represented as: 'Unknown', 'NULL', 'unknown', ‘WP:NULL’, ‘unknown.’, ‘– ‘,…etc (in addition to ignoring the enclosing XML element altogether at times) |
| No unified representation for missing information of type integer | Missing information may be represented as: ‘0’, ‘–1’,…etc |
| Erroneous representation for references to external databases (x-ref) for some interactors | |
| Erroneous internal cross-reference: complexes referencing non-existent (negative) BIND interaction IDs | |
| Erroneous external cross-reference: negative PubMed identifier | PubMed ID ‘–2’ repeated 68 times in the S.Cerevisiae file |
| Inconsistent pattern for representing the IDs of some interactor x-refs | SGD identifiers ‘SGD: S000003663’ and ‘S000003663’; MGD identifiers ‘MGI:1890695’ and ‘1890695’ are all used. |
| Wrong x-ref type: listing some IDs as RefSeq identifiers while in fact they are GIs | GI IDs: ‘15643805’ and ‘15644490’ listed as RefSeq IDs. |
| Out dated external cross-references | There are 13 070 interactor GIs used in BIND that are not currently in use in Entrez. |
Proposed extensions to the PSI-MI ontology to allow CV mappings for currently unmatched BIND terms
| BIND term missing in PSI-MI CVs | Definition | Type of CV term | Web link |
|---|---|---|---|
| Beilstein | Compound database | Database citation (MI:0444)→participant database(MI:0473) | |
| EINECS | Compound database | Database citation (MI:0444) → participant database (MI:0473) | |
| Merck | Compound database | Database citation (MI:0444) →participant database(MI:0473) | |
| dictyBase | Organism Database | Database citation (MI:0444) →participant database (MI:0473) →sequence database (MI:0683) | |
| HGNC | Organism Database | Database citation (MI:0444) →participant database (MI:0473) →sequence database (MI:0683) | |
| PlantGDB | Organism Database | Database citation (MI:0444) → participant database (MI:0473) → sequence database (MI:0683) | |
| RatMap | Organism Database | Database citation(MI:0444) → participant database(MI:0473) → sequence database (MI:0683) | |
| TAIR | Organism Database | Database citation (MI:0444) → participant database (MI:0473) → sequence database (MI:0683) | |
| TIGR | Organism Database | Database citation(MI:0444) → participant database(MI:0473) → sequence database (MI:0683) | |
| ZFIN | Organism Database | Database citation (MI:0444) → participant database (MI:0473) → sequence database (MI:0683) | |
| COG | Protein Family Database | Database citation (MI:0444) → participant database (MI:0473) | |
| Photon | Interactor type | ||
| Equilibrium dialysis | Method to detect interaction between Ligand and receptor under equilbrium conditions. | Interaction detection method (MI:0001) → experimental interaction detection(MI:0045) | |
| Membrane filtration | Method of filtration to separate molecules from a liquid | Participant detection method (MI:0002) → experimental participant identification (MI:0661) | |
| Monoclonal antibody-blockade | Method to block a binding site on a molecule, such as a protein, using a monoclonal antibody to test that the binding site is involved in an interaction with another molecule. | This term is probably too general to properly classify as an interaction detection method | |
| Nuclear translocation-assay | Method to detect interaction by inducing nuclear localization of one participant, which would then pull an interacting participant along with it into the nucleus. As both participants are labeled, the difference in nuclear localization between the induced and non-induced states provides an indication of the interaction between the two proteins. PMID: 20615205 | Interaction detection method (MI:0001) → experimental interaction detection (MI:0045) | |
| Transient-coexpression | Method consists of the expression of two proteins in a cell followed by interaction detection using a specific method. | This term is probably too general to properly classify as an interaction detection method | |
| Reconstitution | Method to reconstitute participants of a protein interaction in vitro to test if they bind. Depends on an interaction detection method | This term is probably too general to properly classify as an interaction detection method | |
| ASAP | No valid link available—Term Ignored | ||
| BMDL | No valid link available—Term Ignored | ||
| Locus tag | No valid link available—Term Ignored | ||
| MFCD | No valid link available—Term Ignored | ||
| MDL, MDL # | No valid link available—Term Ignored | ||
| aMAZE | Pathway database | Does not exist anymore |
Figure 3.Union of BIND and IntAct interactions for species Rattus norvegicus (taxid: 10 116) as extracted using the PSICQUIC plugin for Cytoscape. Blue edges are interactions from IntAct, red edges are interactions from BIND. Blue nodes are interactors in IntAct only, red nodes are interactors in BIND only and green nodes are interactors shared by the two networks. BIND contains 1103 nodes not in IntAct. IntAct contain 984 nodes not in BIND. The two interaction networks share 217 nodes.