| Literature DB >> 26046679 |
Marco Masseroli, Arif Canakoglu, Massimiliano Quigliatti.
Abstract
BACKGROUND: Increasingly high amounts of heterogeneous and valuable controlled biomolecular annotations are available, but far from exhaustive and scattered in many databases. Several annotation integration and prediction approaches have been proposed, but these issues are still unsolved. We previously created a Genomic and Proteomic Knowledge Base (GPKB) that efficiently integrates many distributed biomolecular annotation and interaction data of several organisms, including 32,956,102 gene annotations, 273,522,470 protein annotations and 277,095 protein-protein interactions (PPIs).Entities:
Mesh:
Year: 2015 PMID: 26046679 PMCID: PMC4460591 DOI: 10.1186/1471-2164-16-S6-S5
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Biomolecular entities, PPIs and annotations with biomedical-molecular characteristics integrated in the Genomic and Proteomic Knowledge Base.
| # of Items | # of Organisms | Total Annotations | Gene Annotations | Protein Annotations | |
|---|---|---|---|---|---|
| 563,760 | 12,904 | - | |||
| 16,199,505 | 14,221 | 32,956,102 | |||
| 8,065,827 | 406 | - | |||
| 56,990,212 | 477,175 | 273,522,470 | |||
| 277,095 | 1,073 | - | - | ||
| 5,403 | 7 | 220,964 | - | 220,964 | |
| 41,285 | 479,950 | 306,032,538 | 32,841,035 | 273,191,503 | |
| 29,459 | 28 | 211,526 | 101,523 | 110,003 | |
| 7,853 | 1 | 13,430 | 13,430 | - | |
| 63 | 1 | 114 | 114 | - |
Figure 1GPKB feature network and protein annotation types considered for transitive relationship based transfer of annotations. Solid line: types of available and transferred annotations; dotted line: types of only transferred annotations; bold red lines: types of available protein annotations considered for annotation transfer; bold blue lines: types of gene annotations transferred.
Annotations transferred by transitive relationship and related feature items and annotations integrated in the GPKB on which the transfer is based.
| # of Distinct | # of Distinct | # of Distinct | # of Distinct | # of Distinct | # of Distinct | # of Distinct | % of Distinct |
|---|---|---|---|---|---|---|---|
| Genes: | Proteins: | Pathways: | 12,031,396 | 104,416 | 98,316 | 3,144 | 3.20 % |
| Genes: | Proteins: | Biological | 12,031,396 | 704,382 | 1,044,857 | 21,942 | 2.10 % |
| Genes: | Proteins: | Enzymes: | 12,031,396 | 200,964 | - | 211,305 | ALL |
| Genes: | Proteins: | Transcripts: 8,065,827 | 12,031,396 | 80,680 | 7,644,482 | 6,793 | 0.09 % |
| Genes: | Proteins: | DNA Sequences: | 12,031,396 | 163,396 | 16,107,408 | 7,690 | 0.05 % |
| Proteins: | Genes: | Genetic Disorders: | 12,031,396 | 12,013 | - | 15,344 | ALL |
| PPIs | Genes: | Genetic Disorders: | 50,863 | 12,013 | - | 1,027 | ALL |
Percentages of annotations transferred are with respect to annotations of the same type available in the GPKB; ALL: only annotations transferred, no annotations available in the GPKB.
Figure 2GPKB Web interface: Search page. Through an intuitive Web interface, the user can search and retrieve any of the annotations downloaded from multiple well known databases, or transferred by transitive relationship, which are integrated in the GPKB.
Figure 3GPKB Web interface: Search result page. The transferred new annotation of the Insulin-like growth factor 2 (somatomedin A) (IGF2) human gene to the Insulin-like growth factor binding biological function is shown. (Notice the external links on all IDs, the "Show new transitive relationships only" button and the "Download" icon.) As all annotations in the GPKB that are transferred by the transitive relationship method, it is clearly marked with the value "TRANSITIVE_RELATIONSHIP" of its Inferred attribute. By clicking on the icon next to it (see the arrow), the "Transitive relationship full provenance" window pops up and shows all the available known association data on which the transitive relationship transfer was based. In the example shown, they are the ENCODES associations (provided by the Entrez Gene database) of the IGF2 human gene with the proteins with AAL55889 or AAY40360 EMBL ID, which are both associated by similarity with the protein with P09565 UniProt AC that (according to the GOA database) is annotated to the Insulin-like growth factor binding biological function with NAS (Non-traceable Author Statement) evidence.
Figure 4Co-functional evaluation of genes with pathway annotation transferred by transitive relationship and with known GO annotation. Upper histograms, MaxLgi and AvgLgi: maximum and average of the levels in the Gene Ontology (GO) hierarchy of the lowest common known GO functional annotations shared between each gene with transferred annotation to a pathway and the genes known to be involved in that pathway; lower histograms, MaxLGj and AvgLGj: as MaxLgi and AvgLgi respectively, but between each gene known to be involved in a pathway with transferred gene annotation and all the other genes known to be involved in that pathway; GO level 0 pertains to ontology root shared annotation, higher GO levels pertain to more specific shared GO annotations; level category N represents gene pathway annotations (transferred or known) whose gene does not have any GO annotation.
Figure 5Co-functional evaluation of genes with pathway annotation transferred by transitive relationship and with known or transferred GO annotation. Same as Figure 4, but obtained by considering also the GO functional annotations transferred to the genes by transitive relationship, instead of only the gene known GO functional annotations. In so doing, more than half of the genes without known functional annotation results having specific transferred GO functional annotation(s), i.e. with high GO level.
Figure 6Some Gene Ontology annotations available for the . Yellow upper boxes represent five of the most specific Gene Ontology molecular function annotations available for the Insulin-like growth factor 2 (somatomedin A) (IGF2) human gene; the arrow indicates the new detected annotation.
Figure 7The seven pairwise interactions, between eight human proteins, that have been detected as candidate associated with the .