| Literature DB >> 24742140 |
Jakub Galgonek1, Jiří Vondrášek1.
Abstract
BACKGROUND: There are many databases of small molecules focused on different aspects of research and its applications. Some tasks may require integration of information from various databases. However, determining which entries from different databases represent the same compound is not straightforward. Integration can be based, for example, on automatically generated cross-reference links between entries. Another approach is to use the manually curated links stored directly in databases. This study employs well-established InChI identifiers to measure the consistency and completeness of the manually curated links by comparing them with the automatically generated ones.Entities:
Year: 2014 PMID: 24742140 PMCID: PMC4005828 DOI: 10.1186/1758-2946-6-15
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Figure 1An example of an InChI identifier.
Figure 2An example of a multicomponent InChI identifier.
Figure 3Influence of an undefined chiral center conformation on the inverted tetrahedral sub-layer.
Number of entries in various databases
| | |||||||
|---|---|---|---|---|---|---|---|
| ChEBI | 50500 | 29852 | 26502 | 26487 | 26494 | 678 | 17 |
| DrugBank | 6714 | 6516 | 6509 | 6403 | 6441 | 327 | 2 |
| PDBeChem | 15446 | 15445 | 15439 | 15439 | 15439 | 120 | 2 |
| HMDB | 40278 | 40233 | 40220 | 40219 | 40220 | 177 | 14 |
| NPC | 14814 | 8027 | 8013 | 8013 | 8001 | 95 | 1 |
Inconsistencies between the inchi tool and the molconvert tool
| | | ||||||
|---|---|---|---|---|---|---|---|
| ChEBI | 537 | 9 | 68 | 3 | 19 | 0 | 636 |
| DrugBank | 30 | 0 | 8 | 0 | 0 | 5 | 43 |
| PDBeChem | 0 | 0 | 27 | 68 | 0 | 0 | 95 |
| HMDB | 134 | 1 | 99 | 6 | 8 | 0 | 248 |
| NPC | 395 | 0 | 16 | 12 | 9 | 0 | 431 |
Figure 4Inconsistencies in the interpretations of chiral center conformations.
Inconsistencies between database InChI identifiers and identifiers generated by the tools
| | ||||||
|---|---|---|---|---|---|---|
| ChEBI | 154 | 24 | 0 | 154 | 31 | 636 |
| DrugBank | 47 | 2 | 45 | 7 | 0 | 2 |
| PDBeChem | 4 | 539 | 54 | 4 | 539 | 146 |
| HMDB | 39 | 0 | 243 | 38 | 0 | 5 |
Numbers of collisions
| | |||
|---|---|---|---|
| ChEBI | 459 | 62 | 32 |
| DrugBank | 145 | 143 | 142 |
| PDBeChem | 74 | 48 | 34 |
| HMDB | 39 | 39 | 39 |
| NPC | 42 | 42 | 42 |
Numbers of cross-reference links
| HMDB → PDBeChem | 1247 | 1192 | 1188 |
| HMDB → DrugBank | 1601 | 1589 | 1546 |
| HMDB → ChEBI | 4795 | 3848 | 3727 |
| ChEBI → PDBeChem | 2112 | 2020 | 1950 |
| ChEBI → DrugBank | 2462 | 2459 | 2336 |
| ChEBI → HMDB | 729 | 725 | 721 |
| DrugBank → PDBeChem | 5201 | 5153 | 5029 |
| DrugBank → ChEBI | 1826 | 1808 | 1724 |
| NPC → DrugBank | 1340 | 1340 | 1333 |
Inconsistencies in cross-reference links
| | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| HMDB → PDBeChem | 1.26% | 0.93% | 0.93% | 0.93% | 0.84% | 0.59% | 1.26% | 0.93% | 0.93% | 0.93% | 0.84% | 0.59% |
| HMDB → DrugBank | 7.12% | 5.11% | 4.46% | 4.46% | 1.88% | 1.55% | 7.05% | 5.05% | 4.40% | 4.40% | 1.81% | 1.49% |
| HMDB → ChEBI | 13.84% | 7.06% | 6.79% | 6.79% | 4.51% | 3.33% | 13.71% | 6.92% | 6.63% | 6.63% | 4.29% | 3.09% |
| ChEBI → PDBeChem | 9.64% | 8.21% | 6.51% | 6.51% | 4.62% | 1.69% | 9.64% | 8.21% | 6.46% | 6.46% | 4.56% | 1.54% |
| ChEBI → DrugBank | 33.18% | 28.85% | 27.95% | 27.91% | 18.96% | 17.85% | 22.99% | 17.77% | 13.74% | 13.70% | 4.54% | 3.25% |
| ChEBI → HMDB | 25.80% | 10.40% | 6.38% | 6.38% | 3.05% | 2.91% | 25.10% | 9.71% | 5.55% | 5.55% | 2.22% | 2.08% |
| DrugBank → PDBeChem | 27.04% | 25.49% | 25.17% | 25.17% | 4.04% | 2.11% | 27.02% | 25.47% | 25.13% | 25.13% | 4.00% | 2.01% |
| DrugBank → ChEBI | 18.33% | 15.08% | 13.98% | 13.98% | 3.13% | 1.91% | 18.10% | 14.85% | 13.69% | 13.69% | 2.84% | 1.57% |
| NPC → DrugBank | 23.26% | 13.13% | 12.90% | 12.90% | 6.90% | 5.63% | 22.73% | 12.45% | 11.25% | 11.25% | 4.65% | 3.15% |
Figure 5Inconsistencies in cross-reference links.
Completeness of manually curated links
| | ||||||
|---|---|---|---|---|---|---|
| HMDB → PDBeChem | 1251 | 1173 | 93.8% | 1613 | 1177 | 73.0% |
| HMDB → DrugBank | 1771 | 1436 | 81.1% | 1900 | 1467 | 77.2% |
| HMDB → ChEBI | 3634 | 3211 | 88.4% | 4584 | 3464 | 75.6% |
| ChEBI → PDBeChem | 2773 | 1737 | 62.6% | 2961 | 1765 | 59.6% |
| ChEBI → DrugBank | 1989 | 1561 | 78.5% | 2078 | 1662 | 80.0% |
| ChEBI → HMDB | 3634 | 535 | 14.7% | 4584 | 646 | 14.1% |
| DrugBank → PDBeChem | 3938 | 3669 | 93.2% | 4073 | 3747 | 92.0% |
| DrugBank → ChEBI | 1989 | 1408 | 70.8% | 2078 | 1464 | 70.5% |
| NPC → DrugBank | 1299 | 1021 | 78.6% | 1480 | 1155 | 78.0% |
| ChEBI → DrugBank | 1989 | 1563 | 78.6% | 2078 | 1665 | 80.1% |
| ChEBI → HMDB | 3634 | 3322 | 91.4% | 4584 | 3653 | 79.7% |
Number of errors in the back conversions of InChI identifiers
| Inchi | N/A | 97 |
| Molconvert | 1559 | 601 |
| Molconvert-direct | 773 | 595 |
Additional rules: the number of collisions
| | |||
|---|---|---|---|
| ChEBI | 1440 | 62 | 32 |
| DrugBank | 162 | 143 | 142 |
| PDBeChem | 175 | 48 | 34 |
| HMDB | 98 | 39 | 39 |
| NPC | 50 | 42 | 42 |
Additional rules: the inconsistencies of cross-reference links
| | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| HMDB → PDBeChem | 1.26% | 0.93% | 0.93% | 0.93% | 0.84% | 0.59% | 1.26% | 0.93% | 0.93% | 0.93% | 0.84% | 0.59% |
| HMDB → DrugBank | 6.86% | 4.92% | 4.27% | 4.27% | 1.81% | 1.55% | 6.79% | 4.85% | 4.20% | 4.20% | 1.75% | 1.49% |
| HMDB → ChEBI | 12.07% | 6.82% | 6.55% | 6.55% | 4.40% | 3.33% | 11.94% | 6.68% | 6.39% | 6.39% | 4.19% | 3.09% |
| ChEBI → PDBeChem | 9.28% | 8.05% | 6.26% | 6.26% | 4.46% | 1.69% | 9.28% | 8.05% | 6.21% | 6.21% | 4.41% | 1.54% |
| ChEBI → DrugBank | 32.02% | 28.21% | 27.23% | 27.18% | 18.84% | 17.85% | 21.66% | 17.08% | 12.97% | 12.93% | 4.41% | 3.25% |
| ChEBI → HMDB | 22.19% | 10.26% | 6.24% | 6.24% | 3.05% | 2.91% | 21.50% | 9.57% | 5.41% | 5.41% | 2.22% | 2.08% |
| DrugBank → PDBeChem | 24.78% | 23.34% | 23.03% | 23.03% | 3.92% | 2.11% | 24.76% | 23.32% | 22.97% | 22.97% | 3.86% | 2.01% |
| DrugBank → ChEBI | 17.29% | 14.50% | 13.34% | 13.34% | 3.13% | 1.91% | 17.05% | 14.27% | 13.05% | 13.05% | 2.84% | 1.57% |
| NPC → DrugBank | 20.63% | 11.48% | 11.25% | 11.25% | 5.93% | 5.63% | 20.11% | 10.80% | 9.60% | 9.60% | 3.68% | 3.15% |
Additional rules: the completeness of manually curated links
| | ||||||
|---|---|---|---|---|---|---|
| HMDB → PDBeChem | 1376 | 1173 | 85.2% | 1700 | 1177 | 69.2% |
| HMDB → DrugBank | 1825 | 1440 | 78.9% | 1944 | 1470 | 756% |
| HMDB → ChEBI | 4076 | 3277 | 80.4% | 4935 | 3473 | 70.4% |
| ChEBI → PDBeChem | 3217 | 1744 | 54.2% | 3389 | 1768 | 52.2% |
| ChEBI → DrugBank | 2173 | 1588 | 73.1% | 2254 | 1677 | 74.4% |
| ChEBI → HMDB | 4076 | 561 | 13.8% | 4935 | 647 | 13.1% |
| DrugBank → PDBeChem | 4102 | 3783 | 92.2% | 4229 | 3855 | 91.2% |
| DrugBank → ChEBI | 2173 | 1426 | 65.6% | 2254 | 1474 | 65.4% |
| NPC → DrugBank | 1359 | 1056 | 77.7% | 1524 | 1177 | 77.2% |
| ChEBI → DrugBank | 2173 | 1590 | 73.2% | 2254 | 1680 | 74.5% |
| ChEBI → HMDB | 4076 | 3391 | 83.2% | 4935 | 3662 | 74.2% |