| Literature DB >> 25952498 |
Abstract
BACKGROUND: In recent years, with advances in techniques for protein structure analysis, the knowledge about protein structure and function has been published in a vast number of articles. A method to search for specific publications from such a large pool of articles is needed. In this paper, we propose a method to search for related articles on protein structure analysis by using an article itself as a query.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25952498 PMCID: PMC4423583 DOI: 10.1186/1471-2105-16-S7-S4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Coordinated use of databases. An article is converted to a set of concepts on the concept hierarchy by coordinated use of databases.
Figure 2Integration of GO and InterPro. The directed acyclic graph shows an example of integration of GO (gray nodes) and InterPro (white nodes) constructed by using InterPro2GO cross-references.
Figure 3Overview of the proposed method. The input query consists of two articles, a primary article and an additional article. The concept hierarchy graph is modified by calculating the degree of attention by using two input articles. The similarity between the input article and each of search target articles is evaluated based on the modified graph.
Input articles in dataset 1.
| primary | additional | primary | additional | primary | additional |
|---|---|---|---|---|---|
| 10558980 | 9497353 | 10966114 | 10558980 | 11961546 | 10558980 |
| 10558980 | 10966114 | 10966114 | 11099048 | 11961546 | 10966114 |
| 10558980 | 11591345 | 10966114 | 11591345 | 11961546 | 11099048 |
| 10558980 | 11853669 | 10966114 | 11961546 | 11961546 | 12553912 |
| 10558980 | 12535537 | 10966114 | 12535537 | 11961546 | 12820959 |
| 10558980 | 9261152 | 10966114 | 10350465 | 11961546 | 15537541 |
| 10558980 | 15660128 | 10966114 | 16365295 | 11961546 | 10205047 |
Input articles in dataset 2.
| primary | additional | primary | additional | primary | additional |
|---|---|---|---|---|---|
| 9497353 | 10558980 | 10558980 | 10966114 | 10558980 | 11961546 |
| 10966114 | 10558980 | 11099048 | 10966114 | 10966114 | 11961546 |
| 11591345 | 10558980 | 11591345 | 10966114 | 11099048 | 11961546 |
| 11853669 | 10558980 | 11961546 | 10966114 | 12553912 | 11961546 |
| 12535537 | 10558980 | 12535537 | 10966114 | 12820959 | 11961546 |
| 9261152 | 10558980 | 10350465 | 10966114 | 15537541 | 11961546 |
| 15660128 | 10558980 | 16365295 | 10966114 | 10205047 | 11961546 |
Input articles in dataset 3.
| primary | additional | primary | additional | primary | additional |
|---|---|---|---|---|---|
| 10700286 | 12121650 | 8611559 | 11727989 | 16083905 | 14572476 |
| 10700286 | 10467136 | 8611559 | 9174344 | 16083905 | 16873374 |
| 7966328 | 10562565 | 15327768 | 11707392 | 17070542 | 16406071 |
| 12297050 | 12297049 | 15327768 | 15327769 | 16740718 | 15507431 |
| 12297050 | 12620237 | 15294895 | 10504728 | 16740718 | 17718712 |
| 12517337 | 12086620 | 15294895 | 15070734 | 16732283 | 15931224 |
| 12517337 | 15274926 | 15126499 | 17038310 | 16732283 | 17643372 |
Mean Average Precision with or without additional article and with or without estimation of attended category.
| MAP with additional article | |||
|---|---|---|---|
| 1 | 0.568 | 0.529 | 0.545 |
| 2 | 0.478 | 0.450 | 0.441 |
| 3 | 0.521 | 0.514 | 0.519 |
An example of highly ranked similar articles
| rank | articles | included in correct set | publication year |
|---|---|---|---|
| 1 | 10966114 | yes | 2000 |
| 2 | 15931224 | no | 2005 |
| 3 | 16307917 | no | 2005 |
| 4 | 11591345 | yes | 2001 |