| Literature DB >> 28617217 |
Judita Preiss1, Mark Stevenson2.
Abstract
BACKGROUND: Literature based discovery (LBD) automatically infers missed connections between concepts in literature. It is often assumed that LBD generates more information than can be reasonably examined.Entities:
Keywords: Biomedical text; Data mining; Literature based discovery in the biomedical domain
Mesh:
Year: 2017 PMID: 28617217 PMCID: PMC5471938 DOI: 10.1186/s12859-017-1641-9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Top: open discovery (only A specified), bottom: closed discovery (both A and C specified)
Information about synonym classes related to the number of sources supporting the synonymy
| Min sources | Num classes | Max | >20 | >10 | >7 | >5 | >3 | Mean |
|---|---|---|---|---|---|---|---|---|
| 1 | 11,030 | 74 | 21 | 174 | 401 | 850 | 1,938 | 2.91 |
| 2 | 1,542 | 73 | 1 | 1 | 3 | 5 | 26 | 2.14 |
| 3 | 6 | 3 | 0 | 0 | 0 | 0 | 0 | 3 |
Fig. 2CUIs remaining after removing semantic types
Fig. 3Pseudocode for creating a stoplist by identifying common linking terms
Number of linking terms yielded in replication of existing discoveries
| Mig–Mg | RD–fsh | Som–Arg | Mg–ND | AD–est | Sc–iPL | AD–INN | |
|---|---|---|---|---|---|---|---|
| Number of linking terms found after a single step | |||||||
| unfiltered | 81 | 2 | 232 | 101 | 490 | 0 | 365 |
| sy | 76 | 2 | 221 | 97 | 485 | 0 | 356 |
| manual | 58 | 0 | 157 | 57 | 362 | 0 | 258 |
| Y-Y&P | 0 | 2 | 0 | 1 | 1 | 0 | 0 |
| clt | 54 | 0 | 147 | 55 | 350 | 0 | 248 |
| break | 47 | 0 | 114 | 38 | 0 | 0 | 0 |
| Number of linking terms found after two steps | |||||||
| unfiltered | 49,877 | 1,386 | 132,514 | 69,669 | 424,712 | 9 | 386,098 |
| sy | 46,551 | 1,333 | 126,178 | 65,845 | 408,875 | 9 | 371,354 |
| manual | 25,921 | 510 | 67,206 | 27,046 | 217,834 | 9 | 203,616 |
| Y-Y&P | 0 | 602 | 44 | 121 | 396 | 0 | 0 |
| clt | 23,258 | 453 | 59,317 | 25,227 | 200,720 | 8 | 187,936 |
| break | 17,659 | 361 | 35,323 | 11,173 | 0 | 8 | 0 |
Fig. 4Percentage of original relations remaining after filtering
Timeslice evaluation
| Filtering | Total | Correct | Precision | Recall | F-measure | Average |
|---|---|---|---|---|---|---|
| Performance after a single step | ||||||
| sy | 1,049,250,170 | 526,363 | 0.05 | 44.10 | 1.00e-03 | 11,131 |
| manual | 386,952,997 | 268,327 | 0.07 | 22.48 | 1.38e-03 | 6,099 |
| Y-Y&P | 243,218,893 | 190,072 | 0.08 | 15.93 | 1.56e-03 | 4,952 |
| clt | 387,603,836 | 269,003 | 0.07 | 22.54 | 1.38e-03 | 6,103 |
| break | 131,199,050 | 213,193 | 0.16 | 17.86 | 3.22e-03 | 2,232 |
| Performance after two steps | ||||||
| sy | 3,733,002,802 | 534,301 | 0.01 | 44.77 | 2.86e-04 | 39,602 |
| manual | 1,638,685,466 | 274,544 | 0.02 | 23.00 | 3.35e-04 | 25,828 |
| Y-Y&P | 994,744,004 | 194,749 | 0.02 | 16.32 | 3.91e-04 | 20,257 |
| clt | 1,641,987,567 | 275,230 | 0.02 | 23.06 | 3.35e-04 | 25,857 |
| break | 1,085,230,979 | 227,998 | 0.02 | 19.10 | 4.20e-04 | 18,467 |