| Literature DB >> 29036577 |
Chao Pang1,2, Fleur Kelpin1, David van Enckevort1, Niina Eklund3, Kaisa Silander3, Dennis Hendriksen1, Mark de Haan1, Jonathan Jetten1, Tommy de Boer1, Bart Charbon1, Petr Holub4, Hans Hillege2, Morris A Swertz1,2.
Abstract
MOTIVATION: Biobanks are indispensable for large-scale genetic/epidemiological studies, yet it remains difficult for researchers to determine which biobanks contain data matching their research questions.Entities:
Mesh:
Year: 2017 PMID: 29036577 PMCID: PMC5870622 DOI: 10.1093/bioinformatics/btx478
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Overview of the BiobankUniverse system. Users upload/add biobanks attributes to the universe. TagGenerator is automatically triggered to create ontology representations of the uploaded biobank‘s attributes. These are then used in AttributeMatcher to generate attribute matches with any of the other biobanks. A cosine similarity score is computed for each attribute match pair to prioritize the candidate list, and a strict matching criterion is applied to remove false positives. A biobank similarity is also calculated by computing the cosine angles between the ontology representations of biobanks in the semantic space for each pair
Fig. 2.User interface for discovering biobanks. Users can choose various network options to visualize the ‘universe’: the biobank similarity, the number of matches generated by the system or the number of matches curated by the user. The nodes represent biobanks in the universe and their sizes are proportional to the number of attributes in the corresponding biobanks. The connecting lines represent the similarities (defined as the number of matches or the biobank similarities) between biobanks, the more similar they are and the closer they are next to each other in the universe. The online version is dynamic so you can see the numbers more clearly
Fig. 3.Curating candidate matches by data owners. Users can curate all generated matches available in the universe. Users first choose a leading ‘target’, based on which a match table is generated. (Any biobanks can be a target because of the pairwise match). Users then need to go through each of the cells in the table to make decisions about the generated matches
Recall and precision performance for the HOP project (0–100)
| Lifelines | Mitchelstown | Prevend | Total | Biobank
connect | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Rank | R | P | R | P | R | P | R | P | R | P |
| 1 | 23 | 64 | 23 | 87 | 39 | 41 | 25 | 66 | 24 | 58 |
| 2 | 39 | 55 | 33 | 66 | 61 | 38 | 38 | 55 | 37 | 45 |
| 3 | 45 | 45 | 42 | 58 | 70 | 34 | 46 | 47 | 45 | 39 |
| 4 | 52 | 41 | 48 | 52 | 71 | 32 | 52 | 44 | 50 | 35 |
| 5 | 56 | 38 | 56 | 50 | 73 | 30 | 58 | 42 | 54 | 32 |
| 6 | 59 | 35 | 58 | 46 | 74 | 30 | 60 | 39 | 57 | 30 |
| 7 | 64 | 34 | 62 | 44 | 74 | 29 | 64 | 37 | 60 | 29 |
| 8 | 66 | 32 | 66 | 43 | 74 | 28 | 67 | 36 | 63 | 27 |
| 9 | 68 | 30 | 69 | 42 | 77 | 29 | 69 | 35 | 65 | 26 |
| 10 | 70 | 29 | 72 | 41 | 77 | 29 | 71 | 34 | 67 | 25 |
| 20 | 85 | 25 | 81 | 36 | 77 | 28 | 82 | 30 | 76 | 19 |
| 50 | 88 | 20 | 85 | 34 | 77 | 28 | 85 | 26 | 77 | 16 |
Note: P, precision; R, recall.
Recall and precision performance for the FINRISK project (including 550 manual matches)
| Rank | Recall | Precision | Retrieved |
|---|---|---|---|
| 1 | 0.813 | 0.592 | 755 |
| 2 | 0.878 | 0.325 | 1486 |
| 3 | 0.891 | 0.223 | 2197 |
| 4 | 0.898 | 0.171 | 2889 |
| 5 | 0.904 | 0.139 | 3563 |
| 6 | 0.911 | 0.119 | 4214 |
| 7 | 0.913 | 0.104 | 4834 |
| 8 | 0.915 | 0.092 | 5438 |
| 9 | 0.918 | 0.084 | 6032 |
| 10 | 0.922 | 0.077 | 6614 |
| 20 | 0.929 | 0.044 | 11605 |
| 50 | 0.938 | 0.027 | 19088 |
The overall performance comparison while enabling and disabling the matching criteria from the HOP experiment (including 633 manual matches)
| Matching criteria enabled | Matching criteria disabled | |||||
|---|---|---|---|---|---|---|
| Rank | R | P | RE | R | P | RE |
| 1 | 0.25 | 0.66 | 240 | 0.24 | 0.56 | 268 |
| 2 | 0.38 | 0.55 | 443 | 0.36 | 0.44 | 516 |
| 3 | 0.46 | 0.47 | 613 | 0.43 | 0.37 | 735 |
| 4 | 0.52 | 0.44 | 753 | 0.50 | 0.34 | 931 |
| 5 | 0.58 | 0.42 | 877 | 0.54 | 0.31 | 1089 |
| 6 | 0.60 | 0.39 | 987 | 0.58 | 0.30 | 1235 |
| 7 | 0.64 | 0.37 | 1085 | 0.61 | 0.28 | 1373 |
| 8 | 0.67 | 0.36 | 1173 | 0.63 | 0.26 | 1506 |
| 9 | 0.69 | 0.35 | 1250 | 0.65 | 0.25 | 1630 |
| 10 | 0.71 | 0.34 | 1320 | 0.68 | 0.25 | 1751 |
| 20 | 0.82 | 0.30 | 1724 | 0.76 | 0.18 | 2723 |
| 50 | 0.85 | 0.26 | 2054 | 0.80 | 0.13 | 3848 |
Note: P, precision; R, recall; RE, number of retrieved matches.