| Literature DB >> 32693795 |
Nicholas D Pattengale1, Corey M Hudson2.
Abstract
BACKGROUND: One of the tasks in the iDASH Secure Genome Analysis Competition in 2018 was to develop blockchain-based immutable logging and querying for a cross-site genomic dataset access audit trail. The specific challenge was to design a time/space efficient structure and mechanism of storing/retrieving genomic data access logs, based on MultiChain version 1.0.4 ( https://www.multichain.com/ ).Entities:
Keywords: Algorithms; Blockchain; Blockchain based querying; Decentralized ledger; Genomics
Mesh:
Year: 2020 PMID: 32693795 PMCID: PMC7372871 DOI: 10.1186/s12920-020-0720-3
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Key/values inserted for one log line
| stream | key | value |
|---|---|---|
| logdata | UID_2_28 | {timestamp:1522000126703, |
| node:2, | ||
| id:28, | ||
| ref-id:17, | ||
| user:3, | ||
| activity:FILE_ACCESS, | ||
| resource:GTEx} | ||
| timestamp | TIMESTAMP_1522b000126703 | UID_2_28 |
| node | NODE_2 | UID_2_28 |
| id | ID_28 | UID_2_28 |
| ref-id | REF-ID_17 | UID_2_28 |
| user | USER_3 | UID_2_28 |
| activity | ACTIVITY_FILE_ACCESS | UID_2_28 |
| resource | RESOURCE_GTEx | UID_2_28 |
| node2timestamps | 1522000126703 |
Fig. 1Control flow of query processing. Complex queries follow logical parsing steps indicated in this decision tree. Based on the presence of multiple clauses, timestamps, multiple column constraints and sorting, the liststreamkeys, a binary search of timestamps, filtering of constraints and the sorted functions are used to prepare queries before the data structure is delivered
Fig. 2Memory Utilization. The max amount of main memory used during insertion, measured via gnu time
Fig. 3Disk Usage. The amount of storage used by MultiChain with our indexed representation. This value was calculated by measuring file system utilization before and after insertion via the Linux df command
Fig. 4Query Scaling. A summary of query performance. Our test infrastructure, per dataset, generates a set of nine queries that cover the three query modalities (a single clause, multiple clauses, and timestamp ranges). For the nine queries, the template is fixed, but the generator chooses constraint values randomly. For each dataset, we generated five of these nine query benchmarks, and averaged the observed query times across the five samples. For this plot we have shown four of the nine query types. Examples of the four queries displayed are as follows: “QUERY user=5” clause(user), “QUERY resource=resD activity=activityE” conjunction,“QUERY timestamprange=[99171676,102561181]” timestamp, and “QUERY user=6 timestamprange=[32226847,82574461] sortby=Ref-ID” all three(sorted)