| Literature DB >> 29888084 |
Shen Xu1, Toby Rogers2, Elliot Fairweather1, Anthony Glenn2, James Curran2, Vasa Curcin1.
Abstract
Data provenance is a technique that describes the history of digital objects. In health data settings, it can be used to deliver auditability and transparency, and to achieve trust in a software system. However, implementing data provenance in analytics software at an enterprise level presents a different set of challenges from the research environments where data provenance was originally devised. In this paper, the challenges of reporting provenance information to the user is presented. Provenance captured from analytics software can be large and complex and visualizing a series of tasks over a long period can be overwhelming even for a domain expert, requiring visual aggregation mechanisms that fit with complex human cognitive activities involved in the process. This research studied how provenance-based reporting can be integrated into a health data analytics software, using the example of Atmolytics visual reporting tool.Entities:
Year: 2018 PMID: 29888084 PMCID: PMC5961786
Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc
Comparison of analytics software data provenance and scientific data provenance.
| Features | Scientificdata provenance | Analytics software data provenance |
|---|---|---|
| Design patterns (multiple) | N/A | Many design patterns can be involved, for instance, SOA and/or MVC which is usual for web applications driven by data[ |
| Services | N/A | Combination of methods by the user for adherence todifferent aims and goals |
| Security | Traditionally utilisedsecurity mechanisms such as covert channels, digital signatures, encryption, kernel authentication for security and authorisation | Same problems tackled, though within an SOA systemcontext, particularly regarding data routing withcontent-based data |
| Data classification | Generally static | Potentially dynamic |
| Data literacy | High exposure level | Assorted |
Provenance Question (interpreted at different levels).
| Provenance query | User scenario | Levels | Level of granularity |
|---|---|---|---|
| “By what means was the object created?” | As a | What were the reasoning steps as well as these assumptions made by experts? [Level of | Coarse-grain |
| I wish to determine, as a | What were the data extraction processes? [Level of | ||
| Is there a change in data entry? [Level of |
Suggested mechanisms.
| Item | Feature | Type | Analytics software requirements | Proposed mechanisms |
|---|---|---|---|---|
| A | Design patterns (multiple) | MVC/Entity Framework/SOA, and others | Should permit flexible capturing means | Provenance Abstracted Activity/template (W3C standard complied[ |
| B | Services(Flexible combination) | Potential recombining of methods to meet different ends | Flexible capturing mechanism required | Provenance grafting/templatemethod |
| C | Security | Disguising data | Should allow encryption on data sent across boundaries. | Pre-processing Encrypt and Decrypt/data |
| D | Authentication mechanism needed to guarantee data is taken from correct sources | Data/template accessibility capturing provenance | ||
| E | Different access needs to be provided to different users | Provenance | ||
| so to guarantee data security | Capturing/template agentnode security | |||
| F | Data volumeand assorted data literacy | Confidence/trust judgement concerning reasoning level | Representation allowed by adhering to the logical steps taken by the user | Logical temporal stamp (1);Input justification (2) |
| G | Judgement of confidence/trust above that of analysis level | Support should be provided for logical steps | Feature list (3) + Highlighted difference (4) | |
| H | Judgement of confidence/trust above that of data level | Support should be provided for data sources | Summary of Linked System Activity |
Figure 1.Activity driven representation of data provenance.
Figure 2.Simplified representation of web application processes and provenance capturing (adapted from Xu et al 2016[30])
Figure 3.User experience review mock-up of user interface (https://invis.io/NQDCUN6DS shows the 2nd round user interface). F1 is the logical temporal stamp; F2 is the activities’ justification; G3 is the prechosen feature list; G4 underlines the changes; and H is the linked system activity summary, refer to table 3.
Summary of user feedback. (5 Strongly agree, 4 Agree, 3 Neutral, 2 Disagree, 1 Strongly disagree).
| Questions | A | B | C | D | E | |
|---|---|---|---|---|---|---|
| 1 | The provenance information displayed helps you understand how the result is produced. | 4 | 4 | 5 | 4 | 4 |
| 2 | The provenance information displayed provides transparency to the processes of producing outputs. | 4 | 4 | 5 | 3 | 4 |
| 3 | The provenance information displayed improves the trust/confidence of outputs. | 4 | 2 | 4 | 3 | 4 |
| 4 | The provenance feature captures the decisions involved in producing an output. | 4 | 5 | 3 | 4 | 4 |
| 5 | The provenance feature is useful to your work. | 4 | 4 | 5 | 4 | 4 |