| Literature DB >> 32540846 |
Christine Suver1, Adrian Thorogood2, Megan Doerr1, John Wilbanks1, Bartha Knoppers2.
Abstract
Developing or independently evaluating algorithms in biomedical research is difficult because of restrictions on access to clinical data. Access is restricted because of privacy concerns, the proprietary treatment of data by institutions (fueled in part by the cost of data hosting, curation, and distribution), concerns over misuse, and the complexities of applicable regulatory frameworks. The use of cloud technology and services can address many of the barriers to data sharing. For example, researchers can access data in high performance, secure, and auditable cloud computing environments without the need for copying or downloading. An alternative path to accessing data sets requiring additional protection is the model-to-data approach. In model-to-data, researchers submit algorithms to run on secure data sets that remain hidden. Model-to-data is designed to enhance security and local control while enabling communities of researchers to generate new knowledge from sequestered data. Model-to-data has not yet been widely implemented, but pilots have demonstrated its utility when technical or legal constraints preclude other methods of sharing. We argue that model-to-data can make a valuable addition to our data sharing arsenal, with 2 caveats. First, model-to-data should only be adopted where necessary to supplement rather than replace existing data-sharing approaches given that it requires significant resource commitments from data stewards and limits scientific freedom, reproducibility, and scalability. Second, although model-to-data reduces concerns over data privacy and loss of local control when sharing clinical data, it is not an ethical panacea. Data stewards will remain hesitant to adopt model-to-data approaches without guidance on how to do so responsibly. To address this gap, we explored how commitments to open science, reproducibility, security, respect for data subjects, and research ethics oversight must be re-evaluated in a model-to-data context. ©Christine Suver, Adrian Thorogood, Megan Doerr, John Wilbanks, Bartha Knoppers. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 28.07.2020.Entities:
Keywords: data management; data science; ethics, research; machine learning; privacy
Year: 2020 PMID: 32540846 PMCID: PMC7420687 DOI: 10.2196/18087
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Figure 1Common Data Sharing Models.
Comparison of technical data access models.
| Characteristics | Copy-and-download | Researcher-to-data | Model-to-data |
| Costs of data curation | Shared between data steward and data user | Primarily borne by data steward | Completely borne by data steward |
| Cost of computing infrastructure | Borne by data user | Borne by data steward (users may be charged for compute) | Borne by data steward (users may be charged for compute) |
| Researcher freedom (eg, methods and tools) | Greatest degree of freedom. Limited by terms of use | Limited freedom. Limited by computing infrastructure and terms of use | Least freedom. Limited by APIa structure and computing infrastructure |
| Security and confidentiality of | Weak. Many copies of data shared with many users, possibly in many countries, subject only to data access agreements. Difficult to audit. Difficult to withdraw data once shared | Strong. Data shared with third party users within a secure and auditable computing environment. Data can be easily withdrawn | Very strong. Data remains hidden from users. Data can be withdrawn at any time |
| Data privacy protection | Weak. Deidentification and data access agreement limiting reidentification | Strong. Individual-level data may be viewable but not downloadable. Only results are released. Outputs may need to be deidentified | Very strong. Individual-level data are not downloadable or viewable. Only results are released |
| Security and confidentiality of | Very strong. Researchers submit proposals but maintain control over methods | Medium. Researchers’ activities are supervised/audited | Weak. Researchers must share query/workflow with data steward |
| Informed consent | Consent may be needed for research use, sharing, and cross-border transfer | Consent may be needed for research use only | Consent may be needed for research use only |
| Research ethics oversight | Data user may need research ethics approval | Data steward may need to provide research ethics approval | Data steward may need to provide research ethics approval |
| Scalability to multiple resources | Straightforward through a distributed data commons, with some shared infrastructure (eg, access portal) | Can only be done indirectly through an individual-level meta-analysis | Difficult but theoretically possible through a federated data system |
| Legal agreements | Data access/transfer agreement and data use agreement | Data use agreement and computing environment terms of use | Data use agreement only |
aAPI: application programming interface.