| Literature DB >> 33005429 |
Abstract
A popular model for global scientific repositories is the data commons, which pools or connects many datasets alongside supporting infrastructure. A data commons must establish legally interoperability between datasets to ensure researchers can aggregate and reuse them. This is usually achieved by establishing a shared governance structure. Unfortunately, governance often takes years to negotiate and involves a trade-off between data inclusion and data availability. It can also be difficult for repositories to modify governance structures in response to changing scientific priorities, data sharing practices, or legal frameworks. This problem has been laid bare by the sudden shock of the COVID-19 pandemic. This paper proposes a rapid and flexible strategy for scientific repositories to achieve legal interoperability: the policy-aware data lake. This strategy draws on technical concepts of modularity, metadata, and data lakes. Datasets are treated as independent modules, which can be subject to distinctive legal requirements. Each module must, however, be described using standard legal metadata. This allows legally compatible datasets to be rapidly combined and made available on a just-in-time basis to certain researchers for certain purposes. Global scientific repositories increasingly need such flexibility to manage scientific, organizational, and legal complexity, and to improve their responsiveness to global pandemics.Entities:
Keywords: big data; data commons; data governance; data sharing; legal interoperability; modularity
Year: 2020 PMID: 33005429 PMCID: PMC7454728 DOI: 10.1093/jlb/lsaa065
Source DB: PubMed Journal: J Law Biosci ISSN: 2053-9711
Figure 1Data Commons: Data Inclusion v.s. Availability. Black line—limit of uses permitted by commons’ governance structure; red arrow—uncontroversial scientific project (3 datasets available); green arrow—somewhat controversial scientific project (3 datasets available); blue arrow—controversial scientific project (0 datasets available); red stripes—datasets subject to restrictive legal requirements that must be excluded from the commons; blue waves—legally permitted uses prohibited by governance structure.
Legal Interoperability Models for Transborder Research Projects
| Data Commons | Policy-Aware Data Lake | |
|---|---|---|
|
| A research resource that pools or connects datasets together to make them available to researchers | A research resource that pools or connects datasets together. Each dataset can have distinct legal data governance. Researchers can aggregate and re-use all datasets legally compatible with their context and purpose |
|
| A scientific community establishes a shared legal data governance structure through up-front negotiation | Data contributors describe each dataset using explicit, standard, and accurate legal metadata |
|
| The shared legal data governance structure must respect the legal requirements associated with dataset | The legal requirements associated with the dataset provide a reasonable likelihood of aggregation and re-use |
|
| All datasets within the commons are available for research uses that respect the legal requirements of the most-restrictive dataset included in the commons | All datasets legally available for a proposed research use |
|
| Limited. The common legal data governance structure must be re-negotiated | Data contributors are free at any time to update their legal metadata |
|
| Yes, by definition | Yes, though some datasets may be subject to distinct data localization rules |
|
| Spectrum from centralized to fully distributed | Spectrum from centralized to fully distributed |
Figure 2Policy-Aware Data Lake: Data Inclusion v.s. Availability. Figure Legend: Red arrow—uncontroversial scientific project (5 datasets available); green arrow—somewhat controversial scientific project (3 datasets available); blue arrow—controversial scientific project (2 datasets available).