| Literature DB >> 34820659 |
Jonathan Lawson1, Moran N Cabili1, Giselle Kerry2, Tiffany Boughtwood3, Adrian Thorogood4,5, Pinar Alper5, Sarion R Bowers6, Rebecca R Boyles7, Anthony J Brookes8, Matthew Brush9, Tony Burdett2, Hayley Clissold6, Stacey Donnelly1, Stephanie O M Dyke10, Mallory A Freeberg2, Melissa A Haendel9, Chihiro Hata11, Petr Holub12, Francis Jeanson13, Aina Jene14, Minae Kawashima15, Shuichi Kawashima16, Melissa Konopko17, Irene Kyomugisha18, Haoyuan Li19, Mikael Linden20, Laura Lyman Rodriguez21, Mizuki Morita22, Nicola Mulder23, Jean Muller24,25, Satoshi Nagaie26, Jamal Nasir27, Soichi Ogishima26, Vivian Ota Wang28, Laura D Paglione29, Ravi N Pandya30, Helen Parkinson2, Anthony A Philippakis1, Fabian Prasser31, Jordi Rambla14, Kathy Reinold1, Gregory A Rushton1, Andrea Saltzman1, Gary Saunders17, Heidi J Sofia32, John D Spalding2, Morris A Swertz33, Ilia Tulchinsky34, Esther J van Enckevort33, Susheel Varma35, Craig Voisin34, Natsuko Yamamoto36, Chisato Yamasaki36, Lyndon Zass23, Jaime M Guidry Auvil28, Tommi H Nyrönen20, Mélanie Courtot2.
Abstract
Human biomedical datasets that are critical for research and clinical studies to benefit human health also often contain sensitive or potentially identifying information of individual participants. Thus, care must be taken when they are processed and made available to comply with ethical and regulatory frameworks and informed consent data conditions. To enable and streamline data access for these biomedical datasets, the Global Alliance for Genomics and Health (GA4GH) Data Use and Researcher Identities (DURI) work stream developed and approved the Data Use Ontology (DUO) standard. DUO is a hierarchical vocabulary of human and machine-readable data use terms that consistently and unambiguously represents a dataset's allowable data uses. DUO has been implemented by major international stakeholders such as the Broad and Sanger Institutes and is currently used in annotation of over 200,000 datasets worldwide. Using DUO in data management and access facilitates researchers' discovery and access of relevant datasets. DUO annotations increase the FAIRness of datasets and support data linkages using common data use profiles when integrating the data for secondary analyses. DUO is implemented in the Web Ontology Language (OWL) and, to increase community awareness and engagement, hosted in an open, centralized GitHub repository. DUO, together with the GA4GH Passport standard, offers a new, efficient, and streamlined data authorization and access framework that has enabled increased sharing of biomedical datasets worldwide.Entities:
Keywords: FAIR; GA4GH; automated data access; consent; controlled access; data access; data restrictions; ontology; secondary data use; standard
Year: 2021 PMID: 34820659 PMCID: PMC8591903 DOI: 10.1016/j.xgen.2021.100028
Source DB: PubMed Journal: Cell Genom ISSN: 2666-979X
Count of datasets annotated with DUO by data custodian as of February 2021
| Data custodian | Datasets annotated with DUO |
|---|---|
| Broad Institute | 225 |
| Sanger Institute | 700 |
| EGA | 1,021 |
| HDR UK | 568 |
| BBMRI-ERIC | In progress. Manual for data managers with guidance for DUO annotations released: |
| AMED Biobank Network (GEM Japan) | 203,900 |
| Australian Genomics | 14 |
| H3Africa | 16 |
A census of datasets annotated with DUO in February 2021 highlights widespread adoption of the standard. Early implementers such as EGA are now requiring DUO annotation upon dataset submission. New partners such as BBMRI-ERIC are only starting the annotation process. AMED Biobank has made a very large number of DUO annotations, as they consider each sample to be its own dataset. An example implementation in the EGA is described in supplemental information.
Figure 1Data Use Ontology permissions and modifiers
DUO is a hierarchical vocabulary of data use terms most often used to denote secondary usage conditions for controlled access datasets. DUO does not aim to represent all possible data use terms, consent phrases, or complex logical permutations of permissions, limitations, or requirements. As of June 2021, DUO contains 25 terms representing two types of data use terms, permissions and modifiers. Permissions such as General Research Use (GRU), Health or Medical or Biomedical use (HMB), Disease Specific research (DS), and Population Origins and Ancestry research (POA) standardize allowed usage of the datasets. Modifiers are used to further qualify main categories of controlled access.
Figure 2Browsing the Data Use Ontology
The DUO OWL file has been loaded in human-friendly browsers such as the Ontology Lookup Service (OLS). This enables interactive navigation through the hierarchy and display of additional properties such as definition, comment, or relations to other terms. For example, the “disease specific research” DUO term, http://purl.obolibrary.org/obo/DUO_0000007, clarifies that it should be used in conjunction with a term from a disease ontology. The “Preferred root terms” button (middle, active green checkbox) guides display of the top classes to be displayed to the user instead of presenting the complex upper-level BFO hierarchy (accessible by selecting “All terms”)
Figure 3Current implementations of the Data Use Ontology
DUO has been implemented to annotate genomics datasets worldwide. As of November 2021, implementers include repositories, databases, and projects in North America, Europe, Africa, Europe, Asia, and Australia.
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| ELK reasoner | Kazakov et al., 2014 | |
| Ontology Lookup Service | Jupp et al., 2015 | |
| Ontology Development Kit | ||
| DUO GitHub repository | This manuscript | |
| Released DUO file | This manuscript | |