| Literature DB >> 31263868 |
Charles Vesteghem1,2, Rasmus Froberg Brøndum2, Mads Sønderkær2, Mia Sommer1,2, Alexander Schmitz2, Julie Støve Bødker2, Karen Dybkær1,2,3, Tarec Christoffer El-Galaly1,2,3, Martin Bøgsted1,2,3.
Abstract
Compelling research has recently shown that cancer is so heterogeneous that single research centres cannot produce enough data to fit prognostic and predictive models of sufficient accuracy. Data sharing in precision oncology is therefore of utmost importance. The Findable, Accessible, Interoperable and Reusable (FAIR) Data Principles have been developed to define good practices in data sharing. Motivated by the ambition of applying the FAIR Data Principles to our own clinical precision oncology implementations and research, we have performed a systematic literature review of potentially relevant initiatives. For clinical data, we suggest using the Genomic Data Commons model as a reference as it provides a field-tested and well-documented solution. Regarding classification of diagnosis, morphology and topography and drugs, we chose to follow the World Health Organization standards, i.e. ICD10, ICD-O-3 and Anatomical Therapeutic Chemical classifications, respectively. For the bioinformatics pipeline, the Genome Analysis ToolKit Best Practices using Docker containers offer a coherent solution and have therefore been selected. Regarding the naming of variants, we follow the Human Genome Variation Society's standard. For the IT infrastructure, we have built a centralized solution to participate in data sharing through federated solutions such as the Beacon Networks.Entities:
Keywords: FAIR Data Principles; data sharing; genomics; precision oncology; standards
Year: 2020 PMID: 31263868 PMCID: PMC7299292 DOI: 10.1093/bib/bbz044
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
Literature review search queries
|
|
Figure 1Flow chart of the reference selection process. The light grey part describes the collection of references from PubMed, Scopus and Web of Science and the removal of duplicate entries. The dark grey part describes the filtering of references based on titles. The black part describes the filtering of references based on abstracts. `Not cancer’ means `Not directly applicable to cancer’, `Legal’ means `Treating mainly legal and/or ethics and/or privacy and/or policy issues’, `Not data sharing’ means `Not directly applicable to data sharing’ and `Wrong data’ means `Not directly applicable to clinical and/or genomic data in human’. `Inaccessible’ refers to references where the full content could not be accessed or were not in English, Danish or French.
Figure 2Comparison of centralized versus federated architectures. In the centralized architecture, each institution must upload their data to a centralized web server while in the federated architecture, data stay at their respective institutions, but each institution must implement an interface to make the data findable but not necessarily accessible.