Petros Kalendralis1, Zhenwei Shi1, Alberto Traverso1, Ananya Choudhury1, Matthijs Sloep1, Ivan Zhovannik1,2, Martijn P A Starmans3,4, Detlef Grittner5, Peter Feltens5, Rene Monshouwer2, Stefan Klein3,4, Rianne Fijten1, Hugo Aerts6,7, Andre Dekker1, Johan van Soest1, Leonard Wee1. 1. Department of Radiation Oncology (Maastro), GROW School for Oncology, Maastricht University Medical Centre+, Maastricht, 6229 ET, The Netherlands. 2. Department of Radiation Oncology, Radboud University Medical Center, Nijmegen, 6525 GC, The Netherlands. 3. Department of Radiology and Nuclear Medicine, Erasmus Medical Center, Rotterdam, 3015 GD, The Netherlands. 4. Department of Medical Informatics, Erasmus Medical Center, Rotterdam, 3015 GD, The Netherlands. 5. SOHARD Software GmbH, Fuerth, 90766, Germany. 6. Artificial Intelligence in Medicine (AIM) Program, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, United States. 7. Radiology and Nuclear Medicine, CARIM & GROW School for Oncology, Maastricht University, Maastricht, 6211 LK, The Netherlands.
Abstract
PURPOSE: One of the most frequently cited radiomics investigations showed that features automatically extracted from routine clinical images could be used in prognostic modeling. These images have been made publicly accessible via The Cancer Imaging Archive (TCIA). There have been numerous requests for additional explanatory metadata on the following datasets - RIDER, Interobserver, Lung1, and Head-Neck1. To support repeatability, reproducibility, generalizability, and transparency in radiomics research, we publish the subjects' clinical data, extracted radiomics features, and digital imaging and communications in medicine (DICOM) headers of these four datasets with descriptive metadata, in order to be more compliant with findable, accessible, interoperable, and reusable (FAIR) data management principles. ACQUISITION AND VALIDATION METHODS: Overall survival time intervals were updated using a national citizens registry after internal ethics board approval. Spatial offsets of the primary gross tumor volume (GTV) regions of interest (ROIs) associated with the Lung1 CT series were improved on the TCIA. GTV radiomics features were extracted using the open-source Ontology-Guided Radiomics Analysis Workflow (O-RAW). We reshaped the output of O-RAW to map features and extraction settings to the latest version of Radiomics Ontology, so as to be consistent with the Image Biomarker Standardization Initiative (IBSI). Digital imaging and communications in medicine metadata was extracted using a research version of Semantic DICOM (SOHARD, GmbH, Fuerth; Germany). Subjects' clinical data were described with metadata using the Radiation Oncology Ontology. All of the above were published in Resource Descriptor Format (RDF), that is, triples. Example SPARQL queries are shared with the reader to use on the online triples archive, which are intended to illustrate how to exploit this data submission. DATA FORMAT: The accumulated RDF data are publicly accessible through a SPARQL endpoint where the triples are archived. The endpoint is remotely queried through a graph database web application at http://sparql.cancerdata.org. SPARQL queries are intrinsically federated, such that we can efficiently cross-reference clinical, DICOM, and radiomics data within a single query, while being agnostic to the original data format and coding system. The federated queries work in the same way even if the RDF data were partitioned across multiple servers and dispersed physical locations. POTENTIAL APPLICATIONS: The public availability of these data resources is intended to support radiomics features replication, repeatability, and reproducibility studies by the academic community. The example SPARQL queries may be freely used and modified by readers depending on their research question. Data interoperability and reusability are supported by referencing existing public ontologies. The RDF data are readily findable and accessible through the aforementioned link. Scripts used to create the RDF are made available at a code repository linked to this submission: https://gitlab.com/UM-CDS/FAIR-compliant_clinical_radiomics_and_DICOM_metadata.
PURPOSE: One of the most frequently cited radiomics investigations showed that features automatically extracted from routine clinical images could be used in prognostic modeling. These images have been made publicly accessible via The Cancer Imaging Archive (TCIA). There have been numerous requests for additional explanatory metadata on the following datasets - RIDER, Interobserver, Lung1, and Head-Neck1. To support repeatability, reproducibility, generalizability, and transparency in radiomics research, we publish the subjects' clinical data, extracted radiomics features, and digital imaging and communications in medicine (DICOM) headers of these four datasets with descriptive metadata, in order to be more compliant with findable, accessible, interoperable, and reusable (FAIR) data management principles. ACQUISITION AND VALIDATION METHODS: Overall survival time intervals were updated using a national citizens registry after internal ethics board approval. Spatial offsets of the primary gross tumor volume (GTV) regions of interest (ROIs) associated with the Lung1 CT series were improved on the TCIA. GTV radiomics features were extracted using the open-source Ontology-Guided Radiomics Analysis Workflow (O-RAW). We reshaped the output of O-RAW to map features and extraction settings to the latest version of Radiomics Ontology, so as to be consistent with the Image Biomarker Standardization Initiative (IBSI). Digital imaging and communications in medicine metadata was extracted using a research version of Semantic DICOM (SOHARD, GmbH, Fuerth; Germany). Subjects' clinical data were described with metadata using the Radiation Oncology Ontology. All of the above were published in Resource Descriptor Format (RDF), that is, triples. Example SPARQL queries are shared with the reader to use on the online triples archive, which are intended to illustrate how to exploit this data submission. DATA FORMAT: The accumulated RDF data are publicly accessible through a SPARQL endpoint where the triples are archived. The endpoint is remotely queried through a graph database web application at http://sparql.cancerdata.org. SPARQL queries are intrinsically federated, such that we can efficiently cross-reference clinical, DICOM, and radiomics data within a single query, while being agnostic to the original data format and coding system. The federated queries work in the same way even if the RDF data were partitioned across multiple servers and dispersed physical locations. POTENTIAL APPLICATIONS: The public availability of these data resources is intended to support radiomics features replication, repeatability, and reproducibility studies by the academic community. The example SPARQL queries may be freely used and modified by readers depending on their research question. Data interoperability and reusability are supported by referencing existing public ontologies. The RDF data are readily findable and accessible through the aforementioned link. Scripts used to create the RDF are made available at a code repository linked to this submission: https://gitlab.com/UM-CDS/FAIR-compliant_clinical_radiomics_and_DICOM_metadata.
Authors: Kenneth Clark; Bruce Vendt; Kirk Smith; John Freymann; Justin Kirby; Paul Koppel; Stephen Moore; Stanley Phillips; David Maffitt; Michael Pringle; Lawrence Tarbox; Fred Prior Journal: J Digit Imaging Date: 2013-12 Impact factor: 4.056
Authors: Zhenwei Shi; Kieran G Foley; Juan Pablo de Mey; Emiliano Spezi; Philip Whybra; Tom Crosby; Johan van Soest; Andre Dekker; Leonard Wee Journal: Front Oncol Date: 2019-12-16 Impact factor: 6.244
Authors: Alex Zwanenburg; Martin Vallières; Mahmoud A Abdalah; Hugo J W L Aerts; Vincent Andrearczyk; Aditya Apte; Saeed Ashrafinia; Spyridon Bakas; Roelof J Beukinga; Ronald Boellaard; Marta Bogowicz; Luca Boldrini; Irène Buvat; Gary J R Cook; Christos Davatzikos; Adrien Depeursinge; Marie-Charlotte Desseroit; Nicola Dinapoli; Cuong Viet Dinh; Sebastian Echegaray; Issam El Naqa; Andriy Y Fedorov; Roberto Gatta; Robert J Gillies; Vicky Goh; Michael Götz; Matthias Guckenberger; Sung Min Ha; Mathieu Hatt; Fabian Isensee; Philippe Lambin; Stefan Leger; Ralph T H Leijenaar; Jacopo Lenkowicz; Fiona Lippert; Are Losnegård; Klaus H Maier-Hein; Olivier Morin; Henning Müller; Sandy Napel; Christophe Nioche; Fanny Orlhac; Sarthak Pati; Elisabeth A G Pfaehler; Arman Rahmim; Arvind U K Rao; Jonas Scherer; Muhammad Musib Siddique; Nanna M Sijtsema; Jairo Socarras Fernandez; Emiliano Spezi; Roel J H M Steenbakkers; Stephanie Tanadini-Lang; Daniela Thorwarth; Esther G C Troost; Taman Upadhaya; Vincenzo Valentini; Lisanne V van Dijk; Joost van Griethuysen; Floris H P van Velden; Philip Whybra; Christian Richter; Steffen Löck Journal: Radiology Date: 2020-03-10 Impact factor: 29.146