Amrita Basu1, Denise Warzel2, Aras Eftekhari3, Justin S Kirby4, John Freymann4, Janice Knable5, Ashish Sharma6, Paula Jacobs2. 1. University of California, San Francisco, San Francisco, CA. 2. National Cancer Institute, Bethesda, MD. 3. Attain, McLean, VA. 4. Frederick National Laboratory for Cancer Research, Rockville, MD. 5. Science Applications International Corporation, Reston, VA. 6. Emory University, Atlanta, GA.
Abstract
PURPOSE: Data sharing creates potential cost savings, supports data aggregation, and facilitates reproducibility to ensure quality research; however, data from heterogeneous systems require retrospective harmonization. This is a major hurdle for researchers who seek to leverage existing data. Efforts focused on strategies for data interoperability largely center around the use of standards but ignore the problems of competing standards and the value of existing data. Interoperability remains reliant on retrospective harmonization. Approaches to reduce this burden are needed. METHODS: The Cancer Imaging Archive (TCIA) is an example of an imaging repository that accepts data from a diversity of sources. It contains medical images from investigators worldwide and substantial nonimage data. Digital Imaging and Communications in Medicine (DICOM) standards enable querying across images, but TCIA does not enforce other standards for describing nonimage supporting data, such as treatment details and patient outcomes. In this study, we used 9 TCIA lung and brain nonimage files containing 659 fields to explore retrospective harmonization for cross-study query and aggregation. It took 329.5 hours, or 2.3 months, extended over 6 months to identify 41 overlapping fields in 3 or more files and transform 31 of them. We used the Genomic Data Commons (GDC) data elements as the target standards for harmonization. RESULTS: We characterized the issues and have developed recommendations for reducing the burden of retrospective harmonization. Once we harmonized the data, we also developed a Web tool to easily explore harmonized collections. CONCLUSION: While prospective use of standards can support interoperability, there are issues that complicate this goal. Our work recognizes and reveals retrospective harmonization issues when trying to reuse existing data and recommends national infrastructure to address these issues.
PURPOSE: Data sharing creates potential cost savings, supports data aggregation, and facilitates reproducibility to ensure quality research; however, data from heterogeneous systems require retrospective harmonization. This is a major hurdle for researchers who seek to leverage existing data. Efforts focused on strategies for data interoperability largely center around the use of standards but ignore the problems of competing standards and the value of existing data. Interoperability remains reliant on retrospective harmonization. Approaches to reduce this burden are needed. METHODS: The Cancer Imaging Archive (TCIA) is an example of an imaging repository that accepts data from a diversity of sources. It contains medical images from investigators worldwide and substantial nonimage data. Digital Imaging and Communications in Medicine (DICOM) standards enable querying across images, but TCIA does not enforce other standards for describing nonimage supporting data, such as treatment details and patient outcomes. In this study, we used 9 TCIA lung and brain nonimage files containing 659 fields to explore retrospective harmonization for cross-study query and aggregation. It took 329.5 hours, or 2.3 months, extended over 6 months to identify 41 overlapping fields in 3 or more files and transform 31 of them. We used the Genomic Data Commons (GDC) data elements as the target standards for harmonization. RESULTS: We characterized the issues and have developed recommendations for reducing the burden of retrospective harmonization. Once we harmonized the data, we also developed a Web tool to easily explore harmonized collections. CONCLUSION: While prospective use of standards can support interoperability, there are issues that complicate this goal. Our work recognizes and reveals retrospective harmonization issues when trying to reuse existing data and recommends national infrastructure to address these issues.
Authors: Clement J McDonald; Stanley M Huff; Jeffrey G Suico; Gilbert Hill; Dennis Leavelle; Raymond Aller; Arden Forrey; Kathy Mercer; Georges DeMoor; John Hook; Warren Williams; James Case; Pat Maloney Journal: Clin Chem Date: 2003-04 Impact factor: 8.327
Authors: Amylou C Dueck; Tito R Mendoza; Sandra A Mitchell; Bryce B Reeve; Kathleen M Castro; Lauren J Rogak; Thomas M Atkinson; Antonia V Bennett; Andrea M Denicoff; Ann M O'Mara; Yuelin Li; Steven B Clauser; Donna M Bryant; James D Bearden; Theresa A Gillis; Jay K Harness; Robert D Siegel; Diane B Paul; Charles S Cleeland; Deborah Schrag; Jeff A Sloan; Amy P Abernethy; Deborah W Bruner; Lori M Minasian; Ethan Basch Journal: JAMA Oncol Date: 2015-11 Impact factor: 31.777
Authors: Sean F Altekruse; Gabriel E Rosenfeld; Danielle M Carrick; Emilee J Pressman; Sheri D Schully; Leah E Mechanic; Kathleen A Cronin; Brenda Y Hernandez; Charles F Lynch; Wendy Cozen; Muin J Khoury; Lynne T Penberthy Journal: Cancer Epidemiol Biomarkers Prev Date: 2014-12 Impact factor: 4.254
Authors: Elizabeth J Corwin; Shirley M Moore; Andrea Plotsky; Margaret M Heitkemper; Susan G Dorsey; Drenna Waldrop-Valverde; Donald E Bailey; Sharron L Docherty; Joanne D Whitney; Carol M Musil; Cynthia M Dougherty; Donna J McCloskey; Joan K Austin; Patricia A Grady Journal: J Nurs Scholarsh Date: 2017-02-23 Impact factor: 3.176
Authors: Amir Reza Sadri; Andrew Janowczyk; Ren Zhou; Ruchika Verma; Niha Beig; Jacob Antunes; Anant Madabhushi; Pallavi Tiwari; Satish E Viswanath Journal: Med Phys Date: 2020-11-27 Impact factor: 4.071
Authors: Huaqin Pan; Vesselina Bakalov; Lisa Cox; Michelle L Engle; Stephen W Erickson; Michael Feolo; Yuelong Guo; Wayne Huggins; Stephen Hwang; Masato Kimura; Michelle Krzyzanowski; Josh Levy; Michael Phillips; Ying Qin; David Williams; Erin M Ramos; Carol M Hamilton Journal: Sci Data Date: 2022-09-01 Impact factor: 8.501
Authors: ChulHyoung Park; Seng Chan You; Hokyun Jeon; Chang Won Jeong; Jin Wook Choi; Rae Woong Park Journal: Yonsei Med J Date: 2022-01 Impact factor: 2.759
Authors: Stacy Thomas; Tara Lichtenberg; Kristen Dang; Michael Fitzsimons; Robert L Grossman; Ritika Kundra; Jessica A Lavery; Michele L Lenoue-Newton; Katherine S Panageas; Charles Sawyers; Nikolaus D Schultz; Sahussapont J Sirintrapun; Umit Topaloglu; Angelica Welch; Thomas Yu; Ahmet Zehir; Stuart Gardos Journal: JCO Clin Cancer Inform Date: 2020-08