| Literature DB >> 35707062 |
Kathleen D Muenzen1, Laura M Amendola2, Tia L Kauffman3, Kathleen F Mittendorf3, Jeannette T Bensen4, Flavia Chen5, Richard Green1, Bradford C Powell6, Mark Kvale5, Frank Angelo2, Laura Farnan7, Stephanie M Fullerton8, Jill O Robinson9, Tianran Li1, Priyanka Murali2, James M J Lawlor10, Jeffrey Ou2, Lucia A Hindorff11, Gail P Jarvik2, David R Crosslin12.
Abstract
Integrating data across heterogeneous research environments is a key challenge in multi-site, collaborative research projects. While it is important to allow for natural variation in data collection protocols across research sites, it is also important to achieve interoperability between datasets in order to reap the full benefits of collaborative work. However, there are few standards to guide the data coordination process from project conception to completion. In this paper, we describe the experiences of the Clinical Sequence Evidence-Generating Research (CSER) consortium Data Coordinating Center (DCC), which coordinated harmonized survey and genomic sequencing data from seven clinical research sites from 2020 to 2022. Using input from multiple consortium working groups and from CSER leadership, we first identify 14 lessons learned from CSER in the categories of communication, harmonization, informatics, compliance, and analytics. We then distill these lessons learned into 11 recommendations for future research consortia in the areas of planning, communication, informatics, and analytics. We recommend that planning and budgeting for data coordination activities occur as early as possible during consortium conceptualization and development to minimize downstream complications. We also find that clear, reciprocal, and continuous communication between consortium stakeholders and the DCC is equally important to maintaining a secure and centralized informatics ecosystem for pooling data. Finally, we discuss the importance of actively interrogating current approaches to data governance, particularly for research studies that straddle the research-clinical divide.Entities:
Keywords: clinical research; data coordination; data governance; data harmonization; data management; data sharing; medical genomics; research collaboration; research informatics
Year: 2022 PMID: 35707062 PMCID: PMC9190054 DOI: 10.1016/j.xhgg.2022.100120
Source DB: PubMed Journal: HGG Adv ISSN: 2666-2477
Figure 1CSER phase 2 data coordination and analysis timeline
Data coordination lessons learned in the CSER consortium
| Category | Lessons learned |
|---|---|
| Communication | Identify primary points of contact for addressing different data coordination requirements (e.g., technical infrastructure, data mapping, and consortium policy) using existing communication patterns among working groups and sites Define the unique roles of different working groups in the data coordination process and use those roles to guide inter-group communication Send periodic update emails with consolidated information (progress, resources, and action items) to key data coordination stakeholders |
| Harmonization | Provide data managers with standardized data collection instruments (templates) and specifications for mapping variables to those instruments (data dictionaries) Deploy rigorous version-control methods for data coordination resources that change over time and ensure that data managers are informed of changes Implement standardized protocols and timelines for making changes to data collection instruments Engage a multidisciplinary group of consortium members to develop and approve standardized data models |
| Informatics | Consolidate informatics tools and resources within a secure, centralized platform Utilize available information technology (IT) expertise and resources at participating institutions Prioritize security of informatics tools and disseminate security information to consortium members |
| Compliance | Engage a multidisciplinary group of consortium members to develop a harmonized set of data sharing consent categories Use multiple data-sharing specifications (e.g., institutional certifications, informed consents, and data use letters) to map site-level consent groups to consortium-level consent categories |
| Analytics | Document data-quality issues and unique aspects of the harmonized dataset and plan to distribute documentation to both current and future data users Facilitate access to onboarding resources for users of shared data analysis platforms like the AnVIL |
Figure 2Methods of communication between groups involved in CSER data coordination
Figure 3Sample harmonization process for one variable in the communication satisfaction measure, across all seven CSER projects
To map participant responses to the Participant Post-Return of Results (RoR) Follow-Up no. 1 harmonized import template, each site created a local mapping between the site-level variable name and the harmonized variable name (comsat1_pfu1 for pediatric surveys and comsat1_afu1 for adult surveys) and documented any changes in question wording. Some sites were also required to map alternate response encodings to the harmonized response scale. For example, site 2 administered the question with a reversed response scale (where 1 is “very satisfied” on the harmonized scale and 4 is “very satisfied” on the site scale) and modified harmonized responses accordingly (1 = 4, 2 = 3, 3 = 2, and 4 = 1). Similarly, site 5 administered the question with an additional response option and was instructed to map these responses to blank values (5 = ‘‘).
Figure 4Movement of harmonized survey data (green) and sequence data (purple) between CSER data platforms
CDH, CSER Data Hub; CLI, command line interface; DCC, Data Coordinating Center; M&O, measures and outcomes; QPR, quarterly progress report; WS, web services.
Recommendations for consortium data coordination
| Category | Needs | Recommendations |
|---|---|---|
| Planning | clear expectations for internal and external data sharing | 1. Build data sharing expectations into expected scope of work in funding announcements (NIH |
| sufficient financial resources and time for data coordination | 2. Budget for data coordination, management, and reporting at individual research sites (NIH | |
| integration between DCC and consortium | 3. Establish DCC at start of funding period, if not before (NIH | |
| Communication | consolidation of communication channels | 4. Consolidate lines of communication from DCC to working groups and assign action items appropriately (DCC |
| technical specifications for data sharing | 5. Maximize transparency of data coordination expectations and resources (NIH | |
| efficient use of diverse expertise available within the consortium | 6. Facilitate translation of critical information between stakeholder groups (DCC | |
| Informatics | consolidation of informatics platforms for data coordination | 7. Deploy a secure, centralized web resource for data coordination (DCC |
| flexibility in response to unforeseen events and changing analysis plans | 8. Build flexibility into central databases and data-management software (DCC | |
| correct implementation of site-level security and privacy agreements | 9. Prioritize data privacy and security during platform design (DCC | |
| Analytics | high-quality and reliable data from heterogeneous sources | 10. Provide clear and detailed documentation of shared data resources (DCC |
| integration of research and clinical practice; enhanced protection of data from vulnerable populations | 11. Document approaches to data governance (DCC |
Entities which should be responsible for each recommendation.