| Literature DB >> 33948240 |
Andrew McDavid1, Anthony M Corbett2, Jennifer L Dutra2, Andrew G Straw1, David J Topham3, Gloria S Pryhuber4, Mary T Caserta4, Steven R Gill3, Kristin M Scheible4, Jeanne Holden-Wiltse1,2.
Abstract
INTRODUCTION: In clinical and translational research, data science is often and fortuitously integrated with data collection. This contrasts to the typical position of data scientists in other settings, where they are isolated from data collectors. Because of this, effective use of data science techniques to resolve translational questions requires innovation in the organization and management of these data.Entities:
Keywords: Data analysis; bioinformatics; data management; data science; databases; pediatric; research informatics; systems biology
Year: 2020 PMID: 33948240 PMCID: PMC8057476 DOI: 10.1017/cts.2020.501
Source DB: PubMed Journal: J Clin Transl Sci ISSN: 2059-8661
Fig. 1.A data science workflow in clinical and translational teams. The lifecycle of a Team Data Science project begins with data collection and proceeds in a nonlinear and iterative fashion until conclusions are communicated and data and models are available for reuse (1a). Study personnel will interact in varying degrees with different aspects of the data science lifecycle (1b), while a data scientist visits all phases. Bolded interactions highlight a primary use of a role, while dashed lines indicate ancillary uses.
Eight practices to implement team data science
| Practice | Example |
|---|---|
| 1. Active collaboration | Data engineers and analysts meet regularly with data collectors and domain experts |
| 2. Consistent schema, field names, and identifiers | Data engineers introduce appropriate names and formats for study variables |
| 3. Continuous quality control | Data evaluated for internal and external consistency and quality continually and automatically |
| 4. Versioning, access control, and auditing | Users have differential privileges to read and change data. Changes are tracked and can be replayed. |
| 5. User-driven data exploration | Charting tools are provided for quick and independent exploration of data |
| 6. Import-derived variables | Variables derived by team members are published in central database |
| 7. Defined data export format and programming interfaces | Data are available easily and scriptably in open formats |
| 8. Online documentation | Documentation for data and pipelines is placed near to the means to access them |
Fig. 2.A high-level overview of how study personnel interact with the BLIS data management platform. Clinicians, technicians, and experimentalists generate data for different aspects of the study. Data engineers implement the centralized study portal using the BLIS data management platform, with responsibility to connect all elements of the workflow and interact continuously with all study team members.