| Literature DB >> 30881956 |
Tim Hulsen1, Saumya S Jamuar2, Alan R Moody3, Jason H Karnes4, Orsolya Varga5, Stine Hedensted6, Roberto Spreafico7, David A Hafler8, Eoin F McKinney9.
Abstract
For over a decade the term "Big data" has been used to describe the rapid increase in volume, variety and velocity of information available, not just in medical research but in almost every aspect of our lives. As scientists, we now have the capacity to rapidly generate, store and analyse data that, only a few years ago, would have taken many years to compile. However, "Big data" no longer means what it once did. The term has expanded and now refers not to just large data volume, but to our increasing ability to analyse and interpret those data. Tautologies such as "data analytics" and "data science" have emerged to describe approaches to the volume of available information as it grows ever larger. New methods dedicated to improving data collection, storage, cleaning, processing and interpretation continue to be developed, although not always by, or for, medical researchers. Exploiting new tools to extract meaning from large volume information has the potential to drive real change in clinical practice, from personalized therapy and intelligent drug design to population screening and electronic health record mining. As ever, where new technology promises "Big Advances," significant challenges remain. Here we discuss both the opportunities and challenges posed to biomedical research by our increasing ability to tackle large datasets. Important challenges include the need for standardization of data content, format, and clinical definitions, a heightened need for collaborative networks with sharing of both data and expertise and, perhaps most importantly, a need to reconsider how and when analytic methodology is taught to medical researchers. We also set "Big data" analytics in context: recent advances may appear to promise a revolution, sweeping away conventional approaches to medical science. However, their real promise lies in their synergy with, not replacement of, classical hypothesis-driven methods. The generation of novel, data-driven hypotheses based on interpretable models will always require stringent validation and experimental testing. Thus, hypothesis-generating research founded on large datasets adds to, rather than replaces, traditional hypothesis driven science. Each can benefit from the other and it is through using both that we can improve clinical practice.Entities:
Keywords: big data; big data analytics; data science; precision medicine; translational medicine
Year: 2019 PMID: 30881956 PMCID: PMC6405506 DOI: 10.3389/fmed.2019.00034
Source DB: PubMed Journal: Front Med (Lausanne) ISSN: 2296-858X
Figure 1The synergistic cycle of hypothesis-driven and data-driven experimentation.
FAIR principles for data management and stewardship.
|
Findability: (meta)data are assigned a globally unique and persistent identifier; data are described with rich metadata; metadata clearly and explicitly include the identifier of the data it describes; (meta)data are registered or indexed in a searchable resource Accessibility: (meta)data are retrievable by their identifier using a standardized communications protocol; this protocol is open, free, and universally implementable, and allows for an authentication and authorization procedure, where necessary; metadata are accessible, even when the data are no longer available Interoperability: (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation; they use vocabularies that follow FAIR principles; they include qualified references to other (meta)data Reusability: (meta)data are richly described with a plurality of accurate and relevant attributes; they are released with a clear and accessible data usage license; they are associated with detailed provenance; they meet domain-relevant community standards |
Figure 2The concept of overfitting and model regularization.
Figure 3Model performance evaluation.
Key proposed principles when assessing scientists.
|
Addressing societal needs is an important goal of scholarship. Assessing faculty should be based on responsible indicators that reflect fully the contribution to the scientific enterprise. Publishing all research completely and transparently, regardless of the results, should be rewarded. The culture of Open Research needs to be rewarded It is important to fund research that can provide an evidence base to inform optimal ways to assess science and faculty. Funding out-of-the-box ideas needs to be valued in promotion and tenure decisions. |