| Literature DB >> 31208367 |
Amke Caliebe1, Friedhelm Leverkus2, Gerd Antes3, Michael Krawczak4.
Abstract
BACKGROUND: Use of big data is becoming increasingly popular in medical research. Since big data-based projects differ notably from classical research studies, both in terms of scope and quality, a debate is apt as to whether big data require new approaches to scientific reasoning different from those established in statistics and philosophy of science. MAIN TEXT: The progressing digitalization of our societies generates vast amounts of data that also become available for medical research. Here, the big promise of big data is to facilitate major improvements in the treatment, diagnosis and prevention of diseases. An ongoing examination of the idiosyncrasies of big data is therefore essential to ensure that the field stays congruent with the principles of evidence-based medicine. We discuss the inherent challenges and opportunities of big data in medicine from a methodological point of view, particularly highlighting the relative importance of causality and correlation in commercial and medical research settings. We make a strong case for upholding the distinction between exploratory data analysis facilitating hypothesis generation and confirmatory approaches involving hypothesis validation. An independent verification of research results will be ever more important in the context of big data, where data quality is often hampered by a lack of standardization and structuring.Entities:
Keywords: Big data; Causality; Correlation; Data quality; Digitalization; Hypothesis generation; Scientific methodology; Validation
Mesh:
Year: 2019 PMID: 31208367 PMCID: PMC6580448 DOI: 10.1186/s12874-019-0774-0
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Fig. 1Comparison of big data studies and controlled clinical trials. Whereas big data studies (left inset) usually benefit from the opportunistic use of existing data resources, controlled clinical trials (CCT, right inset) follow a hypothesis-driven study design that determines the type, amount and provenance of the data to be collected. The subsequent data analyses may be methodologically similar or even identical, but the results of the two study types serve rather different purposes: The outcome of a big data study, at best, is a new hypothesis that would require verification in a CCT or controlled experiment to count as ‘scientific’ (dotted arrow). A CCT, by contrast, allows validation (i.e. falsification or verification) of the initial hypothesis, potentially stimulating further studies geared at the solidification, modification or diversification of this hypothesis