| Literature DB >> 28392994 |
Choong Ho Lee1, Hyung-Jin Yoon1.
Abstract
The concept of big data, commonly characterized by volume, variety, velocity, and veracity, goes far beyond the data type and includes the aspects of data analysis, such as hypothesis-generating, rather than hypothesis-testing. Big data focuses on temporal stability of the association, rather than on causal relationship and underlying probability distribution assumptions are frequently not required. Medical big data as material to be analyzed has various features that are not only distinct from big data of other disciplines, but also distinct from traditional clinical epidemiology. Big data technology has many areas of application in healthcare, such as predictive modeling and clinical decision support, disease or safety surveillance, public health, and research. Big data analytics frequently exploits analytic methods developed in data mining, including classification, clustering, and regression. Medical big data analyses are complicated by many technical issues, such as missing values, curse of dimensionality, and bias control, and share the inherent limitations of observation study, namely the inability to test causality resulting from residual confounding and reverse causation. Recently, propensity score analysis and instrumental variable analysis have been introduced to overcome these limitations, and they have accomplished a great deal. Many challenges, such as the absence of evidence of practical benefits of big data, methodological issues including legal and ethical issues, and clinical integration and utility issues, must be overcome to realize the promise of medical big data as the fuel of a continuous learning healthcare system that will improve patient outcome and reduce waste in areas including nephrology.Entities:
Keywords: Big data; Data mining; Epidemiology; Healthcare; Statistics
Year: 2017 PMID: 28392994 PMCID: PMC5331970 DOI: 10.23876/j.krcp.2017.36.1.3
Source DB: PubMed Journal: Kidney Res Clin Pract ISSN: 2211-9132
Figure 1A continuous learning healthcare system.
Medical big data analysis vs. classical statistical analysis
| Medical big data analysis | Classical statistical analysis | |
|---|---|---|
| Application | Hypothesis-generating | Hypothesis-testing |
| Questions of interest | Overcoming the limitation of locally or temporally stable association with continually updating the data and algorithm | Trying to prove causal relationships |
| Domain knowledge | More important in interpretation of the results | Important both in collection of data and interpretation of the results |
| Sources of data | Any kind of sources; frequently multiple sources | Carefully specified collection of data; usually single source |
| Data collection | Recording without the direct supervision of a human | Human-based measurement recording |
| Coverage of data to be analyzed | Substantial fraction of entire population | Small data samples from a specific population with some assumptions of their distribution |
| Data size | Frequently huge | Relatively small |
| Nature of data | Unstructured and structured | Mainly structured |
| Data quality | Rarely clean | Quality controlled |
| Research questions of data analysis | May be different from those of data collection | Same as those of data collection |
| Underlying assumption of the model | Frequently absent | Based on various underlying probability distribution function |
| Analytic tools | Frequently automated with data mining algorithm | Manually by expert with classical statistics |
| Main outputs of analysis | Prediction, models, patterns identified | Statistical score contrasted against random chance |
| Privacy & ethics | Concerns about privacy and ethical issues | Data collection according to the pre-approved protocol; informed consent from the participants |