| Literature DB >> 26960378 |
Yue Zhang, Shu-Li Guo, Li-Na Han1, Tie-Ling Li.
Abstract
OBJECTIVE: To review theories and technologies of big data mining and their application in clinical medicine. DATA SOURCES: Literatures published in English or Chinese regarding theories and technologies of big data mining and the concrete applications of data mining technology in clinical medicine were obtained from PubMed and Chinese Hospital Knowledge Database from 1975 to 2015. STUDY SELECTION: Original articles regarding big data mining theory/technology and big data mining's application in the medical field were selected.Entities:
Mesh:
Year: 2016 PMID: 26960378 PMCID: PMC4804421 DOI: 10.4103/0366-6999.178019
Source DB: PubMed Journal: Chin Med J (Engl) ISSN: 0366-6999 Impact factor: 2.628
Technical features and application of various data mining theories
| Data mining theories | Advantage | Weakness | Application examples | Author | Reference |
|---|---|---|---|---|---|
| Fuzzy theory | Deals with incomplete data; does not need a complex mathematical model; it is easy to understand and use | Not thorough | Predicts the prognosis of prostate cancer; can classify and recognize medical images | Kuo | [ |
| Rough set theory | No prior information is required to process data; is able to handle data that cannot be distinguished from available properties; is simple and easy to operate | Difficult to deal with continuous discrete attributes directly; unable to obtain sufficient support for objective facts | Helps individual patients identify homogeneous subgroups | Gil-Herrera | [ |
| Cloud theory | Provides the model for quantitative and qualitative analysis of the uncertain; is characterized by virtualization, high extension, and service diversity | Data security and privacy protection properties are questionable | Provides new health management model based on cloud platform | Liu and Xiao | [ |
| Dempster-Shafer theory | Satisfies the condition weaker than the Bayesian probability theory; has the ability to express uncertainty directly | Conflicting evidence fusion may obtain counterintuitive conclusion | Assesses cancer treatment outcomes | Lian and Denoeux | [ |
| Artificial neural network | Has a strong ability to deal with the uncertain information; can deal with both categorical and continuous variables; good robustness, self–adaptability, parallel processing, distributed storage, and high fault tolerance | Not suitable for high-dimensional variables; difficult to understand learning and decision-making process of network; it has opacity | Assesses prognoses of heart health | Sunkaria | [ |
| Genetic algorithm | Can handle all types of data in parallel; is easy to combine with other models; solves the optimization of overall situation problems with solution that is independent of the initial conditions | Too many parameters are needed; encoding is difficult and the amount of calculation is large; can only ensure optimization trend of overall situation; cannot ensure the optimal results reached by the probability | Predicts progress in Alzheimer’s disease | Johnson | [ |
| Inductive learning theory | Has high speed of classification; is suitable for large database without need of priori knowledge | Will be misled by implicit inductive bias when training data is insufficient | Test difference control hypothesis | Birnbaum | [ |
| Bayesian network | Has conditional independence; simplifies problem solving; complexity of knowledge acquisition and reasoning is low | Uses acyclic assumption and applies to static analysis only | Predicts the risk of radiation pneumonia | Lee | [ |
| Decision tree | Uses the decision tree diagram; is intuitive, simple, and clear; Has high speed of classification; the decision-making process is visible and suitable for large–scale data processing | Difficult to express complex concepts; insufficient emphasis on the relationship between the same characters; its noise immunity is poor | Assesses risk of public health events | Yang | [ |
| Pattern recognition | Easily identified; reflects pattern structures; has a strong anti-interference ability | Has rejection rate and error rate; difficult to select base unit when interference is encountered | Denoises magnetic resonance image | Shi and Luo | [ |
| High-performance computing | High computing power | High cost and difficulty in designing programming | A new method for high-performance computing is proposed: DAIRRy-BLUP | De Coninck | [ |
| Statistical analysis | The most basic data mining technology; operation is simple with less workload | Poor accuracy and reliability; high requirement for data integrity | Provides treatment plans for children with epilepsy | Hortigüela- Saeta | [ |
BLUP: Best linear unbiased prediction.