| Literature DB >> 34398394 |
Sri Venkat Gunturi Subrahmanya1, Dasharathraj K Shetty2, Vathsala Patil3, B M Zeeshan Hameed4, Rahul Paul5, Komal Smriti6, Nithesh Naik7, Bhaskar K Somani8.
Abstract
Data science is an interdisciplinary field that extracts knowledge and insights from many structural and unstructured data, using scientific methods, data mining techniques, machine-learning algorithms, and big data. The healthcare industry generates large datasets of useful information on patient demography, treatment plans, results of medical examinations, insurance, etc. The data collected from the Internet of Things (IoT) devices attract the attention of data scientists. Data science provides aid to process, manage, analyze, and assimilate the large quantities of fragmented, structured, and unstructured data created by healthcare systems. This data requires effective management and analysis to acquire factual results. The process of data cleansing, data mining, data preparation, and data analysis used in healthcare applications is reviewed and discussed in the article. The article provides an insight into the status and prospects of big data analytics in healthcare, highlights the advantages, describes the frameworks and techniques used, briefs about the challenges faced currently, and discusses viable solutions. Data science and big data analytics can provide practical insights and aid in the decision-making of strategic decisions concerning the health system. It helps build a comprehensive view of patients, consumers, and clinicians. Data-driven decision-making opens up new possibilities to boost healthcare quality.Entities:
Keywords: Big data; Data analytics; Data mining; Healthcare; Healthcare informatics
Mesh:
Year: 2021 PMID: 34398394 PMCID: PMC9308575 DOI: 10.1007/s11845-021-02730-z
Source DB: PubMed Journal: Ir J Med Sci ISSN: 0021-1265 Impact factor: 2.089
Open
source big data platforms and their utilities
| Big data tools | Utilities |
|---|---|
| Apache Hadoop | It is designed to scale up to thousands of machines from single servers, each of which offers local storage The framework enables users to easily build and validate distributed structures, distributes data, and operates across machines automatically |
| Apache Spark | The Hadoop Distributed File system (HDFS) and other data stores are flexible to work with Spark offers integrated Application Program Interfaces (APIs) which enable users to write apps in different languages |
| Apache Cassandra | Cassandra is highly flexible and can add additional hardware that can handle more data and users on demand Cassandra adapts to all possible data types such as unstructured, structured, and semi-structured supporting features such as Atomicity, Consistency, Isolation, and Durability (ACID) |
| Apache Storm | In several cases, Apache Storm is easy to integrate with any programming language, with real-time analytics, online machine learning, and computation Apache Storm uses parallel calculations which run across a machine cluster |
| RapidMiner | RapidMiner provides a variety of products for a new process of data mining It provides an integrated data preparation environment, machine learning, text mining, visualization, predictive analysis, application development, prototype validation, and implementation. statistic modeling, deployment |
| Cloudera | Users can spin clusters, terminate them, and only pay for what they need Cloudera Enterprise can be deployed and run on AWS and Google Cloud Platforms by users |
Fig. 1Sources of big data in healthcare
Fig. 2Various applications of data science in healthcare
Fig. 3The disease analysis system
Fig. 4Role of big data in accelerating the treatment process
Fig. 5Elemental structure of patient-centric healthcare and ecosystem