Literature DB >> 29746254

Big Data Analytics in Medicine and Healthcare.

Abstract

This paper surveys big data with highlighting the big data analytics in medicine and healthcare. Big data characteristics: value, volume, velocity, variety, veracity and variability are described. Big data analytics in medicine and healthcare covers integration and analysis of large amount of complex heterogeneous data such as various - omics data (genomics, epigenomics, transcriptomics, proteomics, metabolomics, interactomics, pharmacogenomics, diseasomics), biomedical data and electronic health records data. We underline the challenging issues about big data privacy and security. Regarding big data characteristics, some directions of using suitable and promising open-source distributed data processing software platform are given.

Entities: Chemical Disease Species

Keywords: Big Data Analytics; Data Mining; Health Informatics; Healthcare Information Systems

Mesh：

Year: 2018 PMID： 29746254 PMCID： PMC6340124 DOI： 10.1515/jib-2017-0030

Source DB: PubMed Journal: J Integr Bioinform ISSN： 1613-4516

Introduction

To obtain the best services and care for the patients, healthcare organizations in many countries have proposed various models of healthcare information systems. These models for personalized, predictive, participatory and preventive medicine are based on using of electronic health records (EHRs) and huge amounts of complex biomedical data and high-quality – omics data [1]. Contemporarily genomics and postgenomics technologies produce huge amounts of raw data about complex biochemical and regulatory processes in the living organisms [2]. These -omics data are heterogeneous, and very often they are stored in different data formats. Similar to these - omics data, the EHRs data are also in heterogeneous formats. The EHRs data can be structured, semi-structured or unstructured; discrete or continuous. Big data in healthcare and medicine refers to these various large and complex data, which they are difficult to analyse and manage with traditional software or hardware [3], [4]. Big data analytics covers integration of heterogeneous data, data quality control, analysis, modeling, interpretation and validation [5]. Application of big data analytics provides comprehensive knowledge discovering from the available huge amount of data. Particularly, big data analytics in medicine and healthcare enables analysis of the large datasets from thousands of patients, identifying clusters and correlation between datasets, as well as developing predictive models using data mining techniques [2]. Big data analytics in medicine and healthcare integrates analysis of several scientific areas such as bioinformatics, medical imaging, sensor informatics, medical informatics and health informatics. A survey of big data cases in medical and healthcare institutions/organizations is given in [6]. The new knowledge discovered by big data analytics techniques should provide comprehensive benefits to the patients, clinicians and health policy makers [7]. The remainder of the paper is organized as follows. Related work is described in the second section. Section 3 describes characteristics of big data, while big data analytics is depicted in the subsequent section. The next section explains some challenging issues about big data analytics techniques, while big data privacy and security are described in Section 6. Last section concludes this paper with discussion and further works.

Related Work

The rapid development of the emerging information technologies, experimental technologies and methods, cloud computing, the Internet of Things, social networks supplies the amounts of generated data that is growing tremendously in numerous research fields [8]. On this point, contemporarily genomics and postgenomics technologies produce huge amounts of raw data about complex biochemical and regulatory processes in the living organisms [2]. These high throughput – omics data provide comprehensive insight towards different kinds of molecular profiles, changes and interactions, such as knowledge allied to the genome, epigenome, transcriptome, proteome, metabolome, interactome, pharmacogenome, diseasome, etc. [9]. These – omics data are heterogeneous and very often stored in different data formats. The main aims and characteristics of the different – omics disciplines are tabled in Table 1.

Table 1:

The main aims of the variety of – omics disciplines.

– omics	The aim of study
Genomics	Study of the set of all genes in an organism, in a broader context non-coding parts of DNA are subject of study
Epigenomics	Study of all epigenomic modifications on the genetic material within a cell
Transcriptomics	Study of the expression level of all RNAs in particular cell, or cell population
Proteomics	Study of all possible interactions that a protein can present, complete set of proteins expressed by a genome in a given cell type or organism, under defined conditions, at a given time
Metalobomics	Study of the whole set of the metabolites (small-molecule compounds) within a cell, an organelle, a tissue, an organ or an organism
Interactomics	Study of the entire set of interactions (both: physical and indirect interactions) between and among proteins and other molecules within a particular cell and consequences of those interactions. These interactions are displayed as graphs and called biological networks
Pharmacogenomics	Study which combines pharmacology and genomics in order to analyse the role of the genome in individual’s drug response
Diseasomics	Study of all diseases and disorders of an organism, often focusing on those diseases and disorders caused by genetic modifications

The main aims of the variety of – omics disciplines. Similar to these – omics data, the EHRs data are also stored in heterogeneous formats. The EHRs data, which can be structured, semi-structured or unstructured; discrete or continuous, contain personal patients’ data, clinical notes, diagnoses, administrative data, charts, tables, prescriptions, procedures, lab tests, medical images, magnetic resonance imaging (MRI), ultrasound, computer tomography (CT) data. Some of these data are acquired from wearable sensors or capture from medical monitoring devices, with different collection frequency [5] that makes these data to have complex features and high dimensions [10]. Dealing with noisiness and incompleteness of EHRs are still challenging task and these shortcomings should be consider while applying data mining techniques [11]. These growing amounts of various – omics data need to be collect, clean, store, transform, transfer, visualize and deliver in a suitable manner to be represented to the clinicians [12]. The processing of these big data in medicine and healthcare can be accelerating by using cloud computing and powerful multicore central processing units (CPUs), graphics processing units (GPU) and field-programmable gate arrays (FPGAs) with parallel processing methods.

Big Data Characteristics

The term big data is described by the following characteristics: value, volume, velocity, variety, veracity and variability, denoted as 6 “Vs” [13], [14], shown in Figure 1. Besides these 6 “Vs”, some authors has defined more than these 6 properties to describe big data characteristics [15].

Figure 1:

The 6 V’s of big data.

The 6 V’s of big data. The volume of health and medical data is expected to raise intensely in the years ahead, usually measured in terabytes, petabytes even yottabytes [14], [16]. Volume refers to the amount of data, while velocity refers to data in motion as well as and to the speed and frequency of data creation, processing and analysis. Complexity and heterogeneity of multiple datasets, which can be structured, semi-structured and unstructured, refer to the variety. Veracity referrers to the data quality, relevance, uncertainty, reliability and predictive value [14], while variability regards about consistency of the data over time. The value of the big data refers to their coherent analysis, which should be valuable to the patients and clinicians. Considering the big data characteristics, data searching, storage and analysis, a very appropriate and promising software platform for development of applications that can handle big data in medicine and healthcare is the open-source distributed data processing platform Apache Hadoop MapReduce [1], [17] that is based on data-intensive computing and NoSQL data modeling techniques [18].

Big Data Analytics

Applications of big data analytics can improve the patient-based service, to detect spreading diseases earlier, generate new insights into disease mechanisms, monitor the quality of the medical and healthcare institutions as well as provide better treatment methods [19], [20], [21]. Data mining techniques employed on EHRs, web and social media data enable identifying the optimal practical guidelines in the hospitals, identifying the association rules in the EHRs [22] and revealing the disease monitoring and health-based trends. Moreover, integration and analysis of the data with different nature, such as social and scientific, can lead to new knowledge and intelligence, exploring new hypothesis, identifying hidden patterns [14]. Nowadays, smart phones are excellent platforms to deliver personal messages to patients to involve them in behavioral changes to improve their wellbeing and health conditions. The mobile phone messages can substitute delivering of medical and motivational advices to the patients [14].

Challenges in Big Data Analytics

Regarding collection of large amount data, some challenging issues should be considered. Obtaining high-throughput – omics data is tied to the cost of experimental measurements. Concerning heterogeneity of the data sources, the noise of the experimental – omics data and the variety of the experimental techniques, environmental conditions, biological nature should be considered, before integration of these heterogeneous data and before employing of the data mining methods. Different data mining techniques can be applied on these heterogeneous biomedical data sets, such as: anomaly detection, clustering, classification, association rules as well as summarization and visualization of those big data sets. These shortcomings might lead to the unreliability of some of the data points, such as missing values or outliers. Despite of these drawbacks of the – omics data, EHRs data are very influenced by the staff who entered the patient’s data, which can lead to entering missing values, incorrect data as a result of mistakes, misunderstanding or wrong interpretation of the original data [5]. Integration of data from various databases and standardization for laboratory protocols and values still remain challenging issues [10]. High dimensionality of the – omics data means, that there have many more dimensions or features than the number of samples, and on the other side the EHRs data which regard to the individuals/patients, makes data mining techniques to be more challenging task. The subsequent stage is the pre-processing of the data, which usually envelop handling noisy data, outliers, missing values, data transformation and normalization. This data pre-processing enables to be applied statistical techniques and data mining methods and thus the big data analytics quality and outcomes can improve and can result with discovering of novel knowledge. This novel knowledge obtained by integration of the – omics and EHRs data should results with improving of the implemented healthcare to the patients as well to advanced decision making by the healthcare decision policy makers.

Big Data Privacy and Security

Two important issues towards big data in healthcare and medicine are security and privacy of the individuals/patients [14], [23]. All medical data are very sensitive and different countries consider these data as legally possessed by the patients [2]. To address these security and privacy challenges, the big data analytics software solutions should use advanced encryption algorithms and pseudo-anonymization of the personal data. These software solutions should provide security on the network level and authentication for all involved users, guarantee privacy and security, as well as set up good governance standards and practices.

Discussion and Future Work

Big data analytics in medicine and healthcare is very promising process of integrating, exploring and analysing of large amount complex heterogeneous data with different nature: biomedical data, experimental data, electronic health records data and social media data. Integration of such diverse data makes big data analytics to intertwine several fields, such as bioinformatics, medical imaging, sensor informatics, medical informatics, health informatics and computational biomedicine. As a further work, the big data characteristics provide very appropriate basis to use promising software platforms for development of applications that can handle big data in medicine and healthcare. One such platform is the open-source distributed data processing platform Apache Hadoop MapReduce that use massive parallel processing (MPP) [20], [24]. These applications should enable applying data mining techniques to these heterogeneous and complex data to reveal hidden patterns and novel knowledge from the data. Recent hardware innovations in processor technology, newer kinds of memories/network architecture will minimize the time spent in moving the data from storage to the processor in a distributed setting [25].

29 in total

1. Meta-analysis in the era of big data.

Authors: Lucía Silva-Fernández; Loreto Carmona
Journal: Clin Rheumatol Date: 2019-07-04 Impact factor: 2.980

Review 2. Molecular networks in Network Medicine: Development and applications.

Authors: Edwin K Silverman; Harald H H W Schmidt; Eleni Anastasiadou; Lucia Altucci; Marco Angelini; Lina Badimon; Jean-Luc Balligand; Giuditta Benincasa; Giovambattista Capasso; Federica Conte; Antonella Di Costanzo; Lorenzo Farina; Giulia Fiscon; Laurent Gatto; Michele Gentili; Joseph Loscalzo; Cinzia Marchese; Claudio Napoli; Paola Paci; Manuela Petti; John Quackenbush; Paolo Tieri; Davide Viggiano; Gemma Vilahur; Kimberly Glass; Jan Baumbach
Journal: Wiley Interdiscip Rev Syst Biol Med Date: 2020-04-19

3. The roles of the US National Library of Medicine and Donald A.B. Lindberg in revolutionizing biomedical and health informatics.

Authors: Randolph A Miller; Edward H Shortliffe
Journal: J Am Med Inform Assoc Date: 2021-11-25 Impact factor: 7.942

Review 4. Towards the Interpretability of Machine Learning Predictions for Medical Applications Targeting Personalised Therapies: A Cancer Case Survey.

Authors: Antonio Jesús Banegas-Luna; Jorge Peña-García; Adrian Iftene; Fiorella Guadagni; Patrizia Ferroni; Noemi Scarpato; Fabio Massimo Zanzotto; Andrés Bueno-Crespo; Horacio Pérez-Sánchez
Journal: Int J Mol Sci Date: 2021-04-22 Impact factor: 5.923

Review 5. Need for Interactive Data Visualization in Public Health Practice: Examples from India.

Authors: K A Narayan; M Siva Durga Prasad Nayak
Journal: Int J Prev Med Date: 2021-02-24

Review 6. Precision medicine: Concept and tools.

Authors: Nardeep Naithani; Sharmila Sinha; Pratibha Misra; Biju Vasudevan; Rajesh Sahu
Journal: Med J Armed Forces India Date: 2021-07-03

7. Short-term Revision Risk of Patellofemoral Arthroplasty Is High: An Analysis from Eight Large Arthroplasty Registries.

Authors: Peter L Lewis; Francois Tudor; Michelle Lorimer; John McKie; Eric Bohm; Otto Robertsson; Keijo T Makela; Jaason Haapakoski; Ove Furnes; Christoffer Bartz-Johannessen; Rob G H H Nelissen; Liza N Van Steenbergen; Donald C Fithian; Heather A Prentice
Journal: Clin Orthop Relat Res Date: 2020-06 Impact factor: 4.755

Review 8. The Challenges of Diagnostic Imaging in the Era of Big Data.

Authors: Marco Aiello; Carlo Cavaliere; Antonio D'Albore; Marco Salvatore
Journal: J Clin Med Date: 2019-03-06 Impact factor: 4.241

9. Real-time prediction of intradialytic relative blood volume: a proof-of-concept for integrated cloud computing infrastructure.

Authors: Sheetal Chaudhuri; Hao Han; Caitlin Monaghan; John Larkin; Peter Waguespack; Brian Shulman; Zuwen Kuang; Srikanth Bellamkonda; Jane Brzozowski; Jeffrey Hymes; Mike Black; Peter Kotanko; Jeroen P Kooman; Franklin W Maddux; Len Usvyat
Journal: BMC Nephrol Date: 2021-08-09 Impact factor: 2.388

10. Big data and predictive analytics in healthcare in Bangladesh: regulatory challenges.

Authors: Shafiqul Hassan; Mohsin Dhali; Fazluz Zaman; Muhammad Tanveer
Journal: Heliyon Date: 2021-05-29