| Literature DB >> 35318232 |
Pui Pui Tang1, I Lam Tam1, Yongliang Jia2,3, Siu-Wai Leung4,5.
Abstract
INTRODUCTION: Big data technologies have been talked up in the fields of science and medicine. The V-criteria (volume, variety, velocity and veracity, etc) for defining big data have been well-known and even quoted in most research articles; however, big data research into public health is often misrepresented due to certain common misconceptions. Such misrepresentations and misconceptions would mislead study designs, research findings and healthcare decision-making. This study aims to identify the V-eligibility of big data studies and their technologies applied to environmental health and health services research that explicitly claim to be big data studies. METHODS AND ANALYSIS: Our protocol follows Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols (PRISMA-P). Scoping review and/or systematic review will be conducted. The results will be reported using PRISMA for Scoping Reviews (PRISMA-ScR), or PRISMA 2020 and Synthesis Without Meta-analysis guideline. Web of Science, PubMed, Medline and ProQuest Central will be searched for the articles from the database inception to 2021. Two reviewers will independently select eligible studies and extract specified data. The numeric data will be analysed with R statistical software. The text data will be analysed with NVivo wherever applicable. ETHICS AND DISSEMINATION: This study will review the literature of big data research related to both environmental health and health services. Ethics approval is not required as all data are publicly available and involves confidential personal data. We will disseminate our findings in a peer-reviewed journal. PROSPERO REGISTRATION NUMBER: CRD42021202306. © Author(s) (or their employer(s)) 2022. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.Entities:
Keywords: biotechnology & bioinformatics; health services administration & management; occupational & industrial medicine; public health
Mesh:
Year: 2022 PMID: 35318232 PMCID: PMC8943752 DOI: 10.1136/bmjopen-2021-053447
Source DB: PubMed Journal: BMJ Open ISSN: 2044-6055 Impact factor: 2.692
V-criteria for big data, particularly data generation and processing
| Item | Details |
| Volume | Terabytes, petabytes, or above |
| Variety | In various forms (ie, structured, semi-structured, and unstructured data) and from various sources of data |
| Velocity | In real time or near real time |
| Veracity | Highly consistent, traceable, and reliable data |
Figure 1Flowchart diagram for study selection.
Information extraction form (example)
| Title | Author(s) | Year | Country | Journal | Type of study | Subgroup | V(s) | Volume (number of records) | Volume per record | Volume (total sample size) | Variety (type(s) of data) | Velocity (frequency of generation) | Velocity (frequency of handling, recording, publishing) | Veracity | Data analysis |
| … | … | 2020 | … | … | Original research article | Big data application: environmental pollution | 3Vs | … | … | … | Structured/ unstructured | Real time | Daily | N/A | … |
| … | … | 2013 | … | … | Original research article | Big data application: work environment and health | 4Vs | … | … | … | Unstructured | Near real time | At time | … | … |
| … | … | 2017 | … | … | Methodology /method | Big data techniques | 2Vs | … | … | … | Structured | N/A | N/A | N/A | … |
| … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … |
N/A, not available from the papers.
Descriptive statistics of the articles on big data in environmental health or health services
| Environmental heath/health services | |
| Publication information | |
| Years of publication | …(…) (…) |
| JCR quartiles (Q1—Q4) | |
| Countries of corresponding authors | |
| Institutions | |
| Departments (medical and IT, etc) | |
| Funders (universities, governments, companies, etc) | |
| Volume | |
| Number of records | …(…) (…) |
| Sample size/number of participants | …(…) (…) |
| Number of variables | …(…) (…) |
| Time points for longitudinal studies | |
| Data size (terabytes) | |
| Variety (type(s) of data), N (%) | |
| Structured | … (…%) |
| Semi-structured | … (…%) |
| Non-structured | … (…%) |
| … | … |
JCR, Journal Citation Reports.
Examples of data synthesis
| Type of study | Title | Authors | Year | Categories | V(s) | Volume (number of records) | Volume per record | Volume (total sample size) | Variety (type(s) of data) | Velocity (frequency of generation) | Velocity (frequency of handling, recording, publishing) | Veracity | Data analysis |
| Original research article | … | … | 2020 | Big data application: environmental pollution | 3Vs | Positive | Unclear | Positive | Positive | Positive | Negative | N/A | Negative |
| … | … | 2013 | Big data application: work environment and health | 4Vs | Positive | Positive | Positive | Positive | Positive | Positive | Positive | Negative | |
| Methodology/method | … | … | 2017 | Big data techniques | 2Vs | Positive | Negative | Positive | Negative | N/A | N/A | N/A | Negative |
| … | … | … | … | … | … | … | … | … | … | … | … | … | … |
Application of big data analytics techniques for environmental health
| Analytics techniques | Area | Environmental health | Sources |
| Machine learning | Water | … | … |
| Air | … | … | |
| Graph analytics | Food | … | … |
| Soil | … | … | |
| Wastes | … | … | |
| … | … | … | … |
Application of big data analytics techniques for health services
| Analytics techniques | Area | Healthcare services | Sources |
| ML (Machine learning) | Neurology | … | … |
| Neural network | Epidemiology | … | … |
| … | … | … | … |
Descriptive statistics of the characteristics of the included studies
| Field of techniques | Number of papers |
| General description | |
| AI only, N (%) | …(…%) |
| Traditional statistics only N (%) | …(…%) |
| Both AI and traditional methods N (%) | …(…%) |
| AI methods, N (%) | …(…%) |
| Deep learning, N | … |
| Classic ML, N | … |
| Linear regression | … |
| Logistic regression | … |
| Naïve Bayes | … |
| … | … |
| Traditional methods, N (%) | …(…%) |
| Regression methods, N | … |
| … | … |
Classic ML: linear regression, logistic regression, naïve Bayes, decision tree, k-nearest neighbour, random forest, discriminant analysis, support vector machine and neural network.
ML, machine learning.
Summary statistics of the included studies on different fields of application for environmental health
| Field of application | Number of papers, N (%) |
| Environmental pollution, N (%) | …(…%) |
| Water | …(…%) |
| Air | …(…%) |
| Food | …(…%) |
| Soil | …(…%) |
| Wastes | …(…%) |
| Environmental health hazards, N (%) | …(…%) |
| Chemical | …(…%) |
| Climate | …(…%) |
| Biological | …(…%) |
| Environmental exposure, N (%) | …(…%) |
| Monitoring | …(…%) |
| Exposure assessment | …(…%) |
| Environmental illness, N (%) | …(…%) |
| Cancer | …(…%) |
| Respiratory | …(…%) |
| Birth defects and developmental diseases | …(…%) |
| Work environment and health, N (%) | …(…%) |
| Occupational exposure | …(…%) |
| Occupational disease | …(…%) |
| … | … |
| … |
Summary statistics of the included studies on different fields of application for health services
| Field of application | Number of papers, N (%) |
| Medical specialties, N (%) | …(…%) |
| Imaging | …(…%) |
| Neurology | …(…%) |
| Public health, N (%) | …(…%) |
| Public health | …(…%) |
| Epidemiology | …(…%) |
| … | … |
| … |
Environmental health or health services data sources
| Data sources | Data types | Sources |
| Clinical data | EHRs, clinical trial data, etc | … |
| Wearable and sensors data | Personal vital signs, ECG, etc | … |
| … | … | … |
EHRs, electronic health records.