Literature DB >> 28482822

Promise and pitfalls in the application of big data to occupational and environmental health.

David M Stieb1,2, Cécile R Boot3, Michelle C Turner4,5,6,7.   

Abstract

Entities:  

Mesh:

Year:  2017        PMID: 28482822      PMCID: PMC5422881          DOI: 10.1186/s12889-017-4286-8

Source DB:  PubMed          Journal:  BMC Public Health        ISSN: 1471-2458            Impact factor:   3.295


× No keyword cloud information.
Is “big data” merely a catchphrase, or does the approach hold real promise in informing occupational and environmental health? Can challenges related to messy and unrepresentative data and spurious findings be overcome?

Promise

The potential power of big data to inform public health decision-making has been widely recognized [1, 2]. However, there is a paucity of published primary research employing these methods in this journal and elsewhere [3, 4]. The American Journal of Public Health encouraged new research in this area and recently appointed an inaugural associate editor for digital health [3]. Big data are typically defined in relation to the “three Vs”, volume, velocity and variety (and more recently, variability, veracity and value) [5]. Other defining characteristics include the emergence of new data sources and providers such as social media, mobile applications and wearable technology such as fitness trackers (the “quantified self” [6]), the need for new analytical methods such as machine learning, non-traditional multi-disciplinary partnerships and real-time analysis and forecasting [7]. Along similar lines, sharing of clinical trial and other study data has also been advocated as a means of broadening access to and more fully exploiting the collective power of data. In addition to increasing statistical power, which could potentially facilitate detecting small signals earlier, which may be particularly important in environmental health, advantages of pooling data include enhanced ability to examine heterogeneity between diverse populations, and consideration of novel hypotheses not tested by the original investigators [8]. Data sharing initiatives must overcome barriers including providing protections for original investigators, particularly those in low-resource countries [9], and issues related to data ownership, privacy and security [8]. The Healthy Birth, Growth, and Development–Knowledge Integration initiative is an example of a data sharing initiative which has navigated many of these issues [8]. A need has also been identified to address barriers to the international sharing of routinely collected public health data, including technical, motivational, economic, political, legal and ethical factors [10]. Exposure analysis is the keystone of occupational and environmental health. As a result, the concept of big data in this context is linked closely to that of the exposome, the totality of human environmental, occupational and other exposures from conception to death [11]. These exposures interact with other determinants of internal dose and health effects characterized by their own data-rich “omes” – the genome, metabolome, lipidome, transcriptome and proteome, among others, analysis of all of which requires novel data analysis methods [11-14]. The exposome may be characterized using a vast array of methods including measurement of both exogenous and endogenous biomarkers in biological specimens, direct environmental monitoring using dedicated sensors, and indirect sources such as operational data from metering and energy use, and facilities management data [12, 15–17].

Pitfalls

As a counterpoint to the potential of big data, one of the primary concerns is the potential for spurious findings, (described at their worst as “fanciful rubbish” or “big error”) that can be generated by employing “much bigger and messier data” [2, 7]. Related to these limitations of big data are epistemological issues around the approach to how they are analyzed and how knowledge is generated. Some have gone so far as to argue that big data analytics allow the data to “speak for themselves,” free of a priori hypotheses, and by extension of investigator bias, but others have countered that whether desirable or not, this is unattainable since all data are in fact framed by the methods and constructs under which they are collected [2, 18]. A hybrid approach has been advanced where big data analysis, machine learning or “knowledge discovery” is guided by theory and practical experience, including a more selective approach to choosing appropriate data sources and analysis methods, as well as ultimately testing hypotheses generated from initial analyses [2, 18]. An additional concern is that to the extent that big data relies on consumer “data trails,” mobile devices, wearable technology or electronic medical records, they may exclude those with limited footprints owing to barriers related to age, race, socioeconomic status, access to care or health literacy [5]. This has the potential to amplify environmental injustice concerns to the extent that it further disadvantages populations who already experience a disproportionate health burden related to environmental exposures [19].

Application to occupational and environmental health

Notwithstanding these important caveats, the potential for big data to inform public health and occupational and environmental health more specifically has been recognized by several funding agencies. The National Institute of Environmental Health Sciences is part of a National Institutes of Health-wide data science initiative, “Big data to knowledge” (BD2K), which aims to facilitate wide use of data, develop methods, software and tools, build capacity through training, and support data infrastructure [20]. The European Commission recently issued a call for proposals pertaining to “Big data supporting Public Health Policies,” focusing on “how to better acquire, manage, share, model, process and exploit” big data for public health purposes, highlighting the opportunities they may provide to identify interactions between environmental, genetic and behavioral determinants of health [21]. Funded initiatives include the European Exposome Cluster [22], US Health and Exposome Research Center: Understanding Lifetime Exposures (HERCULES) [23], and the CANadian Urban Environmental (CANUE) Health Research Consortium [24]. Research in both occupational and environmental health has made widespread use of large datasets for many years. It is instructive to consider how it has been transformed by increasing application of big data and data sharing. In the environmental health realm, there is a long history in air pollution epidemiology of combining routinely available administrative health or vital statistics data, with environmental monitoring data, particularly to examine effects of short term variability in exposure using time-series or case-crossover analysis [25]. This approach was subsequently applied to examining the effects of long term exposure by linking an existing cohort, the American Cancer Society cohort [26], to routinely available environmental data, in order to relatively inexpensively replicate findings from a dedicated cohort study, the Six Cities Study [27]. This approach has now been applied to many other cohorts, and further by creating synthetic cohorts by linking census or tax data to vital statistics data and incorporating spatially comprehensive exposure data combining ground based monitoring, satellite observations, chemical/meteorological models and land use patterns [28, 29]. There are also examples of exploiting clinical trial data to examine associations with air pollution, unrelated to the original study hypothesis, e.g. linking clinical data on carotid intima media thickness as a measure of development of atherosclerosis, to air pollution exposure [30]. While social media as a source of big data have been dismissed as “frivolous,” in addition to being used to track communicable disease for surveillance purposes, there are examples of application to chronic disease and environmental health such as development of predictive models of asthma using Twitter, Google searches and air monitoring data [31]. Asthma exacerbations are well documented in relation to air pollution exposure, and asthma also lends itself to “self-quantification” in relation to tracking of lung function and symptoms. Licksai et al. [32] developed a mobile application which combines these features of asthma with air quality forecasts and advice. Similarly, in occupational health, workplace injury and illness data from physician reporting, employer records and workers compensation claims have been a longstanding resource for research and surveillance. Recently, the US Occupational Safety and Health Administration strengthened reporting requirements and improved public access to these data, motivated partly by increasing the utility of the data for research [33]. In Europe, investigators employed 20 physician reporting and compensation claim datasets from 10 countries to examine trends in occupational disease incidence, accounting for the diversity of data collection methods employed in each country, and demonstrated the potential of data sharing in this area [34]. A key aim of exploiting these data is to improve the capacity to predict and prevent injury and disease in the workplace [35]. Evaluating longer term sequelae of workplace disease and injury requires different types of data. Scandinavia has a long tradition of linking cohort studies to register data to gain insight into predictors of sick leave and work disability [36]. The social security system is a determining factor for the content of registers and there may be important differences between countries. While sick leave benefits are taken over by the social security system in Scandinavia relatively early in the process, in contrast in the Netherlands, the employer is responsible for payment of salary during the first two years of sick leave. As a result, there is no national registration of sick leave, which is a disincentive for employers for valid company registration, reducing its validity as a measure. Nonetheless, first attempts are being made in the Netherlands to link occupational health cohort data to national registers that are a reliable source for measures related to source of income [37]. Social security data have also been widely used to examine work disability benefits and transitions from work to retirement.

Conclusions

Big data and data sharing have the potential to inform occupational and environmental health by exploiting innovations related to non-traditional data sources or providers and novel partnerships. Promising applications include real time analysis and forecasting, and innovative analyses of clinical trial or observational data originally collected for other purposes. However, in order to support these innovations, advances are also required in data curation, protection of privacy and security, as well as data analysis methods. Challenges related to messy and unrepresentative data and spurious findings, as well as epistemological issues and equity considerations must also be addressed.
  25 in total

1.  Trends in incidence of occupational asthma, contact dermatitis, noise-induced hearing loss, carpal tunnel syndrome and upper limb musculoskeletal disorders in European countries from 2000 to 2012.

Authors:  S Jill Stocks; Roseanne McNamee; Henk F van der Molen; Christophe Paris; Pavel Urban; Giuseppe Campo; Riitta Sauni; Begoña Martínez Jarreta; Madeleine Valenty; Lode Godderis; David Miedinger; Pascal Jacquetin; Hans M Gravseth; Vincent Bonneterre; Maylis Telle-Lamberton; Lynda Bensefa-Colas; Serge Faye; Godewina Mylle; Axel Wannag; Yogindra Samant; Teake Pal; Stefan Scholz-Odermatt; Adriano Papale; Martijn Schouteden; Claudio Colosio; Stefano Mattioli; Raymond Agius
Journal:  Occup Environ Med       Date:  2015-01-09       Impact factor: 4.402

2.  The effect of ill health and socioeconomic status on labor force exit and re-employment: a prospective study with ten years follow-up in the Netherlands.

Authors:  Merel Schuring; Suzan J W Robroek; Ferdy W J Otten; Coos H Arts; Alex Burdorf
Journal:  Scand J Work Environ Health       Date:  2012-09-07       Impact factor: 5.024

Review 3.  Informatics and Data Analytics to Support Exposome-Based Discovery for Public Health.

Authors:  Arjun K Manrai; Yuxia Cui; Pierre R Bushel; Molly Hall; Spyros Karakitsios; Carolyn J Mattingly; Marylyn Ritchie; Charles Schmitt; Denis A Sarigiannis; Duncan C Thomas; David Wishart; David M Balshaw; Chirag J Patel
Journal:  Annu Rev Public Health       Date:  2016-12-23       Impact factor: 21.981

4.  An association between air pollution and mortality in six U.S. cities.

Authors:  D W Dockery; C A Pope; X Xu; J D Spengler; J H Ware; M E Fay; B G Ferris; F E Speizer
Journal:  N Engl J Med       Date:  1993-12-09       Impact factor: 91.245

Review 5.  Toward Greater Implementation of the Exposome Research Paradigm within Environmental Epidemiology.

Authors:  Jeanette A Stingone; Germaine M Buck Louis; Shoji F Nakayama; Roel C H Vermeulen; Richard K Kwok; Yuxia Cui; David M Balshaw; Susan L Teitelbaum
Journal:  Annu Rev Public Health       Date:  2017-01-06       Impact factor: 21.981

6.  Mining the Quantified Self: Personal Knowledge Discovery as a Challenge for Data Science.

Authors:  Tom Fawcett
Journal:  Big Data       Date:  2015-12       Impact factor: 2.128

7.  Trends in work disability with mental diagnoses among social workers in Finland and Sweden in 2005-2012.

Authors:  O Rantonen; K Alexanderson; J Pentti; L Kjeldgård; J Hämäläinen; E Mittendorfer-Rutz; M Kivimäki; J Vahtera; P Salo
Journal:  Epidemiol Psychiatr Sci       Date:  2016-09-09       Impact factor: 6.892

8.  Ambient air pollution and atherosclerosis in Los Angeles.

Authors:  Nino Künzli; Michael Jerrett; Wendy J Mack; Bernardo Beckerman; Laurie LaBree; Frank Gilliland; Duncan Thomas; John Peters; Howard N Hodis
Journal:  Environ Health Perspect       Date:  2005-02       Impact factor: 9.031

Review 9.  Epidemiological time series studies of PM2.5 and daily mortality and hospital admissions: a systematic review and meta-analysis.

Authors:  R W Atkinson; S Kang; H R Anderson; I C Mills; H A Walton
Journal:  Thorax       Date:  2014-04-04       Impact factor: 9.139

10.  Biomonitoring in the Era of the Exposome.

Authors:  Kristine K Dennis; Elizabeth Marder; David M Balshaw; Yuxia Cui; Michael A Lynes; Gary J Patti; Stephen M Rappaport; Daniel T Shaughnessy; Martine Vrijheid; Dana Boyd Barr
Journal:  Environ Health Perspect       Date:  2016-07-06       Impact factor: 9.031

View more
  9 in total

1.  Integrated analysis of genomics, longitudinal metabolomics, and Alzheimer's risk factors among 1,111 cohort participants.

Authors:  Burcu F Darst; Qiongshi Lu; Sterling C Johnson; Corinne D Engelman
Journal:  Genet Epidemiol       Date:  2019-05-18       Impact factor: 2.135

2.  Big Data for Sound Policies: Toward Evidence-Informed Hearing Health Policies.

Authors:  Johanna Gutenberg; Panagiotis Katrakazas; Lyubov Trenkova; Louisa Murdin; Dario Brdaric; Nina Koloutsou; Katherine Ploumidou; Niels Henrik Pontoppidan; Ariane Laplante-Lévesque
Journal:  Am J Audiol       Date:  2018-11-19       Impact factor: 1.493

3.  Interdisciplinary data science to advance environmental health research and improve birth outcomes.

Authors:  Jeanette A Stingone; Sofia Triantafillou; Alexandra Larsen; Jay P Kitt; Gary M Shaw; Judit Marsillach
Journal:  Environ Res       Date:  2021-03-15       Impact factor: 8.431

4.  Environmental exposures and fetal growth: the Haifa pregnancy cohort study.

Authors:  Rachel Golan; Itai Kloog; Ronit Almog; Anat Gesser-Edelsburg; Maya Negev; Maya Jolles; Varda Shalev; Vered H Eisenberg; Gideon Koren; Wiessam Abu Ahmad; Hagai Levine
Journal:  BMC Public Health       Date:  2018-01-12       Impact factor: 3.295

5.  Age, sex, and the changing disability burden of compensated work-related musculoskeletal disorders in Canada and Australia.

Authors:  Robert A Macpherson; Tyler J Lane; Alex Collie; Christopher B McLeod
Journal:  BMC Public Health       Date:  2018-06-19       Impact factor: 3.295

6.  Big Data in occupational medicine: the convergence of -omics sciences, participatory research and e-health.

Authors:  Guglielmo Dini; Nicola Luigi Bragazzi; Alfredo Montecucco; Alessandra Toletone; Nicoletta Debarbieri; Paolo Durando
Journal:  Med Lav       Date:  2019-04-19       Impact factor: 1.275

7.  Cardiovascular Health Research in the Workplace: A Workshop Report.

Authors:  Chris Calitz; Charlotte Pratt; Nicolaas P Pronk; Janet E Fulton; Kimberly Jinnett; Anne N Thorndike; Ebyan Addou; Ross Arena; Alison G M Brown; Chia-Chia Chang; Lisa Latts; Debra Lerner; Michiel Majors; Michelle Mancuso; Drew Mills; Eduardo Sanchez; David Goff
Journal:  J Am Heart Assoc       Date:  2021-08-28       Impact factor: 5.501

8.  Big Data Reality Check (BDRC) for public health: to what extent the environmental health and health services research did meet the 'V' criteria for big data? A study protocol.

Authors:  Pui Pui Tang; I Lam Tam; Yongliang Jia; Siu-Wai Leung
Journal:  BMJ Open       Date:  2022-03-22       Impact factor: 2.692

9.  MOLGENIS research: advanced bioinformatics data software for non-bioinformaticians.

Authors:  K Joeri van der Velde; Floris Imhann; Bart Charbon; Chao Pang; David van Enckevort; Mariska Slofstra; Ruggero Barbieri; Rudi Alberts; Dennis Hendriksen; Fleur Kelpin; Mark de Haan; Tommy de Boer; Sido Haakma; Connor Stroomberg; Salome Scholtens; Gert-Jan van de Geijn; Eleonora A M Festen; Rinse K Weersma; Morris A Swertz
Journal:  Bioinformatics       Date:  2019-03-15       Impact factor: 6.937

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.