| Literature DB >> 30655580 |
Christina Vogel1, Stephen Zwolinsky2, Claire Griffiths2, Matthew Hobbs3, Emily Henderson4, Emma Wilkins2.
Abstract
BACKGROUND: 'Big data' has great potential to help address the global health challenge of obesity. However, lack of clarity with regard to the definition of big data and frameworks for effectively using big data in the context of obesity research may be hindering progress. The aim of this study was to establish agreed approaches for the use of big data in obesity-related research.Entities:
Mesh:
Year: 2019 PMID: 30655580 PMCID: PMC6892733 DOI: 10.1038/s41366-018-0313-9
Source DB: PubMed Journal: Int J Obes (Lond) ISSN: 0307-0565 Impact factor: 5.095
Fig. 1Flow diagram illustrating the three survey rounds of the Delphi study. *One statement that appeared in Round 1 was removed, and a new clarified version was added in Round 2
Demographic characteristics of Delphi participants
| Round 1 ( | Round 2 ( | Round 3 ( | |
|---|---|---|---|
| Gender | |||
| Male | 52.8% | 55.2% | 53.8% |
| Female | 47.2% | 44.8% | 46.2% |
| Mean age in years (SD) | 43.9 (10.9) | 42.3 (9.8) | 41.5 (9.7) |
| Country of residence | |||
| UK | 72.2% | 75.9% | 80.8% |
| USA | 11.1% | 6.9% | 7.7% |
| Netherlands | 8.3% | 10.3% | 7.7% |
| New Zealand | 2.8% | 3.4% | 3.8% |
| Australia | 2.8% | 3.4% | 0.0% |
| Ireland | 2.8% | 0.0% | 0.0% |
| Current role | |||
| Professor | 30.6% | 27.6% | 26.9% |
| Associate Professor | 13.9% | 13.8% | 15.4% |
| Lecturer | 16.7% | 17.2% | 15.4% |
| Research Fellow | 27.8% | 31.0% | 30.8% |
| PhD student | 5.6% | 6.9% | 7.7% |
| Other | 5.6% | 3.4% | 3.8% |
| Education | |||
| Doctoral Degree | 91.7% | 93.1% | 92.3% |
| Professional Fellow | 2.8% | 0.0% | 0.0% |
| Master’s Degree | 5.6% | 6.9% | 7.7% |
| Years working in the field | |||
| 10+ years | 52.8% | 44.8% | 46.2% |
| 6–9 years | 14.0% | 17.2% | 15.3% |
| 4–5 years | 19.4% | 20.7% | 19.2% |
| 1–3 years | 13.9% | 17.2% | 19.2% |
Summary of grouped statements by domain
| Statement domains | Number of statements in each domain | Proportion of statements where consensus was achieved ( | ||||
|---|---|---|---|---|---|---|
| Round 1 | Round 2 | Round 3 | Round 1 | Round 2 | Round 3 | |
| Definition of Big Data | 14 | 15 | 15 | 64.3% (9) | 80.0% (12) | 100.0% (15) |
| Data Acquisition | 13a | 16 | 16 | 38.5% (5) | 68.8% (11) | 81.3% (13) |
| Ethics | 13 | 15 | 15 | 61.5% (8) | 80.0% (12) | 93.3% (14) |
| Data Governanceb | 5a | 5 | 5 | 80.0% (4) | 100.0% (5) | 100.0% (5) |
| Training and Infrastructureb | 11 | 12 | 12 | 63.6% (7) | 75.0% (9) | 75.0% (9) |
| Reporting and Transparencyb | 9 | 11 | 11 | 77.8% (7) | 90.9% (10) | 90.9% (10) |
| Quality and Inferenceb | 11 | 11 | 11 | 81.8% (9) | 90.9% (10) | 100.0% (11) |
| Totals | 76 | 85 | 85 | 64.5% (49) | 81.2% (69) | 90.6% (77) |
Note: Consensus was achieved when 70% of participants strongly agreed/agreed or strongly disagreed/disagreed with a statement
aStatements in this round of this domain include responses where ‘don’t know’ exceeded 30% of total responses
bStability of consensus (<10% variation) was achieved between Round 2 and Round 3
Responses to statements included in the Definition of Big Data domain
| Big Data…. | Round 1 ( | Round 2 ( | Round 3 ( | |||
|---|---|---|---|---|---|---|
| Agree % | Disagree % | Agree % | Disagree % | Agree % | Disagree % | |
| 1. Always has a large sample size | 22.2% | 24.1% | 11.5% | |||
| 2. Always requires additional computing power | 68.6% | 31.4% | 69.0% | 31.0% | 19.2% | |
| 3. Is never collected for research purposes (i.e. there is no a priori research question) | 25.7% | 21.4% | 15.4% | |||
| 4. Is always observational | 40.0% | 60.0% | 22.2% | 23.1% | ||
| 5. Does not require specialist mathematical or data science analytical skills | 12.1% | 7.1% | 8.0% | |||
| 6. Does not require specialist knowledge of database management | 15.6% | 14.8% | 12.5% | |||
| 7. Does not require knowledge of computer programming | 42.4% | 57.6% | 37.0% | 63.0% | 25.0% | |
| 8. Is always digital | 61.8% | 38.2% | 66.7% | 33.3% | 28.0% | |
| 9. Does not include qualitative data | 35.3% | 64.7% | 17.9% | 23.1% | ||
| 10. Includes government data sets | 5.7% | 3.4% | 3.8% | |||
| 11. Includes cohort data sets | 13.9% | 7.1% | 3.8% | |||
| 12. Includes commercial data sets | 2.8% | 3.4% | 3.8% | |||
| 13. Includes routine data sets | 5.6% | 3.4% | 0.0% | |||
| 14. Always includes more than one data set | 16.7% | 17.2% | 15.4% | |||
15. Big data always has at least one of: • large volume (e.g. in terms of sample size, number of variables or measurement occasions), • variety (e.g. in terms of the types of variable), or • velocity (e.g. is generated at speed) | – | – | 6.9% | 7.7% | ||
Note: Bold % denotes that 70% consensus was achieved
Responses to statements included in the six domains which sought agreed approaches to using big data in obesity research
| Round 1 ( | Round 2 ( | Round 3 ( | ||||
|---|---|---|---|---|---|---|
| Agree % | Disagree % | Agree % | Disagree % | Agree % | Disagree % | |
| 1. There is not equal access to big datasets for all academic researchers | 2.9% | 3.4% | 0.0% | |||
| 2. There is not equal access to big datasets across academic institutions or non-academic researchers | 3.0% | 3.4% | 0.0% | |||
| 3. I don’t know what big data are available to use for research purposes | 58.3% | 41.7% | 24.1% | 23.1% | ||
| 4. I don’t know how to access big data for research purposes | 47.2% | 52.8% | 48.3% | 51.7% | 57.7% | 42.3% |
| 5. Accessing big data for research purposes takes too long | 25.0% | 4.5% | 4.8% | |||
| 6. Timescales for access to big data limit their utility for obesity research | 55.2% | 44.8% | 28.0% | 26.1% | ||
| 7. Negotiating access to big data for obesity research is a challenge | 5.9% | 3.4% | 3.8% | |||
| 8. Access to big data should be provided via a third party centre/organisation that is independent both from the data owner and the researcher | 24.0%a | 16.7% | 17.4% | |||
| 9. Third party organisations (i.e. those outside of a university) should be responsible for promoting the awareness of big data for use in obesity research | 46.2% | 53.8% | 20.8% | 25.0% | ||
| 10. It is the responsibility of data owners to make their data available | 65.7% | 34.3% | 69.0% | 31.0% | 26.9% | |
| 11. Data owners are responsible for making others aware of the availability of their data | 48.5% | 51.5% | 35.7% | 64.3% | 36.0% | 64.0% |
| 12. It is the responsibility of individual research institutions to identify and negotiate access to big data sources | 56.7% | 43.3% | 63.0% | 37.0% | 25.0% | |
| 13. The cost attached to the use of big data is a major barrier to its use | 62.1% | 37.9% | 20.8% | 19.0% | ||
| 14. Data protection regulations unduly restrict the use of big data in obesity research | – | 50.0% | 50.0% | 42.1% | 57.9% | |
| 15. Government legislation is needed to encourage commercial organisations to share their data for obesity research | – | – | 19.2% | 16.0% | ||
| 16. Big data should be made available via third party organisations who should be responsible for protecting both commercially sensitive and individually sensitive data | – | – | 16.7% | 13.0% | ||
| 1. It is unethical to use big data in obesity research when consent has not been obtained for this purpose | 12.9% | 11.1% | 7.7% | |||
| 2. Consent is a major ethical challenge for big data in obesity research | 22.6% | 14.8% | 16.0% | |||
| 3. Big data from commercial sources is a potential conflict of interest | 64.7% | 35.3% | 21.4% | 19.2% | ||
| 4. Ethical processes need reviewing in light of using big data in obesity research | 5.7% | 3.4% | 3.8% | |||
| 5. Ethical processes unduly restrict the use of big data for obesity research | 46.4% | 53.6% | 36.4% | 63.6% | 30.0% | |
| 6. There are high confidentially risks when using big data for obesity research | 38.2% | 61.8% | 26.9% | 20.8% | ||
| 7. It is the responsibility of individual research institutions to ensure that big data is used ethically | 5.6% | 0.0% | 0.0% | |||
| 8. It is the responsibility of individual researchers to ensure that big data is used ethically | 2.8% | 0.0% | 0.0% | |||
| 9. It is the responsibility of data owners to ensure that big data is used ethically | 5.6% | 6.9% | 7.7% | |||
| 10. It is unethical of commercial companies to withhold big data sets that could be used to identify determinants of obesity and opportunities for intervention | 48.5% | 51.5% | 39.9% | 60.7% | 38.5% | 61.5% |
| 11. Using big data for obesity research doesn’t cause harm because no further contact with individuals or communities is made | 58.6% | 41.4% | 26.1% | 23.8% | ||
| 12. An ethical framework is required to review big data research proposals through formal research processes | 6.1% | 6.9% | 3.8% | |||
| 13. An ethical framework should be developed by independent bodies with no conflicts of interest | 20.6% | 13.8% | 7.7% | |||
| 14. Ethical processes should distinguish between open data already in the public domain and secondary data not already in the public domain, which may contain both commercially and individually sensitive data | – | – | 7.1% | 4.0% | ||
| 15. It is unethical NOT to use big data where it is available, even when informed consent has not been provided, if it will help address obesity | – | – | 30.4% | 69.6% | 14.3% | |
| 1. The data governance requirements associated with using big data in obesity research are clear | 17.2% | 16.0% | 16.7% | |||
| 2. Data governance processes are clear for data controllers | 34.8%a | 65.2%a | 13.6% | 15.0% | ||
| 3. Data governance processes are clear for researchers | 25.8% | 12.0% | 12.0% | |||
| 4. Data governance processes are clear for data owners | 20.8%a | 13.6% | 15.8% | |||
| 5. Ownership of big data can be ambiguous (e.g. for wearables/activity tracking technology the owner could be taken to be the organisation who collates/manages the data, or the individual people the data relates to) | 5.7% | 3.4% | 3.8% | |||
| 1. Big data requires novel/non-traditional analysis techniques | 20.0% | 7.1% | 4.0% | |||
| 2. Researchers need specialist training to link big data | 14.7% | 7.1% | 8.0% | |||
| 3. Researchers need specialist training to manage big data | 11.4% | 10.7% | 8.0% | |||
| 4. Researchers need specialist training to analyse big data | 16.7% | 10.3% | 11.5% | |||
| 5. There is insufficient training available to me, regarding the handling of big data and analysis | 59.4% | 40.6% | 61.5% | 38.5% | 59.1% | 40.9% |
| 6. The cost of training courses in big data analysis techniques prevents me from using these datasets | 23.3% | 19.2% | 17.4% | |||
| 7. My institution has limited equipment/systems necessary for handling big data (i.e. computer memory, secure networked systems etc.) | 41.9% | 58.1% | 37.0% | 63.0% | 37.5% | 62.5% |
| 8. It is the responsibility of individual universities to improve their training and infrastructure to use big data in obesity research | 19.4% | 6.9% | 11.5% | |||
| 9. It is the responsibility of professional organisations, including funding organisations, to provide more training around big data | 17.1% | 13.8% | 11.5% | |||
| 10. The time involved in preparing big datasets for analysis prevents me from using these datasets | 40.0% | 60.0% | 48.3% | 51.7% | 48.0% | 52.0% |
| 11. There are no training or infrastructure issues that prevent me from using big data for obesity research | 41.2% | 58.8% | 25.9% | 20.8% | ||
| 12. Collaboration that draws on varied skill sets is needed to appropriately handle big data in obesity research | – | – | 6.9% | 7.7% | ||
| 1. The provenance (source and date of collection) of big data is adequately reported in peer-reviewed literature | 25.0% | 12.5% | 4.2% | |||
| 2. The methods originally used to collect big data are adequately reported in peer-reviewed literature | 29.4% | 7.1% | 7.7% | |||
| 3. Procedures used to clean and process (e.g. re-code) big data are adequately reported in peer-reviewed literature | 8.6% | 7.1% | 8.0% | |||
| 4. The content of big data sources are adequately reported in peer-reviewed literature | 20.6% | 7.4% | 12.0% | |||
| 5. The processes used to link big data sources (e.g. geocoding techniques) are adequately reported in peer-reviewed literature | 19.4% | 11.1% | 8.3% | |||
| 6. Inadequate reporting of big data and associated methods in peer-reviewed literature means study findings cannot be usefully interpreted | 65.7% | 34.3% | 21.4% | 15.4% | ||
| 7. The costs associated with obtaining big data should be reported in peer-reviewed literature | 51.6% | 48.4% | 51.9% | 48.1% | 62.5% | 37.5% |
| 8. To improve big data related obesity research, standardised reporting frameworks are required | 15.2% | 10.7% | 7.7% | |||
| 9. Academic journals have a responsibility to enforce the use of reporting frameworks for big data | 17.1% | 13.8% | 7.7% | |||
| 10. Where contractual restrictions exist around the reporting of data, these should be noted when disseminating research findings | – | – | 0.0% | 0.0% | ||
| 11. Reporting needs to be independent of the data owner to reduce potential conflicts of interest | – | – | 28.0% | 20.8% | ||
| 1. Big data from commercial organisations results in an increased risk of bias | 58.8% | 41.2% | 26.9% | 20.0% | ||
| 2. Standardised quality checks of the data [i.e. how data was collected, missing data] are required from the data provider | 8.6% | 10.7% | 3.8% | |||
| 3. Big data should be used irrespective of quality in obesity research | 19.4% | 13.8% | 11.5% | |||
| 4. It is important to acknowledge methodological limitations of big data used in obesity research | 0.0% | 6.9% | 0.0% | |||
| 5. Statistically significant results need to be interpreted with caution when using big datasets in obesity research | 8.8% | 3.6% | 4.0% | |||
| 6. Outputs from research using big data are rarely misinterpreted | 11.1% | 8.3% | 9.1% | |||
| 7. There is an over reliance on big data in obesity research despite its potential bias | 17.2% | 12.0% | 16.7% | |||
| 8. The emergence of big data has negatively impacted the use of traditional data sources | 20.0% | 14.3% | 16.7% | |||
| 9. Big data is having an unhealthy steer on the obesity-related research agenda | 13.8% | 14.3% | 15.4% | |||
| 10. Researchers have a responsibility to ensure that their results are correctly interpreted in view of any limitations | 0.0% | 0.0% | 0.0% | |||
| 11. Big data obesity research should always consider inequalities in health or health behaviours as a measure of quality | 57.6% | 42.4% | 69.2% | 30.8% | 26.1% | |
Note: Bold % denotes that 70% consensus was achieved
aProportion of ‘don’t know’ responses to this statement exceeded 30%
Fig. 2Challenges, solutions and agents of change for effective use of big data in obesity research