Literature DB >> 25043853

Big data and large sample size: a cautionary note on the potential for bias.

Robert M Kaplan1, David A Chambers, Russell E Glasgow.   

Abstract

A number of commentaries have suggested that large studies are more reliable than smaller studies and there is a growing interest in the analysis of "big data" that integrates information from many thousands of persons and/or different data sources. We consider a variety of biases that are likely in the era of big data, including sampling error, measurement error, multiple comparisons errors, aggregation error, and errors associated with the systematic exclusion of information. Using examples from epidemiology, health services research, studies on determinants of health, and clinical trials, we conclude that it is necessary to exercise greater caution to be sure that big sample size does not lead to big inferential errors. Despite the advantages of big studies, large sample size can magnify the bias associated with error resulting from sampling or study design. Published 2014. This article is a U.S. Government work and is in the public domain in the USA.

Entities:  

Keywords:  bias; big data; research methods; sampling

Mesh:

Year:  2014        PMID: 25043853      PMCID: PMC5439816          DOI: 10.1111/cts.12178

Source DB:  PubMed          Journal:  Clin Transl Sci        ISSN: 1752-8054            Impact factor:   4.689


  22 in total

1.  Drug development: Raise standards for preclinical cancer research.

Authors:  C Glenn Begley; Lee M Ellis
Journal:  Nature       Date:  2012-03-28       Impact factor: 49.962

Review 2.  Almost all articles on cancer prognostic markers report statistically significant results.

Authors:  Panayiotis A Kyzas; Despina Denaxa-Kyza; John P A Ioannidis
Journal:  Eur J Cancer       Date:  2007-11-05       Impact factor: 9.162

3.  Shattuck Lecture. We can do better--improving the health of the American people.

Authors:  Steven A Schroeder
Journal:  N Engl J Med       Date:  2007-09-20       Impact factor: 91.245

4.  Effect sizes and p values: what should be reported and what should be replicated?

Authors:  A G Greenwald; R Gonzalez; R J Harris; D Guthrie
Journal:  Psychophysiology       Date:  1996-03       Impact factor: 4.016

5.  Sensible use of observational clinical data.

Authors:  J Marc Overhage; Lauren M Overhage
Journal:  Stat Methods Med Res       Date:  2011-08-09       Impact factor: 3.021

6.  Time for a creative transformation of epidemiology in the United States.

Authors:  Michael S Lauer
Journal:  JAMA       Date:  2012-11-07       Impact factor: 56.272

7.  From hot hands to declining effects: the risks of small numbers.

Authors:  Michael S Lauer
Journal:  J Am Coll Cardiol       Date:  2012-07-03       Impact factor: 24.094

8.  Combined postmenopausal hormone therapy and cardiovascular disease: toward resolving the discrepancy between observational studies and the Women's Health Initiative clinical trial.

Authors:  Ross L Prentice; Robert Langer; Marcia L Stefanick; Barbara V Howard; Mary Pettinger; Garnet Anderson; David Barad; J David Curb; Jane Kotchen; Lewis Kuller; Marian Limacher; Jean Wactawski-Wende
Journal:  Am J Epidemiol       Date:  2005-07-20       Impact factor: 4.897

9.  Caveats for the use of operational electronic health record data in comparative effectiveness research.

Authors:  William R Hersh; Mark G Weiner; Peter J Embi; Judith R Logan; Philip R O Payne; Elmer V Bernstam; Harold P Lehmann; George Hripcsak; Timothy H Hartzog; James J Cimino; Joel H Saltz
Journal:  Med Care       Date:  2013-08       Impact factor: 2.983

10.  Postmenopausal estrogen therapy and cardiovascular disease. Ten-year follow-up from the nurses' health study.

Authors:  M J Stampfer; G A Colditz; W C Willett; J E Manson; B Rosner; F E Speizer; C H Hennekens
Journal:  N Engl J Med       Date:  1991-09-12       Impact factor: 91.245

View more
  67 in total

1.  Retraction note to: KDM3A confers metastasis and chemoresistance in epithelial ovarian cancer.

Authors: 
Journal:  J Mol Histol       Date:  2015-12       Impact factor: 2.611

2.  Hidradenitis suppurativa and diabetes: big data bias masks a true association.

Authors:  J W Frew
Journal:  Clin Exp Dermatol       Date:  2019-03-22       Impact factor: 3.470

3.  Racial and ethnic disparities in use of a personal health record by veterans living with HIV.

Authors:  Sarah J Javier; Lara K Troszak; Stephanie L Shimada; D Keith McInnes; Michael E Ohl; Tigran Avoundjian; Taryn A Erhardt; Amanda M Midboe
Journal:  J Am Med Inform Assoc       Date:  2019-08-01       Impact factor: 4.497

4.  Indicators of retention in remote digital health studies: a cross-study evaluation of 100,000 participants.

Authors:  Abhishek Pratap; Elias Chaibub Neto; Phil Snyder; Carl Stepnowsky; Noémie Elhadad; Daniel Grant; Matthew H Mohebbi; Sean Mooney; Christine Suver; John Wilbanks; Lara Mangravite; Patrick J Heagerty; Pat Areán; Larsson Omberg
Journal:  NPJ Digit Med       Date:  2020-02-17

5.  Data, data, data….

Authors:  Julie Wegner
Journal:  J Extra Corpor Technol       Date:  2018-12

6.  Developing a framework for digital objects in the Big Data to Knowledge (BD2K) commons: Report from the Commons Framework Pilots workshop.

Authors:  Kathleen M Jagodnik; Simon Koplev; Sherry L Jenkins; Lucila Ohno-Machado; Benedict Paten; Stephan C Schurer; Michel Dumontier; Ruben Verborgh; Alex Bui; Peipei Ping; Neil J McKenna; Ravi Madduri; Ajay Pillai; Avi Ma'ayan
Journal:  J Biomed Inform       Date:  2017-05-10       Impact factor: 6.317

7.  Emergency Department Testing Patterns for Sexually Transmitted Diseases in North Texas.

Authors:  Arti Barnes; Katelyn K Jetelina; Andrea C Betts; Theresa Mendoza; Pranavi Sreeramoju; Jasmin A Tiro
Journal:  Sex Transm Dis       Date:  2019-07       Impact factor: 2.830

8.  PIK3CD promoted proliferation in diffuse large B cell lymphoma through upregulation of c-myc.

Authors:  Wenli Cui; Shutao Zheng; Xinxia Li; Yuqing Ma; Wei Sang; Ming Liu; Wei Zhang; Xiaoyan Zhou
Journal:  Tumour Biol       Date:  2016-07-22

Review 9.  Promoting equity at the population level: Putting the foundational principles into practice through disability advocacy.

Authors:  Jagriti 'Jackie' Bhattarai; Jacob Bentley; Whitney Morean; Stephen T Wegener; Keshia M Pollack Porter
Journal:  Rehabil Psychol       Date:  2020-04-16

10.  The utility of web mining for epidemiological research: studying the association between parity and cancer risk.

Authors:  Georgia Tourassi; Hong-Jun Yoon; Songhua Xu; Xuesong Han
Journal:  J Am Med Inform Assoc       Date:  2015-11-27       Impact factor: 4.497

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.