Literature DB >> 22729362

Understanding the limits of large datasets.

Catherine M Sanders1, Sidney L Saltzstein, Matthew M Schultzel, Duy H Nguyen, Helen Shi Stafford, Georgia Robins Sadler.   

Abstract

Many health professionals use large datasets to answer behavioral, translational, or clinical questions. Understanding the impact of missing data in large databases, such as disease registries, can avoid erroneous interpretations of these data. Using the California Cancer Registry, the authors selected seven common cancers, seven sociodemographic and clinical variables, and the top three reporting sources, as examples of the type of data that would be deemed critical to most studies. The gender variable had no missing data, followed by age (<0.1 % missing), ethnicity (1.7 %), stage (9.8 %), differentiation (39.1 %), and birthplace (41.1 %). Reports from hospitals and clinics had the lowest percentages of missing data. Users of large datasets should anticipate the limitations of missing data to prevent methodological flaws and misinterpretations of research findings. Knowledge of what and how much data may be missing in large datasets can help prevent errors in research conclusions, while better guiding treatment modalities and public health policies and programs.

Entities:  

Mesh:

Year:  2012        PMID: 22729362      PMCID: PMC4153382          DOI: 10.1007/s13187-012-0383-7

Source DB:  PubMed          Journal:  J Cancer Educ        ISSN: 0885-8195            Impact factor:   2.037


  14 in total

1.  Voluntary reporting system for occupational disease: pilot project, evaluation.

Authors:  N S Seixas; K D Rosenman
Journal:  Public Health Rep       Date:  1986 May-Jun       Impact factor: 2.792

2.  Quality of cancer registry birthplace data for Hispanics living in the United States.

Authors:  Scarlett L Gomez; Sally L Glaser
Journal:  Cancer Causes Control       Date:  2005-08       Impact factor: 2.506

3.  The underreporting of disease and physicians' knowledge of reporting requirements.

Authors:  P M Konowitz; G A Petrossian; D N Rose
Journal:  Public Health Rep       Date:  1984 Jan-Feb       Impact factor: 2.792

4.  Factors associated with missing birthplace information in a population-based cancer registry.

Authors:  S S Lin; C D O'Malley; S W Lui
Journal:  Ethn Dis       Date:  2001       Impact factor: 1.847

5.  Intracystic papillary carcinoma: a review of 917 cases.

Authors:  Julia Grabowski; Sidney L Salzstein; Georgia Robins Sadler; Sarah Blair
Journal:  Cancer       Date:  2008-09-01       Impact factor: 6.860

6.  Late age (85 years or older) peak incidence of bladder cancer.

Authors:  Matthew Schultzel; Sidney L Saltzstein; Tracy M Downs; Suzuho Shimasaki; Catherine Sanders; Georgia Robins Sadler
Journal:  J Urol       Date:  2008-03-04       Impact factor: 7.450

7.  Bias in completeness of birthplace data for Asian groups in a population-based cancer registry (United States).

Authors:  Scarlett L Gomez; Sally L Glaser; Jennifer L Kelsey; Marion M Lee
Journal:  Cancer Causes Control       Date:  2004-04       Impact factor: 2.506

8.  Racial/ethnic differences in early detection of breast cancer: a study of 250,985 cases from the California Cancer Registry.

Authors:  Courtney Summers; Sidney L Saltzstein; Sarah Lynn Blair; Tara Tomiko Tsukamoto; Georgia Robins Sadler
Journal:  J Womens Health (Larchmt)       Date:  2010-02       Impact factor: 2.681

9.  A comparison of merkel cell carcinoma and melanoma: results from the california cancer registry.

Authors:  Julia Grabowski; Sidney L Saltzstein; Georgia Robins Sadler; Zunera Tahir; Sarah Blair
Journal:  Clin Med Oncol       Date:  2008-04-01

10.  Early cancer detection among rural and urban Californians.

Authors:  Sarah L Blair; Georgia R Sadler; Rebecca Bristol; Courtney Summers; Zanera Tahar; Sidney L Saltzstein
Journal:  BMC Public Health       Date:  2006-07-26       Impact factor: 3.295

View more
  12 in total

1.  Language affects length of stay in emergency departments in Queensland public hospitals.

Authors:  Ibrahim Mahmoud; Xiang-Yu Hou; Kevin Chu; Michele Clark
Journal:  World J Emerg Med       Date:  2013

2.  Artificial Intelligence in Adult Spinal Deformity.

Authors:  Pramod N Kamalapathy; Aditya V Karhade; Daniel Tobert; Joseph H Schwab
Journal:  Acta Neurochir Suppl       Date:  2022

3.  Risk factors and communities disproportionately affected by cervical cancer in the Russian Federation: A national population-based study.

Authors:  Anastasiya Muntyanu; Vladimir Nechaev; Elena Pastukhova; James Logan; Elham Rahme; Elena Netchiporouk; Andrei Zubarev; Ivan V Litvinov
Journal:  Lancet Reg Health Eur       Date:  2022-06-30

Review 4.  Big data and clinicians: a review on the state of the science.

Authors:  Weiqi Wang; Eswar Krishnan
Journal:  JMIR Med Inform       Date:  2014-01-17

5.  Identification of patients with congenital hemophilia in a large electronic health record database.

Authors:  Michael Wang; Anissa Cyhaniuk; David L Cooper; Neeraj N Iyer
Journal:  J Blood Med       Date:  2017-08-30

6.  Incidence trends and survival outcomes of penile squamous cell carcinoma: evidence from the Surveillance, Epidemiology and End Results population-based data.

Authors:  Feng Qi; Xiyi Wei; Yuxiao Zheng; Xiaohan Ren; Xiao Li; Erkang Zhao
Journal:  Ann Transl Med       Date:  2020-11

7.  Population-Based Study Detailing Cutaneous Melanoma Incidence and Mortality Trends in Canada.

Authors:  Santina Conte; Feras M Ghazawi; Michelle Le; Hacene Nedjar; Akram Alakel; François Lagacé; Ilya M Mukovozov; Janelle Cyr; Ahmed Mourad; Wilson H Miller; Joël Claveau; Thomas G Salopek; Elena Netchiporouk; Robert Gniadecki; Denis Sasseville; Elham Rahme; Ivan V Litvinov
Journal:  Front Med (Lausanne)       Date:  2022-03-03

Review 8.  The utility of medico-legal databases for public health research: a systematic review of peer-reviewed publications using the National Coronial Information System.

Authors:  Lyndal Bugeja; Joseph E Ibrahim; Noha Ferrah; Briony Murphy; Melissa Willoughby; David Ranson
Journal:  Health Res Policy Syst       Date:  2016-04-12

9.  Identification of people with acquired hemophilia in a large electronic health record database.

Authors:  Michael Wang; Anissa Cyhaniuk; David L Cooper; Neeraj N Iyer
Journal:  J Blood Med       Date:  2017-07-19

10.  Incidence and Mortality of Prostate Cancer in Canada during 1992-2010.

Authors:  François Lagacé; Feras M Ghazawi; Michelle Le; Evgeny Savin; Andrei Zubarev; Mathieu Powell; Linda Moreau; Denis Sasseville; Ioana Popa; Ivan V Litvinov
Journal:  Curr Oncol       Date:  2021-02-21       Impact factor: 3.677

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.