| Literature DB >> 22729362 |
Catherine M Sanders1, Sidney L Saltzstein, Matthew M Schultzel, Duy H Nguyen, Helen Shi Stafford, Georgia Robins Sadler.
Abstract
Many health professionals use large datasets to answer behavioral, translational, or clinical questions. Understanding the impact of missing data in large databases, such as disease registries, can avoid erroneous interpretations of these data. Using the California Cancer Registry, the authors selected seven common cancers, seven sociodemographic and clinical variables, and the top three reporting sources, as examples of the type of data that would be deemed critical to most studies. The gender variable had no missing data, followed by age (<0.1 % missing), ethnicity (1.7 %), stage (9.8 %), differentiation (39.1 %), and birthplace (41.1 %). Reports from hospitals and clinics had the lowest percentages of missing data. Users of large datasets should anticipate the limitations of missing data to prevent methodological flaws and misinterpretations of research findings. Knowledge of what and how much data may be missing in large datasets can help prevent errors in research conclusions, while better guiding treatment modalities and public health policies and programs.Entities:
Mesh:
Year: 2012 PMID: 22729362 PMCID: PMC4153382 DOI: 10.1007/s13187-012-0383-7
Source DB: PubMed Journal: J Cancer Educ ISSN: 0885-8195 Impact factor: 2.037