Literature DB >> 24511148

Data Acquisition and Preprocessing in Studies on Humans: What Is Not Taught in Statistics Classes?

Yeyi Zhu1, Ladia M Hernandez1, Peter Mueller2, Yongquan Dong1, Michele R Forman1.   

Abstract

The aim of this paper is to address issues in research that may be missing from statistics classes and important for (bio-)statistics students. In the context of a case study, we discuss data acquisition and preprocessing steps that fill the gap between research questions posed by subject matter scientists and statistical methodology for formal inference. Issues include participant recruitment, data collection training and standardization, variable coding, data review and verification, data cleaning and editing, and documentation. Despite the critical importance of these details in research, most of these issues are rarely discussed in an applied statistics program. One reason for the lack of more formal training is the difficulty in addressing the many challenges that can possibly arise in the course of a study in a systematic way. This article can help to bridge this gap between research questions and formal statistical inference by using an illustrative case study for a discussion. We hope that reading and discussing this paper and practicing data preprocessing exercises will sensitize statistics students to these important issues and achieve optimal conduct, quality control, analysis, and interpretation of a study.

Entities:  

Keywords:  Applied statistics courses; Data cleaning; Data code book; Data collection; Data dictionary; Quality control; Statistical education

Year:  2013        PMID: 24511148      PMCID: PMC3912269          DOI: 10.1080/00031305.2013.842498

Source DB:  PubMed          Journal:  Am Stat        ISSN: 0003-1305            Impact factor:   8.710


  10 in total

1.  Analysis of data errors in clinical research databases.

Authors:  Saveli I Goldberg; Andrzej Niemierko; Alexander Turchin
Journal:  AMIA Annu Symp Proc       Date:  2008-11-06

2.  Double data entry: what value, what price?

Authors:  S Day; P Fayers; D Harvey
Journal:  Control Clin Trials       Date:  1998-02

3.  Data management: cleaning and checking.

Authors:  B L Roberts; M K Anthony; E A Madigan; Y Chen
Journal:  Nurs Res       Date:  1997 Nov-Dec       Impact factor: 2.381

4.  Influences on inferences. Effect of errors in data on statistical evaluation.

Authors:  S H Levitt; D M Aeppli; R A Potish; C K Lee; M E Nierengarten
Journal:  Cancer       Date:  1993-10-01       Impact factor: 6.860

5.  Verifying keyed medical research data.

Authors:  B A Blumenstein
Journal:  Stat Med       Date:  1993-09-15       Impact factor: 2.373

6.  CDC growth charts: United States.

Authors:  R J Kuczmarski; C L Ogden; L M Grummer-Strawn; K M Flegal; S S Guo; R Wei; Z Mei; L R Curtin; A F Roche; C L Johnson
Journal:  Adv Data       Date:  2000-06-08

7.  Risk factors for small-for-gestational-age babies: The Auckland Birthweight Collaborative Study.

Authors:  J M Thompson; P M Clark; E Robinson; D M Becroft; N S Pattison; N Glavish; J E Pryor; C J Wild; K Rees; E A Mitchell
Journal:  J Paediatr Child Health       Date:  2001-08       Impact factor: 1.954

8.  A comparative risk assessment of burden of disease and injury attributable to 67 risk factors and risk factor clusters in 21 regions, 1990-2010: a systematic analysis for the Global Burden of Disease Study 2010.

Authors:  Stephen S Lim; Theo Vos; Abraham D Flaxman; Goodarz Danaei; Kenji Shibuya; Heather Adair-Rohani; Markus Amann; H Ross Anderson; Kathryn G Andrews; Martin Aryee; Charles Atkinson; Loraine J Bacchus; Adil N Bahalim; Kalpana Balakrishnan; John Balmes; Suzanne Barker-Collo; Amanda Baxter; Michelle L Bell; Jed D Blore; Fiona Blyth; Carissa Bonner; Guilherme Borges; Rupert Bourne; Michel Boussinesq; Michael Brauer; Peter Brooks; Nigel G Bruce; Bert Brunekreef; Claire Bryan-Hancock; Chiara Bucello; Rachelle Buchbinder; Fiona Bull; Richard T Burnett; Tim E Byers; Bianca Calabria; Jonathan Carapetis; Emily Carnahan; Zoe Chafe; Fiona Charlson; Honglei Chen; Jian Shen Chen; Andrew Tai-Ann Cheng; Jennifer Christine Child; Aaron Cohen; K Ellicott Colson; Benjamin C Cowie; Sarah Darby; Susan Darling; Adrian Davis; Louisa Degenhardt; Frank Dentener; Don C Des Jarlais; Karen Devries; Mukesh Dherani; Eric L Ding; E Ray Dorsey; Tim Driscoll; Karen Edmond; Suad Eltahir Ali; Rebecca E Engell; Patricia J Erwin; Saman Fahimi; Gail Falder; Farshad Farzadfar; Alize Ferrari; Mariel M Finucane; Seth Flaxman; Francis Gerry R Fowkes; Greg Freedman; Michael K Freeman; Emmanuela Gakidou; Santu Ghosh; Edward Giovannucci; Gerhard Gmel; Kathryn Graham; Rebecca Grainger; Bridget Grant; David Gunnell; Hialy R Gutierrez; Wayne Hall; Hans W Hoek; Anthony Hogan; H Dean Hosgood; Damian Hoy; Howard Hu; Bryan J Hubbell; Sally J Hutchings; Sydney E Ibeanusi; Gemma L Jacklyn; Rashmi Jasrasaria; Jost B Jonas; Haidong Kan; John A Kanis; Nicholas Kassebaum; Norito Kawakami; Young-Ho Khang; Shahab Khatibzadeh; Jon-Paul Khoo; Cindy Kok; Francine Laden; Ratilal Lalloo; Qing Lan; Tim Lathlean; Janet L Leasher; James Leigh; Yang Li; John Kent Lin; Steven E Lipshultz; Stephanie London; Rafael Lozano; Yuan Lu; Joelle Mak; Reza Malekzadeh; Leslie Mallinger; Wagner Marcenes; Lyn March; Robin Marks; Randall Martin; Paul McGale; John McGrath; Sumi Mehta; George A Mensah; Tony R Merriman; Renata Micha; Catherine Michaud; Vinod Mishra; Khayriyyah Mohd Hanafiah; Ali A Mokdad; Lidia Morawska; Dariush Mozaffarian; Tasha Murphy; Mohsen Naghavi; Bruce Neal; Paul K Nelson; Joan Miquel Nolla; Rosana Norman; Casey Olives; Saad B Omer; Jessica Orchard; Richard Osborne; Bart Ostro; Andrew Page; Kiran D Pandey; Charles D H Parry; Erin Passmore; Jayadeep Patra; Neil Pearce; Pamela M Pelizzari; Max Petzold; Michael R Phillips; Dan Pope; C Arden Pope; John Powles; Mayuree Rao; Homie Razavi; Eva A Rehfuess; Jürgen T Rehm; Beate Ritz; Frederick P Rivara; Thomas Roberts; Carolyn Robinson; Jose A Rodriguez-Portales; Isabelle Romieu; Robin Room; Lisa C Rosenfeld; Ananya Roy; Lesley Rushton; Joshua A Salomon; Uchechukwu Sampson; Lidia Sanchez-Riera; Ella Sanman; Amir Sapkota; Soraya Seedat; Peilin Shi; Kevin Shield; Rupak Shivakoti; Gitanjali M Singh; David A Sleet; Emma Smith; Kirk R Smith; Nicolas J C Stapelberg; Kyle Steenland; Heidi Stöckl; Lars Jacob Stovner; Kurt Straif; Lahn Straney; George D Thurston; Jimmy H Tran; Rita Van Dingenen; Aaron van Donkelaar; J Lennert Veerman; Lakshmi Vijayakumar; Robert Weintraub; Myrna M Weissman; Richard A White; Harvey Whiteford; Steven T Wiersma; James D Wilkinson; Hywel C Williams; Warwick Williams; Nicholas Wilson; Anthony D Woolf; Paul Yip; Jan M Zielinski; Alan D Lopez; Christopher J L Murray; Majid Ezzati; Mohammad A AlMazroa; Ziad A Memish
Journal:  Lancet       Date:  2012-12-15       Impact factor: 79.321

9.  A comparison of error detection rates between the reading aloud method and the double data entry method.

Authors:  Miyuki Kawado; Shiro Hinotsu; Yutaka Matsuyama; Takuhiro Yamaguchi; Shuji Hashimoto; Yasuo Ohashi
Journal:  Control Clin Trials       Date:  2003-10

10.  Trends in birth weight and gestational length among singleton term births in the United States: 1990-2005.

Authors:  Sara M A Donahue; Ken P Kleinman; Matthew W Gillman; Emily Oken
Journal:  Obstet Gynecol       Date:  2010-02       Impact factor: 7.661

  10 in total
  6 in total

1.  Predictive Models for Characterizing Disparities in Exclusive Breastfeeding Performance in a Multi-ethnic Population in the US.

Authors:  Yeyi Zhu; Ladia M Hernandez; Peter Mueller; Yongquan Dong; Steven Hirschfeld; Michele R Forman
Journal:  Matern Child Health J       Date:  2016-02

2.  Arm span and ulnar length are reliable and accurate estimates of recumbent length and height in a multiethnic population of infants and children under 6 years of age.

Authors:  Michele R Forman; Yeyi Zhu; Ladia M Hernandez; John H Himes; Yongquan Dong; Robert K Danish; Kyla E James; Laura E Caulfield; Jean M Kerver; Lenore Arab; Paula Voss; Daniel E Hale; Nadim Kanafani; Steven Hirschfeld
Journal:  J Nutr       Date:  2014-07-16       Impact factor: 4.798

3.  Weight estimation among multi-racial/ethnic infants and children aged 0-5·9 years in the USA: simple tools for a critical measure.

Authors:  Yeyi Zhu; Ladia M Hernandez; Yongquan Dong; John H Himes; Laura E Caulfield; Jean M Kerver; Lenore Arab; Paula Voss; Steven Hirschfeld; Michele R Forman
Journal:  Public Health Nutr       Date:  2018-10-18       Impact factor: 4.022

4.  CLARITE Facilitates the Quality Control and Analysis Process for EWAS of Metabolic-Related Traits.

Authors:  Anastasia M Lucas; Nicole E Palmiero; John McGuigan; Kristin Passero; Jiayan Zhou; Deven Orie; Marylyn D Ritchie; Molly A Hall
Journal:  Front Genet       Date:  2019-12-18       Impact factor: 4.599

5.  Optimizing data collection for public health decisions: a data mining approach.

Authors:  Susan N Partington; Vasil Papakroni; Tim Menzies
Journal:  BMC Public Health       Date:  2014-06-12       Impact factor: 3.295

6.  Data-analysis strategies for image-based cell profiling.

Authors:  Juan C Caicedo; Sam Cooper; Florian Heigwer; Scott Warchal; Peng Qiu; Csaba Molnar; Aliaksei S Vasilevich; Joseph D Barry; Harmanjit Singh Bansal; Oren Kraus; Mathias Wawer; Lassi Paavolainen; Markus D Herrmann; Mohammad Rohban; Jane Hung; Holger Hennig; John Concannon; Ian Smith; Paul A Clemons; Shantanu Singh; Paul Rees; Peter Horvath; Roger G Linington; Anne E Carpenter
Journal:  Nat Methods       Date:  2017-08-31       Impact factor: 28.547

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.