Shersten Killip1, Ziyad Mahfoud, Kevin Pearce. 1. Department of Family Practice and Community Medicine, University of Kentucky, Lexington, KY, USA. skill2@email.uky.edu
Abstract
BACKGROUND: Primary care research often involves clustered samples in which subjects are randomized at a group level but analyzed at an individual level. Analyses that do not take this clustering into account may report significance where none exists. This article explores the causes, consequences, and implications of cluster data. METHODS: Using a case study with accompanying equations, we show that clustered samples are not as statistically efficient as simple random samples. RESULTS: Similarity among subjects within preexisting groups or clusters reduces the variability of responses in a clustered sample, which erodes the power to detect true differences between study arms. This similarity is expressed by the intracluster correlation coefficient, or p (rho), which compares the within-group variance with the between-group variance. Rho is used in equations along with the cluster size and the number of clusters to calculate the effective sample size (ESS) in a clustered design. The ESS should be used to calculate power in the design phase of a clustered study. Appropriate accounting for similarities among subjects in a cluster almost always results in a net loss of power, requiring increased total subject recruitment. Increasing the number of clusters enhances power more efficiently than does increasing the number of subjects within a cluster. CONCLUSIONS: Primary care research frequently uses clustered designs, whether consciously or unconsciously. Researchers must recognize and understand the implications of clusters to avoid costly sample size errors.
RCT Entities:
BACKGROUND: Primary care research often involves clustered samples in which subjects are randomized at a group level but analyzed at an individual level. Analyses that do not take this clustering into account may report significance where none exists. This article explores the causes, consequences, and implications of cluster data. METHODS: Using a case study with accompanying equations, we show that clustered samples are not as statistically efficient as simple random samples. RESULTS: Similarity among subjects within preexisting groups or clusters reduces the variability of responses in a clustered sample, which erodes the power to detect true differences between study arms. This similarity is expressed by the intracluster correlation coefficient, or p (rho), which compares the within-group variance with the between-group variance. Rho is used in equations along with the cluster size and the number of clusters to calculate the effective sample size (ESS) in a clustered design. The ESS should be used to calculate power in the design phase of a clustered study. Appropriate accounting for similarities among subjects in a cluster almost always results in a net loss of power, requiring increased total subject recruitment. Increasing the number of clusters enhances power more efficiently than does increasing the number of subjects within a cluster. CONCLUSIONS: Primary care research frequently uses clustered designs, whether consciously or unconsciously. Researchers must recognize and understand the implications of clusters to avoid costly sample size errors.
Authors: D M Murray; B L Rooney; P J Hannan; A V Peterson; D V Ary; A Biglan; G J Botvin; R I Evans; B R Flay; R Futterman Journal: Am J Epidemiol Date: 1994-12-01 Impact factor: 4.897
Authors: Anna M Fabiszewski de Aceituno; Christine E Stauber; Adam R Walters; Rony E Meza Sanchez; Mark D Sobsey Journal: Am J Trop Med Hyg Date: 2012-06 Impact factor: 2.345
Authors: Carlos Roberto Jaén; Benjamin F Crabtree; Raymond F Palmer; Robert L Ferrer; Paul A Nutting; William L Miller; Elizabeth E Stewart; Robert Wood; Marivel Davila; Kurt C Stange Journal: Ann Fam Med Date: 2010 Impact factor: 5.166
Authors: Morgan A Valley; Kennon J Heard; Adit A Ginde; Dennis C Lezotte; Steven R Lowenstein Journal: Ann Emerg Med Date: 2012-03-07 Impact factor: 5.721
Authors: Ken Resnicow; Nanhua Zhang; Roger D Vaughan; Sasiragha Priscilla Reddy; Shamagonam James; David M Murray Journal: Am J Public Health Date: 2010-02-18 Impact factor: 9.308
Authors: Georgia Robins Sadler; Celine Marie Ko; Jennifer Alisangco; Bradley P Rosbrook; Eric Miller; Judith Fullerton Journal: Appl Nurs Res Date: 2007-08 Impact factor: 2.257
Authors: Ludguier D Montejo; Jingfei Jia; Hyun K Kim; Uwe J Netz; Sabine Blaschke; Gerhard A Müller; Andreas H Hielscher Journal: J Biomed Opt Date: 2013-07 Impact factor: 3.170
Authors: Jeffrey D Smith; Colin C MacDougall; Jennie Johnstone; Ray A Copes; Brian Schwartz; Gary E Garber Journal: CMAJ Date: 2016-03-07 Impact factor: 8.262