Darrel P Francis1. 1. International Centre for Circulatory Health, National Heart and Lung Institute, Imperial College London, 59-61 North Wharf Road, London W2 1LA, UK. d.francis@imperial.ac.uk
Abstract
BACKGROUND: When reported correlation coefficients seem too high to be true, does investigative verification of source data provide suitable reassurance? This study tests how easily omission of patients or selection amongst irreproducible measurements generate fictitious strong correlations, without data fabrication. METHOD AND RESULTS: Two forms of manipulation are applied to a pair of normally-distributed, uncorrelated variables: first, exclusion of patients least favourable to a hypothesised association and, second, making multiple poorly-reproducible measurements per patient and choosing the most supportive. Excluding patients raises correlations powerfully, from 0.0 ± 0.11 (no patients omitted) to 0.40 ± 0.11 (one-fifth omitted), 0.59 ± 0.08 (one-third omitted) and 0.78 ± 0.05 (half omitted). Study size offers no protection: omitting just one-fifth of 75 patients (i.e. publishing 60) makes 92% of correlations statistically significant. Worse, simply selecting the most favourable amongst several measurements raises correlations from 0.0 ± 0.12 (single measurement of each variable) to 0.73 ± 0.06 (best of 2), and 0.90 ± 0.03 (best of 4). 100% of correlation coefficients become statistically significant. Scatterplots may reveal a telltale "shave sign" or "bite sign". Simple statistical tests are presented for these suspicious signatures in single or multiple studies. CONCLUSION: Correlations are vulnerable to data manipulation. Cardiology is especially vulnerable to patient deletion (because cardiologists ourselves might completely control enrolment and measurement), and selection of "best" measurements (because alternative heartbeats are numerous, and some modalities poorly reproducible). Source data verification cannot detect these but tests might highlight suspicious data and--aggregating across studies--unreliable laboratories or research fields. Cardiological correlation research needs adequately-informed planning and guarantees of integrity, with teeth.
BACKGROUND: When reported correlation coefficients seem too high to be true, does investigative verification of source data provide suitable reassurance? This study tests how easily omission of patients or selection amongst irreproducible measurements generate fictitious strong correlations, without data fabrication. METHOD AND RESULTS: Two forms of manipulation are applied to a pair of normally-distributed, uncorrelated variables: first, exclusion of patients least favourable to a hypothesised association and, second, making multiple poorly-reproducible measurements per patient and choosing the most supportive. Excluding patients raises correlations powerfully, from 0.0 ± 0.11 (no patients omitted) to 0.40 ± 0.11 (one-fifth omitted), 0.59 ± 0.08 (one-third omitted) and 0.78 ± 0.05 (half omitted). Study size offers no protection: omitting just one-fifth of 75 patients (i.e. publishing 60) makes 92% of correlations statistically significant. Worse, simply selecting the most favourable amongst several measurements raises correlations from 0.0 ± 0.12 (single measurement of each variable) to 0.73 ± 0.06 (best of 2), and 0.90 ± 0.03 (best of 4). 100% of correlation coefficients become statistically significant. Scatterplots may reveal a telltale "shave sign" or "bite sign". Simple statistical tests are presented for these suspicious signatures in single or multiple studies. CONCLUSION: Correlations are vulnerable to data manipulation. Cardiology is especially vulnerable to patient deletion (because cardiologists ourselves might completely control enrolment and measurement), and selection of "best" measurements (because alternative heartbeats are numerous, and some modalities poorly reproducible). Source data verification cannot detect these but tests might highlight suspicious data and--aggregating across studies--unreliable laboratories or research fields. Cardiological correlation research needs adequately-informed planning and guarantees of integrity, with teeth.
Authors: Andreas Kyriacou; Matthew E Li Kam Wa; Punam A Pabari; Beth Unsworth; Resham Baruah; Keith Willson; Nicholas S Peters; Prapa Kanagaratnam; Alun D Hughes; Jamil Mayet; Zachary I Whinnett; Darrel P Francis Journal: Int J Cardiol Date: 2012-03-27 Impact factor: 4.164
Authors: Andreas Kyriacou; Punam A Pabari; Jamil Mayet; Nicholas S Peters; D Wyn Davies; P Boon Lim; David Lefroy; Alun D Hughes; Prapa Kanagaratnam; Darrel P Francis; Zachary I Whinnett Journal: Int J Cardiol Date: 2013-10-16 Impact factor: 4.164
Authors: Alexandra N Nowbar; Michael Mielewczik; Maria Karavassilis; Hakim-Moulay Dehbi; Matthew J Shun-Shin; Siana Jones; James P Howard; Graham D Cole; Darrel P Francis Journal: BMJ Date: 2014-04-28
Authors: Michela Moraldo; Fabrizio Cecaro; Matthew Shun-Shin; Punam A Pabari; Justin E Davies; Xiao Y Xu; Alun D Hughes; Charlotte Manisty; Darrel P Francis Journal: Int J Cardiol Date: 2012-12-11 Impact factor: 4.164
Authors: Zachary I Whinnett; Darrel P Francis; Arnaud Denis; Keith Willson; Patrizio Pascale; Irene van Geldorp; Maxime De Guillebon; Sylvain Ploux; Kenneth Ellenbogen; Michel Haïssaguerre; Philippe Ritter; Pierre Bordachar Journal: Int J Cardiol Date: 2013-03-05 Impact factor: 4.164