Parash Mani Bhandari1, Brooke Levis2, Dipika Neupane1, Scott B Patten3, Ian Shrier4, Brett D Thombs5, Andrea Benedetti6. 1. Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec, Canada; Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Quebec, Canada. 2. Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec, Canada; Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Quebec, Canada; Centre for Prognosis Research, School of Medicine, Keele University, Staffordshire, UK. 3. Department of Community Health Sciences, University of Calgary, Calgary, Alberta, Canada. 4. Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec, Canada; Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Quebec, Canada; Department of Family Medicine, McGill University, Montréal, Québec, Canada. 5. Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec, Canada; Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Quebec, Canada; Department of Medicine, McGill University, Montréal, Québec, Canada; Department of Psychiatry, McGill University, Montréal, Québec, Canada; Department of Psychology, McGill University, Montréal, Québec, Canada; Department of Educational and Counselling Psychology, McGill University, Montréal, Québec, Canada; Biomedical Ethics Unit, McGill University, Montréal, Québec, Canada. Electronic address: brett.thombs@mcgill.ca. 6. Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec, Canada; Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Quebec, Canada; Department of Medicine, McGill University, Montréal, Québec, Canada. Electronic address: andrea.benedetti@mcgill.ca.
Abstract
OBJECTIVE: To evaluate, across multiple sample sizes, the degree that data-driven methods result in (1) optimal cutoffs different from population optimal cutoff and (2) bias in accuracy estimates. STUDY DESIGN AND SETTING: A total of 1,000 samples of sample size 100, 200, 500 and 1,000 each were randomly drawn to simulate studies of different sample sizes from a database (n = 13,255) synthesized to assess Edinburgh Postnatal Depression Scale (EPDS) screening accuracy. Optimal cutoffs were selected by maximizing Youden's J (sensitivity+specificity-1). Optimal cutoffs and accuracy estimates in simulated samples were compared to population values. RESULTS: Optimal cutoffs in simulated samples ranged from ≥ 5 to ≥ 17 for n = 100, ≥ 6 to ≥ 16 for n = 200, ≥ 6 to ≥ 14 for n = 500, and ≥ 8 to ≥ 13 for n = 1,000. Percentage of simulated samples identifying the population optimal cutoff (≥ 11) was 30% for n = 100, 35% for n = 200, 53% for n = 500, and 71% for n = 1,000. Mean overestimation of sensitivity and underestimation of specificity were 6.5 percentage point (pp) and -1.3 pp for n = 100, 4.2 pp and -1.1 pp for n = 200, 1.8 pp and -1.0 pp for n = 500, and 1.4 pp and -1.0 pp for n = 1,000. CONCLUSIONS: Small accuracy studies may identify inaccurate optimal cutoff and overstate accuracy estimates with data-driven methods.
OBJECTIVE: To evaluate, across multiple sample sizes, the degree that data-driven methods result in (1) optimal cutoffs different from population optimal cutoff and (2) bias in accuracy estimates. STUDY DESIGN AND SETTING: A total of 1,000 samples of sample size 100, 200, 500 and 1,000 each were randomly drawn to simulate studies of different sample sizes from a database (n = 13,255) synthesized to assess Edinburgh Postnatal Depression Scale (EPDS) screening accuracy. Optimal cutoffs were selected by maximizing Youden's J (sensitivity+specificity-1). Optimal cutoffs and accuracy estimates in simulated samples were compared to population values. RESULTS: Optimal cutoffs in simulated samples ranged from ≥ 5 to ≥ 17 for n = 100, ≥ 6 to ≥ 16 for n = 200, ≥ 6 to ≥ 14 for n = 500, and ≥ 8 to ≥ 13 for n = 1,000. Percentage of simulated samples identifying the population optimal cutoff (≥ 11) was 30% for n = 100, 35% for n = 200, 53% for n = 500, and 71% for n = 1,000. Mean overestimation of sensitivity and underestimation of specificity were 6.5 percentage point (pp) and -1.3 pp for n = 100, 4.2 pp and -1.1 pp for n = 200, 1.8 pp and -1.0 pp for n = 500, and 1.4 pp and -1.0 pp for n = 1,000. CONCLUSIONS: Small accuracy studies may identify inaccurate optimal cutoff and overstate accuracy estimates with data-driven methods.
Authors: Elsa-Lynn Nassar; Brooke Levis; Marieke A Neyer; Danielle B Rice; Linda Booij; Andrea Benedetti; Brett D Thombs Journal: Int J Methods Psychiatr Res Date: 2022-04-01 Impact factor: 4.182