Pavel Atanasov1, Andreas Diamantaras1, Amanda MacPherson1, Esther Vinarov1, Daniel M Benjamin1, Ian Shrier1, Friedemann Paul1, Ulrich Dirnagl1, Jonathan Kimmelman2. 1. From Pytho LLC (P.A.), Brooklyn, NY; Department of Neurology (A.D.), Inselspital, Bern University Hospital, University of Bern, Switzerland; Biomedical Ethics Unit, Department of Social Studies of Medicine (A.M., E.V., D.M.B., J.K.), and Centre for Clinical Epidemiology, Lady Davis Institute, Jewish General Hospital (I.S.), McGill University, Montreal, Canada; Max Delbrueck Center for Molecular Medicine (F.P.), Berlin; Department of Neurology (F.P.), NeuroCure Clinical Research Center and Experimental and Clinical Research Center, Charité-Universitätsmedizin Berlin; Humboldt-Universität zu Berlin (U.D.), Charité-Universitätsmedizin Berlin, corporate member of Freie Universität Berlin; and Department of Experimental Neurology and Center for Stroke Research Berlin and QUEST Center for Transforming Biomedical Research (U.D.), Berlin Institute of Health, Germany. 2. From Pytho LLC (P.A.), Brooklyn, NY; Department of Neurology (A.D.), Inselspital, Bern University Hospital, University of Bern, Switzerland; Biomedical Ethics Unit, Department of Social Studies of Medicine (A.M., E.V., D.M.B., J.K.), and Centre for Clinical Epidemiology, Lady Davis Institute, Jewish General Hospital (I.S.), McGill University, Montreal, Canada; Max Delbrueck Center for Molecular Medicine (F.P.), Berlin; Department of Neurology (F.P.), NeuroCure Clinical Research Center and Experimental and Clinical Research Center, Charité-Universitätsmedizin Berlin; Humboldt-Universität zu Berlin (U.D.), Charité-Universitätsmedizin Berlin, corporate member of Freie Universität Berlin; and Department of Experimental Neurology and Center for Stroke Research Berlin and QUEST Center for Transforming Biomedical Research (U.D.), Berlin Institute of Health, Germany. jonathan.kimmelman@mcgill.ca.
Abstract
OBJECTIVE: To explore the accuracy of combined neurology expert forecasts in predicting primary endpoints for trials. METHODS: We identified one major randomized trial each in stroke, multiple sclerosis (MS), and amyotrophic lateral sclerosis (ALS) that was closing within 6 months. After recruiting a sample of neurology experts for each disease, we elicited forecasts for the primary endpoint outcomes in the trial placebo and treatment arms. Our main outcome was the accuracy of averaged predictions, measured using ordered Brier scores. Scores were compared against an algorithm that offered noncommittal predictions. RESULTS:Seventy-one neurology experts participated. Combined forecasts of experts were less accurate than a noncommittal prediction algorithm for the stroke trial (pooled Brier score = 0.340, 95% subjective probability interval [sPI] 0.340 to 0.340 vs 0.185 for the uninformed prediction), and approximately as accurate for the MS study (pooled Brier score = 0.107, 95% confidence interval [CI] 0.081 to 0.133 vs 0.098 for the noncommittal prediction) and the ALS study (pooled Brier score = 0.090, 95% CI 0.081 to 0.185 vs 0.090). The 95% sPIs of individual predictions contained actual trial outcomes among 44% of experts. Only 18% showed prediction skill exceeding the noncommittal prediction. Independent experts and coinvestigators achieved similar levels of accuracy. CONCLUSION: In this first-of-kind exploratory study, averaged expert judgments rarely outperformed noncommittal forecasts. However, experts at least anticipated the possibility of effects observed in trials. Our findings, if replicated in different trial samples, caution against the reliance on simple approaches for combining expert opinion in making research and policy decisions.
RCT Entities:
OBJECTIVE: To explore the accuracy of combined neurology expert forecasts in predicting primary endpoints for trials. METHODS: We identified one major randomized trial each in stroke, multiple sclerosis (MS), and amyotrophic lateral sclerosis (ALS) that was closing within 6 months. After recruiting a sample of neurology experts for each disease, we elicited forecasts for the primary endpoint outcomes in the trial placebo and treatment arms. Our main outcome was the accuracy of averaged predictions, measured using ordered Brier scores. Scores were compared against an algorithm that offered noncommittal predictions. RESULTS: Seventy-one neurology experts participated. Combined forecasts of experts were less accurate than a noncommittal prediction algorithm for the stroke trial (pooled Brier score = 0.340, 95% subjective probability interval [sPI] 0.340 to 0.340 vs 0.185 for the uninformed prediction), and approximately as accurate for the MS study (pooled Brier score = 0.107, 95% confidence interval [CI] 0.081 to 0.133 vs 0.098 for the noncommittal prediction) and the ALS study (pooled Brier score = 0.090, 95% CI 0.081 to 0.185 vs 0.090). The 95% sPIs of individual predictions contained actual trial outcomes among 44% of experts. Only 18% showed prediction skill exceeding the noncommittal prediction. Independent experts and coinvestigators achieved similar levels of accuracy. CONCLUSION: In this first-of-kind exploratory study, averaged expert judgments rarely outperformed noncommittal forecasts. However, experts at least anticipated the possibility of effects observed in trials. Our findings, if replicated in different trial samples, caution against the reliance on simple approaches for combining expert opinion in making research and policy decisions.
Authors: A Finley Caulfield; L Gabler; M G Lansberg; I Eyngorn; M Mlynash; M S Buckwalter; C Venkatasubramanian; C A C Wijman Journal: Neurology Date: 2010-04-06 Impact factor: 9.910
Authors: Jacob Elkins; Roland Veltkamp; Joan Montaner; S Claiborne Johnston; Aneesh B Singhal; Kyra Becker; Maarten G Lansberg; Weihua Tang; Ih Chang; Kumar Muralidharan; Sarah Gheuens; Lahar Mehta; Mitchell S V Elkind Journal: Lancet Neurol Date: 2017-02-15 Impact factor: 44.182
Authors: Daniel M Benjamin; Spencer P Hey; Amanda MacPherson; Yasmina Hachem; Kara S Smith; Sean X Zhang; Sandy Wong; Samantha Dolter; David R Mandel; Jonathan Kimmelman Journal: PLoS One Date: 2022-02-08 Impact factor: 3.240