Gerrit Hirschfeld1, Pedro Emmanuel Alvarenga Americano do Brasil2. 1. German Pediatric Pain Center, Children's Hospital, Dr.-Friedrich-Steiner Str. 5, 45711 Datteln, Germany; Children's Pain Therapy and Paediatric Palliative Care, Witten/Herdecke University, 45711 Datteln, Germany. Electronic address: gerrit.hirschfeld@gmail.com. 2. Instituto de Pesquisa Clínica Evandro Chagas, Fundação Oswaldo Cruz, Av. Brasil 4365, CEP 21040-360, Rio de Janeiro, Brazil.
Abstract
OBJECTIVES: Many diagnostic studies are aimed at defining "optimal" thresholds. Here, we evaluate the performance of empirically defined optimal thresholds (1) in the sample in which they were defined and (2) in the population from which the sample was drawn. STUDY DESIGN AND SETTING: We simulated test results for 120,000 samples varying the number of people without a disease (n between 20 and 500), number of people with a disease (m between 20 and 500), the magnitude of the difference between group means [effect size (ES) between 0.5 and 4], and distributions (normal and log-normal). The thresholds associated with the maximal Youden index were defined as optimal. Performance was defined as the percentage of correct classifications in the sample and when applied to the whole population. RESULTS: At the population level, the thresholds defined for the four ESs (0.5, 0.8, 2, and 4) yielded a median of 59%, 65%, 83%, and 97% correct classifications, respectively. At the sample level, the samples with similar characteristics yielded widely varying estimates of the performance that were systematically higher than at the population level. CONCLUSION: Researchers need to be careful defining cut points for mean differences that are traditionally considered "large" (ES = 0.8). The diagnostic utility of optimal thresholds needs to be assessed in prospective studies.
OBJECTIVES: Many diagnostic studies are aimed at defining "optimal" thresholds. Here, we evaluate the performance of empirically defined optimal thresholds (1) in the sample in which they were defined and (2) in the population from which the sample was drawn. STUDY DESIGN AND SETTING: We simulated test results for 120,000 samples varying the number of people without a disease (n between 20 and 500), number of people with a disease (m between 20 and 500), the magnitude of the difference between group means [effect size (ES) between 0.5 and 4], and distributions (normal and log-normal). The thresholds associated with the maximal Youden index were defined as optimal. Performance was defined as the percentage of correct classifications in the sample and when applied to the whole population. RESULTS: At the population level, the thresholds defined for the four ESs (0.5, 0.8, 2, and 4) yielded a median of 59%, 65%, 83%, and 97% correct classifications, respectively. At the sample level, the samples with similar characteristics yielded widely varying estimates of the performance that were systematically higher than at the population level. CONCLUSION: Researchers need to be careful defining cut points for mean differences that are traditionally considered "large" (ES = 0.8). The diagnostic utility of optimal thresholds needs to be assessed in prospective studies.
Keywords:
Computer simulation; Diagnostic techniques and procedures; Epidemiologic research design; Optimal threshold; Sensitivity and specificity; Youden index
Authors: Daniel S Tsze; Gerrit Hirschfeld; Peter S Dayan; Blake Bulloch; Carl L von Baeyer Journal: Pediatr Emerg Care Date: 2018-08 Impact factor: 1.454