R Christopher Sheldrick1, James C Benneyan2, Ivy Giserman Kiss3, Margaret J Briggs-Gowan4, William Copeland5, Alice S Carter3. 1. Developmental-Behavioral Pediatrics, Tufts University School of Medicine, Boston, MA, USA. 2. Healthcare Systems Engineering Institute, Colleges of Engineering and Health Sciences, Northeastern University, Boston, MA, USA. 3. Department of Psychology, University of Massachusetts Boston, Boston, MA, USA. 4. Department of Psychiatry, University of Connecticut Health Center, Farmington, CT, USA. 5. Department of Psychiatry and Behavioral Sciences, Duke University School of Medicine, Durham, NC, USA.
Abstract
BACKGROUND: The accuracy of any screening instrument designed to detect psychopathology among children is ideally assessed through rigorous comparison to 'gold standard' tests and interviews. Such comparisons typically yield estimates of what we refer to as 'standard indices of diagnostic accuracy', including sensitivity, specificity, positive predictive value (PPV), and negative predictive value. However, whereas these statistics were originally designed to detect binary signals (e.g., diagnosis present or absent), screening questionnaires commonly used in psychology, psychiatry, and pediatrics typically result in ordinal scores. Thus, a threshold or 'cut score' must be applied to these ordinal scores before accuracy can be evaluated using such standard indices. To better understand the tradeoffs inherent in choosing a particular threshold, we discuss the concept of 'threshold probability'. In contrast to PPV, which reflects the probability that a child whose score falls at or above the screening threshold has the condition of interest, threshold probability refers specifically to the likelihood that a child whose score is equal to a particular screening threshold has the condition of interest. METHOD: The diagnostic accuracy and threshold probability of two well-validated behavioral assessment instruments, the Child Behavior Checklist Total Problem Scale and the Strengths and Difficulties Questionnaire total scale were examined in relation to a structured psychiatric interview in three de-identified datasets. RESULTS: Although both screening measures were effective in identifying groups of children at elevated risk for psychopathology in all samples (odds ratios ranged from 5.2 to 9.7), children who scored at or near the clinical thresholds that optimized sensitivity and specificity were unlikely to meet criteria for psychopathology on gold standard interviews. CONCLUSIONS: Our results are consistent with the view that screening instruments should be interpreted probabilistically, with attention to where along the continuum of positive scores an individual falls.
BACKGROUND: The accuracy of any screening instrument designed to detect psychopathology among children is ideally assessed through rigorous comparison to 'gold standard' tests and interviews. Such comparisons typically yield estimates of what we refer to as 'standard indices of diagnostic accuracy', including sensitivity, specificity, positive predictive value (PPV), and negative predictive value. However, whereas these statistics were originally designed to detect binary signals (e.g., diagnosis present or absent), screening questionnaires commonly used in psychology, psychiatry, and pediatrics typically result in ordinal scores. Thus, a threshold or 'cut score' must be applied to these ordinal scores before accuracy can be evaluated using such standard indices. To better understand the tradeoffs inherent in choosing a particular threshold, we discuss the concept of 'threshold probability'. In contrast to PPV, which reflects the probability that a child whose score falls at or above the screening threshold has the condition of interest, threshold probability refers specifically to the likelihood that a child whose score is equal to a particular screening threshold has the condition of interest. METHOD: The diagnostic accuracy and threshold probability of two well-validated behavioral assessment instruments, the Child Behavior Checklist Total Problem Scale and the Strengths and Difficulties Questionnaire total scale were examined in relation to a structured psychiatric interview in three de-identified datasets. RESULTS: Although both screening measures were effective in identifying groups of children at elevated risk for psychopathology in all samples (odds ratios ranged from 5.2 to 9.7), children who scored at or near the clinical thresholds that optimized sensitivity and specificity were unlikely to meet criteria for psychopathology on gold standard interviews. CONCLUSIONS: Our results are consistent with the view that screening instruments should be interpreted probabilistically, with attention to where along the continuum of positive scores an individual falls.
Authors: Patrick M Bossuyt; Johannes B Reitsma; David E Bruns; Constantine A Gatsonis; Paul P Glasziou; Les M Irwig; David Moher; Drummond Rennie; Henrica C W de Vet; Jeroen G Lijmer Journal: Ann Intern Med Date: 2003-01-07 Impact factor: 25.391
Authors: D Shaffer; P Fisher; M K Dulcan; M Davies; J Piacentini; M E Schwab-Stone; B B Lahey; K Bourdon; P S Jensen; H R Bird; G Canino; D A Regier Journal: J Am Acad Child Adolesc Psychiatry Date: 1996-07 Impact factor: 8.829
Authors: Kathleen R Merikangas; Shelli Avenevoli; E Jane Costello; Doreen Koretz; Ronald C Kessler Journal: J Am Acad Child Adolesc Psychiatry Date: 2009-04 Impact factor: 8.829
Authors: Yong-Ming Wang; Lai-Quan Zou; Wen-Lan Xie; Zhuo-Ya Yang; Xiong-Zhao Zhu; Eric F C Cheung; Thomas Alrik Sørensen; Arne Møller; Raymond C K Chan Journal: Schizophr Bull Date: 2019-01-01 Impact factor: 9.306
Authors: R Christopher Sheldrick; Lauren E Schlichting; Blythe Berger; Ailis Clyne; Pensheng Ni; Ellen C Perrin; Patrick M Vivier Journal: Pediatrics Date: 2019-11-14 Impact factor: 7.124
Authors: R Christopher Sheldrick; Susan Marakovitz; Daryl Garfinkel; Alice S Carter; Ellen C Perrin Journal: JAMA Pediatr Date: 2020-04-01 Impact factor: 16.193