| Literature DB >> 31295318 |
Sarah Hjorth1, Rebecca Bromley2,3, Eivind Ystrom1,4,5, Angela Lupattelli1, Olav Spigset6,7, Hedvig Nordeng1,8.
Abstract
In recent years there has been increased attention to child neurodevelopment in studies on medication safety in pregnancy. Neurodevelopment is a multifactorial outcome that can be assessed by various assessors, using different measures. This has given rise to a debate on the validity of various measures of neurodevelopment. The aim of this review was twofold. Firstly we aimed to give an overview of studies on child neurodevelopment after prenatal exposure to central nervous system acting medications using psychotropics and analgesics as examples, giving special focus on the use and validity of outcome measures. Secondly, we aimed to give guidance on how to conduct and interpret medication safety studies with neurodevelopment outcomes. We conducted a systematic review in the MEDLINE, Embase, PsycINFO, Web of Science, Scopus, and Cochrane databases from inception to April 2019, including controlled studies on prenatal exposure to psychotropics or analgesics and child neurodevelopment, measured with standardised psychometric instruments or by diagnosis of neurodevelopmental disorder. The review management tool Covidence was used for data-extraction. Outcomes were grouped as motor skills, cognition, behaviour, emotionality, or "other". We identified 110 eligible papers (psychotropics, 82 papers, analgesics, 29 papers). A variety of neurodevelopmental outcome measures were used, including 27 different psychometric instruments administered by health care professionals, 15 different instruments completed by parents, and 13 different diagnostic categories. In 23 papers, no comments were made on the validity of the outcome measure. In conclusion, establishing neurodevelopmental safety includes assessing a wide variety of outcomes important for the child's daily functioning including motor skills, cognition, behaviour, and emotionality, with valid and reliable measures from infancy through to adolescence. Consensus is needed in the scientific community on how neurodevelopment should be assessed in medication safety in pregnancy studies. Review registration number: CRD42018086101 in the PROSPERO database.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31295318 PMCID: PMC6622545 DOI: 10.1371/journal.pone.0219778
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1PRISMA flowchart.
WOS: Web of Science.
Fig 2Domains of neurodevelopment evaluated and data sources used in medication safety papers on psychotropics.
Some papers had outcomes from more than one domain and assessments from more than one type of assessor. HCP: Health care professionals.
Fig 3Domains of neurodevelopment evaluated and data sources used in medication safety papers on analgesics.
Some papers had outcomes from more than one domain and assessments from more than one type of assessor. HCP: Health care professionals.
Validity and reliability of outcome measures as reported by the authors of the study, papers on antidepressants.
| Psychometric properties of outcome measure | |||
|---|---|---|---|
| Reference | Outcome | Reliability | Validity |
| Suri 2011 [ | BNBAS | Assessors were trained and certified to 0.90 reliability | Sensitive to medication effects. Limited normative base. Results that can be influenced by numerous subtle factors |
| Mortensen 2003 [ | Boel test | Not mentioned | Test was performed at children’s home, not under standard settings in which the test was developed. Surroundings could be associated with both exposure and outcome |
| Batton 2013 [ | Bayley Infant Neurodevelopment Screener | Not mentioned | Validated using clinical and normative standardisation samples |
| Weikum 2013a [ | BSID, unspecified edition | Not mentioned | Not mentioned |
| Gustafsson 2018 [ | BSID-II | Administrators had completed several practice administrations with repeated feedback prior to engaging in study administration of the BSID | Not mentioned |
| Oberlander 2004 [ | BSID-II | Not mentioned | Not mentioned |
| Reebye 2002 [ | BSID-II | Not mentioned | Not mentioned |
| Reebye 2012 [ | BSID-II | Not mentioned | Not mentioned |
| Santucci 2014 [ | BSID-II | Good reliability for infants from 1 to 42 months | Good concurrent validity for infants from 1 to 42 months |
| Austin 2013 [ | BSID-III | BSID is considered gold standard. Good reliability and internal consistency | Good validity. Used US norms. BSID is more predictive, the older the child |
| Hanley 2013 [ | BSID-III | Not mentioned | Not mentioned |
| Heikkinen 2002 [ | Gesell Development scales | Psychometric properties not mentioned (Comment from review authors: Not mentioned which psychometric instrument is used, we have taken this information from the 2003 paper) | |
| Heikkinen 2003 [ | Gesell Development scales | Not mentioned | Not mentioned |
| Johnson 2012 [ | Infant Neurological International Battery | Interrater reliability 0.97, test-retest 0.95 | Sensitivity 90%, specificity 83%, PPV 79%, NPV 93% |
| de Vries 2013 [ | Psychomotor assessment according to Prechtl | Interrater reliability of general movement assessment is high (89% to 93%) | Abnormal general movement quality is highly predictive of later neurological impairment, but might not be sensitive enough to detect minor neurological sequelae. The Motor Optimality Score and the concurrent repertoire might be more revealing. Monotonous movement is related to minor neurological dysfunction in preterm infants |
| Nulman 2002 [ | BSID-II, Reynell developmental language scale, MSCA | Not mentioned | Not mentioned |
| Nulman 1997 [ | BSID-II | Not mentioned | Not mentioned |
| Casper 2003 [ | BSID-II | The two psychologist assessors were reliability certified on the BSID-II annually | Not mentioned |
| Batton 2013 [ | BSID-III | Not mentioned | Not mentioned |
| Galbally 2011 [ | BSID-III | Test-retest reliability for total score is 0.9 at 12 months | Regarded as the reference standard in the assessment of infant and toddler development. Only moderate predictive validity |
| Hurault-Delarue 2016 [ | Compulsory medical exam | Not mentioned | Not mentioned |
| Schechter 2017 [ | DAS | Fidelity checks every 6-months by a licensed clinical psychologist | Standardised, age-normed, well-validated measure of cognitive ability |
| Johnson 2016 [ | DAS-II, Test of early language development, 3rd edition (TELD-3) | Not mentioned | DAS-II: Measure is normalised. TELD-3: Not mentioned |
| Galbally 2015 [ | Movement ABC, WPPSI-III | WPPSI, see validity. Motor ABC, reliability not mentioned | WPPSI is regarded as the gold standard for assessing cognitive ability. Motor ABC, validity not mentioned |
| Mattson 2002 [ | WPPSI-R | Not mentioned | Not mentioned |
| Nulman 1997 [ | Reynell developmental language scale, MSCA | Not mentioned | Not mentioned |
| El Marroun 2017 [ | SON-R (shortened), NEPSY-II | SON-R: Reliability was 0.73 for Mosaics and 0.71 for Categories. NEPSY-II: Not mentioned | SON-R: Validated in Dutch. Correlation to full scale, r = 0.86. NEPSY-II: Not mentioned |
| Hermansen 2016 [ | NEPSY-II (shortened), WPPSI-R | Not mentioned | Not mentioned |
| Nulman 2012 [ | WPPSI-III | Not mentioned | Child IQ levels were considerably higher than the normative mean of the general population. High IQs may reflect the Flynn effect. Mothers who contacted Motherisk may have had higher IQs than those who did not. Canadian norm for full-scale IQ is five points higher than the usual norm |
| Nulman 2015 [ | WPPSI-III | Not mentioned | Not mentioned |
| Mattson 2002 [ | WISC-III | Not mentioned | Not mentioned |
| Brandlistuen 2015 [ | CBCL | Reliability was adequate, Cronbach‘s α 0.62 | The subset of items used in the MoBa study was found to be representative, with a correlation of 0.92 with the full scale |
| Reebye 2002 [ | Early infant temperament questionnaire | Not mentioned | Not mentioned |
| Netsi 2015 [ | Infant Characteristic Questionnaire, Brief Infant Sleep Questionnaire | Measures are reliable | Questionnaires relied heavily on maternal report and are subject to maternal bias |
| Nulman 1997 [ | Toddler temperament scale | Not mentioned | Not mentioned |
| Nulman 2002 [ | Toddler temperament scale | Not mentioned | Not mentioned |
| Handal 2016a [ | ASQ | Not mentioned | Validation study was done. For gross motor the estimated Spearman correlation was 0.25 with Mullen scores. For fine motor the correlation was 0.40 |
| El Marroun 2017 [ | BRIEF | Good test-retest reliability | Good content validity |
| Galbally 2015 [ | CBCL, BRIEF | CBCL, see validity. BRIEF, reliability not mentioned | CBCL is reliable, valid and widely used. BRIEF, validity not mentioned |
| Brandlistuen 2015 [ | CBCL | Reliability was adequate, Cronbach‘s α 0.74 | The subset of items used in the MoBa study was found to be representative, with a correlation of 0.92 with the full scale |
| Hanley 2015 [ | CBCL | Not mentioned | Maternal report poses some risk of differential outcome reporting |
| Johnson 2016 [ | CBCL | Not mentioned | CBCL is not a diagnostic tool, but children scoring high on the PDD subscale are more likely to evidence behaviours commonly associated with ASD |
| Misri 2006 [ | CBCL | Test-retest reliability r = 0.85, interrater reliability r = 0.61 | Not mentioned |
| Nulman 2002 [ | CBCL | Not mentioned | Not mentioned |
| Oberlander 2007 [ | CBCL | Not mentioned | Widely used and well validated. Close relationship between maternal depression and maternal report of child behaviour. Increased level of aggression could represent lower maternal thresholds for tolerating preschool behaviour |
| Oberlander 2010 [ | CBCL | Not mentioned | Widely used and well validated |
| Lupattelli 2018 [ | CBCL, EAS | CBCL: Not mentioned in the article, but in the supplement internal consistency is reported quantitatively for each subscale. EAS: Moderate internal consistency of Norwegian version. In the supplement internal consistency is reported quantitatively for each subscale | Due to parent reporting misclassification cannot be ruled out, but was probably non-differential. CBCL: Widely used and validated. Predictive validity for adolescent psychiatric disorders showed a sensitivity of 0.71 and a specificity of 0.92. EAS: The short version used was highly correlated to full scale |
| Handal 2016b [ | Intelligibility/Complexity of 3-year-old Children’s Utterances | Not mentioned | Validation study showed that children categorised by their mothers as having no language delay achieved a higher score on the communication domain of the Vineland Adaptive Behavior Scale than the children categorised as having severe language delay by mothers (Comment from review authors: Vinelands is a structured interview of parents by the HCP) |
| Skurveit 2014 [ | Intelligibility/Complexity of 3-year-old Children’s Utterances | Not mentioned | Parental self-report is generally a good measure of early expressive vocabulary, especially for severe language delay. The validity of the language grammar rating scale has been described earlier |
| Pedersen 2013 [ | SDQ | Not mentioned | Validated in Danish. Not intended to predict underlying psychiatric disorder. Relatively low sensitivity based on single informants |
| Hutchison 2019 [ | BRIEF | Not mentioned | Maternal perception of her child may have been negatively affected by depression, yet study using performance based measures of executive function found similar results |
| Hermansen 2016 [ | CBCL | See validity | Widely used and standardised. Excellent reliability and validity |
| Nulman 1997 [ | CBCL | Not mentioned | Not mentioned |
| Nulman 2012 [ | CBCL, CPRS | Not mentioned | Maternal perception of her child may have been negatively affected by depression and its associated anxiety and stress |
| Nulman 2015 [ | CBCL, CPRS-R | Not mentioned | Severity of maternal depression may influence responses when filling out questionnaires. Bias minimised as mothers evaluate both her exposed and unexposed children simultaneously (Comment from review authors: the mother still knows that one child was exposed and the other was not) |
| El Marroun 2014 [ | CBCL and Social responsiveness scale (SRS) | Correlations between the CBCL measurements at different ages fell in the expected range, based on a mean correlation (r = 0.60)Crohnbach’s α indicated high inter-item reliability for the SRS (a = 0.79) | CBCL: Validated in Dutch. Good predictive validity to identify preschoolers at risk of autism spectrum disorder. Scales could not be normalised, so scores were dichotomised. The SRS correlated well with the pervasive developmental problems scale of the CBCL, r = 0.59 |
| Hanley 2015 [ | HBQ-P | See validity | Maternal report poses some risk of differential outcome reporting. The HBQ-P has strong psychometric properties |
| Weikum 2013b [ | HBQ-P | See validity | The HBQ-P has strong psychometric properties. Mental health scales discriminate groups of children with and without signs of early psychopathology |
| Grzeskowiak 2016 [ | SDQ | Not mentioned | Validated in Danish. Excellent discrimination for the identification of emotional (AUC 0.80) and behavioural (AUC 0.89) disorders |
| Johnson 2016 [ | CBCL (other caregiver) | Not mentioned | CBCL is not a diagnostic tool, but children scoring high on the PDD subscale are more likely to evidence behaviours commonly associated with ASDs |
| Misri 2006 [ | CBCL (teacher) | Test-retest reliability r = 0.85, interrater reliability r = 0.61 | Not mentioned |
| Oberlander 2007 [ | CBCL (teacher) | Not mentioned | Widely used and well validated. Close relationship between maternal depression and maternal report of child behaviour. Increased level of aggression could represent lower maternal thresholds for tolerating preschool behaviour |
| Boukhris 2017 [ | ADHD | Not mentioned | Diagnosis defined from hospital diagnosis or redemption of a prescription for ADHD medication. Diagnoses of ADHD in the cohort were not validated. Sensitivity analysis on children with a diagnosis confirmed by neurologists and psychiatrists was consistent with those of the main analyses |
| Figueroa 2010 [ | ADHD | Not mentioned | ADHD identified from diagnoses or treatment from practice, not from any formalised test or direct observation. This could lead to false-positive and false-negative errors. The young age when ADHD was diagnosed represents a limitation. Possible that only the most severe cases of ADHD were identified or that other behavioural problems that resemble ADHD are included |
| Laugesen 2013 [ | ADHD | Not mentioned | Detection bias could have led to an overestimation of the association. Children with ADHD were identified based on hospital diagnoses and drug prescriptions. Patients with ADHD diagnosed by private psychiatrists or general practitioners and not prescribed drug treatment would be misclassified as not having ADHD |
| Man 2017 [ | ADHD | Not mentioned | The registry contains information from publicly funded healthcare medical records. Does not include data from private medical practitioners or hospitals |
| Castro 2016 [ | ASD and ADHD | Not mentioned | ICD-9 codes have a high sensitivity and specificity for ASD and ADHD versus generally healthy control. Case definition previously validated and included scores from autism diagnostic observation scale |
| Clements 2015 [ | ASD and ADHD | Not mentioned | For ASD diagnosis in registry, sensitivity was 1.00, specificity 0.91. For ADHD, sensitivity was 0.84, specificity 0.90 |
| Sujan 2017 [ | ASD and ADHD | Not mentioned | Previous research has validated the diagnoses in used registries. Associations were estimated excluding offspring with diagnoses before age 2 years to address concerns about validity of early neurodevelopmental diagnoses. This did not markedly alter results |
| Wibroe 2017 [ | ASD and ADHD | Not mentioned | Not mentioned |
| Malm 2016 [ | ASD, depression, anxiety, ADHD | Not mentioned | Quality of the registry has been validated and is good for psychiatric diagnoses |
| Liu 2017 [ | ASD, F30-39, F40-49, F70-79, F90-99 | Not mentioned | Cannot rule out detection bias, but similar associations were observed for all disorders, irrespective of age at onset. Cases were redefined as at least two hospital contacts for psychiatric disorders, but similar results were obtained |
| Boukhris 2016 [ | ASD | Not mentioned | Diagnoses of ASD in the cohort were not validated. Sensitivity analysis on children with a diagnosis of ASD confirmed by neurologists and psychiatrists was consistent with those of the main analyses |
| Brown 2017 [ | ASD | Not mentioned | ASD was defined as 2 or more outpatient diagnoses by either a paediatrician or psychiatrist, 1 or more diagnoses in hospital databases, after the age of 2 years. A similar definition using US insurance data had a positive predictive value of 87.4% |
| Croen 2011 [ | ASD | Not mentioned | Validation study of diagnoses in the registry against the Autism Diagnostic Interview–Revised and the Autism Diagnostic Observation Schedule–Generic; 94% of cases met criteria for ASD on both instruments, and 100% on at least one. A full review of diagnostic information recorded in cohort medical records demonstrated that at least 90% of ASD cases in the cohort meet DSM-IV criteria |
| Gidaya 2014 [ | ASD | Not mentioned | Register diagnosis of childhood autism was confirmed 94% of the time with an additional 3% classified with another ASD in validation study |
| Hviid 2013 [ | ASD | Not mentioned | A previous study showed that 94% of children registered with autism spectrum disorder diagnoses met diagnostic criteria in chart review. Not all children in the study have been followed throughout childhood. Some may receive a diagnosis of ASD at older ages. Detection bias cannot be ruled out |
| Janecka 2018 [ | ASD | Not mentioned | Not mentioned |
| Rai 2017 [ | ASD | Not mentioned | Previous validation studies found high validity of the diagnoses recorded in the registers |
| Sorensen 2013 [ | ASD, infantile autism | Not mentioned | The quality of the infantile autism diagnosis in the registry has been validated. 94% met the criteria for correct diagnosis |
| Viktorin 2017b [ | ASD | Not mentioned | The ASD diagnoses in the register have previously been validated |
| Harrington 2014 [ | ASD, developmental delay (DD) | Not mentioned | Diagnoses of ASD and DD were confirmed by trained clinicians using validated standardised instruments. Controls were screened and reclassified if appropriate |
| Brown 2016 [ | Disorders of speech/language, motor skills, and scholastic skills | Not mentioned | Diagnostic data were based on specialised health services rather than primary care. Some proportion of children with mild dysfunction may have been missed |
| Simon 2002 [ | Developmental delay of motor skills, developmental delay of speech | Not mentioned | Diagnosis required both a physician diagnosis and confirmation by a formal developmental evaluation. Examinations at outpatient paediatric visits are relatively crude screening measures. Use of these data may reduce bias in ascertainment of developmental delay, but it sacrifices sensitivity by limiting analyses to abnormalities detected during routine medical care |
| Viktorin 2017a [ | Intellectual disability | Not mentioned | Children with ID without clinical care are not captured, so the prevalence estimate of ID in the study may be an underestimate of the true prevalence in the population. Detection bias cannot be ruled out |
ADHD: Attention Deficit Hyperactivity Disorder, ASD: Autism Spectrum Disorder, ASQ: Ages and Stages Questionnaire, BNBAS: Brazelton Neonatal Behavoiral Assessment Scale, BRIEF: Behaviour Rating Inventory of Executive Function, BSID: Bayley Scales of Infant Development, CBCL: Child Behaviour Checklist, CPRS: Conners Parent Rating Scale, DAS: Differential ability scales, EAS: Emotionality, Activity, Sociability Temperament Survey, HBQ-P: MacArthur Health and Behaviour Questionnaire, MSCA: McCarthy’s scales of children’s abilities, SDQ: Strengths and Difficulties Questionnaire, SON-R: Snijders–Oomen Niet-verbale intelligentie Test–Revisie, WISC: Wechsler Intelligence Scale for Children, WPPSI: Wechsler Preschool and Primary Scale of Intelligence.
Validity and reliability of outcome measures as reported by the authors of the study, papers on analgesics.
| Psychometric properties of outcome measure | ||||
|---|---|---|---|---|
| Reference | Exposure | Outcome measure | Reliability | Validity |
| Salokorpi 1996 [ | Indomethacin | Autti-Rämö neurodevelopmental test battery | Not mentioned | Standardised for Finnish children |
| Amin 2008 [ | Indomethacin | BSID-II, mental development index | Not mentioned | Not mentioned |
| Al-Alaiyan 1996 [ | Indomethacin | Gesell development scales, revised | Not mentioned | Not mentioned |
| Avella-Garcia 2016 [ | Paracetamol | BSID, unspecified ed. | Not mentioned in the paper, but in supplement. Cronbach’s α 0.70 (good to moderate) | Not mentioned |
| Barr 1990 [ | ASA | Items from Gross Motor Scale (University of Oregon Medical School), Gesell–and Bayley Scales, Wisconsin Fine Motor Steadiness Battery, and Halstead Reitan Neuropsychological Battery | Monthly reliability checks revealed good interrater reliability | For three variables, the best predictors were exam conditions. None of these measures are normally used with children this young |
| Klebanoff 1988 [ | ASA | Stanford Binet Intelligence Scale | 3 months test-retest, and in addition inter-rater, reliability was 0.83 | Test focuses on verbal abilities. Highly correlated to school performance |
| Streissguth 1987 [ | ASA | WPPSI | Not mentioned | IQ scores by age 4 years have a good predictive validity for later intellectual function |
| Avella-Garcia 2016 [ | Paracetamol | McCarthy Scales of Children’s Abilities, CAST* | McCarthy Scale of Children’s Abilities: Cronbach’s α 0.90. CAST: Cronbach’s α 0.64 (good to moderate) | Only mentioned for CAST: Sensitivity 100%, specificity 97% for ASD |
| Bornehag 2018 [ | Paracetamol | Swedish language development scale† | Not mentioned | Validated in Sweden |
| Liew 2016b [ | Paracetamol | TEACh-5 | The authors do not specify the reliability or validity but refers readers interested in psychometric properties to two papers on the psychometrics of the instrument | |
| Liew 2016c [ | Paracetamol | WPPSI-R, shortened | Not mentioned | Full-scale IQ was derived from the selected items. Danish WPPSI-R norms were not available, so Swedish norms were used. Therefore the distribution of IQ scores in the sample does not have a mean of 100 and SD of 15 |
| Markovic 2019 [ | NSAIDs | SON-R | Reliability of the tests used in the study was 0.73 and 0.71 | Correlation to full scale was r = 0.86. Validated in Dutch |
| Laue 2019 [ | Paracetamol | WISC-IV | Not mentioned | WISC is objective and less biased than parental report |
| Vlenterie 2016 [ | Paracetamol | Motor milestone questionnaire, ASQ, CBCL, EAS | Motor milestone is believed to be objective and therefore a reliable maternal report on motor development. ASQ: Not mentioned. CBCL: Not mentioned. EAS: The short form was as reliable and precise as the full scale | Motor milestones: Not mentioned. ASQ: Validated in Norway. CBCL and EAS: Parent-reported behaviour outcomes can suffer from differential misclassification |
| Wood 2016a [ | Triptans | ASQ, CBCL, EAS | ASQ: The questions had excellent test–retest reliability and agreement between parents and professional examiners. CBCL: Not mentioned. EAS: In a Norwegian sample, internal consistency (α) within each scale ranged from 0.48 to 0.79 | The ASQ is predictive of school performance. The short version has been validated in Norway and in young children. CBCL: The shortened CBCL has been validated in Norway and in young children. Parents are better reporters of externalising symptoms, whereas children are better reporters of internalising symptoms. EAS has been validated in children as young as those studied |
| Markovic 2019 [ | NSAIDs | CBCL | Internal consistency (α) 0.68 | Good validity, validated in Dutch. The subscales had good fit in international studies in diverse societies |
| Brandlistuen 2013 [ | NSAIDs and paracetamol | Motor milestone questionnaire, ASQ, CBCL, EAS | Maternal reports of gross motor milestone attainment have been reported to be highly reliable. ASQ: Not mentioned. CBCL: Not mentioned. EAS: The short form was as reliable and precise as the full scale | Motor milestones: Not mentioned. ASQ: Validated in Norway. Average factor loading 0.61 for fine motor and 0.75 for gross motor items, adequate reliability. Average factor loading 0.82 for communication, good reliability. (Comment from review authors: According to COSMIN principles, factor loadings are considered part of construct validity). CBCL: Correlation to CBCL full scale was 0.92. Average factor loading 0.58 for externalising and 0.52 for internalising behaviour scales, adequate reliability. EAS: Factor loading for emotionality 0.71, activity 0.68, for sociability 0.58, for shyness 0.69 |
| Liew 2014 [ | Paracetamol | SDQ | Reliable screening tool | With the cut-off used, the scale has high specificity for ADHD-like behaviours and 17% of children with problems on the SDQ have received a diagnosis of HKD |
| Liew 2016b [ | Paracetamol | BRIEF | The authors do not specify the reliability or validity but refers readers interested in psychometric properties to two papers on the psychometrics of the instrument | |
| Skovlund 2017 [ | Paracetamol and opioids | Intelligibility/Complexity of 3-year-old Children’s Utterances & ASQ | Not mentioned | Parental report is considered a valid measure and validation against clinical assessment has been described for the instrument in the cohort |
| Wood 2016b [ | Triptans | CBCL | Not mentioned | The shortened CBCL has been validated in Norway. The domains used, have been shown to predict later psychopathology in children and adolescents |
| Wood 2016c [ | Triptans | ASQ, EAS | Not mentioned | The ASQ is predictive of school performance. This short version has been validated in Norway. EAS: Not mentioned |
| Harris 2018 [ | Triptans | ASQ, CBCL, EAS | Not mentioned in the article, but in the supplement internal consistency is reported quantitatively for each subscale | Not mentioned |
| Stergiakouli 2016 [ | Paracetamol | SDQ | SDQ is a validated and reliable screening instrument | |
| Tovo-Rodrigues 2018 [ | Paracetamol | SDQ | Not mentioned | Validated for a Brazilian population and for the studied age group |
| Thompson 2014 [ | Paracetamol and ASA | SDQ, CPRS:R-L (only for paracetamol) | SDQ has a test-retest reliability of 0.62 after 4 to 6 months. Internal consistencies of the subscales range from 0.62 to 0.75 | Self-reported problem behaviour has been shown to be a more valid indicator of mental and physical health than parent-reported problems |
| Ruisch 2018 [ | Paracetamol and ASA | Development and Well-Being Assessment | Inter-rater differences between maternal and teacher assessments could reflect different behaviours in different settings or bias in the assessment | Good validity |
| Avella-Garcia 2016 [ | Paracetamol | California Preschool Social Competence Scale, ADHD, DSM-IV form list | California Preschool Social Competence Scale: Cronbach’s α 0.89. ADHD, DSM-IV form list: Cronbach’s α 0.90 | Not mentioned |
| Liew 2016b [ | Paracetamol | BRIEF | The authors do not specify the reliability or validity but refers readers interested in psychometric properties to two papers on the psychometrics of the instrument | |
| Markovic 2019 [ | NSAIDs | CBCL | Not mentioned | Not mentioned |
| Thompson 2014 [ | Paracetamol and ASA | SDQ (children) | SDQ has a test-retest reliability of 0.62 after 4 to 6 months. Internal consistencies of the subscales range from 0.62 to 0.75 | Self-reported problem behaviour has been shown to be a more valid indicator of mental and physical health than parent-reported problems |
| Ruisch 2018 [ | Paracetamol and ASA | Development and Well-Being Assessment | Inter-rater differences between maternal and teacher assessments could reflect different behaviours in different settings or bias in the assessment | Good validity |
| Rubenstein 2019 [ | Opioids | Developmental delay and ASD | Not mentioned | All children in the study were screened for ASD and had general developmental evaluations by a clinician. Children with positive screenings had comprehensive ASD evaluations |
| Janecka 2018 [ | Triptans and paracetamol | ASD | Not mentioned | Not mentioned |
| Liew 2016a [ | Paracetamol | Infantile autism and ASD | Not mentioned | Diagnoses of ASD were ascertained from the general and psychiatric hospital registries in Denmark using standardised diagnostic criteria. Diagnoses of infantile autism in the psychiatric registry have previously been shown to have high validity |
| Liew 2014 [ | Paracetamol | HKD | Not mentioned | Children who received diagnoses solely prior to 5 years of age were not considered as having HKD due to higher diagnostic uncertainty at younger ages. 79% of all children diagnosed with HKD had redeemed medications at least twice |
| Liew 2019 [ | Paracetamol | ADHD | Diagnoses of ADHD were ascertained through maternal report. This method is reliable | In a validation study, all girls with maternal report of ADHD scored above 90% on ADHD Rating Scale-IV, as did 64% of boys. ADHD prevalence in the study was comparable to estimates by Centers for Disease Control and Prevention |
| Ystrom 2017 [ | Paracetamol | ADHD | Not mentioned | Children who received diagnosis before 3 years of age were excluded. Cases were children with one or more diagnoses from specialist health care |
*Structured interview of parents by the health care professional
† Mixture of parental questionnaires and nurse observation
ADHD: Attention Deficit Hyperactivity Disorder, ASA: Acetylsalicylic acid, ASD: Autism Spectrum Disorder, ASQ: Ages and Stages Questionnaire, BRIEF: Behaviour Rating Inventory of Executive Function, BSID: Bayley Scales of Infant Development, CAST: Childhood Autism Spectrum Test, CBCL: Child Behaviour Checklist, CPRS:R-L: Conners’ Parent Rating Scale, revised, long format, EAS: Emotionality, Activity, Sociability Temperament Survey, HKD: Hyperkinetic disorder, SDQ: Strengths and Difficulties Questionnaire, SON-R: Snijders–Oomen Niet-verbale intelligentie Test–Revisie, TEACh-5: Test of everyday attention, 5 years, WISC: Wechsler Intelligence Scale for Children, WPPSI: Wechsler Preschool and Primary Scale of Intelligence.
Fig 4Psychometric properties of the neurodevelopment outcome measure mentioned in medication safety papers on psychotropics and analgesics.
*Some papers commented on both validity and reliability, and those papers that commented on specific types of validity or reliability could comment on more than on type. One study used both diagnoses and psychometric instruments. Therefore the numbers add up to more than the total number of papers.
Validity and reliability of outcome measures as reported by the authors of the study, papers on psychotropics except antidepressants.
| Psychometric properties of outcome measure | ||||
|---|---|---|---|---|
| Reference | Exposure | Outcome measure | Reliability | Validity |
| Platt 1989 [ | Antipsychotics | BSID | Not mentioned | Measures obtained at different ages may differ in sensitivity. The categorical measure reported was not used in original examination, but derived by combining relevant motor items which showed some variance in study population, and dichotomising resulting scores to maximise drug effects |
| Peng 2013 [ | Antipsychotics | BSID-III | Not mentioned | The scale is widely used and has potential to provide clinically relevant information on early neurodevelopment |
| Johnson 2012 [ | Antipsychotics | Infant Neurological International Battery | Interrater reliability 0.97, test-retest 0.95 | Sensitivity 90%, specificity 83%, PPV 79%, NPV 93% |
| Mortensen 2003 [ | Antipsychotics, anxiolytics | Boel test | Not mentioned | Test was performed at children’s home, not under standard settings in which the test was developed. Surroundings could be associated with both exposure and outcome |
| Oberlander 2004 [ | Anxiolytics (clonazepam combined with SSRI) | BSID-II | Not mentioned | Not mentioned |
| Reebye 2002 [ | Anxiolytics (clonazepam combined with SSRI) | BSID-II | Not mentioned | Not mentioned |
| Reebye 2012 [ | Anxiolytics (clonazepam combined with SSRI) | BSID-II | Not mentioned | Not mentioned |
| Viggedal 1993 [ | Anxiolytics | Griffiths’ mental development scale I, Neuropsychological assessment | Griffiths’: Not mentioned. Neuropsychological assessment: Deviations from normal activity and attention was diagnosed when present at two independent observations | Griffiths’: The high IQ in the reference group is considered normal as the average IQ nowadays is about 110. Neuropsychological assessment: Not mentioned |
| Gidai 2008a [ | Anxiolytics (alprazolam) | Hungarian development test, Behavioural style questionnaire | Not mentioned | Not mentioned |
| Gidai 2008b [ | Anxiolytics (medazepam) | Hungarian development test, Behavioural style questionnaire | Not mentioned | Not mentioned |
| Gidai 2008c [ | Anxiolytics (chlordiazepoxide) | Hungarian development test, Behavioural style questionnaire | Not mentioned | Not mentioned |
| Timmermann 2008b [ | Anxiolytics (meprobamate) | Hungarian development test, Behavioural style questionnaire | Not mentioned | Not mentioned |
| Laegreid 1992 [ | Anxiolytics | Touwen Neurologic Assessment, Clinical neurologic assessment | Not mentioned | The test showed a fair differentiation and an evident developmental sequence |
| Petik 2008a [ | Hypnotics (glutethimide) | Hungarian development test, Behavioural style questionnaire | Not mentioned | Not mentioned |
| Petik 2008b [ | Hypnotics (Amobarbital) | Hungarian development test, Behavioural style questionnaire | Not mentioned | Not mentioned |
| Timmermann 2008a [ | Hypnotics (barbital, hexobarbital, butobarbital) | Hungarian development test, Behavioural style questionnaire | Not mentioned | Not mentioned |
| Hurault-Delarue 2016 [ | Antipsychotics, anxiolytics and hypnotics | Compulsory medical exam | Not mentioned | Not mentioned |
| Schechter 2017 [ | Antipsychotics, anxiolytics, hypnotics | DAS | Fidelity checks were done approximately every 6-months by a licensed clinical psychologist | The DAS is a standardised, age-normed, well-validated measure of cognitive ability |
| Hartz 1975 [ | Anxiolytics (meprobamate, chlordiazepoxide) | Stanford Binet Intelligence Scale | Not mentioned | Not mentioned |
| Mattson 2002 [ | Anxiolytics | WPPSI-R | Not mentioned | Not mentioned |
| Platt 1989 [ | Antipsychotics | Paediatric neurologic assessment | Not mentioned | Not mentioned |
| Mattson 2002 [ | Anxiolytics | WISC-III | Not mentioned | Not mentioned |
| Reebye 2002 [ | Anxiolytics (clonazepam combined with SSRI) | Early Infancy Temperament Questionnaire | Not mentioned | Not mentioned |
| Lupattelli 2019 [ | Anxiolytics, hypnotics | ASQ, CPRS-R | ASQ: Internal consistency was 0.6 to 0.7. CPRS-R: Internal consistency 0.9 | Widely used and validated. In the supplement, the authors present associations between ASQ and diagnosis of motor delay (Beta 1.96 to 4.48) or language impairment (Beta 2.06 to 2.45), and between CPRS-R and parental ADHD symptoms (Beta 0.12 to 0.30 for paternal symptoms and 0.41 to 0.76 for maternal symptoms) |
| Brandlistuen 2017 [ | Anxiolytics, hypnotics | CBCL (shortened) | Not mentioned | Validated, representative of full scale, the domains predict later psychopathology. High factor loadings |
| Misri 2006 [ | Anxiolytics (clonazepam combined with SSRI) | CBCL | Test-retest reliability r = 0.85, interrater reliability r = 0.61 | Not mentioned |
| Odsbu 2015 [ | Anxiolytics | Intelligibility/Complexity of 3-year-old Children’s Utterances | Not mentioned | Parental self-report is a good measure of early expressive vocabulary, especially for severe language delay. Validity of the language grammar rating scale in the cohort has been described previously |
| Radojčić 2017 [ | Anxiolytics | CBCL | Cronbach's alphas for all scales were the same in 6 year-old children and in older children, indicating that problems were reliably measured in children older than 6 years of age | Validated in the Netherlands |
| Misri 2006 [ | Anxiolytics (clonazepam combined with SSRI) | CBCL | Test-retest reliability r = 0.85, interrater reliability r = 0.61 | Not mentioned |
| Radojčić 2017 [ | Anxiolytics | CBCL | Cronbach's alphas for all scales were the same in 6 year-old children and in older children, indicating that problems were reliably measured in children older than 6 years of age | Validated in the Netherlands |
| Figueroa 2010 [ | Anxiolytics | ADHD | Not mentioned | ADHD identified from diagnoses or treatment from practice, not from any formalised test or direct observation. This could lead to false-positive and false-negative errors. The young age when ADHD was diagnosed represents a limitation. It is possible that only the most severe cases of ADHD were identified or that other behavioural problems that resemble ADHD are included |
| Janecka 2018 [ | Lithium | ASD | Not mentioned | Not mentioned |
ADHD: Attention Deficit Hyperactivity Disorder, ASD: Autism Spectrum Disorder, ASQ: Ages and Stages Questionnaire, BSID: Bayley Scales of Infant Development, CBCL: Child Behaviour Checklist, CPRS:R: Conners Parent Rating Scale, revised, DAS: Differential ability scales, NPV: Negative predictive value, PPV: Positive predictive value, WISC: Wechsler Intelligence Scale for Children, WPPSI: Wechsler Preschool and Primary Scale of Intelligence.