Literature DB >> 30525112

A systematic review and meta-analysis of effects of early life non-cognitive skills on academic, psychosocial, cognitive and health outcomes.

Lisa G Smithers^1,2, Alyssa C P Sawyer^1,2, Catherine R Chittleborough^1,2, Neil M Davies^3,4, George Davey Smith^3,4, John W Lynch^5,6,7.

Abstract

Success in school and the labour market relies on more than high intelligence. Associations between "non-cognitive" skills in childhood, such as attention, self-regulation, and perseverance, and later outcomes have been widely investigated. In a systematic review of this literature, we screened 9553 publications, reviewed 554 eligible publications, and interpreted results from 222 better quality publications. Better quality publications comprised randomised experimental and quasi-experimental studies (EQIs), and observational studies that made reasonable attempts to control confounding. For academic achievement outcomes there were 26 EQI publications but only 14 were available for meta-analysis with effects ranging from 0.16 to 0.37SD. However, within sub-domains effects were heterogeneous. The 95% prediction interval for literacy was consistent with negative, null and positive effects (-0.13 to 0.79). Similarly heterogeneous findings were observed for psychosocial, cognitive and language, and health outcomes. Funnel plots of EQIs and observational studies showed asymmetric distributions and potential for small study bias. There is some evidence that non-cognitive skills associate with improved outcomes. However, there is potential for small study and publication bias that may over-estimate true effects, and heterogeneity of effect estimates spanned negative, null and positive effects. The quality of evidence from EQIs under-pinning this field is lower than optimal and more than a third of observational studies made little or no attempt to control confounding. Interventions designed to develop children's non-cognitive skills could potentially improve outcomes. The inter-disciplinary researchers interested in these skills should take a more strategic and rigorous approach to determine which interventions are most effective.

Entities: Chemical Disease Gene Species

Year: 2018 PMID： 30525112 PMCID： PMC6277013 DOI： 10.1038/s41562-018-0461-x

Source DB: PubMed Journal: Nat Hum Behav ISSN： 2397-3374

Introduction

It is over forty years since economists Bowles and Gintis1, in their critique of the US education system, pointed to the importance of skills for labour market success beyond those captured by intelligence, abstract reasoning and academic achievement in literacy and numeracy. They used the term “non-cognitive personality traits” (p. 116) and pointed to motivation, orientation to authority, internalization of work norms, discipline, temperament, and perseverance as characteristics that influenced life success. While it may be intuitive that there is more to success in life than high intelligence, there has been no attempt to systematically assess the research evidence on effects of improving different types of non-cognitive skills. We recognize there is no neat conceptual dichotomy separating cognitive from some non-cognitive skills, but for the purposes of this review we collectively label the diverse set of factors represented in the literature as “non-cognitive” skills. This literature includes studies that either manipulated non-cognitive skills through randomised controlled trials (RCTs) and quasi-experimental designs, or used observed differences in non-cognitive skills through longitudinal or cross-sectional studies. In observational (correlational) data, results from comparing outcomes for higher and lower levels of non-cognitive skills is often used as evidence for their importance in the same way as results from experimental studies. These non-cognitive skills include attention, executive function, inhibitory control, self-control, self-regulation, effortful control, emotion regulation, delay of gratification, and temperament (see Table 1 in Methods for our conceptualizations of these constructs). The importance of social skills for labour market success has been demonstrated2 but this review does not directly include improving social skills in early life as a non-cognitive ability, although the range of psychosocial outcomes includes social skills constructs. We sought to provide a systematic representation of research into non-cognitive abilities and behaviours. The need for such a systematic review is driven by the fact that these abilities are being considered by policy makers to underpin early life interventions3, beyond cognitive abilities (intelligence or IQ) and academic achievement (literacy, numeracy).

Table 1

Distribution of publications (n=554) by outcome domain, study type and quality*

		Outcome Domains
	Number of publications (%)	Academic achievement	Psycho-social	Cognitive and language	Physical health
1. ‘Better’ evidence	222/554(40%)
RCTs	41/222(18%)	22	27	18	2
Quasi experimental interventions	8/222(4%)	4	5	5	1
Twin studies (longitudinal or cross-sectional)	12/222(5%)	4	5	6	0
Observational longitudinal	127/222(57%)	58	52	14	23
Observational cross-sectional	34/222(15%)	14	19	9	5
2. ‘Weak’ evidence	119/554(21%)
Observational longitudinal	73/119(61%)	16	49	5	13
Observational cross-sectional	46/119(39%)	20	28	1	3
3. ‘Poor’ evidence	213/554(38%)
RCTs	1/213(<1%)	0	0	1	0
Observational longitudinal	79/213(37%)	25	46	6	15
Observational cross-sectional	123/213(62%)	29	80	28	16

Individual publications generated multiple outcomes. For example, there were 222 publications considered as ‘Better’ evidence that examined 293 outcomes.

The policy motivation to improve early life non-cognitive skills

This body of research spans disciplines including psychology, sociology, economics, health and education. It is also of great policy interest to governments in many countries3,4, who wish to sustain future economic productivity and social inclusion, by investing significant resources to bolster the development of human capabilities in early life5, especially for disadvantaged children. The investment logic is that children who develop cognitive and non-cognitive skills early in life have better outcomes later in life. The policy outcomes of most interest are longer-term including labour market success, welfare dependency, social relationships, better mental and physical health that ultimately lead to a more skilled, healthy and productive workforce. However, data on effects of early life cognitive skills on these kinds of later life outcomes are very limited. These generative processes are thought to involve initial investments begetting skills that enable future skills, given sustained investments. Non-cognitive skills, such as being able to sustain attention, may be especially important in this regard because they can scaffold later development of cognitive and non-cognitive abilities. It is argued that if these skills are not developed early in life, then it can be extremely difficult and expensive to compensate later in life, and this reduces returns on later investments6.

Diversity of non-cognitive skills

Since 2000, there has been a 400% increase in publications using keywords describing a variety of non-cognitive skills (Supplementary Figure 1). Several constructs comprise the set of non-cognitive skills reflected in this literature, including academic motivation7, responsibility and persistence8, temperament, sociability and behaviour problems9, locus of control and self-esteem10, and attention and socio-emotional skills11. Executive functions12 or cognitive control skills (e.g., aspects of how children deploy their cognitive abilities through inhibitory control and attention) may be closely related to cognitive skills, but are also distinguished from IQ, literacy and numeracy13. Personality traits such as self-esteem, patterns of thoughts, feelings, and behaviours that include perseverance, motivation, self-control, and conscientiousness have also been considered as non-cognitive or quasi-cognitive characteristics14. The term “character skills”15 has been used to promote the potential malleability of non-cognitive skills in contrast to the notion of personality traits that are thought to be more stable. Heckman and Kautz label these as “soft skills”7. Despite the conceptual complexity and potential overlap of some constructs, many different non-cognitive or personality or character or soft skills are represented in the literature. They have been the target of interventions, especially in early life when these traits are thought to be especially malleable16, and for disadvantaged children, who may benefit most6. Interventions to improve non-cognitive skills may directly improve outcomes7,15, or indirectly, through cognitive ability or other mechanisms. For instance, our own longitudinal analyses in three large population-based cohorts in the UK and Australia showed both cognitive abilities and non-cognitive skills were important in explaining socioeconomic inequalities in academic achievement early in life and that non-cognitive skills were only weakly associated with cognitive ability17.

Examples of the evidence for effects of early life non-cognitive skills

Non-cognitive abilities have been associated with a number of shorter and longer-term outcomes including mental health18,19, physical health20, school readiness and academic achievement21,22, crime23, employment and income10, and mortality24. Evidence from RCTs suggests that preschool interventions that improve school readiness may do so in part by increasing children’s ability to self-regulate their attention, emotion and behaviour25. Heckman has argued that interventions to develop these skills, especially in disadvantaged young children have the potential for high rates of return due to their positive effects in multiple life domains6. It is widely accepted that children’s cognitive ability (i.e., intelligence or IQ) associates with academic achievement and later success in adulthood26–29. However, the HighScope Perry Preschool Program started in 1962 suggests other mechanisms may be involved30,31. The intervention provided an active learning program based on Piagetian principles, for disadvantaged 3.5 year old African American children who had IQ scores on the Stanford Binet Test < 8532. In analysing the long term outcomes of the trial, Heckman et al.31, reported that while initially the intervention increased IQ these increases were not maintained to age 7-8 years. Despite this, children who received the intervention went on to enjoy more successful lives in adulthood including greater labour market success, reduced crime involvement and better health30,33,34. While we can find no evidence that the Perry Preschool Program deliberately set out to influence non-cognitive abilities, Heckman and colleagues argued that the intervention resulted in better outcomes for the participants not as a result of increasing their intelligence, but through fostering non-intelligence based socio-emotional “personality” skills31 (p. 2503). It should be noted that the program also improved maths, reading and language through age 14 and adult literacy so there may be an array of mechanisms operating through non-cognitive processes as well as IQ and/or aspects of academic achievement. Nevertheless, the argument proposed as to why the Perry Preschool program ‘worked’ is not dissimilar to the observations of Bowles and Gintis1 forty years ago. They argued that schooling does not make children more intelligent, rather, it socializes them into, and rewards, certain characteristics and behaviours that are valued in the labour market. The aim of this review was to systematically assess all published evidence concerning effects of non-cognitive skills among children up to age 12 on later outcomes. We do not review intervention studies that did not specifically aim to improve non-cognitive skills. Thus, some interventions such as the Perry Preschool30 and Abecedarian35 programs are not formally reviewed here because we could find no documented evidence that these programs specifically set out to improve non-cognitive abilities, and so were not eligible. We screened eligible publications and report results on associations between non-cognitive skills up to age 12. We grouped publications into four outcome domains - academic achievement (including literacy, numeracy and school readiness), cognitive and language development (including intelligence and language), psychosocial well-being (including mental health problems such as internalising and externalising problems, hyperactivity, social skills, and classroom behaviour), and health (including anthropometry and injury). In this manuscript we only report results from those publications we judged to be “better” evidence derived from RCTs and quasi-experimental studies grouped as experimental and quasi-experimental intervention studies (EQIs), and observational studies that made reasonable attempts to control for confounding (endogeneity) bias. However, all eligible publications were fully reviewed and for completeness are presented in Supplementary Tables 7 and 8.

Results

The systematic search identified 9553 articles from electronic and hand-searched sources. After removing duplicates and assessing eligibility, 554 articles were included and presented in a PRISMA36 flowchart (Figure 1). There were 49 (9%) publications involving RCTs and non-randomised quasi-experimental interventions that reported 85 outcomes, 69% of which were in the academic achievement and psychosocial outcome domains. (Table 1). Below we report this group of studies as Experimental/Quasi-experimental studies (EQIs). Observational studies (including twin studies) accounted for the other 91% of all publications, also dominated by publications in the academic achievement and psychosocial outcome domains. Individual studies and publications may have reported multiple outcomes across the domains.

Figure 1

Flow of publications through different stages of the systematic review

Table 1 shows that of the 554 eligible studies, only 40% (n=222) were rated as “better” evidence, 21.5% classified as weak and 38.5% as poor, where there was effectively no attempt to control confounding. The better evidence category does not imply that all of the publications in this category would be considered “strong” evidence in terms of their design and analysis. For example, some of the EQIs included in better evidence did not receive high quality ratings according to the Risk of Bias tool (Supplementary Table 6). We extracted and reported results separately for EQIs and observational publications included in the 222 better quality evidence publications (Supplementary Tables 2-5). This information is summarised in Figure 2 and Supplementary Figures 2a-19b, and 24-31, which display all studies where an effect size and standard error could be calculated.

Figure 2

Effect sizes from studies presenting “better quality” evidence according to outcome.

a, Experimental and quasi-experimental studies. b, Observational studies. NE, not estimable; Effect sizes were calculated from random effects meta-analysis with inverse variance weighting.

Academic Achievement Outcomes

Academic achievement outcomes mostly comprised reading, writing and numeracy, and were most commonly measured by the Woodcock Johnson (WJ) psycho-educational battery. For EQIs, Figure 2 (panel A) shows effect sizes ranged from 0.16SD (95%CI -0.02 to 0.34) for academic achievement and school readiness to 0.37 (95%CI 0.16 to 0.57) for numeracy. The 95% prediction interval for the 11 literacy studies available for meta-analysis was consistent with negative, null and positive effects (-0.13 to 0.79). For observational studies, Figure 2 (panel B) shows effect sizes ranged from 0.16 (95%CI 0.12 to 0.20) for literacy and 0.22 (95%CI 0.14 to 0.31) for academic achievement and school readiness. Prediction intervals were consistent with negative, null and positive effects, ranging from -0.01 to 0.33 for literacy and -0.07 to 0.52 for school readiness. Details of these publications are presented in Supplementary Table 2. Meta-analysis and forest plots are presented in Supplementary Figures 2a-4b. Supplementary Figures 24-25 graph effect size, age and length of follow up.

EQIs

There were 26 publications reporting ten cluster (school or class) RCTs, eleven individual RCTs, one study where the unit of randomisation was unclear, and four quasi-experimental intervention studies. These EQIs involved interventions delivered in usual preschool classes, special classes and groups additional to usual curriculum, at home, or a combination of these. Interventions ranged from training specific abilities (e.g. executive functions) to interventions that included several components. The interventions included teacher-delivered curriculum, teacher training to improve classroom behavioural management, and training parents in game-based activities. There was about twice as many EQI publications concerning teacher-delivered curricula than EQIs including both parent and teacher components. Median age at the time of intervention was 4.5 y. The median follow-up time was under 1 year. The oldest age at follow up was 20 y, from an intervention conducted in 1962, but no effect sizes were reported. The four largest cluster RCTs for literacy and numeracy ranged in effect sizes from 0.09 to 0.49SD (Supplementary Figures 2a-4b). The individually randomised trials were generally smaller and demonstrated effect sizes up to 0.81SD but were more heterogeneous with a 95% prediction interval for literacy consistent with negative and positive effects ranging from -0.91 to 1.79.

Observational

There were four publications of twin studies, 58 longitudinal (including four fixed effects analysis) and 14 cross-sectional publications, with three publications reporting results from multiple cohort studies. Non-cognitive abilities were measured at median age 5.0 y and median follow up of 1.5 years. The oldest age at follow up was 16 y. Study sizes ranged from 41 to 21,260. The measures of non-cognitive abilities included attention, executive function, inhibitory control, self-control, self-regulation and effortful control assessed via teacher-report, parent report and objective tests such as the Continuous Performance Task, Head Toes Knees Shoulders (HTKS) task and Stroop-like tasks. Effect sizes across observational publications were generally smaller than EQIs. Supplementary Figures 2a-4b show effect sizes ranging from negative effects (-0.57SD), to null, to 0.77SD for numeracy and similarly for literacy up to 0.80SD. However, 95% prediction intervals were generally narrower than for EQIs (e.g. -0.04 to 0.37 for numeracy). There was little evidence to conclude that any one measurement tool, measurement method (objective or subjective) or underlying non-cognitive construct was consistently associated with academic achievement.

Psychosocial Outcomes

Psychosocial outcomes included mental health problems (internalising and externalising behaviour), social skills, and aspects of school readiness, such as learning engagement. For EQIs, Figure 2 (panel A) shows effect sizes ranged from 0.23SD (95%CI 0.15 to 0.30) for externalising behaviour to 0.46SD (95%CI 0.31 to 0.61) for social skills. For observational studies, Figure 2 (panel B) shows effect sizes ranged from 0.13SD (95%CI 0.07 to 0.18) for social skills to 0.21SD (95%CI 0.15 to 0.28) for externalizing behaviour. The 95% prediction interval for all psychosocial outcomes were consistent with negative, null and positive effects. For example, the 95% prediction interval for externalising behaviour was -0.08 to 0.51. Details of these publications are presented in Supplementary Table 3. Meta-analysis and forest plots are presented in Supplementary Figures 5a-9b. Supplementary Figures 26-27 graph effect size, age and length of follow-up. Studies were not consistent in scoring of psychosocial outcomes, i.e. higher scores could indicate worse or better functioning. To aid reader’s interpretation of the results, we have converted all effects to be in the same direction so that positive effects indicate better psychosocial outcomes. However, Supplementary Table 3 presents the results as originally reported in individual publications. There were 32 publications reporting 15 cluster RCTs in classrooms, 12 individual RCTs, and five quasi-experimental intervention studies where the intervention was delivered in schools, sports classes, at home, or in community-based settings. Content of the interventions was diverse and included teacher-delivered curriculum sometimes specifically targeting self-regulatory abilities, parent-teacher engagement, teacher training to improve classroom behaviour, training parents in game-based activities, parental Motivational Interviewing and behaviour management, and martial arts. Median age at the time of intervention was 4.5 years with median follow up time less than one year. The oldest age at follow up was 13.5 y from a non-randomised intervention. Intervention groups ranged in size from n=16 to 314 for the individually-randomised trials and n=20 to ~3,350 for cluster RCTs (the largest RCT did not report the exact intervention number). For externalising outcomes, the 95% prediction interval for cluster RCTs was 0.10 to 0.37SD and -0.15 to 0.61SD for individual RCTs. Across RCTs there was no consistent evidence favouring one mode of intervention delivery over another. The three largest cluster RCTs that trialled well-known interventions (PATHS, ParentCorps, Incredible Years) and had both a teacher and parent engagement component37–39, only reported effects where p≤0.05 for three of the eleven outcomes studied. There were five publications of twin studies, 52 longitudinal and 19 cross-sectional publications. The five reasonably-sized twin studies that combined MZ and DZ twins (n ranged from 209-410 pairs) of children aged ~2-8 reported phenotypic correlations between non-cognitive abilities and internalising problems of 0 to -0.3, and -0.1 to -0.6 for fewer externalising problems. The longitudinal studies ranged in size from 49 to 12,158, and cross-sectional studies ranged from 42 to 2,978. Non-cognitive skills were measured at median age 5.0 years with median follow up of 8.2 years. The oldest age at follow up was 19.5 years. Exposures included attention, executive function, inhibitory control, self-regulation, emotion regulation, delay of gratification, effortful control, impulsivity, self-control and temperament, and were assessed by teacher-report, parent report and objective tests. Supplementary Figures 5a to 9b show effects from observational studies consistent with ~0.1 to 0.2SD but all 95% prediction intervals included the null. Observational studies of psychosocial outcomes were the most heterogeneous in terms of measuring exposures and outcomes, complicating interpretations of overall effect estimates. There was little evidence that attention, executive function and delay of gratification affected psychosocial outcomes. For inhibitory control, self-regulation, emotional regulation, impulsivity, self-control and temperament, there was some evidence of effects (0.1 to 0.7SD) on social skills and mental health problems. For effortful control, evidence was mixed, ranging from null to 0.85SD on externalizing behaviour.

Cognitive & Language Outcomes

Cognitive and language outcomes were typically assessed by measures of overall intelligence (such as the Wechsler suite of intelligence tests), verbal and performance intelligence, and language development including expressive and receptive vocabulary (such as the Peabody Picture Vocabulary Test). For EQIs, Figure 2 (panel A) shows the effect sizes ranged from 0.27SD (95%CI 0.01 to 0.53) for expressive vocabulary to 0.56SD (95%CI 0.14 to 0.99) for general cognitive development. No 95% prediction intervals could be calculated as there were fewer than three studies in each subdomain. For observational studies, Figure 2 (panel B) shows effect sizes ranged from 0.08SD (95%CI -0.01 to 0.17) for general cognitive development to 0. 20SD (95%CI 0.11 to 0.30) for total IQ. The 95% prediction interval could only be calculated for receptive vocabulary (-0.17 to 0.50) and general language skills (-0.12 to 0.33). Details of these publications are presented in Supplementary Table 4. Meta-analysis and forest plots are presented in Supplementary Figures 10a-16b. Supplementary Figures 28-29 graph effect size, age and length of follow up. There were 23 publications reporting 18 RCTs (two interventions were reported in six publications) and five quasi-experimental intervention studies. Of the RCTs, six were cluster (school or class) RCTs, one where the unit of randomisation was unclear, and eleven individual RCTs, involving programs delivered in schools or classrooms, at home, in a laboratory setting or a combination of classes and home. Three quasi-experimental interventions involved preschool programs and two involved computerised working memory and inhibitory control training. The content of the interventions was diverse in both delivery and specific focus on non-cognitive ability. Interventions ranged from narrow focused computer-based training to broader content and delivery by teachers in schools plus home visiting with parents. Median age at intervention was 4.3 years, with median follow up of less than one year, extending to 16 years. One RCT inconsistently reported effects of 0.15 and 0.25SD on the same language outcome, using the same sample at age five40,41 and an effect of 0.10SD at age 6 in a different publication42. There were six publications of twin studies, 14 longitudinal (including one fixed effect) and nine cross-sectional publications. The six twin studies that combined MZ and DZ twins (n ranged from 40-901 pairs) of children reported phenotypic correlations between non-cognitive abilities and intelligence of -0.36 to 0.23SD. The longitudinal and cross-sectional publications ranged in effect size from -0.38 (a cross-sectional convenience sample n=77 examining attention) to 0.56SD (a cross-sectional convenience-sample n=80 examining executive function). Exposure was measured at median age 4.5 years. The median duration of follow up for the longitudinal studies was less than one year and the longest follow up was to 12.4 years. Exposures included attention, executive function, self-regulation, effortful control, inhibitory control and temperament assessed via parent- and teacher-report questionnaires such as the Child Behavior Questionnaire, and objective tests such as the Continuous Performance Task and the HTKS task. There was no compelling evidence of effects of attention on cognitive and language outcomes from observational studies. For executive function effects ranged from a detrimental -0.36 to 0.52SD, but the evidence is predominantly from convenience samples. There were too few studies to make any judgments about the effects of effortful control and temperament. Most studies of self-regulation used the HTKS task and showed some effects on vocabulary.

Health Outcomes

There were two small RCTs, one quasi-experimental intervention, 23 longitudinal and five cross-sectional publications that ranged in size from 105 to >26,000. Details of these publications are presented in Supplementary Table 5. Meta-analysis and forest plots are presented in Supplementary Figures 17a-19b. Outcomes included anthropometry, injury, diet, substance use and health behaviours. There were three publications reporting one cluster RCT, one individual RCT and one quasi-experimental intervention study assessing effects on physical development, teen parenthood, and anthropometry. One quasi-experimental study reported an effect of 0.79SD, but this effect is difficult to interpret because of an inadequate description of the control group and the outcome. Median age at intervention was 4.4 years. Median follow up time was less than one year, with the oldest age at follow up of 17 years. Of the observational studies, the median age at exposure was 9.3 years. The median follow up time was 4.2 years and the oldest age at follow up was 55 years. Of the 28 observational studies, 12 involved various outcomes related to substance use but it is difficult to summarise these because studies either did not report effect sizes, or reported unstandardised effects or odds ratios. Observational studies showed little evidence for associations with any of these outcomes.

Assessment of Small Study (Publication) Bias

The funnel plots in Supplementary Figures 20a-23b depict effect sizes for experimental and observational studies separately, according to the standard error of the effect size. These include all publications where effect sizes were reported or able to be calculated, and reported exact p values or P<0.0543. Thus, all studies that reported P greater than some threshold were excluded. Funnel plots for both experimental and observational studies were positively skewed and consistent with smaller studies having larger effects. Egger regression coefficient p values were all <0.01. There was little evidence for differential small study bias comparing EQIs and observational studies.

Fade Out

Supplementary Figure 32 attempted to examine fade out effects44 by graphing reported effects at the end of intervention (or as close to end line as was reported) and at later follow-up. There were only four studies that could be included in this analysis, so interpretive caution is warranted with no clear pattern to support evidence of fade out effects.

Discussion

We reviewed 554 publications and provided interpretation of 222 (40%) better quality publications comprising RCTs, quasi-experimental (EQIs), fixed effects (including twin studies), longitudinal and some cross-sectional designs (observational studies). We set out to systematically examine the published literature on effects of non-cognitive skills up to age 12 on outcomes as they have been presented in the literature. We put no time limit on when outcomes were measured and we grouped them in domains of academic achievement, psychosocial, cognitive and language, and health. This review can say little about longer-term effects that are of central policy interest such as effects of non-cognitive skills on labour market experience because studies eligible for this review do not have data on longer-term outcomes or do not report it. Nor can this review say anything about the importance of non-cognitive skills on later outcomes that are developed as part of normal social interaction and/or the hidden curriculum of more general interventions where children indirectly develop a variety of non-cognitive skills and behavioural styles. We were limited to reporting what might be termed ‘proxy’ or ‘intermediate’ outcomes. While outcomes like academic achievement are clearly related to employment and labour market experience, this review cannot directly inform the role of non-cognitive ability on important outcomes later in life. Despite the policy enthusiasm and discussion of the importance of non-cognitive skills, the current body of evidence is severely limited given median follow-up periods for EQIs of only about one year. We must search elsewhere for evidence on longer-term outcomes because it is precisely in the realm of the labour market that non-cognitive skills may be most beneficial and rewarded. Overall, there is evidence from published EQIs supporting a role for non-cognitive skills in better academic achievement, psychosocial, and cognitive and language outcomes ranging from approximately 0.2 to 0.5SD depending on outcome as shown in Figure 2 Panel A.. We urge some caution in interpreting our results. Analysis of funnel plots clearly demonstrate asymmetry of effect size and the potential of small study bias43. Additionally, forest plots and 95% prediction intervals show large heterogeneity of reported effect sizes generally including the null. This suggests the overall meta-analysed effects from EQIs reported here may be over-estimates that include a null effect. Presenting the analysis in Figure 2 by separating EQIs (Panel A) and observational publications (Panel B) shows larger effects from EQIs than found in higher quality observational studies which ranged from approximately 0.06 to 0.22SD. This is the opposite of what is often seen where observational studies over-estimate effects found in large, well designed RCTs. This over-estimation is often due to residual and/or unmeasured confounding introduced by using observations of exposures rather than experimental manipulation of exposures45. Furthermore, (as pointed out by a reviewer) effect sizes from EQIs and observational studies would only be comparable if the EQI induced a SD change in the particular non-cognitive skill. In reality, effects of interventions on the target non-cognitive skill might be closer to 0.2 to 0.5 SD. So at 0.25 and with no bias, effects found in observational data would be expected to be four times larger than experimental impacts. Franco et al.46 found that among rigorously reviewed social science publications in the Time-Sharing in the Social Sciences National Sciences Foundation database that “strong” results were 40 percentage points more likely to be published than null results and 60 percentage points more likely to be written up. They argued this provided direct evidence of publication bias when researchers choose which results should be written up and presented for publication. It is possible that the published EQIs favour stronger statistically significant results if these are selected by researchers based on p-values. If the published EQIs are dominated by smaller studies with lower power, the overall EQI evidence may provide inflated meta-analysed effect estimates. However, we found little evidence of differences in potential small study and publication bias between EQIs and observational studies. Nevertheless, in academic achievement and psychosocial outcome domains, larger sample, cluster RCTs tended to generate smaller effects than individually randomized small RCTs. A recent meta-analysis of observational studies of over 14,000 children47 showed a mean effect size of 0.27 for inhibitory control on academic achievement. However, this meta-analysis did not exclude poor quality studies, and did not explore potential for small study bias. We deliberately selected higher quality observational studies with more stringent controls for confounding, so it is possible that true effects of non-cognitive skills are actually closer to those from higher quality observational studies that may include a null effect.

Main findings

Academic achievement outcomes

Intervention studies focussed on improving children’s non-cognitive skills at around 4 years of age with median follow-up under one year. These studies were generally consistent with 0.2 to 0.4SD short-term effects on academic achievement but effects were heterogeneous with 95% prediction intervals including negative, null and positive effects. Larger, higher quality RCTs showed effects from 0 to 0.3SD25,48–50. These larger higher-quality RCTs spanned child-focussed interventions on specific domains of non-cognitive skills (e.g. Tools of the Mind), to more teacher-focussed curricula (e.g. Chicago School Readiness), to more multi-dimensional content interventions that included parent, child and teacher (e.g. PATHS). Observational studies on academic achievement generally showed effects around 0.2SD but all 95% prediction intervals included the null. This is consistent with one higher quality observational publication11 which examined six different cohorts with longer follow-up of 5.5 years and reported effects from 0 to 0.2SD. Overall, there was insufficient evidence upon which to base a conclusion about the relative effectiveness of different modes and mechanisms of intervention on non-cognitive skills. Even within the same study, effect sizes differed according to which aspects of academic achievement were measured. For example, one RCT showed an effect on numeracy but not literacy49. Similarly, another RCT showed that effects on literacy depended on the component of literacy that was measured40,41 and effects on some outcomes faded after one year42.

Psychosocial outcomes

For psychosocial outcomes, the evidence from RCTs was dominated by studies of externalising problems, with fewer RCTs on social skills and internalising problems. Average age at the time of intervention was around 4 years with median follow up time under one year. Effects on externalising problems for EQIs was 0.23 (95% CI 0.15-0.30) with a 95% prediction interval of 0.07 to 0.39. Higher quality RCTs examining externalising outcomes, reported positive5152 and null effects in the largest of the RCTs37. These variable effects could be due to differences in the focus of intervention, mode of delivery (parent, teacher or both), or problems with implementation fidelity in larger trials. Similarly inconsistent results were reported for EQIs with social skills outcomes. The heterogeneity of effects is mirrored in the twin, longitudinal and cross-sectional studies. A good example of this is the inconsistent results reported in five publications that all used the same data source53–57. Across these five publications, interpretation of the effects of self-regulatory abilities depended on how the exposure (attention, delayed gratification, and inhibitory control) and outcome (social skills, withdrawal, and aggression) was measured. The different measures of attention had different associations with the same social skills outcome. Inhibitory control was associated with social skills and aggression, but not social withdrawal, whereas the effects of delayed gratification on social skills depended on whether the outcomes were directly observed or from maternal report. The psychosocial outcome studies were the most diverse in interventions (ranging from martial arts, to Motivational Interviewing and Tools of the Mind), and exposure and outcome measurement. This diversity reflected different approaches to improving children’s psychosocial outcomes, such as supporting parents or helping teachers to manage classroom behaviour. Each approach points to different conceptualisations of where psychosocial problems arise and for how, where and whom to intervene (e.g. teachers, psychologists, community nurses or social workers).

Cognitive and language outcomes

The relatively small number of studies in this outcome domain (n=23) produced a wide range of effects. Three reasonably-sized cluster RCTs provided the best estimate of the effect of non-cognitive skills on language and cognitive outcomes25,49. They found effects of ~0.1 to 0.2SD. The largest effect sizes were from a well-designed regression discontinuity study (0.44SD)58, a non-randomised intervention (0.55-0.73SD)59 and a small, low quality randomised trial60. However, all these studies were small (ranging from n=12 to 64) and reported effects that attenuated over time or were inconsistent at different ages. The observational studies provide little evidence that the effects are likely to be bigger than ~0.1SD, with seven of nine longitudinal studies showing few differences and cross-sectional studies reporting mixed effects (-0.38 to 0.56SD). The longitudinal studies were dominated by non-cognitive skills measured using the HTKS and the WJ Picture Vocabulary as the outcome, and despite the popularity of these measurement tools, the results indicate no effects on vocabulary outcomes. Thus, non-cognitive abilities appear to have effects on cognitive and language outcomes of ≤0.2SD.

Physical health outcomes

It is difficult to draw conclusions for physical health outcomes. There were only 3 EQIs reporting diverse outcomes. Outcomes reported across the 28 better quality observational studies were diverse ranging from anthropometry, to injury to physiological characteristics and were consistent with effects ranging from 0.06 to 0.14SD.

Limitations of this review

The compilation of 554 publications was systematic but our assessment of the quality of the evidence is based on our judgement of the potential for bias. Here we follow the approach of others who have argued for limiting systematic reviews and meta-analyses to higher quality evidence61,62. We a priori created criteria for bias based on well-established procedures including quality appraisal tools, evidence hierarchies, directed acyclic graphs and content knowledge about potential sources of confounding and selection bias. While this involves an element of subjective judgement, we are confident that any other reasonable assessment of the quality of evidence would not change the overall conclusions presented here. In the interests of transparency we have disclosed all of the subjective choices we have made in the Supplementary materials and text. It is possible that some relevant articles were not included in this review, even though we undertook an extensive search that included multiple databases, numerous search terms, contacting authors of potentially-relevant papers, and hand searching reference lists of published papers. Studies of systematic review methods have shown that the most difficult to find articles are in the ‘grey literature’, sometimes smaller, of poorer quality and the results unlikely to unduly influence the findings in an already large systematic review62.

The value of a systematic review

While there have been reviews of some aspects of non-cognitive skills,3,4,14,15,47,63 none have been systematic in covering the entire literature, or included screening for evidence quality. It has long been recognised in health and medical research that non-systematic reviews of research enable the selective use of evidence to support a particular argument62. For evidence consumers, who are often not evidence-quality specialists, competing claims about effects of non-cognitive abilities based on particular studies are hard to reconcile without the safety net of a systematic review. We have paid particular attention in this review to issues of quality of the primary evidence. There is little point in summarising evidence that includes obviously flawed studies that can only distort the overall results and reduce the value of the systematic nature of the review61,62,64. This review covers the entire published inter-disciplinary research field describing intentional efforts and observational analogues of interventions to improve development of non-cognitive skills, albeit with most evidence coming from rich countries, especially the USA. The scope of the review should minimise ‘cherry picking’ of results to bolster a particular concept, theory or intervention. This is necessary to advance knowledge given the multidisciplinary nature of this field and is central to informing interventions to boost life chances for disadvantaged children. In health sciences, major advances have been made by coming to agreement and attempting, where possible, to harmonise methods for measuring exposures, outcomes, synthesising and reporting of outcomes. This work includes collaborative efforts such as the EQUATOR network (http://www.equator-network.org/). Such efforts are needed to reduce waste in research64,65, and improve reproducibility of scientific findings66–68.

Implications for future research

What are the active ingredients of non-cognitive skills?

Research that has examined non-cognitive skills in childhood has spanned many disciplines and research traditions, leading to a large number of constructs and tasks being investigated that are sometimes similar in their definition and operationalisation69,70. In 1927, Kelley labelled this the “jangle” fallacy71 (p. 64) where constructs are given different names but in fact are virtually identical. This idea has been recently raised in regard to the construct validity of the concept of “grit”72. It was not uncommon for the same objective tasks to be used as measures of different conceptualisations of non-cognitive abilities. For example the Continuous Performance Task (aka “Go/No Go” task) is used in executive functioning research as a measure of sustained attention and inhibitory control, but it is also used as a measure of effortful control70. Similarly, the “Head Toes Knees Shoulders Task” has been used to measure both behavioural self-regulation73 and executive functioning74. The interventions we reviewed attempted to influence many different facets of non-cognitive skills. Policy makers and researchers ideally need to know what the ‘active ingredients’ are, in order to enhance children’s non-cognitive skills, and ultimately, the relative effectiveness of different interventions and different intervention doses. There are no strong scientific reasons to favour a specific skill over another, but nevertheless it remains important to better understand what the active ingredient(s) underlying non-cognitive skills might be, if we want to support their development.

Mechanisms of action

Theoretically, we might expect that interventions involving both parents and teachers might have larger effects on children’s outcomes. However, a recent meta-analysis of early childhood education programs found little evidence that those with parenting involvement produced larger effects, unless it involved a high dose of home visits75. Of the academic outcomes reviewed here, over half involved only preschool teachers. In our review, there is little evidence to decide which mode of delivery is best and we can find no evidence of attempts at purposive testing of which way to intervene (e.g. teacher, student, parent or various combinations). Purposive testing of delivery mode has been usefully deployed in the design of an RCT in regard to the nurse home visiting literature showing better effects using trained nurses compared to para-professionals76. Interventions that trained children in more specific skills such as executive function, generally showed small effects (e.g., Tools of the Mind)49. Other studies imply that non-specific interventions seem to generate better generalised outcomes31, which may suggest that more holistic programs including multi-dimensional content, may better support overall child development and broad-based benefits.

Head-to-Head Comparisons of Interventions

Comparative effectiveness research has been widely promoted in health and medical science as an important contribution to knowing which treatments are the most effective77,78. The potential for interventions on non-cognitive skills to influence outcomes may be enhanced by similar approaches. We could find almost no evidence of these sorts of purposeful comparative studies in this field. Exceptions were: Barnett et al.49 and Blair and Raver79 who examined effects of Tools of the Mind intervention in cluster RCTs and both found small effects of ~0.1SD for vocabulary. This exception highlights the potential value of these comparisons.

Designing studies for effect modification

In assessing the potential importance of non-cognitive skills for improving life chances, it is obvious that a combination of both high cognitive and non-cognitive ability would be desirable. If that expectation is correct then the effects of interest lie in a test of effect measure modification or interaction depending on what effect is of interest80. We found no publications attempting to test this theory, despite its obvious importance for judging how non-cognitive skills might influence later life outcomes. It is also of interest to test for differential effects of developing non-cognitive skills according to different characteristics of children such as age or socioeconomic background, and of intervention type and setting. However, we urge some caution in investigating differential effects in sub-groups when the basic evidence for effects of non-cognitive skills on outcomes such as academic achievement and psychosocial outcomes is already highly heterogeneous and consistent with null effects.

Long-term follow-up

There is a paucity of literature with long-term follow-up. Studies typically began at age 4-5, with median follow-up of about one year, and with very few studies with follow-up beyond age 10, there is very little evidence addressing effects on medium to longer-term outcomes. This is no doubt due to funding constraints. However, it is frequently argued that non-cognitive skills developed in childhood have major impacts on long-term adult outcomes6. Thus, interventions which have short term effects, but few detectable long-term effects are unlikely to be cost-effective. Therefore, longer-term follow-up of RCTs is especially important and are being supported by several funding agencies in education and elsewhere. Nevertheless, until such longer-term studies are reported, many of the claims in the literature that early interventions on a specific trait or with a particular intervention have major long-term effects are supported by very little empirical evidence.

Fade out

Recent concerns about the fade out of initially promising effects is crucial to consider in regard to the likely value of interventions early in life. Bailey et al.44 argue that interventions should target what they term “trifecta skills” (p.8). These skills are malleable, fundamental, and would not have developed eventually in the absence of the intervention. There were only four studies in which we could assess evidence for fade out and results were inconclusive. This seems another important facet to develop within the research portfolio around non-cognitive skills. Studies could be specifically designed to test the fade out hypothesis in rigorous ways.

Small Study Effects

Larger effects observed in smaller RCTs may be due to several factors including publication bias favouring positive results, true heterogeneity of effects due to differing baseline risks in different intervention populations, implementation difficulties in maintaining intervention intensity and fidelity in larger community settings, and poorer methodological design of smaller studies81. If smaller studies were better able to implement the intervention then larger effects might be real due to greater fidelity to the intervention as designed. On the other hand, publication bias favouring more positive results would mean larger effects from smaller studies would bias true effects upward. This is an important issue for practice and policy as it suggests that effects found in RCTs of small convenience samples may be overestimated or even non-existent. For example, when studies are scaled-up the results can be inconsistent or attenuate towards the null, perhaps suggesting that fidelity is harder because an expert is no longer delivering the intervention and/or that larger scale studies are unable to deliver as intensive interventions as small studies. A useful framework for considering such variation in intervention effects across different scales, contexts and population groups is presented in Weiss et al.82

Heterogeneity of Effects

This review clearly demonstrated large between-study heterogeneity from 95% prediction intervals that were consistent with negative, null and positive effects among sub-domains such as literacy and numeracy. It is possible to argue that this was inevitable in a field where there are many dimensions of non-cognitive skills being investigated in largely convenience samples, against a wide variety of measures of broad constructs such as literacy and numeracy. Perhaps that is so, but the field is nevertheless presented somewhat monolithically in the application of this science to broad-scale intervention and policy practice3. Quantifying the amount of heterogeneity is valuable in providing a baseline from which future research can investigate potential sources of this heterogeneity. For instance we sought to examine whether studies that used more representative population-based samples tended to generate smaller effect estimates, but the number of population-based samples in this field is actually rather small. For instance, of the 11 literacy EQIs able to be included in the meta-analysis only three were population based. For externalising behaviours, of 13 EQIs, only one was in a population-based sample.

Evidence Quality

Citing Practices

We reviewed recent RCTs to count the number of previous RCTs they cited. There were seven RCTs published from 2014-2016. There were 27 previous RCTs of non-cognitive skills on academic achievement and psychosocial outcome domains available to be cited. The highest number of citations of previous RCTs referenced in any of the RCTs published from 2014-16 was four79. Several RCTs published between 2014 and 2016 referenced no previous RCTs. It could perhaps be argued that these RCTs intervened on different non-cognitive skills so should not necessarily cite studies of other non-cognitive skills. Nevertheless, attention regulation and self-control were common ingredients of almost all of these interventions (Supplementary Table 2-5), so the impression is that new RCTs were not being explicitly justified on the basis of what was already known from existing RCTs.

Quality of RCTs

The quality of RCTs was not ideal and reporting of some details was poor or even absent. No RCTs had a formal pre-registered protocol and two thirds did not explicitly identify primary outcomes (See Supplementary Table 6 on Risk of Bias Tool83). This can allow cherry-picking of significant results within studies rather than focus on a single or small number of pre-stated main outcome(s) that the intervention is theoretically, or empirically (based on previous evidence) meant to most influence84. Over one-quarter of RCTs may have had other potential biases, for example, differential participation in the control and intervention groups, and unclear processes for selection of control participants. Ninety-two percent of RCTs did not adequately report randomisation procedures, 81% did not report concealment of allocation processes and participant flow, and most failed to address missing data. It was common for cluster RCTs to have too few clusters to achieve balance between intervention and control groups and in some it was unclear whether clustering was adequately dealt with in the analysis85. Poor reporting made it difficult to fully assess study quality and we strongly encourage researchers, journal editors and reviewers to use tools such as the CONSORT statement (http://www.consort-statement.org/) for reporting, and for RCTs to be pre-registered. These are now mandatory requirements for publishing in most leading health and medical science journals. However, it is possible that research practice regarding pre-registration is already changing and those pre-registered studies are yet to be published.

Quality of observational studies

More than 90% of all research in this field comes from observational studies. Of the 504 observational studies reviewed here, 66% were judged as ‘weak’ or ‘poor’ quality. Of all observational studies, 42% made little or no attempt to adjust for even basic confounding i.e., common causes of non-cognitive ability and the outcome. Problems of endogeneity and confounding are well known and may result in substantial bias of the association of non-cognitive skills and later outcomes. One regrettable consequence of the relatively low quality of much of the research effort in this field, is that it is not able to shed much light on the question of whether improving non-cognitive skills positively influences outcomes. To advance understanding of non-cognitive skills in children and their effects on outcomes later in life, there is little point in amassing more small-scale86, biased observational or experimental studies that have higher likelihood of failing to be replicated65,66,87 and are unable to contribute to evidence triangulation which is central for stronger causal inference88,89. The recommendations we make here to improve evidence quality in this field are not controversial. A 2018 Annual Review of Psychology paper called for more sophisticated power analyses, better statistical practices, study design specific to addressing effect modification, and better disclosure of non-significant as well as significant findings90.

Implications of Sub-optimal Reporting Practices of Effect Sizes and P-values

In order to be included in a meta-analysis studies needed to report or have the information available to calculate an effect size and the standard error. Where standard errors were not available, we calculated standard error from an exact p-value or where the p-value was reported as P

p. Consistent with recommended practice, this meant studies were excluded where an effect size and/or a standard error could not be calculated91. Excluding studies reporting P>p provides a more conservative estimate of the precision of studies. These exclusions were on top of excluding studies where an effect size was either not reported or could not be calculated. We illustrate the effect of this 2-layer exclusion for literacy outcomes. The literature reported 49 literacy related outcomes in 17 EQIs. Excluding outcomes where an effect size could not be calculated reduced the number of available literacy outcomes to 42 outcomes from 14 EQIs. Further excluding results where the p-value was reported as P>p meant the meta-analysis and funnel plots could only include 33 literacy outcomes from 11 EQIs. Thus, this 2-layer exclusion of reported results (due to sub-optimal reporting practices) meant we could only include 67% of the literacy outcomes actually presented in the literature. This also meant the meta-analysed effect size for literacy increased from 0.22 (including studies with P>p) to 0.33 (excluding studies with P>p) for EQIs because of the exclusion of studies with smaller effect sizes.

Interpreting Effect Sizes

We have avoided labelling effect sizes as “small (~0.2SD)”, “medium (~0.5SD)”, or “large (~0.8SD)” according to Cohen’s suggestions92. Even though these metrics are widely, often ritualistically, used as reference points, Cohen did not intend them to be used as absolutes. He cautioned that such generic application of sizes of effects to all research fields was "an operation fraught with many dangers"92(p. 12). Deciding if an effect is “big” is not straightforward in any field. Effect sizes are nothing more than mean differences between intervention (exposed) and control (unexposed) groups on some scale of outcome measurement divided by the standard deviation of the outcome. Use of such standardized effect measures has been criticized in several disciplines. In epidemiology, Greenland et al.93, have argued that the process of standardizing effects, rather than making them more comparable across studies, simply serves to confound that comparison by the observed standard deviation, which is often an artefact of the study sample, particularly for homogenous convenience samples. In political science, King argued that if apples and oranges cannot be meaningfully compared on the original outcome measurement scale then this lack of comparability is not improved by comparing standardized fruit94. Size of effects must be judged within the context of the field, the methods used in the study95 and, importantly, linked to some normative understanding of what weak or strong effects look like in a particular field. For example, if the best interventions available to improve a particular outcome, found reliable effects of 0.2SD when trialled in large population based samples, then a novel intervention finding the same effect might be considered large. Another way of norming effect size may be to consider the size of intervention effects against secular change in an outcome over time. Lipsey, et. al. present a sophisticated understanding of interpreting effect sizes96. For example, they show that the secular growth in reading from kindergarten to grade one in the US is estimated to be about 1.5 SD. By grades 4-5 this growth has declined to about 0.4 SD per year. How should an effect of a non-cognitive skills intervention in kindergarten on reading in grade one of 0.2 SD be judged? Such an intervention has generated about 13% greater improvement than the natural growth in reading during that time. Deciding whether an intervention is worth implementing will depend not only on its benefits, but also its costs, discount rate, scalability and a range of other potential considerations. Interventions that have small effects on average across the population and that are cheap could be very cost effective, particularly if they influence long term outcomes in adulthood. Therefore, the traditional labelling of an intervention as having “small” effects (~0.2SD) is inappropriate because it fails to consider the research, policy and practice context within which the intervention is situated.

Conclusion

So, after all the voluminous research included in this systematic review and meta-analysis, do intentional (from EQI evidence) or implied efforts (from observational evidence) to improve early life non-cognitive skills influence outcomes? Overall, yes, there is some evidence supporting a role for non-cognitive skills in better academic achievement, psychosocial, cognitive and language, but these effects are highly heterogeneous as they relate to the shorter-term outcomes examined in this review. We urge caution in interpreting this overall finding as unequivocally positive, given the potential for small study (publication) bias that may over-estimate the true effects, and the underlying heterogeneity of effect estimates as shown in 95% prediction intervals that were generally consistent with negative, null and positive effects. Thus, a true null effect of non-cognitive skills on these outcomes cannot be ruled out. We urgently need more robust evidence about which skills may be the active ingredient(s) and which outcomes they affect in the longer-term. That may come from studies which are funded for long-term follow-up of some of the more promising interventions reviewed here. These results suggest profitable pathways forward to help improve influences on life success beyond the traditional focus on reading, writing and arithmetic, and IQ. However, the research community interested in these diverse aspects of non-cognitive skills needs higher quality, adequately powered studies, and a strategically integrated, rigorous scientific focus to help answer the policy-relevant questions.97.

Methods

The systematic review protocol was preregistered with the International Prospective Register for Systematic Reviews (PROSPERO, CRD42013006566) in December 2013 and is available at: http://www.crd.york.ac.uk/PROSPERO/. This original protocol included children to age 8. Reviewers suggested extending this to age 12 hence the protocol was updated in September 2017.

Inclusion criteria

Publications were eligible if they involved non-cognitive abilities of children aged up to 12 years, including executive function (working memory, cognitive flexibility, inhibitory control and attention), effortful control, emotional regulation (emotional reactivity), persistence, conscientiousness, attention, self-control, impulsivity and delay of gratification. See Table 2 for a glossary of terms. Interventions that had general developmental goals were included if they specifically stated an aim related to improving any non-cognitive abilities. Only publications reporting original research were included. Publications involving non-cognitive characteristics in clinical subgroups (e.g., those already diagnosed with problems such as attention-deficit/hyperactivity disorder) were excluded because we were interested in effects of non-cognitive characteristics among developmentally normal healthy children.

Table 2

Glossary*

Attention	A state of awareness in which the senses are focused selectively on aspects of the environment and the central nervous system is in a state of readiness to respond to stimuli105.
Cognitive flexibility	This refers to a capacity for objective appraisal of and appropriate, flexible action. It involves adaptability, objectivity and fair-mindedness106.
Conscientiousness	The tendency to be organized, responsible, and hardworking, construed as a dimension of individual differences in the Big Five and Five-Factor Personality Models105.
Delay of gratification	The ability to forgo immediate reward for the sake of greater, future reward based on the original definitions by Mischel106.
Effortful control	Includes the abilities to voluntarily manage attention (attentional regulation) and inhibit (inhibitory control) or activate (activational control) behaviour as needed to adapt, especially when the child does not particularly want to do so107.
Emotional reactivity	The extent to which an individual experiences emotions (a) in response to a wide array of stimuli (emotion sensitivity) (b) strongly or intensely (emotion intensity), and (c) for a prolonged period of time before returning to a baseline level of arousal (emotion persistence)108.
Emotion regulation	The ability of an individual to modulate an emotion or set of emotions. Techniques of conscious emotional regulation can include learning to construe situations differently in order to manage them better and recognizing how different behaviours can be used in the service of a given emotional state105.
Executive function	Higher level cognitive processes that organise and order behaviour, such as judgement, abstraction and concept formation, logic and reasoning, problem solving, planning and sequencing of actions105.
Impulsivity	Behaviour characterised by little or no forethought, reflection or consideration of the consequences105.
Inhibitory control	The ability to suppress a pre-potent response, interrupt and ongoing response and resist distraction from external stimuli109.
Persistence	The quality or state of maintaining a course of action or keeping at a task and finishing it despite the obstacles (such as opposition or discouragement) or the effort involved105.
Self-control	The ability to be in command of one’s behaviour (overt, covert, emotional or physical) and to restrain or inhibit one’s impulses105.
Self-regulation	The control of one’s own behavior through the use of self-monitoring (keeping a record of behavior), self-evaluation (assessing the information obtained during self-monitoring), and self-reinforcement (rewarding oneself for appropriate behaviour or for attaining a goal)105.
Temperament	The basic foundation of personality, usually assumed to be biologically determined and present early in life, including such characteristics as energy level, emotional responsiveness, demeanour, mood, response tempo, and willingness to explore105.
Working memory	A multi-compartment model of short-term memory that has a phonological (or articulatory) loop to retain verbal information, a visuospatial scratchpad to retain visual information, and a central executive to deploy attention between them105.

This glossary has been compiled from several sources as there was no single source that contained definitions of all the non-cognitive constructs included in the systematic review. However, there are also inconsistent definitions across different sources. We reviewed various sources and selected explanations of non-cognitive abilities that were consistent with their usage in the literature included in this systematic review.

Literature Search

We searched four electronic databases for articles published from database conception until December 2016: PubMed, PsycINFO, Embase, and Business Source Complete. These databases were chosen because of their broad coverage of psychological, education, health and economic literature. The search strategy for each database is included in Supplementary Table 1. Search terms were tailored to each database and pilot tested. Study outcomes were not included as search terms to capture all published outcomes associated with non-cognitive abilities. Searches were not restricted by language. Authors of non-English articles were contacted for details or translations. Authors of conference abstracts, editorials and theses were contacted to obtain full text articles. Hand searching of relevant reviews16,98–100, our own libraries, and references cited in all RCTs and quasi-experimental interventions were conducted to identify further studies.

Screening

The titles and abstracts of all articles were screened for eligibility (by AS, LS, CC and TN). To ensure consistency of searching, the first 300 references were searched as a group by all authors and subsequent references were searched independently (Kappa values for agreement were >0.80). Where eligibility was not able to be determined by the title or abstract the full text was reviewed, and when eligibility was unclear this was resolved by group consensus.

Data extraction

The following information was systematically extracted from each article using a standardised form created by the authors. It included: study design, population-based or convenience sample, age of participants at exposure and outcome measurement, sample size and loss to follow up, measurement of exposure and outcome, type of intervention and comparison group, confounding adjustment and results. To be categorised as a population-based study the publication needed to report some intent and procedure to sample from a defined population base. Where studies did not report age but did report school grade, ages were approximated on the basis of knowledge of school attendance age in the country of interest. LS, JL, CC, AS, TG and TN extracted data from articles. ND independently (i.e., blinded to assessments of other authors) reviewed the data extraction for 15% of all studies, including all intervention studies, and consensus was reached for the very small number of discrepancies. Where possible we extracted a standardised ‘beta’ coefficient or standardised effect size to have a unit free way of comparing effects across exposures and outcomes (i.e., the difference in SD units between intervention and control groups, or the effect of a 1SD increase in exposure on an outcome in observational studies). When unstandardised coefficients were reported, where possible we calculated standardised effect size to allow comparability of effects across the studies. When a standardised effect size could not be calculated (i.e., SDs for exposure and outcome were not reported) we reported the unstandardised effect sizes.

Screening to assess risk of bias

The authors JL, LS and AS reviewed all eligible studies and rated their evidence quality as ‘better, weak, or poor’ on the basis of study design and confounding adjustment (Table 1). For RCTs, the risk of bias was assessed using the Cochrane Collaboration Risk of Bias Assessment Tool83 (Supplementary Table 6). We adopted a “potential outcomes approach” to conceptualizing confounding where the interpretation of a ‘causal’ effect of an exposure estimated from observational data relies on several assumptions101. One of the key assumptions is conditional exchangeability between exposed and unexposed. This corresponds to the idea that the estimate is reasonably free from “confounding” by poorly measured or unmeasured characteristics. This is called endogeneity bias in economics. Thus, our assessment of better quality evidence relied on a subjective judgment of the risk of bias from confounding. Publications that made no attempt to statistically control for common causes of exposure and outcome were rated as ‘poor’ because the likelihood of confounding (endogeneity) bias was high, and so these publications could not inform any assessment of likely causal effects of non-cognitive skills on outcomes. On the other hand, observational studies using fixed-effects regression (i.e., twins, siblings, and within-individual change) or adjustment for strong common causes of the exposure – outcome association (including proxies for these such as baseline measures of the outcome, or child’s cognitive ability) were rated as better evidence. Here we only report results from studies that met the definition of ‘Better evidence’. However, all weak and poor evidence studies were reviewed and appear in Supplementary Tables 7 and 8.

Data synthesis

Meta-analysis and forest plots

We used effect sizes as reported in the original study or, where possible, used information presented to calculate effect sizes as Hedges’ g. This may mean some differences exist in how different studies calculated effect sizes in terms of how they included information on standard deviations of the outcome. We synthesised the information on effect sizes by undertaking random effects meta-analysis using inverse variance weighting. When no measure of variance was reported we calculated confidence intervals from p values102. It was common for studies to not report variance or exact p values. To overcome this problem for conducting meta-analyses using inverse variance weighting we were forced to make assumptions about p values to calculate confidence intervals. If p was reported as less than a specific value we assumed p equalled that value, e.g. if p was reported as p<0.01 we assumed p=0.01 for the purpose of calculating confidence intervals. Where p was reported as greater than a specific value, we followed the Cochrane Review Handbook which recommends removing any estimates where p is reported as greater than some value91. The main summary of results is shown in Figures 2a (EQIs) and 2b (observational studies). We show the meta-analysed average effect size (and its 95% confidence interval) in each sub-domain of academic achievement, psychosocial, cognitive and language, and health outcome. The 95% confidence interval informs how precisely the mean effect size has been estimated. On unlimited repetitions of sampling, and assuming there is no effect (i.e., the null is true), then 95% of all the confidence intervals calculated would include the true population mean – in this case the effect size. We also present the 95% prediction interval which indicates the heterogeneity of effects across the population of studies that generated the meta-analysis effect size. The prediction interval estimates where the true effects are to be expected for 95% of similar studies that might be conducted in the future103,104. More detailed analyses showing individual publications in each of the subdomains (e.g. literacy) are presented in Supplementary Figures 2a-19b according to study design (EQIs versus observational, and then by cluster, individual, quasi-experimental, longitudinal and cross-sectional). To reduce bias that may have arisen from studies reporting multiple measures of the same outcome, we obtained an overall estimate across all of the reported measures. For example, if a publication reported three different measures of literacy we meta-analysed those three estimates to get an overall effect. These are the estimates shown in the Supplementary Figures 2a-19b. These figures show the meta-analysed effect size (95% confidence interval), Tau2 (a measure of variation in true effects among studies), the I statistic which describes the proportion of observed variability that can be attributed to among-study heterogeneity104, and the 95% prediction intervals.

Funnel plots and Egger regression

We examined asymmetry of the published evidence by generating funnel plots of effect size against inverse of study size separately for EQIs and observational studies (Supplementary Figures 20a-23b) and calculated the summary Egger regression coefficient and p value indicating the degree of asymmetry81. The coefficient from the Egger regression tests whether the y intercept is zero. The expectation is that the y intercept is zero if there is an even spatial spread of studies within the funnel. The coefficient is the effect size normalized by dividing by the standard error (x-axis) against the reciprocal of the standard error of the estimate (y axis). Small p values on the Egger regression coefficient suggest the presence of small study bias that may produce larger effects.

Length of follow up

To include information on length of follow up, we graphed each publication according to length of follow up, effect size and study size (Supplementary Figures 24-31). The size of the icon in Supplementary Figures 24-S31 corresponds with small (n<100), medium (n=100-500) and large (n>500) studies. The length of the line displays the duration of follow-up. Supplementary Figure 32 specifically compares end of intervention (or as closely as we could approximate) and follow up effects for studies where it could be calculated.

78 in total

1. Fine motor skills and executive function both contribute to kindergarten achievement.

Authors: Claire E Cameron; Laura L Brock; William M Murrah; Lindsay H Bell; Samantha L Worzalla; David Grissmer; Frederick J Morrison
Journal: Child Dev Date: 2012-04-26

2. Cluster randomized trials with a small number of clusters: which analyses should be used?

Authors: Clémence Leyrat; Katy E Morgan; Baptiste Leurent; Brennan C Kahan
Journal: Int J Epidemiol Date: 2018-02-01 Impact factor: 7.196

Review 3. Best evidence synthesis: an intelligent alternative to meta-analysis.

Authors: R E Slavin
Journal: J Clin Epidemiol Date: 1995-01 Impact factor: 6.437

4. Show me the child at seven II: Childhood intelligence and later outcomes in adolescence and young adulthood.

Authors: David M Fergusson; L John Horwood; Elizabeth M Ridder
Journal: J Child Psychol Psychiatry Date: 2005-08 Impact factor: 8.982

5. A gradient of childhood self-control predicts health, wealth, and public safety.

Authors: Terrie E Moffitt; Louise Arseneault; Daniel Belsky; Nigel Dickson; Robert J Hancox; Honalee Harrington; Renate Houts; Richie Poulton; Brent W Roberts; Stephen Ross; Malcolm R Sears; W Murray Thomson; Avshalom Caspi
Journal: Proc Natl Acad Sci U S A Date: 2011-01-24 Impact factor: 11.205

6. School readiness and later achievement.

Authors: Greg J Duncan; Chantelle J Dowsett; Amy Claessens; Katherine Magnuson; Aletha C Huston; Pamela Klebanov; Linda S Pagani; Leon Feinstein; Mimi Engel; Jeanne Brooks-Gunn; Holly Sexton; Kathryn Duckworth; Crista Japel
Journal: Dev Psychol Date: 2007-11

7. Promoting academic and social-emotional school readiness: the head start REDI program.

Authors: Karen L Bierman; Celene E Domitrovich; Robert L Nix; Scott D Gest; Janet A Welsh; Mark T Greenberg; Clancy Blair; Keith E Nelson; Sukhdeep Gill
Journal: Child Dev Date: 2008 Nov-Dec

8. Preventing conduct problems and improving school readiness: evaluation of the Incredible Years Teacher and Child Training Programs in high-risk schools.

Authors: Carolyn Webster-Stratton; M Jamila Reid; Mike Stoolmiller
Journal: J Child Psychol Psychiatry Date: 2008-01-21 Impact factor: 8.982

9. How to increase value and reduce waste when research priorities are set.

Authors: Iain Chalmers; Michael B Bracken; Ben Djulbegovic; Silvio Garattini; Jonathan Grant; A Metin Gülmezoglu; David W Howells; John P A Ioannidis; Sandy Oliver
Journal: Lancet Date: 2014-01-08 Impact factor: 79.321

10. Closing the achievement gap through modification of neurocognitive and neuroendocrine function: results from a cluster randomized controlled trial of an innovative approach to the education of children in kindergarten.

Authors: Clancy Blair; C Cybele Raver
Journal: PLoS One Date: 2014-11-12 Impact factor: 3.240

20 in total

1. Delayed tracking and inequality of opportunity: Gene-environment interactions in educational attainment.

Authors: Antonie Knigge; Ineke Maas; Kim Stienstra; Eveline L de Zeeuw; Dorret I Boomsma
Journal: NPJ Sci Learn Date: 2022-05-04

2. Impact of child development at primary school entry on adolescent health-protocol for a participatory systematic review.

Authors: Michelle Black; Amy Barnes; Mark Strong; David Taylor-Robinson
Journal: Syst Rev Date: 2021-05-07

3. Training Executive Functions to Improve Academic Achievement: Tackling Avenues to Far Transfer.

Authors: Catherine Gunzenhauser; Matthias Nückles
Journal: Front Psychol Date: 2021-05-24

4. The controlled direct effect of temperament at 2-3 years on cognitive and academic outcomes at 6-7 years.

Authors: Shiau Yun Chong; Catherine Ruth Chittleborough; Tess Gregory; John Lynch; Murthy Mittinty; Lisa Gaye Smithers
Journal: PLoS One Date: 2019-06-04 Impact factor: 3.240

5. What Technology Can and Cannot Do to Support Assessment of Non-cognitive Skills.

Authors: Vanessa R Simmering; Lu Ou; Maria Bolsinova
Journal: Front Psychol Date: 2019-09-25

6. Investigating the genetic architecture of noncognitive skills using GWAS-by-subtraction.

Authors: Perline A Demange; Margherita Malanchini; Travis T Mallard; Pietro Biroli; Simon R Cox; Andrew D Grotzinger; Elliot M Tucker-Drob; Abdel Abdellaoui; Louise Arseneault; Elsje van Bergen; Dorret I Boomsma; Avshalom Caspi; David L Corcoran; Benjamin W Domingue; Kathleen Mullan Harris; Hill F Ip; Colter Mitchell; Terrie E Moffitt; Richie Poulton; Joseph A Prinz; Karen Sugden; Jasmin Wertz; Benjamin S Williams; Eveline L de Zeeuw; Daniel W Belsky; K Paige Harden; Michel G Nivard
Journal: Nat Genet Date: 2021-01-07 Impact factor: 38.330

7. Editorial: Bridging the Gap Between Research and Policy in Fostering Social and Emotional Skills.

Authors: Javier Suarez-Alvarez; Rubén Fernández-Alonso; Patrick Charles Kyllonen; Filip De Fruyt; Eduardo Fonseca-Pedrero; José Muñiz
Journal: Front Psychol Date: 2020-03-25

8. Cognitive and noncognitive predictors of success.

Authors: Angela L Duckworth; Abigail Quirk; Robert Gallop; Rick H Hoyle; Dennis R Kelly; Michael D Matthews
Journal: Proc Natl Acad Sci U S A Date: 2019-11-04 Impact factor: 11.205

9. Examining the relationships among adolescent health behaviours, prefrontal function, and academic achievement using fNIRS.

Authors: Mia Papasideris; Hasan Ayaz; Adrian B Safati; Plinio P Morita; Peter A Hall
Journal: Dev Cogn Neurosci Date: 2021-07-01 Impact factor: 6.464

10. Multivariable two-sample Mendelian randomization estimates of the effects of intelligence and education on health.

Authors: Neil Martin Davies; W David Hill; Emma L Anderson; Eleanor Sanderson; Ian J Deary; George Davey Smith
Journal: Elife Date: 2019-09-17 Impact factor: 8.140