Literature DB >> 29107609

The utility of twins in developmental cognitive neuroscience research: How twins strengthen the ABCD research design.

William G Iacono¹, Andrew C Heath², John K Hewitt³, Michael C Neale⁴, Marie T Banich³, Monica M Luciana⁵, Pamela A Madden², Deanna M Barch², James M Bjork⁴.

Abstract

The ABCD twin study will elucidate the genetic and environmental contributions to a wide range of mental and physical health outcomes in children, including substance use, brain and behavioral development, and their interrelationship. Comparisons within and between monozygotic and dizygotic twin pairs, further powered by multiple assessments, provide information about genetic and environmental contributions to developmental associations, and enable stronger tests of causal hypotheses, than do comparisons involving unrelated children. Thus a sub-study of 800 pairs of same-sex twins was embedded within the overall Adolescent Brain and Cognitive Development (ABCD) design. The ABCD Twin Hub comprises four leading centers for twin research in Minnesota, Colorado, Virginia, and Missouri. Each site is enrolling 200 twin pairs, as well as singletons. The twins are recruited from registries of all twin births in each State during 2006-2008. Singletons at each site are recruited following the same school-based procedures as the rest of the ABCD study. This paper describes the background and rationale for the ABCD twin study, the ascertainment of twin pairs and implementation strategy at each site, and the details of the proposed analytic strategies to quantify genetic and environmental influences and test hypotheses critical to the aims of the ABCD study.

Entities: Chemical Disease Gene Species

Keywords: Brain function; Brain structure; Environment; Heritability; Substance use; Twins

Mesh：

Year: 2017 PMID： 29107609 PMCID： PMC5847422 DOI： 10.1016/j.dcn.2017.09.001

Source DB: PubMed Journal: Dev Cogn Neurosci ISSN： 1878-9293 Impact factor: 6.464

“Almost any experiment that one might think of doing with human subjects will be more interesting and yield more valuable results if one does it with twins.” David Lykken (Lykken, 1982, p. 361)

Introduction

Twin births represent an intriguing experiment of nature through which individual differences in key psychological traits can be ascribed to genetic and environmental variation. The classical twin study design contrasts the similarity of genetically identical (monozygotic, MZ) pairs to that of fraternal (dizygotic, DZ) pairs – the latter no more closely related than full sibling pairs − and has been used for many decades to establish genetic contributions to normal human variation and to risk of clinical outcomes (e.g., Kaij and Rosenthal, 1961, Partanen et al., 1966). The proportion of population variation in a trait that is due to genetic influences is referred to as heritability. A recent analysis of 2748 twin studies reported an average heritability of 49% across thousands of medical and behavioral phenotypes (Polderman et al., 2015). Besides providing testimony to the value of twins to science, this report showed that genetic influences are an important component of variation for almost all human traits. Applications of the twin study design to determine causal interrelationships between brain structure and function (Blokland et al., 2012, Jansen et al., 2015), neuropsychological performance (Blokland et al., 2017), and adolescent substance use (Hopfer et al., 2003, Lynskey et al., 2010) are particularly compelling, though such designs can help address causality in a number of other relationships as well (e.g., maltreatment and the development of child neuropsychiatric disorders, Dinkler et al., 2017). The ABCD Twin Hub was inspired by the potential for twin designs to help separate cause from consequence when investigating typical development. In addition, it is well poised to examine the effects of substances and other environmental adversities on neurocognitive development, emotion development, mental health, and physical health. The Twin Hub comprises four participating sites, each with over 30 years of experience conducting twin research: University of Colorado Boulder; University of Minnesota, Minneapolis; Virginia Commonwealth University, Richmond; Washington University, St. Louis, Missouri. The twin design allows estimation of the separate contributions to variance in a trait of genetic effects and shared (including family) environmental effects, as well as non-shared environmental effects (i.e., those differences in environmental exposures that contribute to twin pair differences even in MZ pairs). The design is based on the testable assumption that there is no higher correlation for trait-relevant environmental exposures (for example, peer influences on behavior) in MZ than in DZ pairs except in so far as this is the result of genetic differences, such as when an individual with a high genetic predisposition to substance use seeks environments in which drugs are widely used. This assumption can be evaluated on a trait by trait basis. Even more informative are newer multivariate approaches that go beyond assessment of heritable versus environmental contributors to an individual trait, to examining how these contributors govern how two different traits co-vary within an individual, or govern how a single trait (such as a brain morphometric feature) may differ across developmental time points. It is now recognized that the most powerful genetic applications are usually multivariate, rather than merely focused on univariate analyses of individual traits. Multivariate approaches illuminate how genetic differences between individuals contribute to how one trait may relate to another within-individuals, such as between traits: (i) within a measurement domain (e.g., multiple measures of brain structure and function), (ii) across measurement domains (e.g., brain measures with clinical outcomes or with indices of neuropsychological functioning or other measures of normal variation), and (iii) over time; e.g., across different stages of development (Neale and Cardon, 1992c). Such applications supplement information about the similarity of MZ and DZ twin pairs estimated from correlations observed for individual traits (for example, MZ versus DZ correlations for trait A, and for trait B), with cross-trait correlations both within individuals (the usual phenotypic correlation of trait A with trait B) and across the members of a pair (trait A in first twin with trait B in cotwin, and vice versa, estimated separately by zygosity). To the extent that genetic effects contribute to the covariation of two traits, the twin pair cross-trait correlations should be higher in MZ than in DZ pairs (Martin and Eaves, 1977). As in the univariate case, estimation of shared and non-shared environmental contributions to trait covariation may also be obtained. Measurement of environmental influences that may be shared by members of a twin pair, including family, school, neighborhood or peer influences, as well as exposures for which a pair may be discordant, will allow testing of hypotheses about the “interplay” of genetic and environmental effects. These processes include: (i) GXE (Gene by Environment) interaction effects and (ii) GE correlation effects. GXE interaction effects are those genetic effects that may be moderated by environmental exposures. In the simplest case, GXE can be determined by evaluating the degree to which the strength of genetic effect varies by presence or absence of an environmental exposure. For example, Hicks et al. (2009) found that the genetic contribution to risk for externalizing disorders was especially evident in those experiencing high environmental adversity. As another example, Chiang et al. (2011) found that heritability of diffusion measures of white matter was greater in individuals from a higher socio-economic back ground. GE correlation is present when genetic effects result in phenotypic differences that in turn affect environmental exposures within individuals and, in some extended twin-family designs, within families (Dolan et al., 2014, Eaves et al., 1977). For example, genetically-regulated temperament may prompt a child to seek deviant peers, and evidence for GE correlation has been found in regards to the relationship between specific types of stressful life events (those that could be influenced by the individual, e.g., Bemmels et al., 2008) and psychotic like experiences (Shakoor et al., 2016).

How ABCD’s inclusion of the Twin Hub strengthens the quality of causal inference

To illustrate how a genetic analysis can inform how experiences affect neurodevelopment, we use the example of substance use, though this is only one example among many possible mental and physical health outcomes that will be examined as part of the ABCD study. Youth substance use is not a random event. Youth who abuse substances differ from those who do not on a wide range of risk factors, many of which predate use (Bailey et al., 2014, Hicks et al., 2014, McGue and Iacono, 2005). An association between substance use and a disadvantageous outcome may reflect a causal consequence of use, or it could be a product of pre-existing genetic and environmental risk factors that predispose an individual to both drug use and some other adverse psychosocial outcome. That is, a substance use-outcome association may be confounded by etiologic influences that existed before use began. Natural experiments such as twin difference designs afford innovative and powerful opportunities to evaluate such confounds. Consider the case of MZ twins. Because both members of the pair share their genetic endowment as well as their rearing environment (both pre- and post-natal), the twin in a pair who uses substances less than his or her cotwin serves as an ideal control for genetic predisposition and shared environmental adversity. The twin design thus offers a unique “what-if” neurodevelopmental scenario that is unavailable in studies of singletons. Suppose longitudinal assessments of drug-using teens indicate a flattening of some brain developmental trajectory that continues unabated in same-age non-using peers. One cannot confidently assume that the drug use caused the “flattening” (or delayed maturation) given the possibility of genetic programming for the deviant trajectory. Mathematically leveraging a non-using twin enables an approximation of the “what if” the drug-using teen never used, to better isolate causal effects of substance exposure as a unique environmental event. Put differently, the lesser-using twin provides insight into what the brain of the twin who uses substances more heavily could have been expected to be like had the heavier using twin used less. If the heavy-using twin shows a neurocognitive deficit not observable in the lesser using co-twin, the results are consistent with the possibility that use caused the deficit. If, despite one twin’s heavy use, members of the pair have the same neurocognitive outcome, then predisposing familial risk or other shared risk, not substance use, better explains the association between substance use and outcome. The inclusion of DZ pairs, whose shared environmental experiences are like those of MZ twins (Kendler et al., 1993), provides information about the relative contribution of genetic and shared environmental effects to any observed familial risk. By taking into account risk factors that could account for why one twin in a pair uses more than the other, it is possible to strengthen the inference that it is use, and not a third variable, that accounts for any observed within-pair difference in the outcome. The use of twin pairs discordant for an environmental exposure, in an effort to achieve more robust causal inference, has a decades-long history (Gesell, 1942, McGue et al., 2010). In the modern era, it is recognized that quantitative assessments of environmental exposure can provide additional information and greater statistical power. So how is causality inferred from a twin design? More formally, consider a strong causal hypothesis: the correlation between Exposure A and Outcome B arises because Exposure A has a causal effect on Outcome B (direction-of-causation modeling: Heath et al., 1993a, Neale and Cardon, 1992b). In twin pair data, in the absence of third-variable effects, a strong prediction can be made for the cross-correlations between exposure A in one twin, and outcome B in the other twin. Thus twin pairs concordant as well as discordant for exposure contribute information. For each twin zygosity group, these cross-correlations are predicted to be the product of the twin pair correlation for A, and the correlation of B with A, a prediction that can be tested by model fitting; i.e., model fitting provides a test of the strong causal hypothesis, A causes B, that assumes no third-variable influences. Here, as with simpler analyses of trait covariation, multivariate analyses using multiple indicators of trait A and outcome B are likely to be especially persuasive because they can control for measurement error (Heath et al., 1993a). Recent work illustrates application of the twin pair design to evaluate causal effects of adolescent marijuana use on cognitive ability. Meier et al. (2012) examined the relationship between marijuana use and change in cognitive performance from pre-adolescence to age 38 in approximately 1000 (non-twin) individuals enrolled in the Dunedin Longitudinal Study. Greater marijuana use appeared to be associated with greater cognitive decline over this interval, and the effects were particularly strong for adolescent use. Other longitudinal studies have also suggested negative effects of adolescent cannabis use, although there is inconsistency regarding how strongly the results hold after accounting for pre-adolescent performance and relevant covariates (Fried et al., 2005, Mokrysz et al., 2016). Although longitudinal studies like these ostensibly support the hypothesis that teen marijuana use causes cognitive decline, they cannot rule out the possibility that the cognitive decline would have occurred even in the absence of cannabis use. Twin studies can help to resolve this question. Using longitudinal data from twins followed through adolescence and into young adulthood, Jackson et al. (2016) evaluated the effects of teen marijuana use on IQ assessed at ages 9–12 and 17–20. Consistent with Meier et al. (2012), use was associated with IQ decline for measures of crystallized intelligence. However, employing the co-twin control design, it was found that non-using members of monozygotic twin pairs discordant for use showed the same IQ decline as users, indicating that familial risk, and not use per se, accounted for the decline. This study included over 3000 twins from projects in Minnesota and California, and the results replicated across sites. Of note, this prospective study, also found that marijuana use during adolescence was associated with IQ decline, but because twins were incorporated into the longitudinal design, the results suggested the IQ decline was not a consequence of use. Similar results have been reported by Meier et al. (2017) who found that IQ at age five was lower in twins who became cannabis dependent as teenagers, and that non-dependent co-twins of cannabis dependent 18-year old twins matched their IQ at that age. Resolution of this type of question is essential so that public health messaging is accurate and so that appropriate interventions can be structured at the proper time period in the lifespan. These studies suggest that the focus of intervention and prevention should be on trait factors that render some individuals vulnerable to substance misuse. With its 90 min MRI assessment and accompanying comprehensive neuropsychological and mental health test batteries, ABCD’s coverage of neurocognitive and emotional outcomes is much broader than what was possible in previous studies. This makes ABCD’s inclusion of twins particularly valuable for evaluating the degree to which premorbid emotional and behavioral traits contribute to adolescent substance use or other mental and physical health outcomes, and which in turn contribute to brain health. Combining the twin design with a prospective, longitudinal investigation in which risk factors are comprehensively assessed prior to initiation of use provides one of the most powerful designs available for making causal inferences. Indeed, because the ABCD study cohort will be assessed first at ages 9–10, prior to the onset of substance use, twin study methods can be fruitfully utilized during the first few years of assessment to identify the traits and other characteristics that are most likely to be altered by environmental (e.g., substance use, stressful life events, trauma, family factors) circumstances. The utility of the twin design for strengthening causal inference regarding the consequences associated with substance use is just one of many questions that can be answered in the ABCD study using this approach. For example, there is a relatively large body of literature showing genetic influences on aspects of brain structure, including both gray matter (Eyler et al., 2012, Eyler et al., 2011, Peper et al., 2009, Shen et al., 2016, Yoon et al., 2011) and white matter (Chiang et al., 2009, Kochunov et al., 2015, Koenis et al., 2015, Shen et al., 2014), with some evidence for genetic influences on at least some aspects of task related brain activity (Pinel and Dehaene, 2013) and functional connectivity (Fu et al., 2015, Sinclair et al., 2015, van den Heuvel et al., 2013, Xu et al., 2016, Yang et al., 2016). Further, there is intriguing evidence that there are genetic effects on patterns of brain change over time (Anokhin et al., 2017, Brouwer et al., 2014, Brouwer et al., 2017, van Soelen et al., 2013), and evidence that the magnitude of relative genetic versus environmental influences may increase over the course of development (Jansen et al., 2015, Lenroot et al., 2009, Schmitt et al., 2014, Wallace et al., 2006), though not all studies have found this (Swagerman et al., 2014) and some have found a decrease (Wallace et al., 2006). However, we know relatively little about environmental factors that may have a causal influence on patterns of brain development, such as early stressful life events and/or trauma, family factors, neighborhood, and school factors. This is because many of the current studies on the relationship between early environmental factors and brain outcomes are not able to rule out either GE correlations or third variables that are correlated with both the environment and brain factors (Hanson et al., 2011, Luby et al., 2016, Noble et al., 2015). As such, the twin design used in the ABCD study, coupled with the longitudinal nature of the study, will provide crucial data relevant to these important questions about casual influences of the environment on brain development (as has been utilized at least one recent study, Dinkler et al., 2017). Moreover, this approach will further inform interpretation of ABCD’s longitudinal singleton data.

Overview of the Twin Hub aims and rationale

The longitudinal twin design can answer a broad range of questions on the interplay between adverse environmental impacts and mental health outcomes in emerging adulthood. However, in light of expanding decriminalization of cannabis in the US, one of the primary goals of the ABCD Twin Hub is to deepen our understanding of the effects of adolescent substance use (SU) on the brain, cognitive functioning, and behavior through the integration of developmentally and genetically informative data. As just noted, determinants and consequences of SU are often confounded. For example, executive deficits have been implicated in both the liability to substance use and, through cross-sectional research, its consequences. ABCD’s powerful twin method: (a) provides effective control for genetic factors, demographic background, and shared familial environment, (b) permits the isolation of the substance exposure effects; (c) can be extended to quantitative differences in substance exposure (such as quantity/frequency of consumption or biochemical markers of exposure); and (d) can integrate longitudinal, twin and family data. Additionally, twin data can assess the heritability of brain structure and function, the shared genetic and environmental etiology of brain and behavioral phenotypes, genetic influences on individual differences in neurodevelopmental trajectories and mental illness, the detrimental effects of SU and other risky behaviors on the brain, and assessments of SU-induced epigenetic and metagenomic (e.g., microbiome) changes. Analyses of longitudinal twin data can also test a prevailing hypothesis that: (i) the relative immaturity of the cognitive control regions of the brain during adolescence and peak activity within reward-related regions result in heightened impulsivity, heightened reward sensitivity, and increased risk for SU initiation; and (ii) SU itself may further compromise inhibitory control, thus creating a “vicious circle” of SU progression and self-control deficits. Our first aim is to assess genetic and environmental contributors to a range of mental and physical health outcomes, including substance use liability. As outlined above, we will use classic methods and co-twin control designs to study genetic versus environmental contributions to adolescent brain and neurocognitive development, evaluating how these contributions affect propensity towards the development of mental health relevant outcomes, including SU. Using genetically informed linear and non-linear latent growth curve and mixture models, we will test genetic versus environmental contributions to these developmental trajectories across puberty via neuroimaging and behavioral markers. We aim to predict who will initiate SU and/or develop mental health challenges, considering key domains such as hormonal change and the timing of cortical, subcortical, and psychological maturation. A second aim is to examine the impact of substance use, as well as other environmental factors and risk-taking behaviors, on subsequent neurodevelopment, as well as internalizing and externalizing symptomatology. These various behaviors may cause permanent changes in brain structure, connectivity, and function, detectable by MRI. Again, we use SU as an example. We hypothesize that SU interacts with genetic effects that alter developmental brain trajectories and outcomes. This will be tested by modeling GXE interactions within growth trajectory models. We will test whether individuals’ substance use on one occasion affects their environment on subsequent occasions (a GE correlation reflecting niche selection), within and across substances and imaging phenotypes. We will further evaluate how gateway effects – use of one drug class influencing probability of progression to use of another drug class – are mediated by genetic and environmental influences. Finally, the twin hub is central to ABCD’s efforts to leverage biospecimen data for the understanding of risk. We will collect biospecimens for future studies of genomic, epigenomic, metabolomic, and microbiome changes that may influence drug use and abuse and their consequences for physical and mental health. We will obtain baseline and follow-up serum, saliva, and in some cases gut microbiota (from stool) and urine samples.

Rationale for the ABCD Twin Hub recruitment strategy

Although twin births are far from rare (now approximately 1.5% of births or 3% of the population), standard survey research methods (sampling of households and individuals within households; random digit dialing; school-based ascertainment) would not be cost-effective for the ascertainment of twin pairs in a narrow age range. Most twin research has either been conducted using volunteer samples (e.g., Heath et al., 1997, Lynskey et al., 2003), with the significant limitation of uncertain generalizability of findings, or, preferably, locating twin pairs from birth records, with first assessment beginning at an age (e.g., 12), closer to first use of a substance (e.g., Iacono et al., 2006, Waldron et al., 2013). In the USA, the regulation of vital records is at the state rather than federal level. Some states are relatively supportive of the use of vital records for public health research, and others prohibitive, which poses a challenge in capturing the race/ethnic and sociodemographic diversity of the US population. Individual states may have adequate representation of some but not all groups. We therefore sought to approximate this diversity by forming a consortium of four twin-research sites that have long-established relationships with state vital records departments/divisions, in order to collectively provide reasonable coverage of minority as well as white non-Hispanic twin pairs. We decided to target only like-sex twin pairs, given the common, though not universal (Vink et al., 2012), finding of lower correlations for DZ unlike-sex pairs (Polderman et al., 2015), possibly due to differences in gene expression between the sexes (c.f., Eaves et al., 1978). Without this exclusion, given relatively modest sample sizes, estimates of the magnitude of genetic effects might be inflated. We also targeted all possible eligible twin pairs available within each birth cohort, approximating a random sample of these within the practical limits of real world subject tracing and voluntary recruitment, and did not attempt to select a subset of high risk individuals or pairs; to have done the latter would distort our genetic and environmental analyses in ways that are difficult to assess without a much larger twin study. Table 1 uses maternal race/ethnicity data from the Center for Disease Control (CDC; https://wonder.cdc.gov/) on all births of twins to determine the race/ethnic breakdown for the total US population of twin pairs born 2006–2008, and suggests a breakdown that could approximate these proportions that would be achievable by the four twin-site consortium. Rare minority groups are excluded because of concern that in an Open Science project with wide data-sharing, participants might be too easily identifiable. The Twin Hub is targeting a minimum of 10 pairs per race/ethnic group per site (to maximize protection of confidentiality) and approximately equal numbers of white non-Hispanic pairs per site (to balance recruitment challenges associated with locating and engaging minority pairs across sites).

Table 1

US twin pair births and projected Twin Hub targets based on maternal race/ethnicity.

Race/Ethnicity	US Twin Pairs Born 2006–08		Target Sample (N = 800 pairs)
	N	%	N	%
Non-Hispanic:
White	250,500	60.9	495	61.9
Black	68,882	16.8	136	17.0
American Indian	3170	0.8	--	--a
Asian	20,162	4.9	40	5.0
Hispanic:	65,014	15.8	129	16.0

Excluded due to potential identifiability (small N).

US twin pair births and projected Twin Hub targets based on maternal race/ethnicity. Excluded due to potential identifiability (small N).

The generalizability of twin data

If twin data generalize to singletons, any causal relationships between environmental factors, including drug use, and brain morphology detected using this analytic approach will in turn inform the interpretation of longitudinal relationships between such factors and the brain detected in the much larger ABCD consortium singleton sample. We cannot know a priori the limits to the external validity (i.e., generalizability) of twin data for the major assessment domains of clinical or cognitive neuroscience (nor indeed of individuals from any other family structure), given the relative lack of large-sample neuroimaging data for twins and singletons in this age-group (by large sample we mean many thousands, given the high likelihood of false positive conclusions in underpowered studies, c.f., Ioannidis, 2005). However, the ABCD sample size will, in time, speak to generalizability. We know that since monozygotic twinning occurs randomly (Bulmer, 1970), MZ pairs should be representative of genetic variation in the general population, and a simple test for MZ-DZ pair mean differences will flag measurement domains where DZ pairs may be less representative, requiring greater caution in generalizations of findings. There are few compelling examples of such mean differences by zygosity in same-sex pairs in the literature (Kendler, 2001). We can learn more about twin-singleton differences that might limit generalizability from twin data with respect to maternal sociodemographic characteristics, such as may be derived from birth record data. The likely sociodemographic generalizability of the Twin Hub sample to consortium singletons was investigated by the Missouri site of the Twin Hub, using data on singleton versus twin pair births. Comparisons were limited to white non-Hispanic pairs and African-American pairs because of small sample sizes for twin pairs from other race/ethnic groups. Table 2 summarizes significant or near significant associations found by fitting a multiple logistic regression model predicting twin pair versus singleton birth. Independent variables include: maternal age at childbirth (dummy variables for categories shown in the table), educational level (categories <= 11 years, 12 years, 13–14 years, 15–16 years, 17 + years), marital status at childbirth (unmarried mother with unnamed partner, unmarried mother with named partner, married mother) and maternal place of origin (born in state versus elsewhere). The only strong effects observed are the reduced probabilities of twin births among teenagers and (to a lesser degree) in 20–25-year-old white non-Hispanic women. With these modest exceptions, sociodemographic characteristics of mothers of twins and mothers of singletons at the time of childbirth are very comparable.

Table 2

Maternal sociodemographic predictors of twin pair versus singleton births in Missouri, 2006–07, to white non-Hispanic and African-American mothers.

	White non-Hispanic		African-American
	Adjusted OR	95% CI	Adjusted OR	95% CI
Maternal Age
≤19	0.44	0.34–0.58	0.38	0.23–0.62
20–25	0.69	0.60–0.80	0.86 ^NS	0.62–1.18
26–30	1.00	-- --	1.00	-- --
31–34	1.18	1.01–1.38	1.10 ^NS	0.73–1.66
>35	1.42	1.18–1.72	1.04 ^NS	0.61–1.66
Mother Missouri-born	1.12 ^NS	0.99–1.26	0.88 ^NS	0.66–1.17
Maternal Education 17+ years	1.18	1.01–1.38	1.20 ^NS	0.70–2.07

Significance is indicated by 95% confidence intervals which are in the table.

Maternal sociodemographic predictors of twin pair versus singleton births in Missouri, 2006–07, to white non-Hispanic and African-American mothers. Significance is indicated by 95% confidence intervals which are in the table. A final respect in which twin pairs differ from singletons concerns prematurity, birth weight, and fetal and infant death. Twin pair pregnancies are considered high-risk because of the increased associated risks of fetal death, premature birth and low birth weight (Morrison, 2005). However, behavioral and psychiatric literatures have generally failed to find important twin-singleton differences in later life (e.g., Barnes and Boutwell, 2013, Christensen et al., 2006, Johnson et al., 2002, Kendler et al., 1995, Ordaz et al., 2010, Sadeghi et al., 2017, Tsou et al., 2008), suggesting that with appropriate exclusionary criteria for prematurity, extreme perinatal complications, and low birth-weight, twins are very much like singletons.

Geographic mobility and the external validity of birth cohort data

A second challenge to the implementation of any US-based study that relies on birth-record recruitment is posed by population mobility. How well does a birth cohort of children in a state capture the socioeconomic and race/ethnic diversity of all children living in that state, since birth records exclude children who have moved into the state after their birth? The answer will likely vary as a function of immigration rates into, and emigration rates from, different states. Therefore, we need to understand more about population mobility patterns. We analyzed US Census Public Use Microdata Series (PUMS) data (Ruggles et al., 2015) to examine the percentages of all children aged 9–12 who were born in state, in the metropolitan areas of Denver, Colorado; Minneapolis-St Paul, Minnesota; St Louis, Missouri; and Richmond, Virginia. Respective percentages born in state were: Denver, 77%; Minneapolis-St Paul, 82%; St Louis, 84%; and Richmond, 79%. In other words, a birth cohort will capture 75–85% of the major race/ethnic group children resident in state at ages 9–12 for the consortium sites. Thus reliance on birth cohorts of twin pairs should have minimal effects on the generalizability of findings.

Twin Hub sites and participant ascertainment

The four sites in the Twin Hub each have over 25 years of experience studying behavioral development in twins, adoptees, and nuclear families, with a focus on the genetic contributions to substance use (SU). Twin study findings from member sites have addressed a wide range of substantive topics, including reports that have examined how specific and generalized risk are associated with the development of adolescent SU (Krueger et al., 2002, Maes et al., 2004, Palmer et al., 2012, Palmer et al., 2013b, Palmer et al., 2009, Young et al., 2006, Young et al., 2000), the strong relationship of antisocial behavior to alcohol and drug use (Button et al., 2006, Button et al., 2007, Button et al., 2009, Hicks et al., 2013, Hopfer et al., 2013), insights into G-E interplay in adolescent SUD development (Hicks et al., 2010, Hicks et al., 2012, Hicks et al., 2014, Irons et al., 2015, Maes et al., 2017, Vrieze et al., 2012), the relationship between SU and brain integrity (Anokhin and Golosheykin, 2016, Botteron et al., 2002, Carlson et al., 2002, Carlson et al., 2004a, Carlson et al., 2004b, Carlson et al., 2007, Gustavson et al., 2017, Harper et al., 2016, Isen et al., 2014, Malone et al., 2014a, Pagliaccio et al., 2015, Palmer et al., 2013a, Perlman et al., 2009, Prom-Wormley et al., 2015, Sparks et al., 2014, Wilson et al., 2015, Yoon et al., 2015, Young et al., 2009), and the molecular genetic bases of SUDs (Agrawal et al., 2012, Clark et al., 2017, Derringer et al., 2015, Maes et al., 2016, McGue et al., 2013, Samek et al., 2016) and associated endophenotypes (Iacono, 2014, Iacono et al., 2014a, Iacono et al., 2014b, Liu et al., 2017, Malone et al., 2016, Malone et al., 2014b, Vaidyanathan et al., 2014a, Vaidyanathan et al., 2014b; Vaidyanathan et al., 2014c, Vrieze et al., 2014). Member sites have also led the field in the development of methods for the analysis of data from twins (Neale and Cardon, 1992c, Neale et al., 2006), and pioneered research on the human gut microbiome (Faith et al., 2013, Turnbaugh et al., 2009), including human-to-mouse gut microbiota transplantation studies (Ridaura et al., 2013).

University of Colorado – Boulder

Participation in the Colorado Twin Registry (CTR) is achieved through longstanding collaboration between the Institute for Behavioral Genetics (IBG) and the Colorado Department of Health (CDH), and is described in Rhea et al., 2006, Rhea et al., 2013. Briefly, in 1984 the CDH began mailing inquiry letters on behalf of IBG to parents of living twins born from 1982 forward. Since 1999, the CDH had adopted a ‘negative consent’ process whereby data were released for twin births unless the parents returned a card specifically prohibiting the CDH from doing so. The CDH does not release information if inquiry letters are returned as undeliverable, ‘return to sender’. In preparation for the ABCD project, we collaborated with the CDH to update addresses of the original undeliverable ‘return to senders’ from the birth years 2006–2008 and tried again to contact them. We also used our proven tracking methods to update all contact information for the target birth cohorts. According to the CDH, 3217 twin pairs were born in Colorado during the target years 2006–2008. The CTR initially had consent to contact and access birth record information for 82% of those (2647 pairs). Among those, 1691 pairs are same sex, and we estimate, based on same-sex and opposite-sex numbers and using Weinberg’s formula, that 625 (38.2%) of those are MZ, and 1011 (61.8%) are DZ. For the birth cohort as a whole, the estimates of the proportions of MZ to DZ pairs among the same-sex pairs are 39.2% and 60.8% respectively, not significantly different from the CTR ascertained pairs. The increase in the proportion of DZ pairs compared to two decades earlier (see e.g., Hur et al., 1995) is common across all sites in the twin hub, and is largely a function of increasing average maternal age and the higher DZ twinning rate associated with older mothers (Rhea et al., 2017). As described above, additional contact effort added a further 243 pairs to our target sample. The approximate race/ethnicity breakdown, from CDH reports based on birth record information is 25% Hispanic or Latino (75% non-Hispanic), 96% white, 3% black or African American, <1% each of American Indian/Alaskan, Asian, and Native Hawaiian or Other Pacific Islander.

University of Minnesota

In Minnesota, prospective ABCD twin pairs are drawn from the MCTFR twin registry (Iacono and McGue, 2002, Iacono et al., 2006). Using publicly available Minnesota Department of Health birth records, which include a multiple birth check-off and the address of the parents at the time of twin births, we begin by identifying the records for live-born twins from birth years of interest (e.g., from 2006 to 2008 for ABCD). Using this information, other publicly available resources, and commercial search software, the MCTFR updates the addresses and obtains parent phone numbers. We then contact families and, if they still reside in Minnesota, ask them to become part of the MCTFR registry where they can expect to be contacted at some future time to participate in a research project. For ABCD, the MCTFR recruits families who live in the metropolitan Twin Cities region (approximately 50% of the statewide population) from this registry during the year the twins reach age 9 or 10. Twin eligibility for participation requires that families satisfy the same inclusion and exclusion criteria employed consortium wide by ABCD singletons. For any given ABCD-relevant birth year, there are approximately 730 metropolitan twin births, 270 of which are opposite sex dizygotic pairs who are not recruited for ABCD. From this information, and using Weinberg’s formula, the MCTFR can estimate that of the 460 same sex pairs born in the greater Minneapolis area each year, approximately 40% are monozygotic and 60% dizygotic. According to the State birth certificate data, about 74% of the families are white non-Hispanic, 6% are black, 3% are Asian, and 17% are of other, mixed, or unknown ethnicity. Six percent of the parents identify as Hispanic.

Virginia Commonwealth University

In Virginia, prospective ABCD twin pairs are drawn from the Mid-Atlantic Twin Registry (MATR; Anderson et al., 2002, Lilley and Silberg, 2013) of Virginia Commonwealth University (VCU). Using Department of Health birth records from VA and NC, the MATR begin by identifying the records for live-born twins from birth years of interest (e.g., from 2006 to 2008 for ABCD). This information is then combined with other publicly available resources, and commercial search software to update addresses and obtain parents' phone numbers. We then contact families and ask them to become part of the MATR, which allows us to keep track of them and invite them to take part in research studies. The ABCD study recruits families from the MATR who live within a three-hour drive of Richmond during the year the twins reach age 9 or 10. Eligibility for participation requires that families satisfy the same inclusion and exclusion criteria employed consortium-wide by ABCD singletons. For the ABCD-relevant birth years of 2006–2008 in Virginia, there are approximately 5432 twin births, 1968 of which are opposite sex dizygotic pairs who are not recruited for ABCD. From this information, and using Weinberg’s formula, the MATR can estimate that of the 3464 same sex pairs born in the state during these years, approximately 27.5% are monozygotic and 72.5% dizygotic. According to the State birth certificate data potentially eligible twin families in VA and NC ages 9–10 during the ABCD recruitment timeframe, about 65% of the families are white non-Hispanic, 3% are white with Hispanic ethnicity, 22% are black, 6% are Asian, and 4% are of other, mixed, or unknown ethnicity.

Washington University

At the Washington University site, after review by the state DHSS IRB and approval of stringent data security measures, we were granted access to all birth records for years 2006–2008, with the exception of births where an adoption had occurred (because of a restriction in state law). This authorization, in addition to identifiers, included standard sociodemographic variables (such as maternal educational level, marital status during pregnancy/childbirth, age at childbirth, maternal state/country of birth). Such variables are useful for the characterization of sampling biases occurring through failure to locate a twin pair, or through non-response of a family to invitations to participate. We identified like-sex twin pairs, and excluded pairs with known fetal or other death of either or both twins, and pairs not meeting study eligibility criteria because of prematurity or low birth weight. We merged maternal data with a cumulative data-base of state driver’s license/state ID records for a secondary data analysis phase (negative screen), to exclude pairs whose mother appeared either to have moved out of state, or to be not resident within Missouri within one hundred miles of the Medical Center. However, because Hispanic families in Missouri are disproportionately resident in Kansas City, some 250 miles from the Medical Center, these families were retained in our sampling frame. Pairs were then assigned for tracing using standard fee-for-service commercial data-bases. In a small proportion of pairs the mother could not be linked to driver’s license/state ID data (2.2%). Major predictors of inability to link the mother were (i) mother born out-of-state (Spearman rhos = −0.17 for white non-Hispanics, −0.12 for African-Americans): 64% of linked versus 7% of unlinked white non-Hispanics born out-of-state, 74% versus 38% for African-Americans; (ii) low maternal educational level (Spearman rho of −0.14 in white non-Hispanics, with 37% of linked versus 75% of unlinked having high school education or less, but no significant association in African Americans); (iii) mother married, more likely to be reported by linked than unlinked white non-Hispanics (76% versus 53%, Spearman rho = 0.07) but less likely to be reported by linked than unlinked African Americans (28% versus 53%, Spearman rho = −0.04). Given the relatively small absolute percentage unlinked, we would not expect the exclusion of these pairs from our sampling frame to negatively impact external validity.

Data analysis overview

Consistent with the ABCD Data Analysis and Informatics Core, the primary statistical analysis framework for modeling twin data is the hugely popular statistical programing language R (R Core Team, 2012). This standardization around open source software conveys enormous benefits for transparency, rigor and reproducibility between the twin and non-twin analyses. The most widely used methodological framework for analyses of twin data is structural equation modeling (SEM; Bollen, 1989), which encompasses a huge array of statistical methods, including regression, mixed models, multilevel and factor analysis. Fortunately, it is possible to fit these models in R, which makes for a seamless integration of the twin and non-twin data analyses. The following sections describe in more detail the precise analyses that yield the interpretations described in Sections 1 and 2.

Blending twin data with singleton data across the ABCD consortium

Data from the ABCD twins can be analyzed in two main ways. First, the intent is that all data will be shared with the ABCD Data Analysis and Informatics Center (DAIC) for incorporation in all non-twin analyses. Linear modeling – such as may be used to predict substance use at a later time from earlier neurocognitive and demographic variables – will include clustering by twin pair to correct parameter standard errors which may be biased by the non-independence of the members of a twin pair. Thus twin pairs will play an important role in almost all ‘non-twin’ analyses in the ABCD study. Second, the twin data can be analyzed to yield inferences about the relative impacts of genetic and environmental factors, be these measured or inferred (i.e., latent). In light of how twin subjects undergo the same assessments as singletons, these analyses will also include singletons, because more accurate estimates of the within-person means, variances and covariances increase the statistical power to make inferences about individual differences in cognition, development, psychopathology and SU. Importantly, twins located through school-based recruitment at singleton ABCD sites will be welcomed into the ABCD study, and will be pooled with Twin Hub twin data in analyses where their non-random (non-birth-record) inclusion (such as referred by a schoolmate) would not be a confound.

Univariate analysis

A statistical model for the classical twin design is shown as a path diagram in Fig. 1. There is a one-to-one relationship between a path diagram and a set of linear model equations that describe it (Neale and Cardon, 1992a). Latent variables (A, C, E and D) are shown in circles, and observed measurements as rectangles. With MZ and DZ twins it is possible to estimate paths (regression coefficients) from the latent additive genetic (a), individual specific environment (e) and either genetic dominance (d) or common/shared environment (c) factors. The model is identified by separating the twins into MZ and DZ groups, in which the covariances between twins’ latent variables differ according to genetic theory. MZ twins’ genotypes are essentially identical, so their additive genetic factors are set to correlate 1.0, whereas those of DZ twins are set to 0.5, consistent with a model of many loci on the genome being (weakly) associated with outcomes. Repeated application of this univariate (monophenotype) model to, e.g., voxel-wise cortical thickness data, enables construction of maps of the impact of genetic and environmental factors on the brain (e.g., Schmitt et al., 2007a). These will be generated separately according to gender, with statistical control for age, research site, scanner type and sociodemographic variables as needed. SU, psychopathology and risk factors will be analyzed similarly. Such baseline analyses are important to validate methodology; results may be compared against previous findings and can provide ‘sanity checking’ of more complex analyses.

Fig. 1

Path diagram for resemblance between observed variables between twin 1 and Twin 2 in a pair. Latent additive genetic (A), common environment (C), dominance genetic (D), and individual- specific environment (E) variables cause variation.

Multivariate analysis

Similar to the partitioning of variation within MRI measures or SU, covariation between them can be divided into genetic and environmental sources. These multivariate analyses (see Fig. 2), when applied to measures such as voxelwise data of cortical thickness or surface area, enable the drawing of genetic correlation maps either among brain measures or between them and relevant cognitive or substance use outcomes (Prom-Wormley et al., 2015, Schmitt et al., 2007a, Schmitt et al., 2009, Schmitt et al., 2007b, Wallace et al., 2010). Although very high dimensional models (there being many tens of thousands of voxels per hemisphere in a typical scan) cannot be fitted with only 800 pairs of twins, it is possible to assemble such large genetic or environmental correlation matrices from the repeated application of bivariate analyses. These large matrices can then be explored for genetically homogeneous regions, such as were reported by Twin Hub and UCSD researchers in Science (Chen et al., 2012) and PNAS (Chen et al., 2013). Of particular interest in the present study are associations between MRI assessments and risk factors or outcomes such as SU phenotypes and mental health.

Fig. 2

Multivariate genetic factor model for endophenotypes (End1-END3) and behavioral measures (Beh4–Beh7). Latent additive genetic (A), common environment (C), and individual-specific environment (E) sources of variation affect the factors and the residual variance specific to each measure. An important issue with many phenotypes is that data of interest is not yet available on those who have not yet expressed the phenotype of interest. For example, abuse or dependence are not available among those who have yet to initiate use. For some purposes, such as estimating the dose-response relationship of SU on brain structure or function, it is appropriate to code non-initiators as zero. However, in cases where an individual’s propensity to heavy use or dependence (Conway et al., 2010, Vanyukov et al., 2012) is of interest, ‘probe’ items concerning quantity of use, symptoms of abuse or addiction are best coded as missing in the absence of initiation. Maximum likelihood methods are robust to data of this type, which may be considered as ‘missing by design.’ Importantly, the relationship between liability to initiate use and liability to express symptoms of, say, addiction cannot be measured in unrelated individuals. Effectively, there is no variation in the initiation measure among those who have been measured on addiction (i.e., all youth with measures for problems of addiction have initiated use), so it is not possible to estimate covariance between initiation and addiction. Data from twins, however, overcome this limitation by using the cotwin’s data. Conceptually, we can imagine that average liability to initiate would be higher among twins concordant for initiation than in discordant pairs. If liability to initiate is related to that for addiction, then higher rates of addiction should be observed in concordant than in discordant pairs. This is an issue the Twin Hub is poised to address as the ABCD pre-adolescent sample ages into adolescence. Statistical models for this type of effect are known as conditional-causal-common pathway (Kendler et al., 1999, Koopmans et al., 1999, Neale et al., 2006). These alternative models provide a way to distinguish between the effects of actual use as compared to the propensity to use. More fine-tuned resolution can be effected using comorbidity models (Neale and Kendler, 1995) which can be distinguished with adequate statistical power under many scenarios (Rhee et al., 2004).

Analyzing direction of causation with data from twins

Standard multivariate models for twin data (e.g., Fig. 2), which cannot speak to causation, are essentially common factor models that separate both factor and residual variances into A, E and C/D components (McArdle and Goldsmith, 1990). Direction of causation models replace one or more of the common factor components with regression paths directly from one phenotype to another, i.e., multiple linear regression (e.g., Fig. 3). In the absence of common factors, a model with direct causal paths from every observed variable to every other observed variable is identified with data from relatives. In practice, such causal models are sensitive to the proportion of measurement error in each phenotype (Heath et al., 1993a), but bias from this issue can be corrected when either multiple indicators or repeated measures are available − as is the case in the ABCD study. These methods can be applied to either continuous measures (e.g., neuroimaging) or ordinal ones (e.g., substance initiation or quantity). Hybrid models that include both common factors and direct phenotype-to-phenotype causation are an active area of methodological development; such models should be available for future ABCD analyses.

Fig. 3

Multivariate direction of causation model for five observed variables (×1–×5) with additive genetic (A), common environment (C), and individual-specific environment (E) sources of variation specific to each measure.

Longitudinal models for twin data

Randomized experimental designs are considered the gold standard for inferring causation, but they are unsuitable for epidemiological studies of substance use, because the experimental conditions depart from those in nature, and it is, clearly, unethical to give drugs to children. Both longitudinal and twin study data permit some causal inference, and their combination provides internal validation. The four Twin Hub sites have a long history of conducting longitudinal twin studies, and of developing methods to analyze them. The Markov model coauthored by Heath (Eaves et al., 1986) permits estimation of genetic and environmental contributions to developmental change of a single trait, e.g., hippocampal volume (see Fig. 4).

Fig. 4

Genetic model of developmental change over time, following Eaves et al. (1986). The effects of occasion-specific (a1…at) and constant genetic factors (a) may accumulate over time through paths g. Similar processes may occur for environmental components. A straightforward extension is to permit moderation of, e.g., genetic innovation by substance use status, the prediction being that new sources of genetic variation would emerge as a direct function of substance use. This approach is essentially a longitudinal data extension of the genotype by environment interaction (GXE) model described by Purcell (2002) and van der Sluis et al. (2012), with the important addition that substance use can elicit new sources of genetic variance (increase values of a2…a4) as well as change the effect size of factors operating in the absence of substance use (a1). GXE interaction models in the presence of G-E covariance present special challenges (Rathouz et al., 2008, Van Hulle and Rathouz, 2015, Zheng and Rathouz, 2015; Zheng et al., 2015); some are met by careful structural equation modeling, while others require complexities such as numerical integration of the likelihood over the range of hypothetical values of an unmeasured genetic or environmental factor. Latent growth curve (LGC) models (Meredith and Tisak, 1990, Muthen and Asparouhov, 2015, Muthén et al., 2011, Nylund et al., 2008) are popular for the analysis of longitudinal data, partly because they require relatively few parameters regardless of the number of measurement occasions. They also make predictions about means and covariances over time, and across relatives. Most applications feature variance in level, slope, quadratic and residual components, and these random components can be partitioned into the genetic and environmental factors using the twin study design. We note that beyond descriptive polynomial growth curves, it is possible to compare the fit of many mathematical curves, such as the Gompertz, exponential or logistic (McArdle and Hamagami, 2003, Neale and McArdle, 2000). Variable inter-test intervals are also readily modeled, as Schmitt et al. (2014) showed in analyses of twins and siblings in the NIMH intramural study of brain development. To address whether substance use directly affects developmental trajectories, or that they covary due to shared common risk factors, we will use multivariate extensions of the LGC model. Analogous to the single occasion models, both common factor and direction of causation models can be implemented within the LGC framework. We will also test for heterogeneity of developmental trajectories (Lubke and Muthen, 2005, Muthén et al., 2011), and patterns of switching between them, using growth mixture modeling (Dolan et al., 2005, Raijmakers et al., 2001). Patterns of substance use over time vary considerably both within and between persons; individuals may oscillate between periods of drug use and remission. To investigate such irregular variation, we can employ dynamical systems models. These are especially useful for long time series (e.g., from fMRI or ecological momentary assessments such as might be obtained from Fitbit or other personal devices), and here a more individual-specific approach may be taken (Boker et al., 2014, Boker et al., 2009, Boker and Nesselroade, 2002, Hu et al., 2014). Parameter estimates from such models may then be explored to identify ‘types’ of substance use pattern, via methods such as latent class analysis. These models are again suitable for application to multivariate and twin data. They can offer novel insights into processes such as relapse and rebound effects, known phenomena in substance use patterns over time.

Models for data collected from parents

While the twin study is a powerful design to partition genetic and environmental factors, it has its limitations, and can benefit greatly from the addition of data collected from other relatives (Keller et al., 2009, Medland et al., 2009). Measures of psychopathology and substance use history from the parents can provide valuable insights into the environmental and genetic transmission of substance use liability. In addition, spousal resemblance for substance use has been found to be substantial. In a twin study, the genetic effects of parental assortative mating can be confounded with those of the shared environment, as both processes increase equally the phenotypic variance and MZ and DZ covariances. Modeling data from parents permits resolution between mechanisms of marital resemblance (Eaves and Heath, 1981), and expanded twin models that include parents can differentiate between genetic and cultural transmission over generations (Neale and Fulker, 1984).

Statistical power

The advantages of data from twins described above are clear, but the choice of 800 pairs is less so. To maximize statistical power, one might choose to study 5500 pairs (i.e., make the entire consortium sample a twin sample), but this would not be feasible for many reasons, and the representativeness of the sample and the generalization of findings to non-twins might be questioned. Instead, ABCD is designed to assess and control for any twin-non-twin differences, by ascertaining both types at each of the four Twin Hub sites. Each Twin Hub site will ascertain and Image 200 pairs, along with 150 or 200 non-twins, within a single two-year timeframe for baseline data collection. The resulting study of 800 twin pairs is the largest longitudinal prospective neuroimaging study of twins to date. Because DZ twins outnumber MZ pairs by about 2:1, to optimize power to detect both genetic and shared environmental effects, the twin hub plans to over-recruit MZ pairs such that the final sample is more balanced by zygosity. The power to detect additive genetic effects depends on the proportion of shared environmental effects, and vice versa (Visscher, 2004, Visscher et al., 2008). For continuous measures (neuroimaging phenotypes, height, weight, factor scores derived from many test items), power to detect heritable variation (a2) consistently exceeds 0.8 when a2 > 0.3 regardless of the proportion shared environmental effects, c2. The converse is true for detecting c2. Power to detect a2 or c2 in binary measures (e.g., initiation of marijuana) is lower (Neale et al., 1994) and depends on the proportion who meet criteria. Maximum power is available when half the sample meets criteria, for which case our sample size yields >0.8 power to find a2 > 0.5 significant at the 0.05 level; for prevalences of 25, 10 and 5%, a2 of 0.6, 0.7 and 0.8, respectively, suffice. Power increases when shared environmental variance is non-zero. Considerably more power is available to detect variation in traits measured at the ordinal level; a2 > 0.4 with a 5-category scale where the lowest category is 50%. These results emphasize the need to use at least ordinal and preferably continuous level measurement wherever possible. Multivariate analyses also increase power in many circumstances (Evans, 2002). Power to identify direction of causation hypotheses increases with the difference in the pattern of twin correlations of the two traits. Heath et al. (1993b, see their Table IV) show that 800 pairs can be adequate for traits correlating as little as 0.15 if their twin correlations differ substantially. The greater the traits correlate, the higher the statistical power. We expect to find highly heritable structural neuroimaging traits, but lower values for functional imaging measures, substance use, psychopathology and risk factors, which benefits power for this type of analysis. It would be optimistic to find correlations exceeding 0.3 across domains.

Summary and conclusions

The study of twins has provided compelling evidence of the ubiquitous influence of genetics, as well as environments, on important human traits ranging from anthropometric characteristics, to physiological and biochemical traits, to diseases, and psychological and behavioral traits and disorders (Polderman et al., 2015). Recent advances in molecular biology and neuroscience have brought into even sharper focus the value of the classical twin study in understanding the biological pathways, including those involving epigenetics, the metabolome, the microbiome, and brain structure and function, that underlie complex human traits (van Dongen et al., 2012). If twins were rare or unusual in some respect, this conceptual elegance would not be of much practical advantage. However, twin births are relatively frequent (now approximately 3% of children) and their twin status carries no stigma or barrier to their participation in research. Indeed, it has been the uniform experience of the ABCD Twin Hub investigators, and that of researchers around the world, that twins are exceptionally willing to participate in research studies (Martin et al., 1997) and that they show strong levels of commitment to longitudinal assessment. This has allowed us to develop twin registries involving thousands of pairs of twins and their family members who are representative of the population as a whole, and who are willing to participate in multiple research assessments over long periods of time (Anderson et al., 2002, Iacono et al., 2006, Rhea et al., 2006, Rhea et al., 2013). These fortuitous circumstances have made twins and their families ideal participants for an intensive and extensive ten-year longitudinal study like the ABCD. The ABCD twin study will allow us to assess the role of genetics and environment on deep and detailed phenotypic assessments, such as functional MRI data, that would be hard to achieve any other way. By comparison, in genome wide association analysis (GWAS) and its related estimation of ‘SNP heritability’, very large samples are needed to obtain reasonable statistical precision. Having such large samples would preclude the level of fine grained and extensive phenotypic analysis required by ABCD. Establishing the extent of the genetic contribution to the outcome is a critical first step, especially when considering novel phenotypes in developmental neuroscience, like fMRI data during complex cognitive tasks. But the real power of twin studies, and their careful and rigorous statistical analysis, is that they permit us to address critical questions about the associations among traits and even the causal relationships underlying them. Through bivariate and multivariate analyses, we can test hypotheses about the causes of comorbidity or correlation between two or more traits. Thus although two traits may be associated, they may be so because of genetic pleiotropy when a common genetic cause influences two or more phenotypes; the classic early example in psychiatric genetics is that depression and anxiety share a common genetic vulnerability (Kendler et al., 1992). Alternatively, they may be associated because one trait, e.g. substance use, directly alters the second trait, e.g. brain development. Conceptually, the most striking application of the twin design in distinguishing phenotypic causation from genetic pleiotropy is the through the use of discordant MZ twins. MZ co-twin designs are particularly powerful for testing hypotheses about specific risks because MZ twins are perfectly matched for age, sex, and genetic background, and partially matched for environmental background (van Dongen et al., 2012). Given this, the association of a risk phenotype (like marijuana use), that is present in one member of a pair and not the other, with an outcome (like IQ or brain function), would provide clear evidence of causation that cannot be explained away as mere genetic correlation. As has been described in this paper, this conceptually elegant idea can be extended to quantitative analyses of the twins’ resemblance for the traits. The signature characteristic of genetic pleiotropy as distinct from phenotypic causality is that there is a genetic correlation but no environmental correlation. The signature characteristic of phenotypic causation is that the outcome is predicted to the same degree by both the genetic and environment variation influencing the causal phenotype. This distinction can be assessed through appropriate bivariate or multivariate genetic modeling of twin data (e.g., De Moor et al., 2008, Jackson et al., 2016, van Beek et al., 2014). In the absence of twin data, ABCD’s epidemiological survey of individual children would leave unresolved some critical questions about the interpretation of observed associations among substance use or other environmental exposures and outcomes for the developing brain or cognitive performance. Outlining, in hypothetical terms, the utility of twins in developmental neuroscience, versus making an intensive and extensive study like the ABCD a reality, are quite different things. Fortunately, the ABCD Twin Hub comprises four study sites that each represent decades of experience in ascertaining and recruiting twins, implementing intensive assessment protocols, and successfully retaining families in their studies over long periods of time, in some cases over decades from infancy to adulthood. To accomplish this requires the development of highly experienced study teams, an investment of time and funds in establishing relationships with government offices, maintaining databases and family contacts through such things as annual newsletters and holiday cards over long periods of time, considerable efforts devoted to regulatory compliance, a substantial dose of dogged persistence and, perhaps above all, an enthusiasm for the value of the study of twins to make a real contribution to psychological science in general and, now, ABCD in particular.

Conflict of Interest

None.

170 in total

1. Heritability of volumetric brain changes and height in children entering puberty.

Authors: Inge L C van Soelen; Rachel M Brouwer; G Caroline M van Baal; Hugo G Schnack; Jiska S Peper; Lei Chen; René S Kahn; Dorret I Boomsma; Hilleke E Hulshoff Pol
Journal: Hum Brain Mapp Date: 2011-12-03 Impact factor: 5.038

2. Neural Correlates of Response Inhibition in Adolescents Prospectively Predict Regular Tobacco Smoking.

Authors: Andrey P Anokhin; Simon Golosheykin
Journal: Dev Neuropsychol Date: 2016 Jan-Mar Impact factor: 2.253

3. Power of the classical twin design revisited.

Authors: Peter M Visscher
Journal: Twin Res Date: 2004-10

4. Genetic topography of brain morphology.

Authors: Chi-Hua Chen; Mark Fiecas; E D Gutiérrez; Matthew S Panizzon; Lisa T Eyler; Eero Vuoksimaa; Wesley K Thompson; Christine Fennema-Notestine; Donald J Hagler; Terry L Jernigan; Michael C Neale; Carol E Franz; Michael J Lyons; Bruce Fischl; Ming T Tsuang; Anders M Dale; William S Kremen
Journal: Proc Natl Acad Sci U S A Date: 2013-09-30 Impact factor: 11.205

5. The heritability of life events: an adolescent twin and adoption study.

Authors: Heather R Bemmels; S Alexandra Burt; Lisa N Legrand; William G Iacono; Matt McGue
Journal: Twin Res Hum Genet Date: 2008-06 Impact factor: 1.587

Review 6. The continuing value of twin studies in the omics era.

Authors: Jenny van Dongen; P Eline Slagboom; Harmen H M Draisma; Nicholas G Martin; Dorret I Boomsma
Journal: Nat Rev Genet Date: 2012-07-31 Impact factor: 53.242

7. Heritability of brain volume change and its relation to intelligence.

Authors: Rachel M Brouwer; Anna M Hedman; Neeltje E M van Haren; Hugo G Schnack; Rachel G H Brans; Dirk J A Smit; Rene S Kahn; Dorret I Boomsma; Hilleke E Hulshoff Pol
Journal: Neuroimage Date: 2014-05-09 Impact factor: 6.556

8. Efficient calculation of empirical P-values for genome-wide linkage analysis through weighted permutation.

Authors: Sarah E Medland; James E Schmitt; Bradley T Webb; Po-Hsiu Kuo; Michael C Neale
Journal: Behav Genet Date: 2008-09-23 Impact factor: 2.805

9. Genetics of brain fiber architecture and intellectual performance.

Authors: Ming-Chang Chiang; Marina Barysheva; David W Shattuck; Agatha D Lee; Sarah K Madsen; Christina Avedissian; Andrea D Klunder; Arthur W Toga; Katie L McMahon; Greig I de Zubicaray; Margaret J Wright; Anuj Srivastava; Nikolay Balov; Paul M Thompson
Journal: J Neurosci Date: 2009-02-18 Impact factor: 6.167

10. GE covariance through phenotype to environment transmission: an assessment in longitudinal twin data and application to childhood anxiety.

Authors: Conor V Dolan; Johanna M de Kort; Toos C E M van Beijsterveldt; Meike Bartels; Dorret I Boomsma
Journal: Behav Genet Date: 2014-05-01 Impact factor: 2.805

19 in total

1. White Matter Tract Integrity, Involvement in Sports, and Depressive Symptoms in Children.

Authors: Lisa S Gorham; Deanna M Barch
Journal: Child Psychiatry Hum Dev Date: 2020-01-25

2. Substance use patterns in 9-10 year olds: Baseline findings from the adolescent brain cognitive development (ABCD) study.

Authors: Krista M Lisdahl; Susan Tapert; Kenneth J Sher; Raul Gonzalez; Sara Jo Nixon; Sarah W Feldstein Ewing; Kevin P Conway; Alex Wallace; Ryan Sullivan; Kelah Hatcher; Christine Kaiver; Wes Thompson; Chase Reuter; Hauke Bartsch; Natasha E Wade; Joanna Jacobus; M D Albaugh; N Allgaier; A P Anokhin; K Bagot; F C Baker; M T Banich; D M Barch; A Baskin-Sommers; F J Breslin; S A Brown; V Calhoun; B J Casey; B Chaarani; L Chang; D B Clark; C Cloak; R T Constable; L B Cottler; R K Dagher; M Dapretto; A Dick; E K Do; N U F Dosenbach; G J Dowling; D A Fair; P Florsheim; J J Foxe; E G Freedman; N P Friedman; H P Garavan; D G Gee; M D Glantz; P Glaser; M R Gonzalez; K M Gray; S Grant; F Haist; S Hawes; S G Heeringa; R Hermosillo; M M Herting; J M Hettema; J K Hewitt; C Heyser; E A Hoffman; K D Howlett; R S Huber; M A Huestis; L W Hyde; W G Iacono; A Isaiah; M Y Ivanova; R S James; T L Jernigan; N R Karcher; J M Kuperman; A R Laird; C L Larson; K H LeBlanc; M F Lopez; M Luciana; B Luna; H H Maes; A T Marshall; M J Mason; E McGlade; A S Morris; C Mulford; B J Nagel; G Neigh; C E Palmer; M P Paulus; D Pecheva; D Prouty; A Potter; L I Puttler; N Rajapakse; J M Ross; M Sanchez; C Schirda; J Schulenberg; C Sheth; P D Shilling; E R Sowell; N Speer; L Squeglia; C Sripada; J Steinberg; M T Sutherland; R Tomko; K Uban; S Vrieze; S R B Weiss; D Wing; D A Yurgelun-Todd; R A Zucker; Mary M Heitzeg
Journal: Drug Alcohol Depend Date: 2021-07-29 Impact factor: 4.852

3. Minnesota Center for Twin and Family Research.

Authors: Sylia Wilson; Kevin Haroian; William G Iacono; Robert F Krueger; James J Lee; Monica Luciana; Stephen M Malone; Matt McGue; Glenn I Roisman; Scott Vrieze
Journal: Twin Res Hum Genet Date: 2019-12-03 Impact factor: 1.587

4. Twin studies of brain, cognition, and behavior.

Authors: John K Hewitt
Journal: Neurosci Biobehav Rev Date: 2020-04-24 Impact factor: 8.989

5. Commentary: Substance use and the brain: it is not straightforward to differentiate cause from consequence - a commentary on Kim-Spoon et al. (2020).

Authors: Sylia Wilson
Journal: J Child Psychol Psychiatry Date: 2020-08-17 Impact factor: 8.982

6. Genetic and environmental influences on executive functions and intelligence in middle childhood.

Authors: Samantha M Freis; Claire L Morrison; Jeffrey M Lessem; John K Hewitt; Naomi P Friedman
Journal: Dev Sci Date: 2021-07-29

Review 7. Meaningful associations in the adolescent brain cognitive development study.

Authors: Anthony Steven Dick; Daniel A Lopez; Ashley L Watts; Steven Heeringa; Chase Reuter; Hauke Bartsch; Chun Chieh Fan; David N Kennedy; Clare Palmer; Andrew Marshall; Frank Haist; Samuel Hawes; Thomas E Nichols; Deanna M Barch; Terry L Jernigan; Hugh Garavan; Steven Grant; Vani Pariyadath; Elizabeth Hoffman; Michael Neale; Elizabeth A Stuart; Martin P Paulus; Kenneth J Sher; Wesley K Thompson
Journal: Neuroimage Date: 2021-06-18 Impact factor: 6.556

8. The Effects of Alcohol and Cannabis Use on the Cortical Thickness of Cognitive Control and Salience Brain Networks in Emerging Adulthood: A Co-twin Control Study.

Authors: Jeremy Harper; Stephen M Malone; Sylia Wilson; Ruskin H Hunt; Kathleen M Thomas; William G Iacono
Journal: Biol Psychiatry Date: 2021-01-20 Impact factor: 13.382

Review 9. Twin studies to GWAS: there and back again.

Authors: Naomi P Friedman; Marie T Banich; Matthew C Keller
Journal: Trends Cogn Sci Date: 2021-07-24 Impact factor: 24.482

10. Conduct disorder symptomatology is associated with an altered functional connectome in a large national youth sample.

Authors: Scott Tillem; May I Conley; Arielle Baskin-Sommers
Journal: Dev Psychopathol Date: 2021-04-14