| Literature DB >> 31921287 |
Justin M Luningham1, Daniel B McArtor1, Anne M Hendriks2,3, Catharina E M van Beijsterveldt2,3, Paul Lichtenstein4, Sebastian Lundström5, Henrik Larsson4,6, Meike Bartels2,3,7, Dorret I Boomsma2,3,7, Gitta H Lubke1.
Abstract
Parallel meta-analysis is a popular approach for increasing the power to detect genetic effects in genome-wide association studies across multiple cohorts. Consortia studying the genetics of behavioral phenotypes are oftentimes faced with systematic differences in phenotype measurement across cohorts, introducing heterogeneity into the meta-analysis and reducing statistical power. This study investigated integrative data analysis (IDA) as an approach for jointly modeling the phenotype across multiple datasets. We put forth a bi-factor integration model (BFIM) that provides a single common phenotype score and accounts for sources of study-specific variability in the phenotype. In order to capitalize on this modeling strategy, a phenotype reference panel was utilized as a supplemental sample with complete data on all behavioral measures. A simulation study showed that a mega-analysis of genetic variant effects in a BFIM were more powerful than meta-analysis of genetic effects on a cohort-specific sum score of items. Saving the factor scores from the BFIM and using those as the outcome in meta-analysis was also more powerful than the sum score in most simulation conditions, but a small degree of bias was introduced by this approach. The reference panel was necessary to realize these power gains. An empirical demonstration used the BFIM to harmonize aggression scores in 9-year old children across the Netherlands Twin Register and the Child and Adolescent Twin Study in Sweden, providing a template for application of the BFIM to a range of different phenotypes. A supplemental data collection in the Netherlands Twin Register served as a reference panel for phenotype modeling across both cohorts. Our results indicate that model-based harmonization for the study of complex traits is a useful step within genetic consortia.Entities:
Keywords: consortia; data integration; genome-wide association studies; latent variable modeling; phenotype harmonization
Year: 2019 PMID: 31921287 PMCID: PMC6914843 DOI: 10.3389/fgene.2019.01227
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Example of the bi-factor integration model. T represents the general trait. Q1 and Q2 are the specific factors corresponding to questionnaire 1 and questionnaire 2, respectively. λ represents the factor loading, γ represents the effect of the single-nucleotide polymorphism on the general trait. Thresholds and error terms are not depicted.
Various data-generating models and sample sizes used in simulations, resulting in 20 simulation conditions.
| Sample sizes | Data-generating models | |
|---|---|---|
| Varying simulation conditions | N1: cohort 1 = 5,000, cohort 2 = 5,000, Ref= 400 | Model 1: same measurement across item sets |
| N2: cohort 1 = 2,500, cohort 2 = 2,500, Ref= 400 | Model 2: different levels of item set reliability | |
| N3: cohort 1 = 7,500, cohort 2 = 7,500, Ref= 400 | Model 3: mean and variance differences in cohort-specific factor | |
| N4: cohort 1 = 2,500, cohort 2 = 7,500, Ref= 400 | Model 4: true model is a higher-order model (bi-factor model is misspecified) | |
| N5: cohort 1 = 4,500, cohort 2 = 4,500, Ref= 1,000 |
Figure 2Path diagrams of the data-generating models for the simulation. T represents the general trait. Q1 and Q2 are the specific factors corresponding to questionnaire 1 and questionnaire 2, respectively. (A) depicts model 1, representing the ideal measurement case with equal reliabilities for both questionnaires.
(B) depicts model 2, in which the items on the second questionnaire have lower reliabilities than the first questionnaire, represented by shading. (C) depicts model 3, in which the Q2 factor is shaded, representing an increased mean and variance along with reliability differences. (D) depicts model 4, a second-order model where the general factor summarizes the covariance among two questionnaire-specific factors, and the general factor additional direct effects on select items.
Figure 3Power to detect single-nucleotide polymorphism effect of each analysis approach relative to complete data power. Results are presented across the different data-generating models and the different sample size conditions detailed in . The analysis approaches were sum score meta-analysis, factor score meta-analysis, mega-analysis with the bi-factor integration mode, and multiple imputation of the missing items. SEM, structural equation modeling; Imp., imputation.
Empirical power and type I error results with different analysis methods under the four data-generating models and five sample size conditions.
| Model | N | Complete data power | Mega SEM | FS meta | SS meta | Impute power | Full T1 | Mega T1 | FS T1 | SS T1 | Impute T1 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Model1 | N1 | 0.760 | 0.567 | 0.557 | 0.519 | 0.496 | 0.046 | 0.046 | 0.044 | 0.040 | 0.037 |
| Model2 | N1 | 0.710 | 0.517 | 0.481 | 0.445 | 0.446 | 0.051 | 0.044 | 0.043 | 0.055 | 0.039 |
| Model3 | N1 | 0.678 | 0.479 | 0.399 | 0.356 | 0.275 | 0.045 | 0.047 | 0.071 | 0.067 | 0.037 |
| Model4 | N1 | 0.760 | 0.549 | 0.486 | 0.531 | 0.505 | 0.049 | 0.047 | 0.047 | 0.059 | 0.047 |
| Model1 | N2 | 0.480 | 0.358 | 0.329 | 0.323 | 0.295 | 0.046 | 0.040 | 0.057 | 0.051 | 0.033 |
| Model2 | N2 | 0.444 | 0.305 | 0.277 | 0.280 | 0.266 | 0.045 | 0.057 | 0.064 | 0.057 | 0.047 |
| Model3 | N2 | 0.384 | 0.241 | 0.219 | 0.213 | 0.151 | 0.048 | 0.045 | 0.045 | 0.044 | 0.043 |
| Model4 | N2 | 0.466 | 0.327 | 0.264 | 0.285 | 0.287 | 0.047 | 0.051 | 0.050 | 0.044 | 0.051 |
| Model1 | N3 | 0.878 | 0.743 | 0.723 | 0.666 | 0.637 | 0.055 | 0.054 | 0.042 | 0.043 | 0.034 |
| Model2 | N3 | 0.858 | 0.675 | 0.658 | 0.619 | 0.596 | 0.045 | 0.049 | 0.054 | 0.048 | 0.044 |
| Model3 | N3 | 0.816 | 0.596 | 0.554 | 0.526 | 0.424 | 0.049 | 0.046 | 0.049 | 0.047 | 0.032 |
| Model4 | N3 | 0.886 | 0.719 | 0.665 | 0.681 | 0.665 | 0.048 | 0.061 | 0.054 | 0.066 | 0.042 |
| Model1 | N4 | 0.772 | 0.576 | 0.553 | 0.551 | 0.410 | 0.045 | 0.046 | 0.055 | 0.058 | 0.049 |
| Model2 | N4 | 0.719 | 0.549 | 0.508 | 0.497 | 0.380 | 0.062 | 0.047 | 0.064 | 0.061 | 0.053 |
| Model3 | N4 | 0.653 | 0.509 | 0.487 | 0.457 | 0.203 | 0.048 | 0.041 | 0.057 | 0.052 | 0.029 |
| Model4 | N4 | 0.738 | 0.549 | 0.498 | 0.513 | 0.420 | 0.051 | 0.050 | 0.049 | 0.050 | 0.042 |
| Model1 | N5 | 0.747 | 0.588 | 0.572 | 0.535 | 0.512 | 0.036 | 0.044 | 0.044 | 0.039 | 0.039 |
| Model2 | N5 | 0.711 | 0.535 | 0.515 | 0.488 | 0.478 | 0.064 | 0.066 | 0.052 | 0.046 | 0.028 |
| Model3 | N5 | 0.626 | 0.450 | 0.398 | 0.399 | 0.318 | 0.051 | 0.049 | 0.055 | 0.058 | 0.040 |
| Model4 | N5 | 0.713 | 0.525 | 0.504 | 0.494 | 0.497 | 0.048 | 0.049 | 0.056 | 0.052 | 0.048 |
FS, factor score meta-analysis; Mega, BFIM SEM mega-analysis; SS, mean score meta-analysis; T1, type I error rate.
Figure 4Empirical bias of single-nucleotide polymorphism effect of each analysis approach relative to true effect size. Results are presented across the different data-generating models and the different sample size conditions detailed in . The analysis approaches were sum score meta-analysis, factor score meta-analysis, mega-analysis with the bi-factor integration mode, and multiple imputation of the missing items. SEM, structural equation modeling; Imp., imputation.
Relative bias, coverage rates, type I error rates, and standard errors computed with different analysis methods under the four data-generating models and five sample size conditions.
| Model | N | Mega bias | FS bias | SS bias | Impute bias | Mega coverage | FS coverage | SS coverage | Impute coverage |
|---|---|---|---|---|---|---|---|---|---|
| Model1 | N1 | −0.011 | 0.063 | −0.001 | −0.036 | 0.944 | 0.939 | 0.962 | 0.972 |
| Model2 | N1 | 0.014 | 0.030 | −0.017 | −0.041 | 0.958 | 0.940 | 0.944 | 0.950 |
| Model3 | N1 | 0.026 | −0.062 | 0.016 | −0.093 | 0.959 | 0.903 | 0.948 | 0.966 |
| Model4 | N1 | 0.029 | 0.022 | 0.004 | −0.015 | 0.949 | 0.941 | 0.956 | 0.955 |
| Model1 | N2 | 0.016 | 0.070 | 0.018 | −0.021 | 0.941 | 0.926 | 0.937 | 0.947 |
| Model2 | N2 | 0.005 | 0.029 | −0.022 | −0.039 | 0.941 | 0.934 | 0.934 | 0.949 |
| Model3 | N2 | −0.006 | −0.069 | -0.019 | −0.121 | 0.941 | 0.910 | 0.944 | 0.963 |
| Model4 | N2 | 0.027 | 0.011 | −0.011 | −0.012 | 0.948 | 0.933 | 0.943 | 0.942 |
| Model1 | N3 | 0.005 | 0.062 | −0.015 | −0.057 | 0.952 | 0.929 | 0.949 | 0.962 |
| Model2 | N3 | 0.011 | 0.034 | −0.017 | −0.049 | 0.944 | 0.944 | 0.953 | 0.952 |
| Model3 | N3 | −0.004 | −0.053 | 0.025 | −0.073 | 0.941 | 0.883 | 0.951 | 0.968 |
| Model4 | N3 | 0.031 | 0.027 | −0.007 | −0.022 | 0.958 | 0.944 | 0.950 | 0.966 |
| Model1 | N4 | −0.009 | 0.070 | −0.002 | −0.042 | 0.954 | 0.934 | 0.956 | 0.966 |
| Model2 | N4 | 0.004 | 0.062 | 0.062 | −0.010 | 0.957 | 0.938 | 0.949 | 0.960 |
| Model3 | N4 | 0.014 | −0.007 | 0.092 | −0.052 | 0.946 | 0.917 | 0.945 | 0.965 |
| Model4 | N4 | 0.039 | 0.034 | 0.017 | −0.017 | 0.945 | 0.921 | 0.945 | 0.947 |
| Model1 | N5 | 0.011 | 0.090 | 0.012 | −0.031 | 0.954 | 0.917 | 0.953 | 0.952 |
| Model2 | N5 | 0.013 | 0.060 | −0.011 | −0.009 | 0.955 | 0.929 | 0.950 | 0.958 |
| Model3 | N5 | −0.004 | −0.036 | 0.032 | −0.060 | 0.938 | 0.917 | 0.936 | 0.961 |
| Model4 | N5 | 0.035 | 0.042 | −0.011 | −0.034 | 0.945 | 0.938 | 0.934 | 0.956 |
FS, factor score meta-analysis; Mega, BFIM SEM mega-analysis; SS, mean score meta-analysis; Impute, multiple imputation.
Results for sample size condition 1 when no reference panel data were included.
| Model | N | Mega rel. power | FS rel. power | Mega bias | FS bias |
|---|---|---|---|---|---|
| mdl1 | N1 | 0.690 | 0.465 | −0.378 | 0.012 |
| mdl2 | N1 | 0.620 | 0.449 | −0.426 | 0.158 |
| mdl3 | N1 | 0.531 | 0.410 | −0.497 | 0.212 |
| mdl4 | N1 | 0.662 | 0.483 | −0.228 | 0.189 |
Overt/physical aggression items in Aggression in Children: Unraveling gene-environment interplay to inform Treatment and InterventiON strategies.
| Item code | Item |
|---|---|
| ATAC63 | Has there ever been a time when he/she would be angry to the extent that he/she cannot be reached? |
| ATAC65 | Does he/she often tease others by deliberately doing things that are perceived as provocative? |
| ATAC70 | Has he/she ever been deliberately been physical cruel to anybody? |
| ATAC71 | Does he/she often get into fights? |
| CBCL016 | Cruelty, bullying, or meanness to others |
| CBCL020 | Destroys his/her own things |
| CBCL021 | Destroys things belonging to his/her family or others |
| CBCL023 | Disobedient at school |
| CBCL037 | Gets in many fights |
| CBCL057 | Physically attacks people |
| CBCL094 | Teases a lot |
| CBCL095 | Temper tantrums or hot temper |
Figure 5Path diagram of bi-factor integration model application to Aggression in Children: Unraveling gene-environment interplay to inform Treatment and InterventiON strategies data. Agg, overt aggression factor; CBCL, child behavior check-list; ATAC, autism-tics, attention-deficit hyperactivity disorder, and other comorbidities scale.