Literature DB >> 28122950

Reproduction and In-Depth Evaluation of Genome-Wide Association Studies and Genome-Wide Meta-analyses Using Summary Statistics.

Yao-Fang Niu1, Chengyin Ye2, Ji He3, Fang Han4, Long-Biao Guo1, Hou-Feng Zheng5,6, Guo-Bo Chen7.   

Abstract

In line with open-source genetics, we report a novel linear regression technique for genome-wide association studies (GWAS), called Open GWAS algoriTHm (OATH). When individual-level data are not available, OATH can not only completely reproduce reported results from an experimental model, but also recover underreported results from other alternative models with a different combination of nuisance parameters using naïve summary statistics (NSS). OATH can also reliably evaluate all reported results in-depth (e.g., p-value variance analysis), as demonstrated for 42 Arabidopsis phenotypes under three magnesium (Mg) conditions. In addition, OATH can be used for consortium-driven genome-wide association meta-analyses (GWAMA), and can greatly improve the flexibility of GWAMA. A prototype of OATH is available in the Genetic Analysis Repository (https://github.com/gc5k/GEAR).
Copyright © 2017 Niu et al.

Entities:  

Keywords:  Arabidopsis; GEAR; GWAMA; GWAS; magnesium; meta-analyses; naïve summary statistics; reproducibility; transparency

Mesh:

Substances:

Year:  2017        PMID: 28122950      PMCID: PMC5345724          DOI: 10.1534/g3.116.038877

Source DB:  PubMed          Journal:  G3 (Bethesda)        ISSN: 2160-1836            Impact factor:   3.154


Reproducibility and transparency are the cornerstones of scientific integrity. In addition to artifacts that may compromise a study, analysis itself is becoming more complicated and poses another obstacle to reproducing discoveries. For big data studies involving high-throughput computation, such as GWAS, the reported findings are subject to criticism, as the results may differ among models even when the experimental design is sound. Therefore, the choice of a model as well as its conclusion (i.e., false positive or false negative) are often justified by an analyst’s prior knowledge (Aschard ; Day ). However, given practical constraints, such as data sharing policies and computational burden, it is not feasible to present all possible results found under alternative models. Although many consortia encourage open-source genetics and have released GWAS summary statistics, including the Genetic Investigation of Anthropometric Traits (GIANT) Consortium and the Psychiatric Genomic Consortia (PGC), it is still difficult to thoroughly evaluate a published study. Consequently, reproducibility and the success rate of subsequent studies are hampered. What kind of method and set of summary statistics are needed to fully reproduce results and to explore studies using unreported analyses? Statistical analyses can be reproduced in the absence of individual-level data; this is possible due to the theory of sufficient statistics (Fisher 1921). In this study, we propose a complementary method to reproduce each GWAS hit in the absence of shared original data. We report an algorithm called OATH that works directly on summary statistics. When individual-level data are not available, OATH can not only completely reproduce the reported results from an experimental model but also recover underreported results from other alternative models using only summary statistics. The utility of OATH will be demonstrated for 42 phenotypes: 14 traits of 295 Arabidopsis inbred lines grown under three Mg conditions. Furthermore, as OATH is based on linear regression, its application to other analyses is possible as long as linear regression was employed. For example, OATH can be embedded into consortium-driven GWAMA. Without loss of generality, the literature-driven meta-analyses can be considered a “retrospective” study, which is often an irreversible process under which a meta-analyses conductor can rarely customize the summary statistics. In contrast, a consortium-driven GWAMA can be a “prospective” study; quality control can be conducted more thoroughly (Chen ) and the summary statistics from each cohort can be customized under the request of the consortium. As demonstrated below with two Chinese GWAS cohorts, a consortium-driven GWAMA can more efficiently adjust covariates using OATH.

Materials and Methods

We begin this section with a brief explanation of the OATH algorithm; a more detailed description can be found in the Supplemental Material. To demonstrate the use of OATH, an introduction of Arabidopsis GWAS data under Mg treatments and two Chinese GWAS cohorts will follow.

OATH

For a saturated GWAS analysis, its multiple regression model is written as (for the ease of discussion, all variables are centered, but the method can be applied to data not centered)in which is the observed phenotype of individuals, and is the residual. codes the counts of the reference alleles at the locus and is the covariate. in which is the effect size of the marker and is the partial regression coefficient. The least-squares estimator is in which Both and are individual-level data in the estimator. The least-squares estimator for can also be expressed in the following form (see Supplemental Material; hereafter referred to as OATH):in which is the diagonal of in which is for The variance–covariance matrix of isThe information [known as sufficient statistics for data reduction (Fisher 1921)] required for Equations 2 and 3 is contained in the variance–covariance matrix of all variables in Equation 1; no individual-level data are needed. As illustrated in Figure 1, all elements for Equations 2 and 3 can be extracted from Rather than summary statistics from complicated models, involves variance and covariance only; therefore, we call them NSS in the text below. Of note, as the second row/column of is locus-specific, only the locus-specific part of should be provided for each locus (Figure 1 and Supplemental Material, File S1).
Figure 1

Schematic illustration of open GWAS algorithm (OATH). The first row is the OATH equation. The second row shows (A) the sufficient statistics a symmetric matrix, and (B) can be split into generic G (red) and locus-specific (yellow) parts. The third row represents how the elements in can be extracted to build (C) (D) and (E) respectively. GWAS, genome-wide association studies; OATH, Open GWAS algoriTHm.

Schematic illustration of open GWAS algorithm (OATH). The first row is the OATH equation. The second row shows (A) the sufficient statistics a symmetric matrix, and (B) can be split into generic G (red) and locus-specific (yellow) parts. The third row represents how the elements in can be extracted to build (C) (D) and (E) respectively. GWAS, genome-wide association studies; OATH, Open GWAS algoriTHm. In general, Equation 2 can be written as indicating the set of covariates included. If any covariates are dropped from Equations 2 and 3 can be tailored to generate a corresponding estimate for the target marker effect Thus, recovering underreported results for any combination of covariates is possible if the summary statistic is provided. One possible application for OATH is GWAMA. If each cohort sends to the central hub, the whole GWAMA gains more flexibility because the central hub will be able to customize the GWAS model to any combination of covariates. The technical details on how to integrate OATH into GWAMA can be found in File S1.

Arabidopsis GWAS data

The seeds of all 295 lines were acquired from the Arabidopsis Biological Resources Center stock. Then, 234 accessions were sampled from 1307 worldwide accessions, which were genotyped using a 250 K single nucleotide polymorphism (SNP) chip (Horton ), and 61 were extracted from the Arabidopsis 1001 Genomes Project (http://1001genomes.org) (Figure S1A in File S2). The geographical distribution of the 295 lines was consistent with the Arabidopsis lines collected in RegPanel (http://regmap.uchicago.edu) (see Figure S1B in File S2). After quality control [triallelic or tetra-allelic loci, minor allele frequency (MAF) < 0.05, genotyping rate < 0.998, and homozygosity rate < 0.99 were removed], 156,744 biallelic loci remained for 42 GWAS (Figure S2 in File S2). Genetic relatedness was estimated using these 156,744 markers, resulting in a genetic relationship matrix (GRM). The eigenvectors were estimated in the GRM. The 295 inbred lines were grown under three Mg conditions: the low, normal, and high conditions contained 1, 1000, and 10,000 µM MgSO4, respectively, which was in accordance with the concentrations of Mg2+ in soil solutions (Hariadi and Shabala 2004). Fourteen traits were investigated under each treatment: seven were morphological traits and seven were nutrient concentration traits (Table 1). Under the three treatments, there were 42 total phenotypes for each line (Figure S3 in File S2). To reduce environmental influences, the median value of biological replicates was used as the phenotypic value. To reduce the maternal effects prior to phenotyping, inbred lines were grown for one generation under controlled greenhouse conditions at Zhejiang University (N30°18′25, E120°04′54), Hangzhou, Zhejiang Province, China, in 2015. For the ease of analysis, each phenotype was standardized (Figure S4 and Figure S5 in File S2). See the supplementary notes in File S1 for more details on these traits.
Table 1

Fourteen Arabidopsis traits investigated under three magnesium (Mg) treatments

Trait CategoryTrait IdentifierFull Name of TraitUnitaTrait DescriptionAnnotation
MTRGTDays to root germinationdThe number of days from seeding until emergence, with more than half of seedlings having a first radicleRoot germination and lateral root number data were shown as the value obtained in low- or high-Mg treatment minus those under normal-Mg treatment. The primary root values for the low-Mg or high-Mg treatment were then divided by values obtained from normal-Mg treatment
PRLPrimary root lengthcmAfter 8 d of growth under the treatments, plants were flattened directly on agar and imaged using a camera
LRNLateral root numbercmAfter 8 d growth of under the treatments, plants were flattened directly on agar and imaged using a camera
SGTDays to shoot germinationdNormalShoot germination data were shown as the value obtained in low- or high-Mg treatment minus those under normal-Mg treatment. The epicotyl length and rosette width values for low-Mg or high-Mg treatment were then divided by values obtained from normal-Mg treatment
ELEpicotyl lengthcmAfter 8 d of growth under the treatments, plants were flattened directly on agar and imaged using a camera
RLRosette width lengthcmAfter 8 d of growth under the treatments, plants were flattened directly on agar and imaged using a camera
BiomassFresh weight of plantsmgAll fully expanded and nonlesioned seedlings were collected from four plants for each accession and weighed to obtain fresh weight measurements. The results represent average values across all available replicatesBiomass and nutrient concentration data were calculated as the ratio of the treatment value (low Mg or high Mg) divided by the normal, in which seeds were germinated in normal Mg
NCTKPotassium concentration per plantmg/gElemental analysis was performed with an ICP-MS (Agilent 7500a). All samples were normalized to calculated weights as previously described. The results represent average values across all available replicates
CaCalcium concentration per plantmg/g
MgMagnesium concentration per plantmg/g
SSulfur concentration per plantmg/g
FeIron concentration per plantmg/g
MnManganese concentration per plantmg/g
NaSodium concentration per plantmg/g

MT, morphological traits; NCT nutrition concentration traits; ICP-MS, inductively-coupled plasma mass spectrometry.

For NCT, the units are measured for fresh weight.

MT, morphological traits; NCT nutrition concentration traits; ICP-MS, inductively-coupled plasma mass spectrometry. For NCT, the units are measured for fresh weight.

Two Chinese GWAS cohorts

Two Chinese GWAS cohorts, NA (Han ) and SLE (Han ), were used to demonstrate the application of OATH to consortium-driven meta-analyses. The NA cohort was originally recruited for the study of narcolepsy, an autoimmune disorder affecting hypocretin (orexin) neurons; 3191 samples were genotyped. The SLE cohort was recruited for the study of systemic lupus erythematosus in the Chinese population; 2309 samples were genotyped. In order to mimic a consortium-driven GWAMA, these two GWAS cohorts provided the required NSS to the central hub. Using the meta-PCA technique (Chen ), the general genotyping quality of these two cohorts was validated by the GWAMA central hub, based only on the reported allele frequencies; individual-level data were not required (Figure S6 in File S2).

Data availability

The authors state that all data necessary for confirming the conclusions presented in the article are represented fully within the article.

Results

An OATH simulation example

In order to demonstrate the OATH kernel, a single-locus analysis is shown. The MAF of the biallelic locus was 0.23, and the effect size was set to zero. Three covariates, each sampled from the standard normal distribution were simulated. The phenotype was sampled from The sample size was In this simulated sample, and Including one, two, or three covariates, it generated seven possible models. The reproducibility of the partial regression coefficients estimated by OATH agreed well with those estimated from the individual-level data (Figure 2). An R script is available for this example at https://github.com/gc5k/OATH.
Figure 2

Demonstration of OATH for a single locus. Three covariates were simulated, and OATH generated seven models by including one, two, or three covariates. The subtitle for each plot indicates the set of covariates included. The x-axis represents the regression coefficients estimated from OATH, whereas the y-axis shows the regression coefficients for the corresponding model but estimated from the individual-level data. The vertical and horizontal lines across each point indicate the SE of the partial regression coefficient estimated via OATH and individual-level data, respectively. The subtitle in each panel indicates the covariates included. OATH, Open GWAS algoriTHm.

Demonstration of OATH for a single locus. Three covariates were simulated, and OATH generated seven models by including one, two, or three covariates. The subtitle for each plot indicates the set of covariates included. The x-axis represents the regression coefficients estimated from OATH, whereas the y-axis shows the regression coefficients for the corresponding model but estimated from the individual-level data. The vertical and horizontal lines across each point indicate the SE of the partial regression coefficient estimated via OATH and individual-level data, respectively. The subtitle in each panel indicates the covariates included. OATH, Open GWAS algoriTHm.

Two Arabidopsis GWAS models

For these 295 Arabidopsis lines, we conducted a GWAS for each of the 42 phenotypes in the saturated GWAS models, which included the top five eigenvectors (Figure S7 in File S2). In contrast, we also conducted naïve/simple linear regressions (i.e., no covariates) for these phenotypes, denoted as naïve GWAS (nGWAS) (Figure S8 in File S2). Under the 42 sGWAS, a metric measuring population stratification (Devlin and Roeder 1999), had a mean of whereas that of the nGWAS was indicating adjustment of the covariates in differentiating GWAS outcomes. For each phenotype, the correlations of the estimated β (additive genetic effect) and between the sGWAS and the nGWAS were and respectively (Figure S9 in File S2). Using as the nominal genome-wide significance threshold, the sGWAS had 284 hits in total; 84, 84, and 116 under the low-, normal-, and high-Mg conditions, respectively. The nGWAS had 397 hits in total; 89, 188, and 120 under the low-, normal-, and high-Mg conditions, respectively. Between the sGWAS and the nGWAS, 206 hits were shared (Figure 3). As demonstrated in this example, an alternative model could lead to different results, which might cause controversy over reproducibility.
Figure 3

sGWAS and GWAS hits for 14 traits under three Mg conditions. The x-axis shows the chromosomal coordinates for Arabidopsis; the y-axis represents 14 traits. The GWAS hits observed in the saturated model (sGWAS), which was adjusted by the top five eigenvectors, are presented on the top; the hits observed in the naïve model (nGWAS) are represented in the bottom panel. GWAS hits are represented by black circles; and those hits shared by both nGWAS and sGWAS are filled with color. A total of 51, 74, and 81 GWAS hits were shared under low-, normal-, and high-Mg conditions, respectively. A GWAS hit was defined as in which the number of markers was 156,744. GWAS, genome-wide association studies; Mg, magnesium; nGWAS, naïve GWAS; sGWAS, saturated GWAS.

sGWAS and GWAS hits for 14 traits under three Mg conditions. The x-axis shows the chromosomal coordinates for Arabidopsis; the y-axis represents 14 traits. The GWAS hits observed in the saturated model (sGWAS), which was adjusted by the top five eigenvectors, are presented on the top; the hits observed in the naïve model (nGWAS) are represented in the bottom panel. GWAS hits are represented by black circles; and those hits shared by both nGWAS and sGWAS are filled with color. A total of 51, 74, and 81 GWAS hits were shared under low-, normal-, and high-Mg conditions, respectively. A GWAS hit was defined as in which the number of markers was 156,744. GWAS, genome-wide association studies; Mg, magnesium; nGWAS, naïve GWAS; sGWAS, saturated GWAS.

Reproducing sGWAS for Arabidopsis

In order to reproduce the sGWAS results for each SNP in the absence of shared original data, for each phenotype, the following NSS were used: the variance–covariance matrix of a phenotype, five eigenvectors, and information from 156,744 specific loci were also provided. Of note, the covariance matrix of the five eigenvectors was a diagonal matrix because the eigenvectors were mutually orthogonal. As expected, OATH synthesized the NSS as prepared above to reproduce the 42 sGWAS with high precision, as illustrated for days to root germination (RGT) (Figure 4), as well as for the 41 phenotypes (Figure S10 in File S2). For 14 traits under the normal-Mg condition, the consistency between the estimated β from OATH and those from the sGWAS was and OATH found the same 284 hits that were found in the sGWAS. This indicated that, even without access to the individual-level data, OATH could retrospectively scrutinize the reported results. Furthermore, we also conducted individual-level data GWAS for these 295 Arabidopsis lines by including the top 10 eigenvectors, leading to possible outcomes for the association between a phenotype and a marker. OATH also almost perfectly reproduced the results (data not shown).
Figure 4

Reproducibility of sGWAS results using OATH with NSS for RTG under three Mg conditions. Each column represents β (top) and p-values (bottom) under low-, normal-, and high-Mg conditions. The y-axis represents the statistics from OATH synthesized from NSS, and the x-axis from the sGWAS; the red points are QTL detected in the sGWAS. Correlations are shown in the top left corner of each panel. GWAS, genome-wide association studies; Mg, magnesium; NSS, naïve summary statistics; OATH, Open GWAS algoriTHm; QTL, quantitative trait loci; RTG, days to root germination; sGWAS, saturated GWAS.

Reproducibility of sGWAS results using OATH with NSS for RTG under three Mg conditions. Each column represents β (top) and p-values (bottom) under low-, normal-, and high-Mg conditions. The y-axis represents the statistics from OATH synthesized from NSS, and the x-axis from the sGWAS; the red points are QTL detected in the sGWAS. Correlations are shown in the top left corner of each panel. GWAS, genome-wide association studies; Mg, magnesium; NSS, naïve summary statistics; OATH, Open GWAS algoriTHm; QTL, quantitative trait loci; RTG, days to root germination; sGWAS, saturated GWAS.

Recovering underreported results for Arabidopsis

These 295 Arabidopsis lines resulted in the generation of models, given all possible combinations of the five eigenvectors. With the inclusion or exclusion of certain eigenvectors, OATH was capable of synthesizing another 30 GWAS that had at least one of the five eigenvectors as covariates. For each of the 42 phenotypes, an OATH hit was claimed if a SNP had any of its 32 models in which the OATH OATH found 637 hits for 42 phenotypes; 163 hits were not found by either the sGWAS or the nGWAS. Of these 163 new hits, 25 had indicating a nominal overall significance under 42 phenotypes and 32 models. We validated these OATH hits by implementing their exact models using individual-level data from the 295 Arabidopsis lines; the consistency of the β and was and respectively (Figure 5). Therefore, OATH found all possible underreported results with high consistency. These 637 OATH hits were found on 575 unique SNPs, for which 430 were within genes and 145 were between genic regions.
Figure 5

Validation of OATH hits by their exact models. A total of 637 OATH hits, which had a were found; 25 loci had a Within each panel, the positive part of the y-axis represents the observed in OATH; the negative part of the y-axis is the corresponding evaluated by their exact models using individual-level data from 295 Arabidopsis lines. OATH, Open GWAS algoriTHm.

Validation of OATH hits by their exact models. A total of 637 OATH hits, which had a were found; 25 loci had a Within each panel, the positive part of the y-axis represents the observed in OATH; the negative part of the y-axis is the corresponding evaluated by their exact models using individual-level data from 295 Arabidopsis lines. OATH, Open GWAS algoriTHm.

In-depth evaluation of the GWAS hits for Arabidopsis

In the experimental design theory established by R. Fisher, a single high/low value, such as productivity in a field experiment, is often confounded by a combination of other factors (Fisher 1926) of little interest when compared with the values under different factors. Therefore, we further investigated whether the combination of the eigenvectors influenced each OATH hit. For those 637 OATH hits, the smallest range of 32 from 7.01 to 7.14, was found for SNP 3_8965883 (chromosome 3, 8,965,883 bp, and MAF = 0.0508) associated with sulfur under the low-Mg condition. SNP 3_8965883 was located within RASPBERRY 3 (RSY3), a gene related to embryogenesis (Apuya ) (Table 2). Across the 32 models, its βs and SEs remained relatively stable (Figure 6).
Table 2

Three single nucleotide polymorphism (SNP) examples from Arabidopsis inbred lines

Conservative ModelPowerful ModelAnnotation
SNPA1Freq.Covariatesaβ1σ1log10(p)Covariatesβ1σ1log10(p)TreatmentTraitGroup #F-StatisticGene
3_8965883A0.0508+++0.6920.1277.01++0.7030.1277.18Low MgS1RSY3
4_6353940T0.0578+0.5380.1234.79++++0.6360.1216.54High MgRGT25168 (p < 1e−16)AT4G10200
5_20010406T0.0508++0.4490.133.19+++1.4300.18612.69High MgK3434 (p < 1e−16)AT5G49350

Each SNP had 32 models evaluated by Open GWAS algoriTHm (OATH) via naïve summary statistics. The smallest (conservative model) and the largest (powerful) were tabulated. SNP, single nucleotide polymorphism; Freq., frequency; S, sulfur; RGT, days to root germination; K, potassium.

“+” and “−” indicate inclusion and exclusion of the jth covariate.

Figure 6

Evaluation of the modeling for three OATH hits. The top four rows represent and estimated using 32 possible OATH models given five covariates for these three SNPs. and are in ascending order according to their In the top row, the corresponding OATH models are denoted by colored squares, indicating the inclusion or exclusion of covariates; there are two bars, gray and pink, in each cluster, representing with or without adjustment for An asterisk indicates that this (SNP) is significant under the corresponding model without adjustment for three asterisks indicate that this SNP is significant under the corresponding model, both with and without adjustment for OATH, Open GWAS algoriTHm; SNP, single nucleotide polymorphism.

Each SNP had 32 models evaluated by Open GWAS algoriTHm (OATH) via naïve summary statistics. The smallest (conservative model) and the largest (powerful) were tabulated. SNP, single nucleotide polymorphism; Freq., frequency; S, sulfur; RGT, days to root germination; K, potassium. “+” and “−” indicate inclusion and exclusion of the jth covariate. Evaluation of the modeling for three OATH hits. The top four rows represent and estimated using 32 possible OATH models given five covariates for these three SNPs. and are in ascending order according to their In the top row, the corresponding OATH models are denoted by colored squares, indicating the inclusion or exclusion of covariates; there are two bars, gray and pink, in each cluster, representing with or without adjustment for An asterisk indicates that this (SNP) is significant under the corresponding model without adjustment for three asterisks indicate that this SNP is significant under the corresponding model, both with and without adjustment for OATH, Open GWAS algoriTHm; SNP, single nucleotide polymorphism. In contrast, the largest range of from 3.19 to 12.69 (Table 2), was found for SNP 5_200100406 (chromosome 5, 200,100,406 bp, and MAF = 0.058) associated with K under the high-Mg condition. SNP 5_200100406 was located within AT5G49350, a gene encoding glycine-rich protein (Tabata ) (Table 2). Of its 32 16 were > 6.5. We partitioned its 32 sorted into different groups if any two neighboring differed by a unit. Its 32 could be split into three groups (F-statistic = 434.06 and p-value < 1e−16). The four OATH models in the highest group included the first, second, and fourth eigenvectors (Figure 6). Its βs were increased in the highest group but the corresponding SEs decreased, resulting in a much higher In another example, SNP 4_6353940, associated with RGT under the high-Mg condition, had its 32 partitioned into two groups via inclusion or exclusion of the second eigenvector (Figure 6). SNP 4_6353940 had a MAF of 0.0507 and was located within AT4G10200, a gene related to TTF-type zinc finger proteins with a HAT dimerization domain (Mayer ) (Table 2). Inclusion or exclusion of the second eigenvector also resulted in two groups for the β. Among 637 OATH hits, this SNP had the most significant difference for its group, and the F-statistic was 5168.142 (p-value < 1e−16). An R script is available at https://github.com/gc5k/OATH for the demonstrated Arabidopsis analyses with OATH.

Application of OATH to GWAMA

Two Chinese GWAS datasets, the NA (Han ) and SLE cohorts (Han ), were used to confirm the utility of OATH for meta-analyses. From these two cohorts, 9124 common variants on chromosome 1 in both cohorts were analyzed in NA (3191 samples) and SLE (2309 samples), respectively. For both cohorts, the SNPs were aligned on the same reference alleles. SNP rs4144542 was set as the causal locus explaining 5% of the total phenotypic variation. Three eigenvectors were used as covariates. In order to mimic a real consortium-driven GWAMA, one author (HFZ) generated NSS for these two GWAS cohorts; another author (GBC), who was blind to the individual-level data, ran OATH and the meta-analyses. After receiving the central hub synthesized seven corresponding given consequently, meta-analyses could be implemented for each locus. As demonstrated in Figure 7, rs4144542 was successfully identified in all seven GWAMA analyses. Other loci had very similar estimated effects under these seven models.
Figure 7

Genome-wide association meta-analyses (GWAMA) of the NA and SLE cohorts. The subtitle in each panel indicates a customized GWAMA. For example, indicates that the first and third eigenvectors are covariates for each of the two cohorts. The dashed line indicates the chromosome-wise threshold, given

Genome-wide association meta-analyses (GWAMA) of the NA and SLE cohorts. The subtitle in each panel indicates a customized GWAMA. For example, indicates that the first and third eigenvectors are covariates for each of the two cohorts. The dashed line indicates the chromosome-wise threshold, given An R script is available for this GWAMA demonstration at https://github.com/gc5k/OATH.

Discussion

The scientific community is seeking reproducibility, and efforts have been made to improve reproducibility as well as transparency. Reproducibility may vary among studies; however, false discovery due to controversial or improper modeling can be monitored and even avoided, as demonstrated for the 295 Arabidopsis lines. Since the establishment of experimental design theory for field experiments (Fisher 1926), it has been known that a single outcome may be confounded, such as nutrition level factors. A high or low outcome makes little sense when it departs from its context, such as the conditions that led to the observed extreme values. In particular, as justification for the inclusion of covariates is controversial, variation in studies due to modeling makes reproducibility challenging (Aschard ). As GWAS results are often reported using a particular model, the interpretation of a GWAS hit should be reasonably scrutinized, as demonstrated in this study. We developed OATH and demonstrated its utility in GWAS of 295 Arabidopsis inbred lines. OATH successfully reproduced the GWAS results generated from a model with five covariates. In addition, underreported results, possibly generated by alternative models, were recovered. Given these comprehensive results, we could evaluate GWAS hits more thoroughly. As OATH is based on summary statistics, this implementation was compatible with GWAS data sharing policy, including those involving human subjects. For Arabidopsis, a typical admixed population, a linear mixed model technique provides an alternative solution (Korte ); however, the complicated statistical properties of linear mixed models (Chen 2014, 2016; de los Campos ) may be beyond OATH’s linear regression model capabilities. Given the many possible ways to utilize OATH, GWAMA would most likely benefit from OATH integration. Using OATH, GWAMA would be more efficient at switching from one GWAS model to another whenever necessary, a procedure that often leads to logistical burden under a conventional GWAMA design. Many consortia that encourage open-source genetics have released GWAS summary statistics, such as GIANT and PGC. If those consortia would also release the naïve summary data required by OATH, efficiency and reproducibility can be dramatically boosted and the utility of the GWAS data maximized because the recovery of underreported GWAS discoveries becomes possible, as demonstrated in our study. In summary, in line with the open-source movement, we believe that reproducibility, transparency, and in-depth evaluation of GWAS are possible or can be improved using the proposed method. OATH as a solution is simple and easily embedded into other applications, and the information technology seems mature enough for implementation. To facilitate application of the proposed method, we deposited OATH in Genetic Analysis Repository (GEAR; https://github.com/gc5k/GEAR). Three “one-click-for-all” R scripts for the demonstrated examples are available at https://github.com/gc5k/OATH.

Supplementary Material

Supplemental material is available online at www.g3journal.org/lookup/suppl/doi:10.1534/g3.116.038877/-/DC1. Click here for additional data file. Click here for additional data file.
  14 in total

1.  Genomic control for association studies.

Authors:  B Devlin; K Roeder
Journal:  Biometrics       Date:  1999-12       Impact factor: 2.571

2.  Sequence and analysis of chromosome 4 of the plant Arabidopsis thaliana.

Authors:  K Mayer; C Schüller; R Wambutt; G Murphy; G Volckaert; T Pohl; A Düsterhöft; W Stiekema; K D Entian; N Terryn; B Harris; W Ansorge; P Brandt; L Grivell; M Rieger; M Weichselgartner; V de Simone; B Obermaier; R Mache; M Müller; M Kreis; M Delseny; P Puigdomenech; M Watson; T Schmidtheini; B Reichert; D Portatelle; M Perez-Alonso; M Boutry; I Bancroft; P Vos; J Hoheisel; W Zimmermann; H Wedler; P Ridley; S A Langham; B McCullagh; L Bilham; J Robben; J Van der Schueren; B Grymonprez; Y J Chuang; F Vandenbussche; M Braeken; I Weltjens; M Voet; I Bastiaens; R Aert; E Defoor; T Weitzenegger; G Bothe; U Ramsperger; H Hilbert; M Braun; E Holzer; A Brandt; S Peters; M van Staveren; W Dirske; P Mooijman; R Klein Lankhorst; M Rose; J Hauf; P Kötter; S Berneiser; S Hempel; M Feldpausch; S Lamberth; H Van den Daele; A De Keyser; C Buysshaert; J Gielen; R Villarroel; R De Clercq; M Van Montagu; J Rogers; A Cronin; M Quail; S Bray-Allen; L Clark; J Doggett; S Hall; M Kay; N Lennard; K McLay; R Mayes; A Pettett; M A Rajandream; M Lyne; V Benes; S Rechmann; D Borkova; H Blöcker; M Scharfe; M Grimm; T H Löhnert; S Dose; M de Haan; A Maarse; M Schäfer; S Müller-Auer; C Gabel; M Fuchs; B Fartmann; K Granderath; D Dauner; A Herzl; S Neumann; A Argiriou; D Vitale; R Liguori; E Piravandi; O Massenet; F Quigley; G Clabauld; A Mündlein; R Felber; S Schnabl; R Hiller; W Schmidt; A Lecharny; S Aubourg; F Chefdor; R Cooke; C Berger; A Montfort; E Casacuberta; T Gibbons; N Weber; M Vandenbol; M Bargues; J Terol; A Torres; A Perez-Perez; B Purnelle; E Bent; S Johnson; D Tacon; T Jesse; L Heijnen; S Schwarz; P Scholler; S Heber; P Francs; C Bielke; D Frishman; D Haase; K Lemcke; H W Mewes; S Stocker; P Zaccaria; M Bevan; R K Wilson; M de la Bastide; K Habermann; L Parnell; N Dedhia; L Gnoj; K Schutz; E Huang; L Spiegel; M Sehkon; J Murray; P Sheet; M Cordes; J Abu-Threideh; T Stoneking; J Kalicki; T Graves; G Harmon; J Edwards; P Latreille; L Courtney; J Cloud; A Abbott; K Scott; D Johnson; P Minx; D Bentley; B Fulton; N Miller; T Greco; K Kemp; J Kramer; L Fulton; E Mardis; M Dante; K Pepin; L Hillier; J Nelson; J Spieth; E Ryan; S Andrews; C Geisel; D Layman; H Du; J Ali; A Berghoff; K Jones; K Drone; M Cotton; C Joshu; B Antonoiu; M Zidanic; C Strong; H Sun; B Lamar; C Yordan; P Ma; J Zhong; R Preston; D Vil; M Shekher; A Matero; R Shah; I K Swaby; A O'Shaughnessy; M Rodriguez; J Hoffmann; S Till; S Granat; N Shohdy; A Hasegawa; A Hameed; M Lodhi; A Johnson; E Chen; M Marra; R Martienssen; W R McCombie
Journal:  Nature       Date:  1999-12-16       Impact factor: 49.962

3.  On the reconciliation of missing heritability for genome-wide association studies.

Authors:  Guo-Bo Chen
Journal:  Eur J Hum Genet       Date:  2016-07-20       Impact factor: 4.246

4.  Sequence and analysis of chromosome 5 of the plant Arabidopsis thaliana.

Authors:  S Tabata; T Kaneko; Y Nakamura; H Kotani; T Kato; E Asamizu; N Miyajima; S Sasamoto; T Kimura; T Hosouchi; K Kawashima; M Kohara; M Matsumoto; A Matsuno; A Muraki; S Nakayama; N Nakazaki; K Naruo; S Okumura; S Shinpo; C Takeuchi; T Wada; A Watanabe; M Yamada; M Yasuda; S Sato; M de la Bastide; E Huang; L Spiegel; L Gnoj; A O'Shaughnessy; R Preston; K Habermann; J Murray; D Johnson; T Rohlfing; J Nelson; T Stoneking; K Pepin; J Spieth; M Sekhon; J Armstrong; M Becker; E Belter; H Cordum; M Cordes; L Courtney; W Courtney; M Dante; H Du; J Edwards; J Fryman; B Haakensen; E Lamar; P Latreille; S Leonard; R Meyer; E Mulvaney; P Ozersky; A Riley; C Strowmatt; C Wagner-McPherson; A Wollam; M Yoakum; M Bell; N Dedhia; L Parnell; R Shah; M Rodriguez; L H See; D Vil; J Baker; K Kirchoff; K Toth; L King; A Bahret; B Miller; M Marra; R Martienssen; W R McCombie; R K Wilson; G Murphy; I Bancroft; G Volckaert; R Wambutt; A Düsterhöft; W Stiekema; T Pohl; K D Entian; N Terryn; N Hartley; E Bent; S Johnson; S A Langham; B McCullagh; J Robben; B Grymonprez; W Zimmermann; U Ramsperger; H Wedler; K Balke; E Wedler; S Peters; M van Staveren; W Dirkse; P Mooijman; R K Lankhorst; T Weitzenegger; G Bothe; M Rose; J Hauf; S Berneiser; S Hempel; M Feldpausch; S Lamberth; R Villarroel; J Gielen; W Ardiles; O Bents; K Lemcke; G Kolesov; K Mayer; S Rudd; H Schoof; C Schueller; P Zaccaria; H W Mewes; M Bevan; P Fransz
Journal:  Nature       Date:  2000-12-14       Impact factor: 49.962

5.  Genome-wide association study in a Chinese Han population identifies nine new susceptibility loci for systemic lupus erythematosus.

Authors:  Jian-Wen Han; Hou-Feng Zheng; Yong Cui; Liang-Dan Sun; Dong-Qing Ye; Zhi Hu; Jin-Hua Xu; Zhi-Ming Cai; Wei Huang; Guo-Ping Zhao; Hong-Fu Xie; Hong Fang; Qian-Jin Lu; Jian-Hua Xu; Xiang-Pei Li; Yun-Feng Pan; Dan-Qi Deng; Fan-Qin Zeng; Zhi-Zhong Ye; Xiao-Yan Zhang; Qing-Wen Wang; Fei Hao; Li Ma; Xian-Bo Zuo; Fu-Sheng Zhou; Wen-Hui Du; Yi-Lin Cheng; Jian-Qiang Yang; Song-Ke Shen; Jian Li; Yu-Jun Sheng; Xiao-Xia Zuo; Wei-Fang Zhu; Fei Gao; Pei-Lian Zhang; Qing Guo; Bo Li; Min Gao; Feng-Li Xiao; Cheng Quan; Chi Zhang; Zheng Zhang; Kun-Ju Zhu; Yang Li; Da-Yan Hu; Wen-Sheng Lu; Jian-Lin Huang; Sheng-Xiu Liu; Hui Li; Yun-Qing Ren; Zai-Xing Wang; Chun-Jun Yang; Pei-Guang Wang; Wen-Ming Zhou; Yong-Mei Lv; An-Ping Zhang; Sheng-Quan Zhang; Da Lin; Yi Li; Hui Qi Low; Min Shen; Zhi-Fang Zhai; Ying Wang; Feng-Yu Zhang; Sen Yang; Jian-Jun Liu; Xue-Jun Zhang
Journal:  Nat Genet       Date:  2009-10-18       Impact factor: 38.330

6.  RASPBERRY3 gene encodes a novel protein important for embryo development.

Authors:  Nestor R Apuya; Ramin Yadegari; Robert L Fischer; John J Harada; Robert B Goldberg; John H Harada
Journal:  Plant Physiol       Date:  2002-06       Impact factor: 8.340

7.  A Robust Example of Collider Bias in a Genetic Association Study.

Authors:  Felix R Day; Po-Ru Loh; Robert A Scott; Ken K Ong; John R B Perry
Journal:  Am J Hum Genet       Date:  2016-02-04       Impact factor: 11.025

8.  Genome-wide patterns of genetic variation in worldwide Arabidopsis thaliana accessions from the RegMap panel.

Authors:  Matthew W Horton; Angela M Hancock; Yu S Huang; Christopher Toomajian; Susanna Atwell; Adam Auton; N Wayan Muliyati; Alexander Platt; F Gianluca Sperone; Bjarni J Vilhjálmsson; Magnus Nordborg; Justin O Borevitz; Joy Bergelson
Journal:  Nat Genet       Date:  2012-01-08       Impact factor: 38.330

9.  Genomic heritability: what is it?

Authors:  Gustavo de Los Campos; Daniel Sorensen; Daniel Gianola
Journal:  PLoS Genet       Date:  2015-05-05       Impact factor: 5.917

10.  Across-cohort QC analyses of GWAS summary statistics from complex traits.

Authors:  Guo-Bo Chen; Sang Hong Lee; Matthew R Robinson; Maciej Trzaskowski; Zhi-Xiang Zhu; Thomas W Winkler; Felix R Day; Damien C Croteau-Chonka; Andrew R Wood; Adam E Locke; Zoltán Kutalik; Ruth J F Loos; Timothy M Frayling; Joel N Hirschhorn; Jian Yang; Naomi R Wray; Peter M Visscher
Journal:  Eur J Hum Genet       Date:  2016-08-24       Impact factor: 4.246

View more
  1 in total

1.  A genotype imputation method for de-identified haplotype reference information by using recurrent neural network.

Authors:  Kaname Kojima; Shu Tadaka; Fumiki Katsuoka; Gen Tamiya; Masayuki Yamamoto; Kengo Kinoshita
Journal:  PLoS Comput Biol       Date:  2020-10-01       Impact factor: 4.475

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.