Literature DB >> 15814068

PBAT: a comprehensive software package for genome-wide association analysis of complex family-based studies.

Kristel Van Steen1, Christoph Lange.   

Abstract

The PBAT software package (v2.5) provides a unique set of tools for complex family-based association analysis at a genome-wide level. PBAT can handle nuclear families with missing parental genotypes, extended pedigrees with missing genotypic information, analysis of single nucleotide polymorphisms (SNPs), haplotype analysis, quantitative traits, multivariate/longitudinal data and time to onset phenotypes. The data analysis can be adjusted for covariates and gene/environment interactions. Haplotype-based features include sliding windows and the reconstruction of the haplotypes of the probands. PBAT's screening tools allow the user successfully to handle the multiple comparisons problem at a genome-wide level, even for 100,000 SNPs and more. Moreover, PBAT is computationally fast. A genome scan of 300,000 SNPs in 2,000 trios takes 4 central processing unit (CPU)-days. PBAT is available for Linux, Sun Solaris and Windows XP.

Entities:  

Mesh:

Year:  2005        PMID: 15814068      PMCID: PMC3525120          DOI: 10.1186/1479-7364-2-1-67

Source DB:  PubMed          Journal:  Hum Genomics        ISSN: 1473-9542            Impact factor:   4.639


Genetic association studies take advantage of the fact that we can measure genotypes directly via either protein electro-phoretic or molecular genetic methods. The goal is to explain the variation in the disease trait of interest using an individual's genotype as a genetic marker. There are two basic types of study design that are used in genetic association analysis: standard (population-based, case-control or cohort) and family-based. Analytical methods appropriate for these two designs are quite different. The family-based design is attractive for many reasons. For one, the design protects against a finding of spurious association, due to population admixture or stratification. The reason for robustness is that the analysis uses parental genotypes to determine the distribution of the test statistic. The analysis cannot be biased by admixture or stratification because the case and control alleles are drawn from the same subjects; therefore, they have the same genetic background. The other key advantage of family-based studies is the way the multiple testing problem can be handled. Using the conditional mean model approach, [1-3] the data are first analysed in a 'screening step'. The analysis of the screening step does not bias the significance level of sub-sequently computed tests. In this screening step, the scientist can look at all possible associations between the markers and traits and select a subset of 'promising' marker - trait combinations -- typically five combinations [3]. Only the selected subset is then put forward to the hypothesis-testing step. A general paradigm for testing the association between a response variable (disease trait) and a predictor (genotype as a marker) is a regression analysis, since this can accommodate all types of outcomes and all types of predictors. Although regression analysis has many advantages and is widely used in epidemiological investigations, it does require specifying a model for how the trait depends upon the genotype. If the model is incorrect, the power may be reduced. Depending upon study design and analysis, there may also be consequences for the validity. Cordell and Clayton [4] have described a unified approach to performing genetic association analysis with nuclear families (or case/control data) in a regression context. Case-parent trios are analysed via conditional logistic regression using the case and three pseudo-controls derived from the untransmitted parental alleles. The beauty of the method is that it can be performed using standard statistical software and that additional effects, such as parent-of-origin, effects can be included. The major drawback is that, to date, the technique has not been adapted to include extended pedigrees without splitting them up into nuclear families. A large number of computer programs are available for family-based association tests, including AFBAC, [5] QTDT, [6] FBAT, [7-11] TRANSMIT [12] and PDT [13]. These software packages primarily focus on the computation of various test statistics, whereas the PBAT software package also exhibits pre- and post-analysis features. The PBAT software can be downloaded from http://www.biostat.harvard.edu/~clange/default.htm. PBAT is an interactive software package that provides tools for the design and data analysis of family-based association studies. It is available for Windows XP, Linux and UNIX operating systems. The newest version of PBAT (v2.5) includes many features that were not available in earlier versions, [14] such as haplotype analysis tools that can be invoked using batch mode or user interface, more flexible specifications in power calculations and allowance for discrete trait distribution when applicable. In particular, PBAT incorporates the features of the family-based tests of association (FBAT) package http://www.biostat.harvard.edu/fbat/fbat.htm but provides many additional options for designing association/linkage studies and analysing data with multiple continuous traits. Perhaps the most striking feature, which gives PBATa unique advantage over most available software in the field, is its implementation of the screening techniques -- that is, the conditional mean model approach [1,2] -- that allow the user to handle the multiple comparison problem at a genome-wide level [3]. Further advantages of PBAT are the analytical power and sample size calculations for family-based association tests [15,16]. PBAT is especially well suited for quantitative traits while possibly accounting for important predictors. The cornerstone of the package is the unified approach to FBAT, introduced by Rabinowitz and Laird [17] and Laird et al. [10]. FBAT builds on the original Transmission Disequilibrium Test (TDT) method, [18] in which alleles transmitted to affected offspring are compared with the expected distribution of alleles among offspring. It has been generalised so that tests of different genetic models, tests of different sampling designs, tests involving different disease phenotypes, tests with missing parents and tests of different null hypotheses are all in the same framework. In particular, the FBAT statistic is based on a linear combination of offspring genotypes and traits: where V = Var(S) and Tij represents the coded phenotype (ie the phenotype adjusted for any covariates) of the j-th offspring in family i. Xij denotes the offspring's coded genotype at the locus being tested. It depends on the genetic model under consideration. The expected distribution is derived using Mendel's law of segregation and conditioning on the sufficient statistics for any nuisance parameters under the null hypothesis, the null hypothesis being 'no linkage and no association' or 'no association, in the presence of linkage'. PBAT provides methods for a wide range of situations that arise in family-based association studies using FBAT statistics. More specifically, there are two main components: tools for the planning of family-based association studies and data analysis tools. In terms of study planning, PBAT computes the power for study designs that consist of different family types with varying numbers of offspring, under different ascertainment conditions and allowing for missing parental genotypes. The data analysis tools available in PBAT provide options to test linkage or association in the presence of linkage, using (bi-allelic or multi-allelic) marker or haplotype data, single or multiple traits (eg measurements recorded repeatedly over time) that may be quantitative, qualitative or time-to-onset, with nuclear families as well as extended pedigrees. PBAT easily handles covariates and gene/covariate interactions in all computed FBAT statistics. Furthermore, PBAT can also be used for post-study power calculations and construction of the most powerful test statistic. For situations in which multiple traits and markers are given, PBAT's screening tools reduce the large pool of traits and markers and select the most promising combinations in terms of the FBAT statistic. Using PBAT's screening tools the present authors have shown that genome-wide association studies using families are realisable in terms of data analysis [3]. The key concept of the implemented screening techniques is the conditional mean model approach, [1,2] for which the data space is partitioned into two independent testing sets. This allows one to control the type I error rates and to overcome one of the most important statistical hurdles when analysing genome-wide association studies with thousands of markers: the multiple comparison problem. The screening technique maintains its protective character for extended datasets with a few hundred thousand SNPs. It should be noted that, in general, adding more SNPs comes at the cost of power loss when corrections for multiple testing need to be applied (eg Bonferroni-type corrections to control type I error). These screening methods are hardly affected by adding 'non-causal' SNPs. In addition, they are robust against effects of population stratification and admixture, since the final decision in the screening process is based on FBATs, which guard against these confounding factors. Finally, PBAT's screening tools are most successful in detecting common disease susceptibility loci. This is particularly attractive in the light of the HapMap project, [19] which aims to describe the common patterns of genetic variation in humans. The problem of detecting rare disease-associated SNPs remains; however, this is a general problem rather than a problem specifically related to the screening techniques of PBAT. Applying the authors' screening tools using the haplo-type features of PBAT (eg using sliding windows acknowledging the linkage disequilibrium structures present in the data) may be more beneficial. This is work in progress. TRAN-SMIT [12] is another program for transmission disequilibrium testing that uses marker haplotypes based on several closely linked markers. By contrast with PBAT, however, TRANSMIT leads to elevated false-positive rates in the presence of population admixture and does not handle quantitative traits [20]. Moreover, it has no built-in functions for performing screening on a genome-wide level. PBAT's data analysis tools have been extensively validated. These include the data analysis tools using univariate and multivariate traits, [21] multivariate/longitudinal FBAT models, [22] time-to-onset traits (Su; personal communication), haplotype analysis (Randolph; personal communication) and genomic screening [3]. PBAT is under constant development. Future developments include refined screening tools and guidelines that apply to haplotype-based genomic screening, power calculations for haplotype analysis and further effort towards a PBAT compendium of commands and an extensive documentation for its users.
  19 in total

1.  A general test of association for quantitative traits in nuclear families.

Authors:  G R Abecasis; L R Cardon; W O Cookson
Journal:  Am J Hum Genet       Date:  2000-01       Impact factor: 11.025

2.  A unified stepwise regression procedure for evaluating the relative effects of polymorphisms within a gene using case/control or family data: application to HLA in type 1 diabetes.

Authors:  Heather J Cordell; David G Clayton
Journal:  Am J Hum Genet       Date:  2001-11-21       Impact factor: 11.025

3.  Family-based tests for associating haplotypes with general phenotype data: application to asthma genetics.

Authors:  Steve Horvath; Xin Xu; Stephen L Lake; Edwin K Silverman; Scott T Weiss; Nan M Laird
Journal:  Genet Epidemiol       Date:  2004-01       Impact factor: 2.135

4.  Power calculations for a general class of family-based association tests: dichotomous traits.

Authors:  Christoph Lange; Nan M Laird
Journal:  Am J Hum Genet       Date:  2002-08-12       Impact factor: 11.025

5.  PBAT: tools for family-based association studies.

Authors:  Christoph Lange; Dawn DeMeo; Edwin K Silverman; Scott T Weiss; Nan M Laird
Journal:  Am J Hum Genet       Date:  2004-02       Impact factor: 11.025

6.  Using the noninformative families in family-based association tests: a powerful new testing strategy.

Authors:  Christoph Lange; Dawn DeMeo; Edwin K Silverman; Scott T Weiss; Nan M Laird
Journal:  Am J Hum Genet       Date:  2003-09-18       Impact factor: 11.025

7.  A new powerful non-parametric two-stage approach for testing multiple phenotypes in family-based association studies.

Authors:  Christoph Lange; Helen Lyon; Dawn DeMeo; Benjamin Raby; Edwin K Silverman; Scott T Weiss
Journal:  Hum Hered       Date:  2003       Impact factor: 0.444

8.  The family based association test method: strategies for studying general genotype--phenotype associations.

Authors:  S Horvath; X Xu; N M Laird
Journal:  Eur J Hum Genet       Date:  2001-04       Impact factor: 4.246

9.  A discordant-sibship test for disequilibrium and linkage: no need for parental data.

Authors:  S Horvath; N M Laird
Journal:  Am J Hum Genet       Date:  1998-12       Impact factor: 11.025

10.  Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM).

Authors:  R S Spielman; R E McGinnis; W J Ewens
Journal:  Am J Hum Genet       Date:  1993-03       Impact factor: 11.025

View more
  23 in total

1.  Family-based association analysis of NAV2 gene with the risk and age at onset of Alzheimer's disease.

Authors:  Ke-Sheng Wang; Ying Liu; Chun Xu; Xuefeng Liu; Xingguang Luo
Journal:  J Neuroimmunol       Date:  2017-06-27       Impact factor: 3.478

2.  Macrophage migration inhibitory factor and autism spectrum disorders.

Authors:  Elena L Grigorenko; Summer S Han; Carolyn M Yrigollen; Lin Leng; Yuka Mizue; George M Anderson; Erik J Mulder; Annelies de Bildt; Ruud B Minderaa; Fred R Volkmar; Joseph T Chang; Richard Bucala
Journal:  Pediatrics       Date:  2008-08       Impact factor: 7.124

3.  Association of ADAM10 and CAMK2A polymorphisms with conduct disorder: evidence from family-based studies.

Authors:  Xue-Qiu Jian; Ke-Sheng Wang; Tie-Jian Wu; Joel J Hillhouse; Jerald E Mullersman
Journal:  J Abnorm Child Psychol       Date:  2011-08

4.  Bayesian Cox Proportional Hazards Model in Survival Analysis of HACE1 Gene with Age at Onset of Alzheimer's Disease.

Authors:  Ke-Sheng Wang; Ying Liu; Shaoqing Gong; Chun Xu; Xin Xie; Liang Wang; Xingguang Luo
Journal:  Int J Clin Biostat Biom       Date:  2017-12-01

5.  Non-parametric Survival Analysis of EPG5 Gene with Age at Onset of Alzheimer's Disease.

Authors:  Ke-Sheng Wang; Xuefeng Liu; Changchun Xie; Ying Liu; Chun Xu
Journal:  J Mol Neurosci       Date:  2016-09-01       Impact factor: 3.444

6.  Association of IRF5 in UK SLE families identifies a variant involved in polyadenylation.

Authors:  Deborah S Cunninghame Graham; Harinder Manku; Susanne Wagner; Julia Reid; Kirsten Timms; Alexander Gutin; Jerry S Lanchbury; Tim J Vyse
Journal:  Hum Mol Genet       Date:  2006-12-22       Impact factor: 6.150

7.  An omnibus test for family-based association studies with multiple SNPs and multiple phenotypes.

Authors:  Jessica Lasky-Su; Amy Murphy; Matthew B McQueen; Scott Weiss; Christoph Lange
Journal:  Eur J Hum Genet       Date:  2010-01-20       Impact factor: 4.246

8.  An association between the PTGS2 rs5275 polymorphism and colorectal cancer risk in families with inherited non-syndromic predisposition.

Authors:  Jason Ross; Linda Lockett; Diana Brookes; Bruce Tabor; Konsta Duesing; Michael Buckley; Trevor Lockett; Peter Molloy; Finlay Macrae; Graeme Young; Ignacio Blanco; Gabriel Capella; Garry N Hannan
Journal:  Eur J Hum Genet       Date:  2013-03-27       Impact factor: 4.246

9.  Fast Genome-Wide QTL Association Mapping on Pedigree and Population Data.

Authors:  Hua Zhou; John Blangero; Thomas D Dyer; Kei-Hang K Chan; Kenneth Lange; Eric M Sobel
Journal:  Genet Epidemiol       Date:  2016-12-12       Impact factor: 2.135

10.  Genes controlling affiliative behavior as candidate genes for autism.

Authors:  Carolyn M Yrigollen; Summer S Han; Anna Kochetkova; Tammy Babitz; Joseph T Chang; Fred R Volkmar; James F Leckman; Elena L Grigorenko
Journal:  Biol Psychiatry       Date:  2008-01-22       Impact factor: 13.382

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.