Literature DB >> 26873930

Trinculo: Bayesian and frequentist multinomial logistic regression for genome-wide association studies of multi-category phenotypes.

Luke Jostins1, Gilean McVean2.   

Abstract

MOTIVATION: For many classes of disease the same genetic risk variants underly many related phenotypes or disease subtypes. Multinomial logistic regression provides an attractive framework to analyze multi-category phenotypes, and explore the genetic relationships between these phenotype categories. We introduce Trinculo, a program that implements a wide range of multinomial analyses in a single fast package that is designed to be easy to use by users of standard genome-wide association study software.
AVAILABILITY AND IMPLEMENTATION: An open source C implementation, with code and binaries for Linux and Mac OSX, is available for download at http://sourceforge.net/projects/trinculo SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. CONTACT: lj4@well.ox.ac.uk.
© The Author 2016. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2016        PMID: 26873930      PMCID: PMC4908321          DOI: 10.1093/bioinformatics/btw075

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Of the many associations discovered by genome-wide association studies, a large number are shared across multiple traits (Parkes ), and many more predict different subtypes of the same disease (Taylor ). A significant challenge for researchers is finding statistical techniques that leverage this genetic sharing to increase power and discover new biology. Multinomial logistic regression provides a powerful and flexible framework to carry out association analyses across multiple traits. The frequentist version has been used in studies of disease subphenotypes (Morris, 2010) and in cross-disorder association studies (Smoller ). Bayesian extensions have been used to select between different models of genetic sharing across multiple traits (Rockett ). However, these studies have fitted the models in an ad-hoc manner, using inefficient R or STATA packages. The lack of a single, flexible tool for multinomial logistic regression has made using these methods difficult for most users, especially when compared to the availability of fast, user-friendly tools for binary logistic regression such as PLINK (Purcell ). To address this,we provide a software package that implements a wide range of multi-category logistic analyses in a single efficient and user-friendly program.

2 Functionality

2.1 User interface

Trinculo uses a command-line interface that is designed to be familiar to users of standard human genetics tools. It uses a PLINK-style format to enter commands and specify input and output files, and reads data in standard formats, including binary PLINK and dosage formats for genotypes and standard text formats for phenotypes and covariates. Sample IDs are automatically matched across different input files, so the user can combine multiple sources of information. Documentation and detailed examples are included with the software.

2.2 Use modes

Trinculo can carry out a wide range of common multi-category analyses, including:

2.2.1 Frequentist multinomial logistic regression

Calculates a combined (omnibus) P-value of association for each variant across all categories using a likelihood ratio test.

2.2.2 Bayesian multinomial logistic regression

Calculates a single Bayes factor for each variant that summarizes the evidence of association across all categories. Users can specify a prior covariance on effect sizes, an independent-effects prior (default) or an empirical prior calculated across all variants.

2.2.3 Bayesian model selection

Generates a marginal likelihood for each possible sharing model, where a sharing model specifies which categories the variant is and is not associated with. The module can calculate Bayes factors in favour of, or against,a variant being shared across categories or uniquely associated to one (see use case below), or posteriors on particular sharing models (if provided with priors on models).

2.2.4 Multi-category simulation

Efficiently simulates genotypes from a multinomial model under ascertainment for given sample sizes and allele frequency. This allows the user to undertake power calculations for the above analyses. All of these modes can include principal components (to control for population stratification) or other covariates, and can include other SNPs as covariates to test for independent effects or carry out stepwise regression. More details on these use modes, and technical details on their implementation, can be found in Supplementary Materials.

2.3 Implementation and speed

Trinculo is written in C and is supported on linux and Mac OS X. Models are fitted using Newton’s method, which, after optimization of the second derivative calculation, we find to be much faster than the BFGS method used by other implementations. Like other genetic association software, Trinculo also lends itself readily to parallelization, either by splitting the data up into chunks and running each chunk on a separate core, or through inbuilt multithreading capacity in the software itself. Trinculo can carry out an omnibus frequentist multinomial association scan for a reasonably sized genome-wide association study (1 M SNPs, 4000 cases spread evenly across two categories, plus 2000 controls, with five principal components) on a laptop (1.7 GHz Intel Core i7) in under 10 h. A very large study (100 000 samples across five categories with 50 000 controls) would take 16 h on 24 cores. The fastest R implementation, NNET (Venables and Ripley, 2002),would take 48 h and 5.8 days for the same analyses, respectively. The python implementation statsmodels (Seabold and Perktold, 2010) would take 24 and 31 h, respectively.

3 Example use: analysis of inflammatory bowel disease data

We applied the Bayesian model selection mode, using an empirical prior, to data from 193 inflammatory bowel disease risk variants (Jostins ). The data came from two IBD phenotypes: Crohn’s disease (CD) and ulcerative colitis (UC),with 17 379 CD cases,13 458 UC cases and 22 442 controls. The disease specific Bayes factors (i.e. the ratio of marginal likelihoods for a model where the variant is only associated with one phenotype and for a model where it is associated with both) for each variant are shown in Figure 1, with the variants with the strongest evidence of phenotype specificity highlighted. We used an empirical prior that estimated the correlation in effect size between the two diseases (estimated as ρ = 0.739).
Fig. 1.

Phenotype specificity Bayes factors for the 193 IBD risk variants. Dots to the left and right of vertical line show stronger evidence of CD and UC specificity, respectively. Colors show classification by P-value (a single-disease frequentist association test using binomial logistic regression), dashed lines mark low-certainty assignments (1/4

Phenotype specificity Bayes factors for the 193 IBD risk variants. Dots to the left and right of vertical line show stronger evidence of CD and UC specificity, respectively. Colors show classification by P-value (a single-disease frequentist association test using binomial logistic regression), dashed lines mark low-certainty assignments (1/4BFs capped at 200 and 1/200 for visibility

4 Discussion

Trinculo is a fast, flexible and easy-to-use tool for multi-category genetic association studies. By providing a wide range of use options, it allows the user to tailor their analysis to their data and experimental design. For instance, if the user wishes to carry out model selection at a risk variant, but wishes to account for the effect of a second risk variant in linkage disquilibrium, then Trinculo’s conditional regression option will handle this automatically. Other use cases not discussed here, such a multinomial fine-mapping or ordinal logistic regression, are also included in the software. We hope that these features will allow researchers to use multinomial logistic regression to answer their own biological questions as easily as they currently use binary logistic regression.
  7 in total

1.  PLINK: a tool set for whole-genome association and population-based linkage analyses.

Authors:  Shaun Purcell; Benjamin Neale; Kathe Todd-Brown; Lori Thomas; Manuel A R Ferreira; David Bender; Julian Maller; Pamela Sklar; Paul I W de Bakker; Mark J Daly; Pak C Sham
Journal:  Am J Hum Genet       Date:  2007-07-25       Impact factor: 11.025

Review 2.  Genetic insights into common pathways and complex relationships among immune-mediated diseases.

Authors:  Miles Parkes; Adrian Cortes; David A van Heel; Matthew A Brown
Journal:  Nat Rev Genet       Date:  2013-08-06       Impact factor: 53.242

3.  Reappraisal of known malaria resistance loci in a large multicenter study.

Authors: 
Journal:  Nat Genet       Date:  2014-09-28       Impact factor: 38.330

4.  Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis.

Authors: 
Journal:  Lancet       Date:  2013-02-28       Impact factor: 79.321

5.  Risk alleles for systemic lupus erythematosus in a large case-control collection and associations with clinical subphenotypes.

Authors:  Kimberly E Taylor; Sharon A Chung; Robert R Graham; Ward A Ortmann; Annette T Lee; Carl D Langefeld; Chaim O Jacob; M Ilyas Kamboh; Marta E Alarcón-Riquelme; Betty P Tsao; Kathy L Moser; Patrick M Gaffney; John B Harley; Michelle Petri; Susan Manzi; Peter K Gregersen; Timothy W Behrens; Lindsey A Criswell
Journal:  PLoS Genet       Date:  2011-02-17       Impact factor: 5.917

6.  Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease.

Authors:  Luke Jostins; Stephan Ripke; Rinse K Weersma; Richard H Duerr; Dermot P McGovern; Ken Y Hui; James C Lee; L Philip Schumm; Yashoda Sharma; Carl A Anderson; Jonah Essers; Mitja Mitrovic; Kaida Ning; Isabelle Cleynen; Emilie Theatre; Sarah L Spain; Soumya Raychaudhuri; Philippe Goyette; Zhi Wei; Clara Abraham; Jean-Paul Achkar; Tariq Ahmad; Leila Amininejad; Ashwin N Ananthakrishnan; Vibeke Andersen; Jane M Andrews; Leonard Baidoo; Tobias Balschun; Peter A Bampton; Alain Bitton; Gabrielle Boucher; Stephan Brand; Carsten Büning; Ariella Cohain; Sven Cichon; Mauro D'Amato; Dirk De Jong; Kathy L Devaney; Marla Dubinsky; Cathryn Edwards; David Ellinghaus; Lynnette R Ferguson; Denis Franchimont; Karin Fransen; Richard Gearry; Michel Georges; Christian Gieger; Jürgen Glas; Talin Haritunians; Ailsa Hart; Chris Hawkey; Matija Hedl; Xinli Hu; Tom H Karlsen; Limas Kupcinskas; Subra Kugathasan; Anna Latiano; Debby Laukens; Ian C Lawrance; Charlie W Lees; Edouard Louis; Gillian Mahy; John Mansfield; Angharad R Morgan; Craig Mowat; William Newman; Orazio Palmieri; Cyriel Y Ponsioen; Uros Potocnik; Natalie J Prescott; Miguel Regueiro; Jerome I Rotter; Richard K Russell; Jeremy D Sanderson; Miquel Sans; Jack Satsangi; Stefan Schreiber; Lisa A Simms; Jurgita Sventoraityte; Stephan R Targan; Kent D Taylor; Mark Tremelling; Hein W Verspaget; Martine De Vos; Cisca Wijmenga; David C Wilson; Juliane Winkelmann; Ramnik J Xavier; Sebastian Zeissig; Bin Zhang; Clarence K Zhang; Hongyu Zhao; Mark S Silverberg; Vito Annese; Hakon Hakonarson; Steven R Brant; Graham Radford-Smith; Christopher G Mathew; John D Rioux; Eric E Schadt; Mark J Daly; Andre Franke; Miles Parkes; Severine Vermeire; Jeffrey C Barrett; Judy H Cho
Journal:  Nature       Date:  2012-11-01       Impact factor: 49.962

7.  A powerful approach to sub-phenotype analysis in population-based genetic association studies.

Authors:  Andrew P Morris; Cecilia M Lindgren; Eleftheria Zeggini; Nicholas J Timpson; Timothy M Frayling; Andrew T Hattersley; Mark I McCarthy
Journal:  Genet Epidemiol       Date:  2010-05       Impact factor: 2.135

  7 in total
  8 in total

Review 1.  From genome-wide associations to candidate causal variants by statistical fine-mapping.

Authors:  Daniel J Schaid; Wenan Chen; Nicholas B Larson
Journal:  Nat Rev Genet       Date:  2018-08       Impact factor: 53.242

2.  Resolving TYK2 locus genotype-to-phenotype differences in autoimmunity.

Authors:  Calliope A Dendrou; Adrian Cortes; Lydia Shipman; Hayley G Evans; Kathrine E Attfield; Luke Jostins; Thomas Barber; Gurman Kaur; Subita Balaram Kuttikkatte; Oliver A Leach; Christiane Desel; Soren L Faergeman; Jane Cheeseman; Matt J Neville; Stephen Sawcer; Alastair Compston; Adam R Johnson; Christine Everett; John I Bell; Fredrik Karpe; Mark Ultsch; Charles Eigenbrot; Gil McVean; Lars Fugger
Journal:  Sci Transl Med       Date:  2016-11-02       Impact factor: 17.956

Review 3.  Statistical methods to detect pleiotropy in human complex traits.

Authors:  Sophie Hackinger; Eleftheria Zeggini
Journal:  Open Biol       Date:  2017-11       Impact factor: 6.411

4.  Fine-mapping inflammatory bowel disease loci to single-variant resolution.

Authors:  Hailiang Huang; Ming Fang; Luke Jostins; Maša Umićević Mirkov; Gabrielle Boucher; Carl A Anderson; Vibeke Andersen; Isabelle Cleynen; Adrian Cortes; François Crins; Mauro D'Amato; Valérie Deffontaine; Julia Dmitrieva; Elisa Docampo; Mahmoud Elansary; Kyle Kai-How Farh; Andre Franke; Ann-Stephan Gori; Philippe Goyette; Jonas Halfvarson; Talin Haritunians; Jo Knight; Ian C Lawrance; Charlie W Lees; Edouard Louis; Rob Mariman; Theo Meuwissen; Myriam Mni; Yukihide Momozawa; Miles Parkes; Sarah L Spain; Emilie Théâtre; Gosia Trynka; Jack Satsangi; Suzanne van Sommeren; Severine Vermeire; Ramnik J Xavier; Rinse K Weersma; Richard H Duerr; Christopher G Mathew; John D Rioux; Dermot P B McGovern; Judy H Cho; Michel Georges; Mark J Daly; Jeffrey C Barrett
Journal:  Nature       Date:  2017-06-28       Impact factor: 49.962

5.  AB_SA: Accessory genes-Based Source Attribution - tracing the source of Salmonella enterica Typhimurium environmental strains.

Authors:  Laurent Guillier; Michèle Gourmelon; Solen Lozach; Sabrina Cadel-Six; Marie-Léone Vignaud; Nanna Munck; Tine Hald; Federica Palma
Journal:  Microb Genom       Date:  2020-07

6.  Genome-wide association study identifies Sjögren's risk loci with functional implications in immune and glandular cells.

Authors:  Bhuwan Khatri; Kandice L Tessneer; Astrid Rasmussen; Farhang Aghakhanian; Tove Ragna Reksten; Adam Adler; Ilias Alevizos; Juan-Manuel Anaya; Lara A Aqrawi; Eva Baecklund; Johan G Brun; Sara Magnusson Bucher; Maija-Leena Eloranta; Fiona Engelke; Helena Forsblad-d'Elia; Stuart B Glenn; Daniel Hammenfors; Juliana Imgenberg-Kreuz; Janicke Liaaen Jensen; Svein Joar Auglænd Johnsen; Malin V Jonsson; Marika Kvarnström; Jennifer A Kelly; He Li; Thomas Mandl; Javier Martín; Gaétane Nocturne; Katrine Brække Norheim; Øyvind Palm; Kathrine Skarstein; Anna M Stolarczyk; Kimberly E Taylor; Maria Teruel; Elke Theander; Swamy Venuturupalli; Daniel J Wallace; Kiely M Grundahl; Kimberly S Hefner; Lida Radfar; David M Lewis; Donald U Stone; C Erick Kaufman; Michael T Brennan; Joel M Guthridge; Judith A James; R Hal Scofield; Patrick M Gaffney; Lindsey A Criswell; Roland Jonsson; Per Eriksson; Simon J Bowman; Roald Omdal; Lars Rönnblom; Blake Warner; Maureen Rischmueller; Torsten Witte; A Darise Farris; Xavier Mariette; Marta E Alarcon-Riquelme; Caroline H Shiboski; Marie Wahren-Herlenius; Wan-Fai Ng; Kathy L Sivils; Indra Adrianto; Gunnel Nordmark; Christopher J Lessard
Journal:  Nat Commun       Date:  2022-07-27       Impact factor: 17.694

7.  Fine-mapping and functional studies highlight potential causal variants for rheumatoid arthritis and type 1 diabetes.

Authors:  Harm-Jan Westra; Marta Martínez-Bonet; Suna Onengut-Gumuscu; Annette Lee; Yang Luo; Nikola Teslovich; Jane Worthington; Javier Martin; Tom Huizinga; Lars Klareskog; Solbritt Rantapaa-Dahlqvist; Wei-Min Chen; Aaron Quinlan; John A Todd; Steve Eyre; Peter A Nigrovic; Peter K Gregersen; Stephen S Rich; Soumya Raychaudhuri
Journal:  Nat Genet       Date:  2018-09-17       Impact factor: 41.307

8.  NRG1, PIP4K2A, and HTR2C as Potential Candidate Biomarker Genes for Several Clinical Subphenotypes of Depression and Bipolar Disorder.

Authors:  Anastasia Levchenko; Natalia M Vyalova; Timur Nurgaliev; Ivan V Pozhidaev; German G Simutkin; Nikolay A Bokhan; Svetlana A Ivanova
Journal:  Front Genet       Date:  2020-08-25       Impact factor: 4.599

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.