Literature DB >> 26249815

cgmisc: enhanced genome-wide association analyses and visualization.

Marcin Kierczak1, Jagoda Jabłońska2, Simon K G Forsberg3, Matteo Bianchi2, Katarina Tengvall2, Mats Pettersson1, Veronika Scholz2, Jennifer R S Meadows2, Patric Jern2, Örjan Carlborg3, Kerstin Lindblad-Toh4.   

Abstract

UNLABELLED: High-throughput genotyping and sequencing technologies facilitate studies of complex genetic traits and provide new research opportunities. The increasing popularity of genome-wide association studies (GWAS) leads to the discovery of new associated loci and a better understanding of the genetic architecture underlying not only diseases, but also other monogenic and complex phenotypes. Several softwares are available for performing GWAS analyses, R environment being one of them.
RESULTS: We present cgmisc, an R package that enables enhanced data analysis and visualization of results from GWAS. The package contains several utilities and modules that complement and enhance the functionality of the existing software. It also provides several tools for advanced visualization of genomic data and utilizes the power of the R language to aid in preparation of publication-quality figures. Some of the package functions are specific for the domestic dog (Canis familiaris) data.
AVAILABILITY AND IMPLEMENTATION: The package is operating system-independent and is available from: https://github.com/cgmisc-team/cgmisc CONTACT: marcin.kierczak@imbim.uu.se. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author 2015. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2015        PMID: 26249815      PMCID: PMC4653382          DOI: 10.1093/bioinformatics/btv426

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

High-throughput genotyping and sequencing has opened several new research opportunities to study complex genetic traits and genome-wide association studies (GWAS) is a popular way to analyse genotyping data from segregating populations. Widely used GWAS softwares include PLINK (Purcell ), EMMAX (Kang ), GCTA (Yang ) and GenABEL (Aulchenko ). A single software package is rarely sufficiently complete to cover all aspects of a typical genome-wide analysis pipeline. Transferring data between different softwares is often a laborious process. One advantage of the GenABEL package, that often makes it the software of choice, is that in addition to GWAS-specific functionalities, it provides access to the R (R Development Core Team, 2008) language and community-contributed packages. Here we developed a number of algorithms and solutions to several common GWAS tasks. Some of these solutions aim at facilitating production of publication-quality data/results visualization (see, e.g. Fig. 1). Several cgmisc functions have been used to produce results and visualizations for peer-reviewed publications, e.g. Tengvall , Owczarek-Lipska and Olsson . Here, we present these and more functions in the form of the documented and supported R package cgmisc.
Fig. 1.

An example figure generated using the cgmisc package: p-values from Fisher’s exact test for allele counts highlight the most divergent regions between two populations. Colour of the points correspond to their LD (r2) with the most significant marker (the reference)

An example figure generated using the cgmisc package: p-values from Fisher’s exact test for allele counts highlight the most divergent regions between two populations. Colour of the points correspond to their LD (r2) with the most significant marker (the reference)

2 Description

cgmisc (ver. 2.9.10), provides 34 functions for the analysis and visualization of GWAS data. A few functions in the package are tailored for working with data from the domestic dog (Canis familiaris) but are easy to adjust for analysing other species. Some functions rely on third-party softwares which are freely available for research purposes. Internally, cgmisc functions use data structures implemented in the GenABEL package. For all functions and parameters in the package, we use the period-separated naming convention (Bååth, 2012). Functions provided by the cgmisc package can be grouped into the following categories: Analyses of population structure. Population strata can be compared based on their allele-frequency differences, using either (i) fixation index F or (ii) Fisher’s exact test for reference allele count observed versus the allele count expected under the null hypothesis of no population structure. Tools related to association scans. Enhanced quantile–quantile (qq) plot showing (i) theoretical and (ii) empirical confidence intervals as well as (iii) empirical significance thresholds. We also implemented an extended version of the Manhattan plot, with colour-coded information on linkage disequilibrium (LD) between a selected marker and its neighbours plus a minor-allele frequency panel. Easy ways of interfacing variance GWAS scans (vGWAS; Shen ) and bigRR (Shen ) packages (BLUP, ridge regression) as well as simple visualization of per-genotype distribution of phenotypic values are provided. We also complement the standard tests for association with a basic scan for gene-gene interaction (epistasis). Heterozygosity analyses. We provide functions for the detection and visualization of runs of homozygosity along the genome to facilitate the detection of suggestive selective sweeps and highlight regions that may be challenging for standard association mapping tools. Analyses and visualization of linkage structure. The cgmisc package provides tools for assessing average haplotype lengths by visualization of LD-decay as a function of the distance between markers. In addition, the package offers export functions that enable haplotype phasing using PHASE (Stephens ) and haplotype visualization using Haploview (Barrett ). In addition, we implemented the marker clumping procedure used by PLINK. Improved annotation. The package provides functions for genome annotation in the domestic dog (Canis familiaris, canFam3.1 assembly), offers the direct interaction with the UCSC Genome Browser (Kuhn ) and improved analyses of pseudo-autosomal regions on the X chromosome. In addition, we provide a convenient method for retrieving and plotting information on endogenous retroviral sequences identified by the RetroTector software (Sperber ). Data subsetting, manipulation and visualization. cgmisc can generate windows for sliding-window (also with overlap) and jumping-window type analyses. A number of convenience functions enables users to, e.g. retrieve information about LD or chromosome start/end point coordinates. All functions were designed to be user-friendly with attention to the quality of visualizations. Complete documentation is available upon cgmisc installation. In order to facilitate package usage, we included a quick tutorial (package vignette in the supplementary information) that takes the user through all steps necessary to use each of the package functions. The tutorial is based on the included example dataset. A detailed description of the methods and algorithms used by the functions is provided in the vignette and documentation.

Funding

M.K. was supported by the Swedish Foundation for Strategic Research. J.R.S.M., M.K. and Ö.C. received support from FORMAS. M.K., S.F. and Ö.C. were supported by the Swedish Research Council. K.L.-T. and M.K. were supported by the European Research Council. J.J. was supported by the European Commission, Erasmus mobility grant. Conflict of Interest: none declared.
  13 in total

1.  A new statistical method for haplotype reconstruction from population data.

Authors:  M Stephens; N J Smith; P Donnelly
Journal:  Am J Hum Genet       Date:  2001-03-09       Impact factor: 11.025

2.  Haploview: analysis and visualization of LD and haplotype maps.

Authors:  J C Barrett; B Fry; J Maller; M J Daly
Journal:  Bioinformatics       Date:  2004-08-05       Impact factor: 6.937

3.  GCTA: a tool for genome-wide complex trait analysis.

Authors:  Jian Yang; S Hong Lee; Michael E Goddard; Peter M Visscher
Journal:  Am J Hum Genet       Date:  2010-12-17       Impact factor: 11.025

4.  GenABEL: an R library for genome-wide association analysis.

Authors:  Yurii S Aulchenko; Stephan Ripke; Aaron Isaacs; Cornelia M van Duijn
Journal:  Bioinformatics       Date:  2007-03-23       Impact factor: 6.937

5.  A novel generalized ridge regression method for quantitative genetics.

Authors:  Xia Shen; Moudud Alam; Freddy Fikse; Lars Rönnegård
Journal:  Genetics       Date:  2013-01-18       Impact factor: 4.562

6.  Inheritance beyond plain heritability: variance-controlling genes in Arabidopsis thaliana.

Authors:  Xia Shen; Mats Pettersson; Lars Rönnegård; Örjan Carlborg
Journal:  PLoS Genet       Date:  2012-08-02       Impact factor: 5.917

7.  Two loci on chromosome 5 are associated with serum IgE levels in Labrador retrievers.

Authors:  Marta Owczarek-Lipska; Béatrice Lauber; Vivianne Molitor; Sabrina Meury; Marcin Kierczak; Katarina Tengvall; Matthew T Webster; Vidhya Jagannathan; Yvette Schlotter; Ton Willemse; Anke Hendricks; Kerstin Bergvall; Ake Hedhammar; Göran Andersson; Kerstin Lindblad-Toh; Claude Favrot; Petra Roosje; Eliane Marti; Tosso Leeb
Journal:  PLoS One       Date:  2012-06-15       Impact factor: 3.240

8.  Genome-wide analysis in German shepherd dogs reveals association of a locus on CFA 27 with atopic dermatitis.

Authors:  Katarina Tengvall; Marcin Kierczak; Kerstin Bergvall; Mia Olsson; Marcel Frankowiack; Fabiana H G Farias; Gerli Pielberg; Örjan Carlborg; Tosso Leeb; Göran Andersson; Lennart Hammarström; Åke Hedhammar; Kerstin Lindblad-Toh
Journal:  PLoS Genet       Date:  2013-05-09       Impact factor: 5.917

9.  The UCSC genome browser and associated tools.

Authors:  Robert M Kuhn; David Haussler; W James Kent
Journal:  Brief Bioinform       Date:  2012-08-20       Impact factor: 11.622

10.  Thorough investigation of a canine autoinflammatory disease (AID) confirms one main risk locus and suggests a modifier locus for amyloidosis.

Authors:  Mia Olsson; Linda Tintle; Marcin Kierczak; Michele Perloski; Noriko Tonomura; Andrew Lundquist; Eva Murén; Max Fels; Katarina Tengvall; Gerli Pielberg; Caroline Dufaure de Citres; Laetitia Dorso; Jérôme Abadie; Jeanette Hanson; Anne Thomas; Peter Leegwater; Åke Hedhammar; Kerstin Lindblad-Toh; Jennifer R S Meadows
Journal:  PLoS One       Date:  2013-10-09       Impact factor: 3.240

View more
  9 in total

1.  Genome-wide association study for bone strength in laying hens.

Authors:  Biaty Raymond; Anna Maria Johansson; Heather Anne McCormack; Robert Hall Fleming; Matthias Schmutz; Ian Chisholm Dunn; Dirk Jan De Koning
Journal:  J Anim Sci       Date:  2018-06-29       Impact factor: 3.159

2.  Novel protective and risk loci in hip dysplasia in German Shepherds.

Authors:  Lea I Mikkola; Saila Holopainen; Anu K Lappalainen; Tiina Pessa-Morikawa; Thomas J P Augustine; Meharji Arumilli; Marjo K Hytönen; Osmo Hakosalo; Hannes Lohi; Antti Iivanainen
Journal:  PLoS Genet       Date:  2019-07-19       Impact factor: 5.917

3.  Pigment Intensity in Dogs is Associated with a Copy Number Variant Upstream of KITLG.

Authors:  Kalie Weich; Verena Affolter; Daniel York; Robert Rebhun; Robert Grahn; Angelica Kallenberg; Danika Bannasch
Journal:  Genes (Basel)       Date:  2020-01-09       Impact factor: 4.096

4.  A Missense Variant in ALDH5A1 Associated with Canine Succinic Semialdehyde Dehydrogenase Deficiency (SSADHD) in the Saluki Dog.

Authors:  Karen M Vernau; Eduard Struys; Anna Letko; Kevin D Woolard; Miriam Aguilar; Emily A Brown; Derek D Cissell; Peter J Dickinson; G Diane Shelton; Michael R Broome; K Michael Gibson; Phillip L Pearl; Florian König; Thomas J Van Winkle; Dennis O'Brien; B Roos; Kaspar Matiasek; Vidhya Jagannathan; Cord Drögemüller; Tamer A Mansour; C Titus Brown; Danika L Bannasch
Journal:  Genes (Basel)       Date:  2020-09-02       Impact factor: 4.096

5.  A QTL for conformation of back and croup influences lateral gait quality in Icelandic horses.

Authors:  Maria K Rosengren; Heiðrún Sigurðardóttir; Marina Solé; Gabriella Lindgren; Susanne Eriksson; Rakan Naboulsi; Ahmad Jouni; Miguel Novoa-Bravo; Elsa Albertsdóttir; Þorvaldur Kristjánsson; Marie Rhodin; Åsa Viklund; Brandon D Velie; Juan J Negro
Journal:  BMC Genomics       Date:  2021-04-14       Impact factor: 3.969

6.  flashfm-ivis: interactive visualisation for fine-mapping of multiple quantitative traits.

Authors:  Feng Zhou; Adam S Butterworth; Jennifer L Asimit
Journal:  Bioinformatics       Date:  2022-07-06       Impact factor: 6.931

7.  Genetic Regulation of Transcriptional Variation in Natural Arabidopsis thaliana Accessions.

Authors:  Yanjun Zan; Xia Shen; Simon K G Forsberg; Örjan Carlborg
Journal:  G3 (Bethesda)       Date:  2016-08-09       Impact factor: 3.154

8.  IntAssoPlot: An R Package for Integrated Visualization of Genome-Wide Association Study Results With Gene Structure and Linkage Disequilibrium Matrix.

Authors:  Fengyu He; Shuangcheng Ding; Hongwei Wang; Feng Qin
Journal:  Front Genet       Date:  2020-03-20       Impact factor: 4.599

9.  Whole-genome genotyping and resequencing reveal the association of a deletion in the complex interferon alpha gene cluster with hypothyroidism in dogs.

Authors:  Matteo Bianchi; Nima Rafati; Åsa Karlsson; Eva Murén; Carl-Johan Rubin; Katarina Sundberg; Göran Andersson; Olle Kämpe; Åke Hedhammar; Kerstin Lindblad-Toh; Gerli Rosengren Pielberg
Journal:  BMC Genomics       Date:  2020-04-16       Impact factor: 3.969

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.