Literature DB >> 23803468

Pipit: visualizing functional impacts of structural variations.

Ryo Sakai1, Matthieu Moisse, Joke Reumers, Jan Aerts.   

Abstract

SUMMARY: Pipit is a gene-centric interactive visualization tool designed to study structural genomic variations. Through focusing on individual genes as the functional unit, researchers are able to study and generate hypotheses on the biological impact of different structural variations, for instance, the deletion of dosage-sensitive genes or the formation of fusion genes. Pipit is a cross-platform Java application that visualizes structural variation data from Genome Variation Format files. AVAILABILITY: Executables, source code, sample data, documentation and screencast are available at https://bitbucket.org/biovizleuven/pipit.

Entities:  

Mesh:

Year:  2013        PMID: 23803468      PMCID: PMC3740631          DOI: 10.1093/bioinformatics/btt367

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

Structural variation is defined as a change of genomic DNA greater than 1 kb in size and can be either balanced or unbalanced (Stankiewicz and Lupski, 2010). A structural variation may be benign, it may influence phenotypes, it may predispose to or cause diseases, and it may be transmitted to next generations. In addition, it may also result in the formation of new transcripts through gene fusion or exon skipping when breakpoints disrupt gene structures (Feuk ). Understanding the structural change in the genome, as well as its functional impact, is critical for studying phenotypic variations and genetic diseases in human and model organisms (Korbel ). When structural variations are studied, the structure of these variants are usually visualized by encoding breakpoints on a linear or circular layout (Herbig ; Krzywinski ). Other visual encodings such as dot plot and graph representations show the changes in the reference genome introduced by the structural rearrangement (Nielsen and Wong, 2012). These visual encodings pose two challenges in analysis of structural variation data. First, they focus on size and position, rather than the consequence of structural rearrangement. Second, they introduce an implicit correlation between size and effect, whereas small structural variations may have more severe effects than some large variations. Therefore, an effective visual encoding of structural and copy-number variations based on functional units is essential to gain insights into impact on human health and disease. We introduce an exploratory visualization tool, named Pipit, which uses a novel gene-centric visual encoding to examine how gene structures individually are affected by structural variants. Pipit takes a Genome Variation Format file as input and encodes an affected gene structure as a disk, showing how it is modified and oriented towards other genes. By means of this data abstraction, the focus of the structural variants is shifted from the genomic distance to the biologically relevant feature, i.e. the underlying genes. This encoding enables swift visual inspection of genes that are affected by structural variants, which may be examined further if deemed interesting. Not constrained or biased by variation sizes, Pipit permits in-depth analysis and exploration of structural variation data.

2 FEATURES

Pipit is an interactive visualization tool developed in Processing, an open source programming language and integrated development environment based on Java. The executables are available for Linux, Mac OS X and Windows. The input file is in Genome Variation Format because it is a well-standardized format for genomic structural variation data, extended from the Generic Feature Format (Reese ), and both the European Bioinformatics Institute and the National Center for Biotechnology Information (NCBI) curate, archive and make data publicly available in this format via DGVa (http://www.ebi.ac.uk/dgva) and dbVar (http://www.ncbi.nlm.nih.gov/dbvar), respectively (Lappalainen ). Pipit also uses the gene track, cytoband and gene ontology (GO) information obtained from the UCSC table browser database (Karolchik ). The current version supports the data from human (NCBI build 36 and 37) and mouse (NCBI build 37/mm9 and GRCm build 38/mm10), but the user can load the data for other model organisms. In addition, the user can load a comma-separated values file containing the Ensembl gene ID and ordinal or categorical information, such as haploinsufficiency scores (Huang ) and known oncogenes (Fig. 1 and Supplementary Material).
Fig. 1.

Pipit visualizing the structural variation data of the mouse genome (estd118). (A) The deletion of the Met gene on chromosome 6 is selected. (B) The associated genomic information of the region is shown on the bottom panel. (C) Categories of the known oncogenes for the mouse are listed, and the Oncogenes category is selected. GO terms associated with affected genes are listed below on the right panel

Pipit visualizing the structural variation data of the mouse genome (estd118). (A) The deletion of the Met gene on chromosome 6 is selected. (B) The associated genomic information of the region is shown on the bottom panel. (C) Categories of the known oncogenes for the mouse are listed, and the Oncogenes category is selected. GO terms associated with affected genes are listed below on the right panel Each affected gene is represented as a disk and filled according to which part of its structure is influenced by a structural variation (Fig. 1; see Supplementary Material). Structural variant types are based on the data file and colour coded as shown on the right panel. Unaffected genes are compressed into a line connecting affected genes. The default promoter length upstream of the gene sequence can be set when loading the data. There are four layouts to explore the structural variation data. The default view is the collapsed ordered gene view (Fig. 1). In this view, a coloured disk may represent an affected gene or consecutively ordered genes that are affected by the same type of structural variation. In the expanded view, all affected genes are individually visualized. The chromosome position view shows affected variants mapped to their genomic positions. Lastly, the unit plot view visualizes affected genes by their type of structural variant event, such as deletion, tandem-duplication and so forth (see Supplementary Material). When a disk unit is selected (Fig. 1A), the underlying genes and structural variation events are shown on the bottom panel, along with the chromosome with cytobands and transcripts with their exonic regions coloured in dark grey (Fig. 1B). The gene name shown in this panel links to the Ensembl browser and displays the genomic region. In the right panel (Fig. 1C), the coloured square boxes for each structural variant types serve as radio buttons to hide or show a selected type of variant. The text field below searches for a specific gene amongst affected genes. GO terms associated with affected genes are listed, and conversely selecting a GO term highlights associated genes in the main view. A screenshot can be saved as a PDF by pressing the ‘p’ on the keyboard.

3 DISCUSSION

Pipit introduces a novel visualization paradigm and user interaction method to examine structural variants based on the affected gene region. It facilitates the study of structural variants from a gene-centric perspective to investigate various events, for instance, how known dosage-sensitive genes are affected or whether gene fusions are formed. Future work includes extending the functional unit to encompass important regulatory elements as elaborated in the ENCODE project (Birney ). Additionally, functions to compare multiple samples from various data formats, such as Variant Call Format (Danecek ), and options to incorporate other conventional linear or circular representations are essential for more comprehensive study of structural variants. Funding: iMinds [SBO 2012], University of Leuven Research Council [SymBioSys PFV/10/016, GOA/10/009] and European Union Framework Programme 7 [HEALTH-F2-2008-223040 ‘CHeartED’]. Conflict of Interest: none declared.
  12 in total

1.  The UCSC Table Browser data retrieval tool.

Authors:  Donna Karolchik; Angela S Hinrichs; Terrence S Furey; Krishna M Roskin; Charles W Sugnet; David Haussler; W James Kent
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

Review 2.  Structural variation in the human genome.

Authors:  Lars Feuk; Andrew R Carson; Stephen W Scherer
Journal:  Nat Rev Genet       Date:  2006-02       Impact factor: 53.242

Review 3.  Structural variation in the human genome and its role in disease.

Authors:  Paweł Stankiewicz; James R Lupski
Journal:  Annu Rev Med       Date:  2010       Impact factor: 13.739

4.  Circos: an information aesthetic for comparative genomics.

Authors:  Martin Krzywinski; Jacqueline Schein; Inanç Birol; Joseph Connors; Randy Gascoyne; Doug Horsman; Steven J Jones; Marco A Marra
Journal:  Genome Res       Date:  2009-06-18       Impact factor: 9.043

5.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

Authors:  Ewan Birney; John A Stamatoyannopoulos; Anindya Dutta; Roderic Guigó; Thomas R Gingeras; Elliott H Margulies; Zhiping Weng; Michael Snyder; Emmanouil T Dermitzakis; Robert E Thurman; Michael S Kuehn; Christopher M Taylor; Shane Neph; Christoph M Koch; Saurabh Asthana; Ankit Malhotra; Ivan Adzhubei; Jason A Greenbaum; Robert M Andrews; Paul Flicek; Patrick J Boyle; Hua Cao; Nigel P Carter; Gayle K Clelland; Sean Davis; Nathan Day; Pawandeep Dhami; Shane C Dillon; Michael O Dorschner; Heike Fiegler; Paul G Giresi; Jeff Goldy; Michael Hawrylycz; Andrew Haydock; Richard Humbert; Keith D James; Brett E Johnson; Ericka M Johnson; Tristan T Frum; Elizabeth R Rosenzweig; Neerja Karnani; Kirsten Lee; Gregory C Lefebvre; Patrick A Navas; Fidencio Neri; Stephen C J Parker; Peter J Sabo; Richard Sandstrom; Anthony Shafer; David Vetrie; Molly Weaver; Sarah Wilcox; Man Yu; Francis S Collins; Job Dekker; Jason D Lieb; Thomas D Tullius; Gregory E Crawford; Shamil Sunyaev; William S Noble; Ian Dunham; France Denoeud; Alexandre Reymond; Philipp Kapranov; Joel Rozowsky; Deyou Zheng; Robert Castelo; Adam Frankish; Jennifer Harrow; Srinka Ghosh; Albin Sandelin; Ivo L Hofacker; Robert Baertsch; Damian Keefe; Sujit Dike; Jill Cheng; Heather A Hirsch; Edward A Sekinger; Julien Lagarde; Josep F Abril; Atif Shahab; Christoph Flamm; Claudia Fried; Jörg Hackermüller; Jana Hertel; Manja Lindemeyer; Kristin Missal; Andrea Tanzer; Stefan Washietl; Jan Korbel; Olof Emanuelsson; Jakob S Pedersen; Nancy Holroyd; Ruth Taylor; David Swarbreck; Nicholas Matthews; Mark C Dickson; Daryl J Thomas; Matthew T Weirauch; James Gilbert; Jorg Drenkow; Ian Bell; XiaoDong Zhao; K G Srinivasan; Wing-Kin Sung; Hong Sain Ooi; Kuo Ping Chiu; Sylvain Foissac; Tyler Alioto; Michael Brent; Lior Pachter; Michael L Tress; Alfonso Valencia; Siew Woh Choo; Chiou Yu Choo; Catherine Ucla; Caroline Manzano; Carine Wyss; Evelyn Cheung; Taane G Clark; James B Brown; Madhavan Ganesh; Sandeep Patel; Hari Tammana; Jacqueline Chrast; Charlotte N Henrichsen; Chikatoshi Kai; Jun Kawai; Ugrappa Nagalakshmi; Jiaqian Wu; Zheng Lian; Jin Lian; Peter Newburger; Xueqing Zhang; Peter Bickel; John S Mattick; Piero Carninci; Yoshihide Hayashizaki; Sherman Weissman; Tim Hubbard; Richard M Myers; Jane Rogers; Peter F Stadler; Todd M Lowe; Chia-Lin Wei; Yijun Ruan; Kevin Struhl; Mark Gerstein; Stylianos E Antonarakis; Yutao Fu; Eric D Green; Ulaş Karaöz; Adam Siepel; James Taylor; Laura A Liefer; Kris A Wetterstrand; Peter J Good; Elise A Feingold; Mark S Guyer; Gregory M Cooper; George Asimenos; Colin N Dewey; Minmei Hou; Sergey Nikolaev; Juan I Montoya-Burgos; Ari Löytynoja; Simon Whelan; Fabio Pardi; Tim Massingham; Haiyan Huang; Nancy R Zhang; Ian Holmes; James C Mullikin; Abel Ureta-Vidal; Benedict Paten; Michael Seringhaus; Deanna Church; Kate Rosenbloom; W James Kent; Eric A Stone; Serafim Batzoglou; Nick Goldman; Ross C Hardison; David Haussler; Webb Miller; Arend Sidow; Nathan D Trinklein; Zhengdong D Zhang; Leah Barrera; Rhona Stuart; David C King; Adam Ameur; Stefan Enroth; Mark C Bieda; Jonghwan Kim; Akshay A Bhinge; Nan Jiang; Jun Liu; Fei Yao; Vinsensius B Vega; Charlie W H Lee; Patrick Ng; Atif Shahab; Annie Yang; Zarmik Moqtaderi; Zhou Zhu; Xiaoqin Xu; Sharon Squazzo; Matthew J Oberley; David Inman; Michael A Singer; Todd A Richmond; Kyle J Munn; Alvaro Rada-Iglesias; Ola Wallerman; Jan Komorowski; Joanna C Fowler; Phillippe Couttet; Alexander W Bruce; Oliver M Dovey; Peter D Ellis; Cordelia F Langford; David A Nix; Ghia Euskirchen; Stephen Hartman; Alexander E Urban; Peter Kraus; Sara Van Calcar; Nate Heintzman; Tae Hoon Kim; Kun Wang; Chunxu Qu; Gary Hon; Rosa Luna; Christopher K Glass; M Geoff Rosenfeld; Shelley Force Aldred; Sara J Cooper; Anason Halees; Jane M Lin; Hennady P Shulha; Xiaoling Zhang; Mousheng Xu; Jaafar N S Haidar; Yong Yu; Yijun Ruan; Vishwanath R Iyer; Roland D Green; Claes Wadelius; Peggy J Farnham; Bing Ren; Rachel A Harte; Angie S Hinrichs; Heather Trumbower; Hiram Clawson; Jennifer Hillman-Jackson; Ann S Zweig; Kayla Smith; Archana Thakkapallayil; Galt Barber; Robert M Kuhn; Donna Karolchik; Lluis Armengol; Christine P Bird; Paul I W de Bakker; Andrew D Kern; Nuria Lopez-Bigas; Joel D Martin; Barbara E Stranger; Abigail Woodroffe; Eugene Davydov; Antigone Dimas; Eduardo Eyras; Ingileif B Hallgrímsdóttir; Julian Huppert; Michael C Zody; Gonçalo R Abecasis; Xavier Estivill; Gerard G Bouffard; Xiaobin Guan; Nancy F Hansen; Jacquelyn R Idol; Valerie V B Maduro; Baishali Maskeri; Jennifer C McDowell; Morgan Park; Pamela J Thomas; Alice C Young; Robert W Blakesley; Donna M Muzny; Erica Sodergren; David A Wheeler; Kim C Worley; Huaiyang Jiang; George M Weinstock; Richard A Gibbs; Tina Graves; Robert Fulton; Elaine R Mardis; Richard K Wilson; Michele Clamp; James Cuff; Sante Gnerre; David B Jaffe; Jean L Chang; Kerstin Lindblad-Toh; Eric S Lander; Maxim Koriabine; Mikhail Nefedov; Kazutoyo Osoegawa; Yuko Yoshinaga; Baoli Zhu; Pieter J de Jong
Journal:  Nature       Date:  2007-06-14       Impact factor: 49.962

6.  A standard variation file format for human genome sequences.

Authors:  Martin G Reese; Barry Moore; Colin Batchelor; Fidel Salas; Fiona Cunningham; Gabor T Marth; Lincoln Stein; Paul Flicek; Mark Yandell; Karen Eilbeck
Journal:  Genome Biol       Date:  2010-08-26       Impact factor: 13.583

7.  Characterising and predicting haploinsufficiency in the human genome.

Authors:  Ni Huang; Insuk Lee; Edward M Marcotte; Matthew E Hurles
Journal:  PLoS Genet       Date:  2010-10-14       Impact factor: 5.917

8.  Paired-end mapping reveals extensive structural variation in the human genome.

Authors:  Jan O Korbel; Alexander Eckehart Urban; Jason P Affourtit; Brian Godwin; Fabian Grubert; Jan Fredrik Simons; Philip M Kim; Dean Palejev; Nicholas J Carriero; Lei Du; Bruce E Taillon; Zhoutao Chen; Andrea Tanzer; A C Eugenia Saunders; Jianxiang Chi; Fengtang Yang; Nigel P Carter; Matthew E Hurles; Sherman M Weissman; Timothy T Harkins; Mark B Gerstein; Michael Egholm; Michael Snyder
Journal:  Science       Date:  2007-09-27       Impact factor: 47.728

9.  The variant call format and VCFtools.

Authors:  Petr Danecek; Adam Auton; Goncalo Abecasis; Cornelis A Albers; Eric Banks; Mark A DePristo; Robert E Handsaker; Gerton Lunter; Gabor T Marth; Stephen T Sherry; Gilean McVean; Richard Durbin
Journal:  Bioinformatics       Date:  2011-06-07       Impact factor: 6.937

10.  GenomeRing: alignment visualization based on SuperGenome coordinates.

Authors:  A Herbig; G Jäger; F Battke; K Nieselt
Journal:  Bioinformatics       Date:  2012-06-15       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.