Literature DB >> 27288499

GenVisR: Genomic Visualizations in R.

Zachary L Skidmore1, Alex H Wagner1, Robert Lesurf1, Katie M Campbell1, Jason Kunisaki1, Obi L Griffith2, Malachi Griffith3.   

Abstract

UNLABELLED: Visualizing and summarizing data from genomic studies continues to be a challenge. Here, we introduce the GenVisR package to addresses this challenge by providing highly customizable, publication-quality graphics focused on cohort level genome analyses. GenVisR provides a rapid and easy-to-use suite of genomic visualization tools, while maintaining a high degree of flexibility by leveraging the abilities of ggplot2 and Bioconductor.
AVAILABILITY AND IMPLEMENTATION: GenVisR is an R package available via Bioconductor (https://bioconductor.org/packages/GenVisR) under GPLv3. Support is available via GitHub (https://github.com/griffithlab/GenVisR/issues) and the Bioconductor support website. CONTACTS: obigriffith@wustl.edu or mgriffit@wustl.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author 2016. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2016        PMID: 27288499      PMCID: PMC5039916          DOI: 10.1093/bioinformatics/btw325

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

The continued development of massively parallel sequencing technologies has led to an exponential growth in the amount of genomic data produced (Kodama ). This growth has in turn enabled scientists to investigate increasingly large, cohort-level genomic datasets. Generating intuitive visualizations is a critical component in recognizing patterns and investigating underlying biological properties in cohorts under study. A significant bottleneck exists, however, between data generation and subsequent visualization and interpretation (Good ). Additionally, generating publication-quality figures for effective communication of these data typically requires ad hoc methods such as manual creation or extensive graphic manipulation with third party software. This process is both time intensive and difficult to automate/reproduce. Further, the absence of software supporting multiple species can make this process even more demanding. Here, we present GenVisR, a Bioconductor package to address these issues. GenVisR provides a user-friendly, flexible and comprehensive suite of tools for visualizing complex genomic data in three categories (small variants, copy number alterations and data quality) for multiple species of interest.

2 Visualization of small variants

The identification of small variants (SNVs and indels) within a genomic context is of paramount importance for the elucidation of the genetic basis of disease. Numerous tools and resources have been created to identify variants in sequencing data (Wang ). Conversely, few tools exist to visually display and summarize these variants across sample cohorts. Given a gene of interest, it is often useful to view variant occurrences in the context of the translated protein product (Zhang ). A variety of options exist to achieve this; however tools that offer both automation and flexibility to perform this task are lacking (Supplementary Table S1) (Griffith ; Leiserson ; Nilsen ; Yin ; Zhou ). The function lolliplot was developed to allow for precise control over visualization options (Fig. 1A). This includes the ability to choose Ensembl annotation databases for protein domain displays and to plot multiple tracks of mutations above and below the protein representation. Another common objective of genomic studies is to identify variant recurrence across multiple genes within a cohort. The GenVisR function waterfall was developed to calculate and rapidly illustrate the mutational burden of variants on both a gene and sample level, and further differentiates between variant types (Fig. 1B) (Krysiak ; Ma ; Wagner ). Mutually exclusive genomic events at the variant level are emphasized in this visualization by arranging samples in a hierarchical fashion such that samples with mutations in the most recurrently mutated genes are ranked first. Finally, it is often informative to investigate the rate of transition and transversion mutations observed across a set of cases. For example, lung tumors originating from patients with a history of tobacco smoke exposure display a pattern of enrichment for C to A or G to T transversions (Govindan ). The function TvTi (transversion/transition) was developed to improve recognition of these types of patterns within a cohort.
Fig. 1.

Selected representation of GenVisR visualizations. (A) Output from lolliplot for select TCGA breast cancer samples (Cancer Genome Atlas Network, 2012) shows two mutational hotspots in PIK3CA within the accessory and catalytic kinase domains. (B) Output from waterfall shows mutations for five genes across 50 select TCGA breast cancer samples with mutation type indicated by colour in the grid and per sample/gene mutation rates indicated in the top and left sidebars. (C) Output from genCov displays coverage (bottom plots) showing focal deletions in sample A (last exon) and B (second intron) within a gene of interest. GC content (top plot) is encoded via a range of colours for each exon. (D) Output from lohSpec for HCC1395 (Griffith ), HCC38 and HCC1143 (Daemen ) breast cancer cell lines shows LOH events, across all chromosomes, shaded as dark blue. (E) Output from covBars shows cumulative coverage for 10 samples indicating that for each sample, at least ∼75% of targeted regions were covered at ≥ 35× depth. (F) Output from compIdent for the HCC1395 breast cancer cell line (tumor and normal) shows variant coverage (bottom plot) and SNP allele fraction (main plot) indicating highly related samples. Note that 4/24 positions are discrepant and likely result from extensive LOH in this cell line

Selected representation of GenVisR visualizations. (A) Output from lolliplot for select TCGA breast cancer samples (Cancer Genome Atlas Network, 2012) shows two mutational hotspots in PIK3CA within the accessory and catalytic kinase domains. (B) Output from waterfall shows mutations for five genes across 50 select TCGA breast cancer samples with mutation type indicated by colour in the grid and per sample/gene mutation rates indicated in the top and left sidebars. (C) Output from genCov displays coverage (bottom plots) showing focal deletions in sample A (last exon) and B (second intron) within a gene of interest. GC content (top plot) is encoded via a range of colours for each exon. (D) Output from lohSpec for HCC1395 (Griffith ), HCC38 and HCC1143 (Daemen ) breast cancer cell lines shows LOH events, across all chromosomes, shaded as dark blue. (E) Output from covBars shows cumulative coverage for 10 samples indicating that for each sample, at least ∼75% of targeted regions were covered at ≥ 35× depth. (F) Output from compIdent for the HCC1395 breast cancer cell line (tumor and normal) shows variant coverage (bottom plot) and SNP allele fraction (main plot) indicating highly related samples. Note that 4/24 positions are discrepant and likely result from extensive LOH in this cell line

3 Visualization of copy number alterations

Copy number alterations occurring within the genome are implicated in a variety of diseases (Beroukhim ). The function GenCov illustrates amplifications and deletions across one or more samples in a genomic region of interest (Fig. 1C). A key feature of GenCov is the effective use of plot space, especially for large regions of interest, via the differential compression of various features (introns, exons, UTR) within the region of interest. For a broader view the function cnView plots copy number calls, and the corresponding ideogram, for an individual sample at the chromosome level. The function cnSpec displays amplifications and deletions on a still larger scale via copy number segments calls. This information is displayed as a heat map arranged in a grid indexed by chromosomes and samples. Alternatively, cnFreq displays the frequency of samples within a cohort that are observed to have copy number gains or losses at specific genomic loci. In addition to copy number changes, loss of heterozygosity (LOH) often plays an important role in genomic diseases. For example, in acute myeloid leukemia copy neutral LOH has been associated with shorter complete remission and worse overall survival (Gronseth ). The function lohSpec displays LOH regions observed within a cohort (Fig. 1D) by plotting a sliding window mean difference in variant allele fractions for tumor and normal germline variants.

4 Visualization of data quality

In genomic studies, the quality of sequencing data is of critical importance to the proper interpretation of observed variations. Therefore, we provide a suite of functions focused on data quality assessment and visualization. The first of these, covBars, provides a framework for displaying the sequencing coverage achieved for targeted bases in a study (Fig. 1E). A second function, compIdent, aids in the identification of mix-ups among samples that are thought to originate from the same individual (Fig. 1F). This is achieved by displaying the variant allele fraction of SNPs in relation to each sample. By default, 24 biallelic ‘identity SNPs’ (Pengelly ) are used to determine sample identity.

5 Example usage

GenVisR was developed with the naïve R user in mind. Functions are well documented and have reasonable defaults set for optional parameters. To illustrate, creating Figure 1B was as simple as executing the waterfall function call on a standard MAF (version 2.4) file containing variant mutation data and choosing which genes to plot: genes = c(“PIK3CA”, “TP53”, “USH2”, “MLL3”, “BRCA1”) GenVisR::waterfall(x=maf_file, plotGenes=genes) The MAF file format originally developed for The Cancer Genome Atlas project (Cancer Genome Atlas Research Network, 2008) is the default file format accepted by waterfall. This format was chosen based on its simplicity and accessibility. A number of resources exist to convert VCF files common to most variant callers to MAF format. In the interest of maintaining flexibility, the waterfall and other GenVisR functions are able to accept alternative file types as input.

6 Conclusion

GenVisR provides features and functions for many popular genomic visualizations not otherwise available in a single convenient package (Table S1). By leveraging the abilities of ggplot2 (Wickham, 2009) it confers a level of customizability not previously possible. Virtually any aspect of a plot can be changed simply by adding an additional layer onto the graphical object. Thus, GenVisR allows for publication quality figures with a minimal amount of required input and data manipulation while maintaining a high degree of flexibility and customizability.
  19 in total

1.  MAGI: visualization and collaborative annotation of genomic aberrations.

Authors:  Mark D M Leiserson; Connor C Gramazio; Jason Hu; Hsin-Ta Wu; David H Laidlaw; Benjamin J Raphael
Journal:  Nat Methods       Date:  2015-06       Impact factor: 28.547

2.  Prognostic significance of acquired copy-neutral loss of heterozygosity in acute myeloid leukemia.

Authors:  Christine M Gronseth; Scott E McElhone; Barry E Storer; Kathleen A Kroeger; Vicky Sandhu; Matthew L Fero; Frederick R Appelbaum; Elihu H Estey; Min Fang
Journal:  Cancer       Date:  2015-05-29       Impact factor: 6.860

3.  Exploring genomic alteration in pediatric cancer using ProteinPaint.

Authors:  Xin Zhou; Michael N Edmonson; Mark R Wilkinson; Aman Patel; Gang Wu; Yu Liu; Yongjin Li; Zhaojie Zhang; Michael C Rusch; Matthew Parker; Jared Becksfort; James R Downing; Jinghui Zhang
Journal:  Nat Genet       Date:  2016-01       Impact factor: 38.330

4.  Genome Modeling System: A Knowledge Management Platform for Genomics.

Authors:  Malachi Griffith; Obi L Griffith; Scott M Smith; Avinash Ramu; Matthew B Callaway; Anthony M Brummett; Michael J Kiwala; Adam C Coffman; Allison A Regier; Ben J Oberkfell; Gabriel E Sanderson; Thomas P Mooney; Nathaniel G Nutter; Edward A Belter; Feiyu Du; Robert L Long; Travis E Abbott; Ian T Ferguson; David L Morton; Mark M Burnett; James V Weible; Joshua B Peck; Adam Dukes; Joshua F McMichael; Justin T Lolofie; Brian R Derickson; Jasreet Hundal; Zachary L Skidmore; Benjamin J Ainscough; Nathan D Dees; William S Schierding; Cyriac Kandoth; Kyung H Kim; Charles Lu; Christopher C Harris; Nicole Maher; Christopher A Maher; Vincent J Magrini; Benjamin S Abbott; Ken Chen; Eric Clark; Indraniel Das; Xian Fan; Amy E Hawkins; Todd G Hepler; Todd N Wylie; Shawn M Leonard; William E Schroeder; Xiaoqi Shi; Lynn K Carmichael; Matthew R Weil; Richard W Wohlstadter; Gary Stiehr; Michael D McLellan; Craig S Pohl; Christopher A Miller; Daniel C Koboldt; Jason R Walker; James M Eldred; David E Larson; David J Dooling; Li Ding; Elaine R Mardis; Richard K Wilson
Journal:  PLoS Comput Biol       Date:  2015-07-09       Impact factor: 4.475

5.  Detecting somatic point mutations in cancer genome sequencing data: a comparison of mutation callers.

Authors:  Qingguo Wang; Peilin Jia; Fei Li; Haiquan Chen; Hongbin Ji; Donald Hucks; Kimberly Brown Dahlman; William Pao; Zhongming Zhao
Journal:  Genome Med       Date:  2013-10-11       Impact factor: 11.117

6.  Modeling precision treatment of breast cancer.

Authors:  Anneleen Daemen; Obi L Griffith; Laura M Heiser; Nicholas J Wang; Oana M Enache; Zachary Sanborn; Francois Pepin; Steffen Durinck; James E Korkola; Malachi Griffith; Joe S Hur; Nam Huh; Jongsuk Chung; Leslie Cope; Mary Jo Fackler; Christopher Umbricht; Saraswati Sukumar; Pankaj Seth; Vikas P Sukhatme; Lakshmi R Jakkula; Yiling Lu; Gordon B Mills; Raymond J Cho; Eric A Collisson; Laura J van't Veer; Paul T Spellman; Joe W Gray
Journal:  Genome Biol       Date:  2013       Impact factor: 13.583

7.  ggbio: an R package for extending the grammar of graphics for genomic data.

Authors:  Tengfei Yin; Dianne Cook; Michael Lawrence
Journal:  Genome Biol       Date:  2012-08-31       Impact factor: 13.583

8.  Copynumber: Efficient algorithms for single- and multi-track copy number segmentation.

Authors:  Gro Nilsen; Knut Liestøl; Peter Van Loo; Hans Kristian Moen Vollan; Marianne B Eide; Oscar M Rueda; Suet-Feung Chin; Roslin Russell; Lars O Baumbusch; Carlos Caldas; Anne-Lise Børresen-Dale; Ole Christian Lingjaerde
Journal:  BMC Genomics       Date:  2012-11-04       Impact factor: 3.969

9.  Organizing knowledge to enable personalization of medicine in cancer.

Authors:  Benjamin M Good; Benjamin J Ainscough; Josh F McMichael; Andrew I Su; Obi L Griffith
Journal:  Genome Biol       Date:  2014-08-27       Impact factor: 13.583

10.  DGIdb 2.0: mining clinically relevant drug-gene interactions.

Authors:  Alex H Wagner; Adam C Coffman; Benjamin J Ainscough; Nicholas C Spies; Zachary L Skidmore; Katie M Campbell; Kilannin Krysiak; Deng Pan; Joshua F McMichael; James M Eldred; Jason R Walker; Richard K Wilson; Elaine R Mardis; Malachi Griffith; Obi L Griffith
Journal:  Nucleic Acids Res       Date:  2015-11-03       Impact factor: 16.971

View more
  101 in total

1.  Melorheostosis: Exome sequencing of an associated dermatosis implicates postzygotic mosaicism of mutated KRAS.

Authors:  Michael P Whyte; Malachi Griffith; Lee Trani; Steven Mumm; Gary S Gottesman; William H McAlister; Kilannin Krysiak; Robert Lesurf; Zachary L Skidmore; Katie M Campbell; Ilana S Rosman; Susan Bayliss; Vinieth N Bijanki; Angela Nenninger; Brian A Van Tine; Obi L Griffith; Elaine R Mardis
Journal:  Bone       Date:  2017-04-21       Impact factor: 4.398

2.  Recurrent somatic mutations affecting B-cell receptor signaling pathway genes in follicular lymphoma.

Authors:  Kilannin Krysiak; Felicia Gomez; Brian S White; Matthew Matlock; Christopher A Miller; Lee Trani; Catrina C Fronick; Robert S Fulton; Friederike Kreisel; Amanda F Cashen; Kenneth R Carson; Melissa M Berrien-Elliott; Nancy L Bartlett; Malachi Griffith; Obi L Griffith; Todd A Fehniger
Journal:  Blood       Date:  2016-11-14       Impact factor: 22.113

3.  Genetic factors rather than blast reduction determine outcomes of allogeneic HCT in BCR-ABL-negative MPN in blast phase.

Authors:  Vikas Gupta; James A Kennedy; Jose-Mario Capo-Chichi; Soyoung Kim; Zhen-Huan Hu; Edwin P Alyea; Uday R Popat; Ronald M Sobecks; Bart L Scott; Aaron T Gerds; Rachel B Salit; H Joachim Deeg; Ryotara Nakamura; Wael Saber
Journal:  Blood Adv       Date:  2020-11-10

4.  NeoPalAna: Neoadjuvant Palbociclib, a Cyclin-Dependent Kinase 4/6 Inhibitor, and Anastrozole for Clinical Stage 2 or 3 Estrogen Receptor-Positive Breast Cancer.

Authors:  Cynthia X Ma; Feng Gao; Jingqin Luo; Donald W Northfelt; Matthew Goetz; Andres Forero; Jeremy Hoog; Michael Naughton; Foluso Ademuyiwa; Rama Suresh; Karen S Anderson; Julie Margenthaler; Rebecca Aft; Timothy Hobday; Timothy Moynihan; William Gillanders; Amy Cyr; Timothy J Eberlein; Tina Hieken; Helen Krontiras; Zhanfang Guo; Michelle V Lee; Nicholas C Spies; Zachary L Skidmore; Obi L Griffith; Malachi Griffith; Shana Thomas; Caroline Bumb; Kiran Vij; Cynthia Huang Bartlett; Maria Koehler; Hussam Al-Kateb; Souzan Sanati; Matthew J Ellis
Journal:  Clin Cancer Res       Date:  2017-03-07       Impact factor: 12.531

5.  D3Oncoprint: Stand-Alone Software to Visualize and Dynamically Explore Annotated Genomic Mutation Files.

Authors:  Alida Palmisano; Yingdong Zhao; Richard M Simon
Journal:  JCO Clin Cancer Inform       Date:  2018-12

6.  Identification of recurrent noncoding mutations in B-cell lymphoma using capture Hi-C.

Authors:  Alex J Cornish; Phuc H Hoang; Sara E Dobbins; Philip J Law; Daniel Chubb; Giulia Orlando; Richard S Houlston
Journal:  Blood Adv       Date:  2019-01-08

7.  Structural Alterations Driving Castration-Resistant Prostate Cancer Revealed by Linked-Read Genome Sequencing.

Authors:  Srinivas R Viswanathan; Gavin Ha; Andreas M Hoff; Jeremiah A Wala; Jian Carrot-Zhang; Christopher W Whelan; Nicholas J Haradhvala; Samuel S Freeman; Sarah C Reed; Justin Rhoades; Paz Polak; Michelle Cipicchio; Stephanie A Wankowicz; Alicia Wong; Tushar Kamath; Zhenwei Zhang; Gregory J Gydush; Denisse Rotem; J Christopher Love; Gad Getz; Stacey Gabriel; Cheng-Zhong Zhang; Scott M Dehm; Peter S Nelson; Eliezer M Van Allen; Atish D Choudhury; Viktor A Adalsteinsson; Rameen Beroukhim; Mary-Ellen Taplin; Matthew Meyerson
Journal:  Cell       Date:  2018-06-18       Impact factor: 41.582

8.  Prospective Decision Analysis Study of Clinical Genomic Testing in Metastatic Breast Cancer: Impact on Outcomes and Patient Perceptions.

Authors:  Daniel G Stover; Raquel E Reinbolt; Elizabeth J Adams; Sarah Asad; Katlyn Tolliver; Mahmoud Abdel-Rasoul; Cynthia D Timmers; Susan Gillespie; James L Chen; Siraj Mahamed Ali; Katharine A Collier; Mathew A Cherian; Anne M Noonan; Sagar Sardesai; Jeffrey VanDeusen; Robert Wesolowski; Nicole Williams; Clara N Lee; Charles L Shapiro; Erin R Macrae; Bhuvaneswari Ramaswamy; Maryam B Lustberg
Journal:  JCO Precis Oncol       Date:  2019-11-18

9.  Truncating Prolactin Receptor Mutations Promote Tumor Growth in Murine Estrogen Receptor-Alpha Mammary Carcinomas.

Authors:  Obi L Griffith; Szeman Ruby Chan; Malachi Griffith; Kilannin Krysiak; Zachary L Skidmore; Jasreet Hundal; Julie A Allen; Cora D Arthur; Daniele Runci; Mattia Bugatti; Alexander P Miceli; Heather Schmidt; Lee Trani; Krishna-Latha Kanchi; Christopher A Miller; David E Larson; Robert S Fulton; William Vermi; Richard K Wilson; Robert D Schreiber; Elaine R Mardis
Journal:  Cell Rep       Date:  2016-09-27       Impact factor: 9.423

10.  RICTOR Amplification Promotes NSCLC Cell Proliferation through Formation and Activation of mTORC2 at the Expense of mTORC1.

Authors:  Laura C Kim; Christopher H Rhee; Jin Chen
Journal:  Mol Cancer Res       Date:  2020-08-14       Impact factor: 5.852

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.