Literature DB >> 28575171

karyoploteR: an R/Bioconductor package to plot customizable genomes displaying arbitrary data.

Bernat Gel1, Eduard Serra1,2.   

Abstract

MOTIVATION: Data visualization is a crucial tool for data exploration, analysis and interpretation. For the visualization of genomic data there lacks a tool to create customizable non-circular plots of whole genomes from any species.
RESULTS: We have developed karyoploteR, an R/Bioconductor package to create linear chromosomal representations of any genome with genomic annotations and experimental data plotted along them. Plot creation process is inspired in R base graphics, with a main function creating karyoplots with no data and multiple additional functions, including custom functions written by the end-user, adding data and other graphical elements. This approach allows the creation of highly customizable plots from arbitrary data with complete freedom on data positioning and representation.
AVAILABILITY AND IMPLEMENTATION: karyoploteR is released under Artistic-2.0 License. Source code and documentation are freely available through Bioconductor (http://www.bioconductor.org/packages/karyoploteR) and at the examples and tutorial page at https://bernatgel.github.io/karyoploter_tutorial. CONTACT: bgel@igtp.cat.
© The Author(s) 2017. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2017        PMID: 28575171      PMCID: PMC5870550          DOI: 10.1093/bioinformatics/btx346

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Data visualization is an important part of data analysis. It efficiently summarizes complex data, facilitates exploration and can reveal non-obvious patterns in the data. A natural representation for genomic data is positioned along the genome next to the ideograms of the different chromosomes. This type of representation is specially useful to identify the relation between different types of experimental data and genomic annotations. Various genomic visualization tools are available. Circos (Krzywinski ) produces highly customizable high quality circular plots, as does ites R counterpart RCircos (Zhang ). There are other R packages capable of plotting whole genome diagrams such as: ggbio (Yin ), based on the grammar of graphics that can produce different plot types including ideogram and karyogram plots; IdeoViz (Pai and Ren, 2014), to plot binned data along the genome either as lines or bars; or chromPlot (Oróstica and Verdugo, 2016), to plot up to four datasets given in a predefined format. These packages are either limited in the amount or type of data they can plot (IdeoViz and chromPlot) or have limited customization options (ggbio). In addition, the Bioconductor package Gviz (Hahne and Ivanek, 2016) is a powerful tool to create track-based plots of diverse biological data but it does not produce plots of the whole genome. There is a lack of a tool in R to create non-circular whole genome plots, able to plot arbitrary data in any organism and with ample customization capabilities. Here we present karyoploteR, an extendable and customizable R/Bioconductor package to plot genome ideograms and genomic data positioned along them. It’s inspired on the R base graphics, building plots with multiple successive calls to simple plotting functions.

2 Features

The interface of karyoploteR and the process to create a complete plot is very similar to that of base R graphics. We first create a simple or even empty plot with an initializing function and then add additional graphic elements with successive calls to other plotting functions. The first call creates and initializes the graphical device and returns a karyoplot object with all the information needed to add data to it. The karyoplot object contains a coordinate change function mapping genomic coordinates into plotting coordinates, which is used by all plotting functions. Plotting functions are classified into three groups: the ones adding non-data elements to the plot and two data plotting groups, low-level functions and high-level functions. karyoploteR also takes some ideas from Circos, such as not defining fixed tracks but leaving complete freedom to the user with respect to data positioning using the r0 and r1 parameters. All non-data elements in the karyoplot (main title, chromosome names, …) are drawn by specific functions. These functions accept standard graphical parameters but it’s also possible to swap them for custom functions if a higher level of customization is needed.

2.1 Ideogram plotting

Ideogram plotting is the basic functionality of karyoploteR. Default ideograms can be plotted with a single function call (Fig. 1A). However, itwe possible to customize them: positioning the chromosomes in different arrangements, representing just a subset of chromosomes or change whether the cytobands are included and how they are represented. It is also possible to create different data plotting regions either above or below the ideograms as well as customizing all sizings and margins by changing the values stored in plot.params.
Fig. 1

(A) The complete human GRCh38 genome. This plot is created with the single command ‘plotKaryotype(genome=“hg38”)’. (B) An example of a figure generated by karyoploteR representing different data types plotted in human chromosomes 1 and 2

(A) The complete human GRCh38 genome. This plot is created with the single command ‘plotKaryotype(genome=“hg38”)’. (B) An example of a figure generated by karyoploteR representing different data types plotted in human chromosomes 1 and 2

2.2 Not only human

karyoploteR is not restricted to human data in any way. It is possible to specify other organisms when creating a karyoplot. Genome data for a small set of organisms is included with the package and it will use functionality from regioneR (Gel ) to get it from UCSC or Bioconductor for other genomes. If an organism is not available anywhere, it is possible to plot it providing its genome information. Therefore, if required, it’s possible to create custom genomes for specific purposes.

2.3 Data plotting

Data plotting functions are divided in two groups: low-level and high-level. Low-level data plotting functions plot graphical primitives such as points, lines and polygons. Except for the additional chr parameter, they mimic the behaviour of their base graphics counterparts including the usage of most of the standard graphical parameters. These plotting functions offer a flexible signature and are completely data agnostic: they know nothing about biological concepts, giving the user total freedom on how to use them. High-level functions, in contrast, are used to create more complex data representations. They understand some basic concepts such as ‘genomic region’ and they usually perform some kind of computation prior to data plotting (Fig. 1B).

2.4 Customization and extensibility

In addition to customizing sizings and margins and the using custom genomes, karyoploteR can be extended with custom plotting functions. All internal functions, including the main coordinate change function, are exported and documented in the package vignette. With this it is possible to create custom plotting functions adapted to specific data types and formats.

3 Conclusion

We have developed an R/Bioconductor package, karyoploteR, to plot arbitrary genomes with data positioned on them. It offers a flexible API inspired in R base graphics, with low-level functions to plot graphical primitives and high-level functions to plot complex data. The plots are highly customizable in data positioning and appearance and it is possible to extend the package functionality with custom plotting functions. karyoploteR requires R ≥ 3.4 and Bioconductor ≥ 3.5. More information and examples can be found at the package Bioconductor page and at https://bernatgel.github.io/karyoploter_tutorial
  6 in total

1.  Circos: an information aesthetic for comparative genomics.

Authors:  Martin Krzywinski; Jacqueline Schein; Inanç Birol; Joseph Connors; Randy Gascoyne; Doug Horsman; Steven J Jones; Marco A Marra
Journal:  Genome Res       Date:  2009-06-18       Impact factor: 9.043

2.  Visualizing Genomic Data Using Gviz and Bioconductor.

Authors:  Florian Hahne; Robert Ivanek
Journal:  Methods Mol Biol       Date:  2016

3.  chromPlot: visualization of genomic data in chromosomal context.

Authors:  Karen Y Oróstica; Ricardo A Verdugo
Journal:  Bioinformatics       Date:  2016-03-09       Impact factor: 6.937

4.  ggbio: an R package for extending the grammar of graphics for genomic data.

Authors:  Tengfei Yin; Dianne Cook; Michael Lawrence
Journal:  Genome Biol       Date:  2012-08-31       Impact factor: 13.583

5.  regioneR: an R/Bioconductor package for the association analysis of genomic regions based on permutation tests.

Authors:  Bernat Gel; Anna Díez-Villanueva; Eduard Serra; Marcus Buschbeck; Miguel A Peinado; Roberto Malinverni
Journal:  Bioinformatics       Date:  2015-09-30       Impact factor: 6.937

6.  RCircos: an R package for Circos 2D track plots.

Authors:  Hongen Zhang; Paul Meltzer; Sean Davis
Journal:  BMC Bioinformatics       Date:  2013-08-10       Impact factor: 3.169

  6 in total
  151 in total

1.  Mutalisk: a web-based somatic MUTation AnaLyIS toolKit for genomic, transcriptional and epigenomic signatures.

Authors:  Jongkeun Lee; Andy Jinseok Lee; June-Koo Lee; Jongkeun Park; Youngoh Kwon; Seongyeol Park; Hyonho Chun; Young Seok Ju; Dongwan Hong
Journal:  Nucleic Acids Res       Date:  2018-07-02       Impact factor: 16.971

2.  Germline 16p11.2 Microdeletion Predisposes to Neuroblastoma.

Authors:  Laura E Egolf; Zalman Vaksman; Gonzalo Lopez; Jo Lynne Rokita; Apexa Modi; Patricia V Basta; Hakon Hakonarson; Andrew F Olshan; Sharon J Diskin
Journal:  Am J Hum Genet       Date:  2019-08-29       Impact factor: 11.025

3.  Efficient Nuclease-Directed Integration of Lentivirus Vectors into the Human Ribosomal DNA Locus.

Authors:  Diana Schenkwein; Saira Afzal; Alisa Nousiainen; Manfred Schmidt; Seppo Ylä-Herttuala
Journal:  Mol Ther       Date:  2020-05-23       Impact factor: 11.454

4.  Chromosomal translocations inactivating CDKN2A support a single path for malignant peripheral nerve sheath tumor initiation.

Authors:  Cleofe Romagosa; Eduard Serra; Bernat Gel; Miriam Magallón-Lorenz; Juana Fernández-Rodríguez; Ernest Terribas; Edgar Creus-Batchiller; Anna Estival; Diana Perez Sidelnikova; Héctor Salvador; Alberto Villanueva; Ignacio Blanco; Meritxell Carrió; Conxi Lázaro
Journal:  Hum Genet       Date:  2021-05-31       Impact factor: 4.132

5.  Inferred divergent gene regulation in archaic hominins reveals potential phenotypic differences.

Authors:  Laura L Colbran; Eric R Gamazon; Dan Zhou; Patrick Evans; Nancy J Cox; John A Capra
Journal:  Nat Ecol Evol       Date:  2019-10-07       Impact factor: 15.460

6.  Genome streamlining in a minute herbivore that manipulates its host plant.

Authors:  Robert Greenhalgh; Wannes Dermauw; Joris J Glas; Stephane Rombauts; Nicky Wybouw; Jainy Thomas; Juan M Alba; Ellen J Pritham; Saioa Legarrea; René Feyereisen; Yves Van de Peer; Thomas Van Leeuwen; Richard M Clark; Merijn R Kant
Journal:  Elife       Date:  2020-10-23       Impact factor: 8.140

7.  A High-Resolution Map of Human Enhancer RNA Loci Characterizes Super-enhancer Activities in Cancer.

Authors:  Han Chen; Han Liang
Journal:  Cancer Cell       Date:  2020-10-01       Impact factor: 31.743

8.  EnhancerDB: a resource of transcriptional regulation in the context of enhancers.

Authors:  Ran Kang; Yiming Zhang; Qingqing Huang; Junhua Meng; Ruofan Ding; Yunjian Chang; Lili Xiong; Zhiyun Guo
Journal:  Database (Oxford)       Date:  2019-01-01       Impact factor: 3.451

9.  Transcriptomic and epigenomic dynamics associated with development of human iPSC-derived GABAergic interneurons.

Authors:  George Andrew S Inglis; Ying Zhou; Dillon G Patterson; Christopher D Scharer; Yanfei Han; Jeremy M Boss; Zhexing Wen; Andrew Escayg
Journal:  Hum Mol Genet       Date:  2020-08-29       Impact factor: 6.150

10.  Human iPSC-derived Down syndrome astrocytes display genome-wide perturbations in gene expression, an altered adhesion profile, and increased cellular dynamics.

Authors:  Blandine Ponroy Bally; W Todd Farmer; Emma V Jones; Selin Jessa; J Benjamin Kacerovsky; Alexandre Mayran; Huashan Peng; Julie L Lefebvre; Jacques Drouin; Arnold Hayer; Carl Ernst; Keith K Murai
Journal:  Hum Mol Genet       Date:  2020-03-27       Impact factor: 6.150

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.