Literature DB >> 31931182

shinyChromosome: An R/Shiny Application for Interactive Creation of Non-circular Plots of Whole Genomes.

Yiming Yu1, Wen Yao2, Yuping Wang3, Fangfang Huang3.   

Abstract

Non-circular plots of whole genomes are natural representations of genomic data aligned along all chromosomes. Currently, there is no specialized graphical user interface (GUI) designed to produce non-circular whole genome diagrams, and the use of existing tools requires considerable coding effort from users. Moreover, such tools also require improvement, including the addition of new functionalities. To address these issues, we developed a new R/Shiny application, named shinyChromosome, as a GUI for the interactive creation of non-circular whole genome diagrams. shinyChromosome can be easily installed on personal computers for own use as well as on local or public servers for community use. Publication-quality images can be readily generated and annotated from user input using diverse widgets. shinyChromosome is deployed at http://150.109.59.144:3838/shinyChromosome/, http://shinyChromosome.ncpgr.cn, and https://yimingyu.shinyapps.io/shinyChromosome for online use. The source code and manual of shinyChromosome are freely available at https://github.com/venyao/shinyChromosome.
Copyright © 2019 The Authors. Published by Elsevier B.V. All rights reserved.

Entities:  

Keywords:  Genomic data visualization; Graphical user interface; Non-circular whole genome plot; Shiny application; shinyChromosome

Mesh:

Substances:

Year:  2020        PMID: 31931182      PMCID: PMC7056921          DOI: 10.1016/j.gpb.2019.07.003

Source DB:  PubMed          Journal:  Genomics Proteomics Bioinformatics        ISSN: 1672-0229            Impact factor:   7.691


Introduction

Biological data analysis is a challenging task in the post-genomic era. Data visualization is frequently utilized to convey concepts, communicate new discoveries, summarize and analyze data, as well as develop hypotheses. Circos plots are a common method of visualizing genomic data in a circular format and dozens of tools have been developed to generate Circos plots [1], [2], [3], [4]. Linear representations of whole genome data along all chromosomes are another common genome visualization format used to display the relationship between experimental data and genome annotation in a variety of species. Although several tools have been developed to create non-circular plots, number of such tools is much lower compared to that of the tools for creating circular plots. chromPlot and IdeoViz are two R packages that are designed to visualize whole genome data along all chromosomes in a non-circular format [5], [6]. However, only a limited number of plot types with few customization options can be produced by chromPlot or IdeoViz [7]. ggbio is a powerful R package that can visualize local or global genomic data in both circular and non-circular formats [8]. However, to create non-circular whole genome diagrams with multiple data panels using ggbio, users are required to set the position and size of each data panel by themselves. Developed using R base graphics, karyoploteR is a versatile R package that can also create non-circular genome plots [7]. Typically, an ideogram is first created by karyoploteR, and other datasets are then added sequentially to create different plots, which can be displayed in either the same or different panels. The regions of different panels are defined by r0 and r1 parameters, which are inspired by the min and max radius parameters used to define different data tracks in Circos plots. Chromosomes are restricted to be aligned along the horizontal axis by karyoploteR, in spite of the frequent requirement to align all chromosomes along the vertical axis for the visualization of genomic data. Comparison of data across two genomes using a plot with one genome aligned along the horizontal axis and the other aligned along the vertical axis is widely used to demonstrate the regulation of gene expression by expression quantitative trait loci (eQTL), the interactions between different genomic regions identified by Hi-C sequencing, and the synteny between different genome assemblies [9], [10], [11]. However, none of the tools mentioned above can create two-genome plots. Moreover, all these tools are difficult to use for users without coding experience, since they all require users to write their own code. Although commonly used as graphical interfaces to create non-circular plots, the Integrative Genomics Viewer (IGV) and the University of California at Santa Cruz (UCSC) Genome Browser are mainly used to visualize genomic datasets only in specific genomic regions [12], [13]. Here, we present shinyChromosome, a new R/Shiny application with a graphical user interface (GUI) designed to facilitate the interactive creation of non-circular whole genome plots of any species. Users can also make use of the diverse widgets in shinyChromosome to customize the appearance of output plots.

Method

R is a widely used programming language for biological data analysis, graphic representation, statistics, and data reporting (https://www.R-project.org/) [14]. shinyChromosome is written completely in R, so R users can modify or extend its code to fit their own need. The shinyChromosome application consists of two functional parts, ui.R and server.R. The former (ui.R) defines the interface of shinyChromosome, the widgets to accept input data, and options from the user. Subsequently, the latter (server.R) creates the plots based on the input data and options. ggplot2, a major graphics representation package in R, is used in shinyChromosome to produce non-circular whole genome plots [15]. Typical input data to create a non-circular whole genome plot contain values across many genomic regions or genomic positions within the same genome. The input data can be represented graphically in different formats, including scatter plot, line plot, bar chart, heatmap, and many others. These plots can be easily created using ggplot2 and combined to produce compound plots. The Shiny package is used to build the graphical interface of shinyChromosome. The shinyChromosome application contains five main menus (Figure 1). The “Single-genome plot” and “Two-genome plot” menus are the two main functionalities of shinyChromosome and are responsible for producing the non-circular whole genome plots. The “Gallery” menu displays 65 example figures that can be generated using shinyChromosome. The “Help” menu provides instructions for the installation and usage of shinyChromosome, as well as input data formatting requirements and a comprehensive user manual for shinyChromosome. The “About” menu provides a brief introduction to shinyChromosome and a list of the R packages used by shinyChromosome.
Figure 1

Overview of shinyChromosome and a single-genome plot created with shinyChromosome

A. The main menus of shinyChromosome. B. The control panel of shinyChromosome. C. Diverse widgets to customize the appearance of generated plots. D. Options provided to customize the overall appearance of plots. E. Ten example datasets (Data 1–10) are distributed into six tracks to create an example plot.

Overview of shinyChromosome and a single-genome plot created with shinyChromosome A. The main menus of shinyChromosome. B. The control panel of shinyChromosome. C. Diverse widgets to customize the appearance of generated plots. D. Options provided to customize the overall appearance of plots. E. Ten example datasets (Data 1–10) are distributed into six tracks to create an example plot.

Results

shinyChromosome was developed using ggplot2, which is a modern data visualization package based on the grammar of graphics in R [15]. The GUI of shinyChromosome was designed using Shiny, which is an R package for building interactive web applications using pure R code. shinyChromosome can create single-genome plots by aligning genome data along all chromosomes of a single genome and can create two-genome plots to compare data from two genomes (Figure 1). For plots aligned along a single genome, a dataset with two columns, representing the IDs and lengths of all chromosomes, respectively, separated by commas, tabs, or other delimiters, is required to define the frame of the plot (Figure 1). Then, 1–10 non-overlapping tracks can be created and aligned along all chromosomes. As many as 10 datasets can be then uploaded and distributed to one or more tracks. Based on the nature of the dataset and user-specified inputs, these tracks can then be displayed by different plots, including scatter plots, line plots, bar charts, rectangles, and heatmaps, as well as segment, text, and chromosome ideograms (Figure 1). Combinations of different types of plots can be created in the same track to produce complex linear representations of the genomic data. The required formats of input datasets to create different types of plots are described in the “Input data format” menu of the shinyChromosome application. Users can choose to arrange all chromosomes separately or to concatenate all chromosomes in the sequential order and align all chromosomes along the horizontal or vertical axis. Widgets are provided to tune the height of each track and the distances between different tracks. For two-genome plots, all chromosomes of one genome are concatenated in the sequential order and aligned to the horizontal axis while all chromosomes of the other genome are concatenated in the sequential order and aligned to the vertical axis (Figure 2). Two datasets are required to define the two genomes aligned to the horizontal and vertical axes separately. Both datasets should be formatted in the same way as the dataset used to define the frame of a single-genome plot, including two columns with one for the IDs and the other for the lengths of all chromosomes. Another dataset can then be uploaded to create specific plots to demonstrate the synteny between two genomes or the interactions between different genomic regions of the two genomes. Each row of the dataset defines the positions of the two genomes—i.e., the position of one genome aligned along the horizontal axis and the position of the other aligned along the vertical axis. Previously, we identified 70,858 quantitative trait loci (QTL) that regulated the expression of 66,649 small RNAs in an F2 population of rice [9]. Using this dataset (https://doi.org/10.5061/dryad.9d030), we employed shinyChromosome to produce a scatter plot to demonstrate the regulation of the expression of this set of small RNAs by the list of QTL (Figure 2). Concatenation of all chromosomes of each genome, adjustment of chromosome positions of all genomes, coloration of all points, and addition of chromosome labels along both axes were accomplished by shinyChromosome automatically.
Figure 2

A two-genome plot created using shinyChromosome

The X axis shows the physical position of sQTL along 12 chromosomes of the rice genome. The Y axis shows the physical position of sRNAs along 12 chromosomes of the rice genome. Different chromosomes are separated by vertical and horizontal black lines. Point color represents the LOD values of the QTL. QTL, quantitative trait loci; sRNA, small RNA; sQTL, QTL regulating the expression of sRNA; LOD, logarithm of odds ratio.

A two-genome plot created using shinyChromosome The X axis shows the physical position of sQTL along 12 chromosomes of the rice genome. The Y axis shows the physical position of sRNAs along 12 chromosomes of the rice genome. Different chromosomes are separated by vertical and horizontal black lines. Point color represents the LOD values of the QTL. QTL, quantitative trait loci; sRNA, small RNA; sQTL, QTL regulating the expression of sRNA; LOD, logarithm of odds ratio. Diverse widgets can be used to customize the appearance of the generated plots according to main plot color and color transparency, point symbol and size, width and type of different lines, shading colors used to fill the areas under lines, as well as border colors of bars, rectangles, and heatmap, etc. The titles and tick labels of both axes can also easily be edited by users. In addition, a legend could be added on the right or at the bottom of the plot generated for each dataset. The height and width of the created plot could also be modified easily. In addition, 18 different themes are provided to annotate the generated plots. A theme is a set of predefined figure options that allows changing the overall appearance of a plot with a single command. Moreover, R scripts to reproduce plots created by shinyChromosome are provided to users for additional modifications, which can also be integrated with other scripts for further downstream analysis.

Discussion

shinyChromosome is a user-friendly GUI for users with limited programming experience to interactively create non-circular plots of whole genomes. The design philosophy of shinyChromosome is similar to that of karyoploteR. All chromosomes are aligned along an axis to which other datasets are added. karyoploteR is implemented using the R base graphics system, whereas shinyChromosome is implemented in R using the ggplot2 system. Compared to karyoploteR, shinyChromosome permits the creation of the two-genome plots as shown in Figure 2. No more than 10 datasets can be input into shinyChromosome, which is the major limitation of shinyChromosome at present. Nevertheless, we believe that 10 input datasets are adequate to create a non-circular plot for most of current studies. Moreover, karyoploteR is prepared as an R package while shinyChromosome is provided as a GUI. As a result, karyoploteR is intended for users with significant R coding experience, while shinyChromosome caters for users without any coding experience. To further extend the application of shinyChromosome, we built an R package named shinyChromosomeR (https://github.com/venyao/shinyChromosomeR), utilizing the core scripts of shinyChromosome. Users with significant R coding experience can choose to use the shinyChromosomeR package to create non-circular whole genome diagrams with more than 10 input datasets. Sixty-five example figures generated by shinyChromosome are provided in the “Gallery” menu. These figures demonstrate the functionalities and range of usage of shinyChromosome. The input data files used to create each example figure are provided with proper file names indicating the track index and the plot type of each input file. shinyChromosome could be used to rapidly create non-circular whole genome diagrams from scratch with default parameters and randomly assigned colors. Moreover, with the various widgets provided, publication-quality figures can be readily created by shinyChromosome. shinyChromosome can be used online at http://150.109.59.144:3838/shinyChromosome/, http://shinychromosome.ncpgr.cn/, and https://yimingyu.shinyapps.io/shinyChromosome/ without installation. Users can also install and run shinyChromosome on their own computers without uploading data to online servers. Advanced users can also deploy shinyChromosome on local or public web servers to provide online use to other users.

Availability

The source code of shinyChromosome and example datasets are available at https://github.com/venyao/shinyChromosome. The dataset used to create Figure 2 was from Supplementary file 7 of our previous study [9] and is available in Dryad at https://doi.org/10.5061/dryad.9d030.

Authors’ contributions

WY conceived the project. YY and WY developed the software with the help form YW and FH. WY wrote the manuscript with the contributions of YY, YW, and FH. All authors read and approved the final manuscript.

Competing interests

The authors declare that they have no competing interests.
  12 in total

1.  The human genome browser at UCSC.

Authors:  W James Kent; Charles W Sugnet; Terrence S Furey; Krishna M Roskin; Tom H Pringle; Alan M Zahler; David Haussler
Journal:  Genome Res       Date:  2002-06       Impact factor: 9.043

2.  BioCircos.js: an interactive Circos JavaScript library for biological data visualization on web applications.

Authors:  Ya Cui; Xiaowei Chen; Huaxia Luo; Zhen Fan; Jianjun Luo; Shunmin He; Haiyan Yue; Peng Zhang; Runsheng Chen
Journal:  Bioinformatics       Date:  2016-01-27       Impact factor: 6.937

3.  Circos: an information aesthetic for comparative genomics.

Authors:  Martin Krzywinski; Jacqueline Schein; Inanç Birol; Joseph Connors; Randy Gascoyne; Doug Horsman; Steven J Jones; Marco A Marra
Journal:  Genome Res       Date:  2009-06-18       Impact factor: 9.043

4.  circlize Implements and enhances circular visualization in R.

Authors:  Zuguang Gu; Lei Gu; Roland Eils; Matthias Schlesner; Benedikt Brors
Journal:  Bioinformatics       Date:  2014-06-14       Impact factor: 6.937

5.  shinyCircos: an R/Shiny application for interactive creation of Circos plot.

Authors:  Yiming Yu; Yidan Ouyang; Wen Yao
Journal:  Bioinformatics       Date:  2018-04-01       Impact factor: 6.937

6.  Extensive sequence divergence between the reference genomes of two elite indica rice varieties Zhenshan 97 and Minghui 63.

Authors:  Jianwei Zhang; Ling-Ling Chen; Feng Xing; David A Kudrna; Wen Yao; Dario Copetti; Ting Mu; Weiming Li; Jia-Ming Song; Weibo Xie; Seunghee Lee; Jayson Talag; Lin Shao; Yue An; Chun-Liu Zhang; Yidan Ouyang; Shuai Sun; Wen-Biao Jiao; Fang Lv; Bogu Du; Meizhong Luo; Carlos Ernesto Maldonado; Jose Luis Goicoechea; Lizhong Xiong; Changyin Wu; Yongzhong Xing; Dao-Xiu Zhou; Sibin Yu; Yu Zhao; Gongwei Wang; Yeisoo Yu; Yijie Luo; Zhi-Wei Zhou; Beatriz Elena Padilla Hurtado; Ann Danowitz; Rod A Wing; Qifa Zhang
Journal:  Proc Natl Acad Sci U S A       Date:  2016-08-17       Impact factor: 11.205

7.  Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration.

Authors:  Helga Thorvaldsdóttir; James T Robinson; Jill P Mesirov
Journal:  Brief Bioinform       Date:  2012-04-19       Impact factor: 11.622

8.  Genetic basis of sRNA quantitative variation analyzed using an experimental population derived from an elite rice hybrid.

Authors:  Jia Wang; Wen Yao; Dan Zhu; Weibo Xie; Qifa Zhang
Journal:  Elife       Date:  2015-03-30       Impact factor: 8.140

Review 9.  Software tools for visualizing Hi-C data.

Authors:  Galip Gürkan Yardımcı; William Stafford Noble
Journal:  Genome Biol       Date:  2017-02-03       Impact factor: 13.583

10.  karyoploteR: an R/Bioconductor package to plot customizable genomes displaying arbitrary data.

Authors:  Bernat Gel; Eduard Serra
Journal:  Bioinformatics       Date:  2017-10-01       Impact factor: 6.937

View more
  4 in total

1.  New Data and New Features of the FunRiceGenes (Functionally Characterized Rice Genes) Database: 2021 Update.

Authors:  Fangfang Huang; Yingru Jiang; Tiantian Chen; Haoran Li; Mengjia Fu; Yazhou Wang; Yufang Xu; Yang Li; Zhengfu Zhou; Lihua Jia; Yidan Ouyang; Wen Yao
Journal:  Rice (N Y)       Date:  2022-04-19       Impact factor: 5.638

2.  Detection of loci exhibiting pleiotropic effects on body weight and egg number in female broilers.

Authors:  Eirini Tarsani; Andreas Kranis; Gerasimos Maniatis; Ariadne L Hager-Theodorides; Antonios Kominakis
Journal:  Sci Rep       Date:  2021-04-02       Impact factor: 4.379

3.  LIRBase: a comprehensive database of long inverted repeats in eukaryotic genomes.

Authors:  Lihua Jia; Yang Li; Fangfang Huang; Yingru Jiang; Haoran Li; Zhizhan Wang; Tiantian Chen; Jiaming Li; Zhang Zhang; Wen Yao
Journal:  Nucleic Acids Res       Date:  2022-01-07       Impact factor: 16.971

4.  Haploidy in somatic cells is induced by mature oocytes in mice.

Authors:  Yeonmi Lee; Aysha Trout; Nuria Marti-Gutierrez; Seoon Kang; Philip Xie; Aleksei Mikhalchenko; Bitnara Kim; Jiwan Choi; Seongjun So; Jongsuk Han; Jing Xu; Amy Koski; Hong Ma; Junchul David Yoon; Crystal Van Dyken; Hayley Darby; Dan Liang; Ying Li; Rebecca Tippner-Hedges; Fuhua Xu; Paula Amato; Gianpiero D Palermo; Shoukhrat Mitalipov; Eunju Kang
Journal:  Commun Biol       Date:  2022-01-25
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.