Literature DB >> 23346040

QCanvas: An Advanced Tool for Data Clustering and Visualization of Genomics Data.

Nayoung Kim1, Herin Park, Ningning He, Hyeon Young Lee, Sukjoon Yoon.   

Abstract

We developed a user-friendly, interactive program to simultaneously cluster and visualize omics data, such as DNA and protein array profiles. This program provides diverse algorithms for the hierarchical clustering of two-dimensional data. The clustering results can be interactively visualized and optimized on a heatmap. The present tool does not require any prior knowledge of scripting languages to carry out the data clustering and visualization. Furthermore, the heatmaps allow the selective display of data points satisfying user-defined criteria. For example, a clustered heatmap of experimental values can be differentially visualized based on statistical values, such as p-values. Including diverse menu-based display options, QCanvas provides a convenient graphical user interface for pattern analysis and visualization with high-quality graphics.

Entities:  

Keywords:  data clustering; genomics; heatmap visualization; microarray anlaysis; pattern recognition

Year:  2012        PMID: 23346040      PMCID: PMC3543928          DOI: 10.5808/GI.2012.10.4.263

Source DB:  PubMed          Journal:  Genomics Inform        ISSN: 1598-866X


Introduction

Genomics and proteomics data are typically analyzed by hierarchical clustering, followed by visualization with heatmaps [1-3]. Various algorithms have been implemented in the data clustering procedure [4]. The visualization of clustered data includes tree-based hierarchical clustering patterns and heatmaps of experimental values [5]. Simultaneously carrying out clustering and visualization in a single platform provides a convenient tool for choosing an appropriate clustering algorithm and finding patterns in the resulting heatmaps. Previously, bioinformaticists used programmable tools, such as R and Matlab, and commercial data-mining packages to analyze their data. A simple and integrated program will allow experimental scientists to intuitively identify meaningful patterns from a large dataset without requiring knowledge of scripting computer languages or statistical theory. Herein, we introduce a user-friendly tool, QCanvas, which integrates diverse clustering algorithms and an interactive heatmap display interface (Fig. 1). This program directly imports raw experimental data in a matrix format and displays these data in a heatmap. Various clustering methods can be applied to two-dimensional data, with the real-time generation of clustered heatmaps. Furthermore, subsets of heatmap data can be selectively displayed, based on user-defined filters. QCanvas is an easy-to-use and powerful tool for fast data analysis and interpretation by bench scientists. Without any knowledge of scripting languages and without any graphics-editing software, one can generate and customize tree-clustered heatmaps with high-quality graphics.
Fig. 1

Graphical user interface of QCanvas. Data retrieving, processing, and visualization can be carried out through interactive, user-friendly menus. QCanvas provides additional tools for the optimization of size, color, and shape of trees and heatmaps.

QCanvas: Implementation and Functions

Data clustering

QCanvas provides a total of eight popular measures for generating the similarity matrix-i.e., Correlation uncenter, Correlation center, Absolute corr-uncenter, Absolute corrcenter, Spearman rank, Kendall's tau, Euclidean distance, and City-block distance. All of these measures have typically been included among the data clustering methods of previous tools [4]. In QCanvas, the calculation of the similarity matrix is selectively applied to the data for the x-axis and the y-axis independently. Hierarchical clustering is simultaneously carried out based on the established similarity matrices. QCanvas provides diverse algorithms for hierarchical clustering, such as the average method, centroid method, single method, and complete method. QCanvas uses a standard window-based graphical user interface (GUI), providing multiple windows to comparatively visualize patterns of various combinations of similarity matrices and hierarchical clustering methods. This program provides quantitative trees for displaying clustering patterns and similarity measures together.

Heatmap optimization for pattern recognition

QCanvas software recognizes text-based data in a matrix format. For demonstration purposes, a small microarray gene expression dataset is included in the software package and can be downloaded from the website (http://compbio.sookmyung.ac.kr/~qcanvas). Once the input data are imported into the QCanvas window, a heatmap of the non-clustered data is displayed (Fig. 2A). The user can easily test various data-clustering and tree-building methods on the raw data and interactively select appropriate heatmaps with tree structures (Fig. 2B). The GUI provides various menu-based options to optimize the display of heatmaps, trees, and annotations. The colors, locations, and sizes of the trees and the annotations can be customized in a flexible manner. The scale and color scheme of the heatmaps can also be adjusted in an interactive window. The node colors can be customized for positive, negative, missing, or zero values. The color contrast between nodes can also be interactively adjusted. The overall vertical or horizontal size of a component of a figure can be customized and saved in postscript format for a high-image quality.
Fig. 2

Illustration of data clustering and filtering by QCanvas. The microarray data for 20 selected genes in five cancer samples were used for this demonstration. Red represents up-regulated gene expression, and green represents down-regulated gene expression. The scale is log2 (fold-change). (A) Input matrix data are visualized using a heatmap. (B) The clustering of both cancer samples and genes is interactively carried out using the graphical user interface. (C) The selective display of genes with high or low expression in cancer samples (2-fold changes). (D) The selective display of genes with significant changes (p < 0.01).

Data filtering for the selection of major markers

Heatmaps that are based on data clustering display the overall profiles of the experimental values for the given samples. QCanvas provides a data-filtering option to selectively display data nodes satisfying a given threshold. In the example shown in Fig. 2C, data points with a 2-fold change (increase or decrease) in gene expression are selectively displayed. In many cases, a dataset includes experimental values and statistical confidence levels together. The option for data filtering in QCanvas is useful for analyzing patterns in the experimental data that are statistically significant. One can filter the heatmap profiles using statistical confidence data that are included in a separate file. In the example shown in Fig. 2D, the gene expression data are filtered based on the p-values for the fold-change. QCanvas can import two separate files together for simultaneous data clustering and filtering. The GUI menu for data filtering enables the pattern analysis to be performed easily, without the need for manual data processing or the use of scripting languages.

Conclusion

This report introduces QCanvas, a program that provides a convenient and powerful interface for the pattern analysis of large-scale omics data. This program enables the user to conduct data clustering, data filtering, and graphics editing simultaneously on an integrated platform. These steps are typically performed on omics data, such as DNA (or protein) microarray data. All essential functionalities were integrated into the user-friendly interface of QCanvas. The simple and intuitive nature of this tool meets the practical needs of research scientists working on omics data who do not have expertise in bioinformatics approaches. The program is freely available with demo data and a step-by-step tutorial through the website (http://compbio.sookmyung.ac.kr/~qcanvas).
  5 in total

1.  Java Treeview--extensible visualization of microarray data.

Authors:  Alok J Saldanha
Journal:  Bioinformatics       Date:  2004-06-04       Impact factor: 6.937

2.  Systematic analysis of genotype-specific drug responses in cancer.

Authors:  Nayoung Kim; Ningning He; Changsik Kim; Fan Zhang; Yiling Lu; Qinghua Yu; Katherine Stemke-Hale; Joel Greshock; Richard Wooster; Sukjoon Yoon; Gordon B Mills
Journal:  Int J Cancer       Date:  2012-03-29       Impact factor: 7.396

3.  Oncogenic pathway signatures in human cancers as a guide to targeted therapies.

Authors:  Andrea H Bild; Guang Yao; Jeffrey T Chang; Quanli Wang; Anil Potti; Dawn Chasse; Mary-Beth Joshi; David Harpole; Johnathan M Lancaster; Andrew Berchuck; John A Olson; Jeffrey R Marks; Holly K Dressman; Mike West; Joseph R Nevins
Journal:  Nature       Date:  2005-11-06       Impact factor: 49.962

4.  Molecular classification of prostate cancer using curated expression signatures.

Authors:  Elke K Markert; Hideaki Mizuno; Alexei Vazquez; Arnold J Levine
Journal:  Proc Natl Acad Sci U S A       Date:  2011-11-28       Impact factor: 11.205

5.  Cluster analysis and display of genome-wide expression patterns.

Authors:  M B Eisen; P T Spellman; P O Brown; D Botstein
Journal:  Proc Natl Acad Sci U S A       Date:  1998-12-08       Impact factor: 11.205

  5 in total
  15 in total

1.  Vaginal ecosystem modeling of growth patterns of anaerobic bacteria in microaerophilic conditions.

Authors:  Audrie A Medina-Colorado; Kathleen L Vincent; Aaron L Miller; Carrie A Maxwell; Lauren N Dawson; Trevelyn Olive; Elena V Kozlova; Marc M Baum; Richard B Pyles
Journal:  Anaerobe       Date:  2017-04-26       Impact factor: 3.331

2.  Identification of glycoproteins associated with HIV latently infected cells using quantitative glycoproteomics.

Authors:  Weiming Yang; Brooks Jackson; Hui Zhang
Journal:  Proteomics       Date:  2016-06-08       Impact factor: 3.984

3.  InCHlib - interactive cluster heatmap for web applications.

Authors:  Ctibor Skuta; Petr Bartůněk; Daniel Svozil
Journal:  J Cheminform       Date:  2014-09-17       Impact factor: 5.514

4.  Glycoproteomic Approach Identifies KRAS as a Positive Regulator of CREG1 in Non-small Cell Lung Cancer Cells.

Authors:  David J Clark; Yuping Mei; Shisheng Sun; Hui Zhang; Austin J Yang; Li Mao
Journal:  Theranostics       Date:  2016-01-01       Impact factor: 11.556

5.  Involvement of hedgehog pathway in early onset, aggressive molecular subtypes and metastatic potential of breast cancer.

Authors:  Syeda Kiran Riaz; Jahangir Sarwar Khan; Syed Tahir Abbas Shah; Fen Wang; Lin Ye; Wen G Jiang; Muhammad Faraz Arshad Malik
Journal:  Cell Commun Signal       Date:  2018-01-08       Impact factor: 5.712

6.  Unboxing cluster heatmaps.

Authors:  Sophie Engle; Sean Whalen; Alark Joshi; Katherine S Pollard
Journal:  BMC Bioinformatics       Date:  2017-02-15       Impact factor: 3.169

7.  QSurface: fast identification of surface expression markers in cancers.

Authors:  Yourae Hong; Choa Park; Nayoung Kim; Juyeon Cho; Sung Ung Moon; Jongmin Kim; Euna Jeong; Sukjoon Yoon
Journal:  BMC Syst Biol       Date:  2018-03-19

8.  Somatic mutaome profile in human cancer tissues.

Authors:  Nayoung Kim; Yourae Hong; Doyoung Kwon; Sukjoon Yoon
Journal:  Genomics Inform       Date:  2013-12-31

9.  Glycoproteomic study reveals altered plasma proteins associated with HIV elite suppressors.

Authors:  Weiming Yang; Oliver Laeyendecker; Sarah K Wendel; Bai Zhang; Shisheng Sun; Jian-Ying Zhou; Minghui Ao; Richard D Moore; J Brooks Jackson; Hui Zhang
Journal:  Theranostics       Date:  2014-09-18       Impact factor: 11.556

10.  Glucose starvation induces mutation and lineage-dependent adaptive responses in a large collection of cancer cell lines.

Authors:  Ningning He; Nayoung Kim; Euna Jeong; Yiling Lu; Gordon B Mills; Sukjoon Yoon
Journal:  Int J Oncol       Date:  2015-11-11       Impact factor: 5.650

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.