Literature DB >> 29881254

Chromatic: WebAssembly-Based Cancer Genome Viewer.

Richard Finney1, Daoud Meerzaman1.   

Abstract

Chromatic is a novel web-browser tool that enables researchers to visually inspect genomic variations identified through next-generation sequencing of cancer data sets to determine whether such calls are, in fact, valid. It is the first cancer bioinformatics tool developed using WebAssembly technology, which comprises a portable, low-level byte code format that provides for the rapid execution of programs within supported web browsers. It has been designed expressly for ease of use by scientists without extensive expertise in bioinformatics.

Entities:  

Keywords:  WebAssembly; cancer; mutation; viewer

Year:  2018        PMID: 29881254      PMCID: PMC5987889          DOI: 10.1177/1176935118771972

Source DB:  PubMed          Journal:  Cancer Inform        ISSN: 1176-9351


Introduction

High throughput next-generation sequencing (NGS) technologies have provided researchers with the ability to identify millions of genomic variations, but they are not without important weaknesses that can undermine the validity of the results obtained from any study. Despite the use of automated filtering approaches to identify erroneous variants, many remain problematic and require visual inspection, which remains the best approach to differentiate between real variants and artifacts that may include false positives and false negatives. Several visualization tools are available. However, these generally are time-consuming and difficult to use, especially for scientists lacking expertise in using bioinformatics tools. Here, we report the development and release of an innovative tool, Chromatic, that provides users with a robust capability to observe variants simply and quickly using a web browser. Chromatic is written in portable C which is compiled into JavaScript using Emscripten. Emscripten uses a C compiler which creates LLVM[1] (low-level virtual machine) byte code which is transpiled ultimately into JavaScript. No executable “EXE” file, or native binary, is created. WebAssembly,[2] a byte code that executes in supported browsers independent of browser add-ons, is a new technology that is much faster than JavaScript, although not as fast as native binary executables. Current implementations of WebAssembly cannot manipulate the browser’s Document Object Model (DOM) and rely on interfacing with JavaScript to update the screen. Code originally written for native targets can be ported to the browser without the added difficulties of adapting to different operating systems. Installation is as simple as loading a webpage. Security is the same as any other webpage within the sandboxed environment of the browser.

Features and Methods

Chromatic is designed to allow users to review variant calls including mutations and small insertions or deletions (indels). It provides access to public cancer data sets via a webserver at the National Cancer Institute (NCI). The initial release of Chromatic provides access to 4 publicly available cancer projects: various samples from the Texas Cancer Research Biobank (TCRB),[3] whole genome data for liver samples from the Korean Genome Research Foundation,[4] whole genome liver samples from Beijing Genomics Institute (BGI-Shenzhen),[5,6] and esophageal squamous cell carcinoma exome data from Fudan University.[7] The TCRB data were downloaded from their website. The other 3 data sets were downloaded from the public section of the National Center for Biotechnology’s Short Read Archive (SRA). All data were processed using novoalign (http://novocraft.com) on the computational resources of the high performance computing center and the national institutes of health (NIH HPC) Biowulf cluster (http://hpc.nih.gov). Chromatic also provide access to protected data from The Cancer Genome Atlas (TCGA) after users acquire a secure token from the NCI’s Genomics Data Commons (GDC).[7] Users can operate the program using simple menus and dashboard interfaces to view images of genomic regions and navigate through tens of thousands of cancer samples. A batch-mode feature provides for scripting for long-running operations, which creates a slideshow that can be saved to the user’s local storage. Batch-mode jobs are stored in tape archive (TAR) files and can be extracted for viewing using a web browser. The resulting slideshow HTML file and PNG image files can help in reviewing a large number of genomic locations, substantially reducing the workload involved in examining many variant calls. Chromatic source code is available on the website, https://chromatic.nci.nih.gov/chromatic.html. Chromatic uses some of the code for image processing and binary sequence alignment map (BAM) processing from the Alview BAM file viewing project.[8] The functionality for reading BAM files is a custom implementation using the BAM file specification,[9] which has been refined to reproduce the quirks of the samtools library. Open and lightweight PNG image processing, ZLIB[10] compression code, and TAR code were used from various sources and were compiled and linked as regular C files instead of linked libraries. The purposes of integrating these third-party files in line are to simplify compilation and facilitate porting the code to other systems. Most of the code created for Chromatic, including the core functionality, is available in the public domain. Chromatic uses C source code files to implement TAR archives, gzip decompression, and PNG image manipulation, all of which have permissive Massachusetts Institute of Technology (MIT)-style licenses (ie, no restrictions other than maintaining attributions exist). Developers may modify the code and use it as they wish. Chromatic communicates with 2 support programs on the host server from which it is downloaded. One program, called “slicer,” is a simple proxy that obtains a subset of the BAM file from the GDC website (https://gdc.cancer.gov) or server local storage. The second support program, called “srvdna,” is a common gateway interface (CGI) program[11] that provides selected portions of the human genome reference sequence, thereby allowing researchers to avoid spending the large amount of time it would take to download the entire sequence. The source codes of “slicer” and “srvdna” are provided as part of the base source release of Chromatic for those who want to customize and host their own instance. Limiting server operations to simple proxy duty and serving up sequence data simplify the role of the server. As the BAM file and sequence are processed within the user’s browser, the burdens of complex session management and data processing on the server are eliminated. Moving the work to the browser also significantly simplifies security on the server as it only needs to serve data and proxy requests for short read data. A disadvantage of Chromatic is that there is more network traffic generated than would be in the case of a server program generating the images and serving them up. Also, a true native binary version would run faster than a WebAssembly version.

Summary

Chromatic is an easy to use web-browser tool that provides fast, near-native central processing unit (CPU) performance. Web-Assembly bypasses the often onerous installation difficulties associated with obtaining and setting up a native program binary executable. Program invocation is reduced to opening a webpage. The program provides easy access to short read data regardless of which operating system a given user runs. Developers may reengineer and distribute Chromatic as long as they maintain the attribution notices present in portions of the source code.
  7 in total

1.  Genome-wide survey of recurrent HBV integration in hepatocellular carcinoma.

Authors:  Wing-Kin Sung; Hancheng Zheng; Shuyu Li; Ronghua Chen; Xiao Liu; Yingrui Li; Nikki P Lee; Wah H Lee; Pramila N Ariyaratne; Chandana Tennakoon; Fabianus H Mulawadi; Kwong F Wong; Angela M Liu; Ronnie T Poon; Sheung Tat Fan; Kwong L Chan; Zhuolin Gong; Yujie Hu; Zhao Lin; Guan Wang; Qinghui Zhang; Thomas D Barber; Wen-Chi Chou; Amit Aggarwal; Ke Hao; Wei Zhou; Chunsheng Zhang; James Hardwick; Carolyn Buser; Jiangchun Xu; Zhengyan Kan; Hongyue Dai; Mao Mao; Christoph Reinhard; Jun Wang; John M Luk
Journal:  Nat Genet       Date:  2012-05-27       Impact factor: 38.330

2.  The Sequence Alignment/Map format and SAMtools.

Authors:  Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal:  Bioinformatics       Date:  2009-06-08       Impact factor: 6.937

3.  Toward a Shared Vision for Cancer Genomic Data.

Authors:  Robert L Grossman; Allison P Heath; Vincent Ferretti; Harold E Varmus; Douglas R Lowy; Warren A Kibbe; Louis M Staudt
Journal:  N Engl J Med       Date:  2016-09-22       Impact factor: 91.245

4.  SLC15A2 genomic variation is associated with the extraordinary response of sorafenib treatment: whole-genome analysis in patients with hepatocellular carcinoma.

Authors:  Yeon-Su Lee; Bo Hyun Kim; Byung Chul Kim; Aesun Shin; Jin Sook Kim; Seung-Hyun Hong; Jung-Ah Hwang; Jung Ahn Lee; Seungyoon Nam; Sung Hoon Lee; Jong Bhak; Joong-Won Park
Journal:  Oncotarget       Date:  2015-06-30

5.  Comparative genomic analysis of esophageal squamous cell carcinoma between Asian and Caucasian patient populations.

Authors:  Jiaying Deng; Hu Chen; Daizhan Zhou; Junhua Zhang; Yun Chen; Qi Liu; Dashan Ai; Hanting Zhu; Li Chu; Wenjia Ren; Xiaofei Zhang; Yi Xia; Menghong Sun; Huiwen Zhang; Jun Li; Xinxin Peng; Liang Li; Leng Han; Hui Lin; Xiujun Cai; Jiaqing Xiang; Shufeng Chen; Yihua Sun; Yawei Zhang; Jie Zhang; Haiquan Chen; Shijian Zhang; Yi Zhao; Yun Liu; Han Liang; Kuaile Zhao
Journal:  Nat Commun       Date:  2017-11-16       Impact factor: 14.919

Review 6.  Alview: Portable Software for Viewing Sequence Reads in BAM Formatted Files.

Authors:  Richard P Finney; Qing-Rong Chen; Cu V Nguyen; Chih Hao Hsu; Chunhua Yan; Ying Hu; Massih Abawi; Xiaopeng Bian; Daoud M Meerzaman
Journal:  Cancer Inform       Date:  2015-09-13

7.  An open access pilot freely sharing cancer genomic data from participants in Texas.

Authors:  Lauren B Becnel; Stacey Pereira; Jennifer A Drummond; Marie-Claude Gingras; Kyle R Covington; Christie L Kovar; Harsha Vardhan Doddapaneni; Jianhong Hu; Donna Muzny; Amy L McGuire; David A Wheeler; Richard A Gibbs
Journal:  Sci Data       Date:  2016-02-16       Impact factor: 6.444

  7 in total
  1 in total

1.  Privacy-preserving local analysis of digital trace data: A proof-of-concept.

Authors:  Laura Boeschoten; Adriënne Mendrik; Emiel van der Veen; Jeroen Vloothuis; Haili Hu; Roos Voorvaart; Daniel L Oberski
Journal:  Patterns (N Y)       Date:  2022-02-08
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.