Literature DB >> 23749962

TASUKE: a web-based visualization program for large-scale resequencing data.

Masahiko Kumagai¹, Jungsok Kim, Ryutaro Itoh, Takeshi Itoh.

Abstract

SUMMARY: Because an enormous amount of sequence data is being collected, a method to effectively display sequence variation information is urgently needed. tasuke is a web application that visualizes large-scale resequencing data generated by next-generation sequencing technologies and is suitable for rapid data release to the public on the web. The variation and read depths of multiple genomes, as well as annotations, can be shown simultaneously at various scales. We demonstrate the use of TASUKE by applying it to 50 rice and 100 human genome resequencing datasets.
AVAILABILITY AND IMPLEMENTATION: The tasuke program package and user manual are available from http://tasuke.dna.affrc.go.jp/. CONTACT: taitoh@affrc.go.jp.

Entities: Chemical Species

Mesh：

Year: 2013 PMID： 23749962 PMCID： PMC3702261 DOI： 10.1093/bioinformatics/btt295

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

Recent advances in next-generation sequencing (NGS) technologies have allowed the rapid production of a tremendous amount of genomic sequence data at a low cost. This has naturally led to the resequencing of hundreds or thousands of genomes, such as the 1000 human genomes (http://www.1000genomes.org/) and the 1001 genomes project in Arabidopsis (http://www.1001genomes.org/). A method for comparing dozens of genomes in an effective manner is, therefore, urgently needed. Although a few stand-alone programs for comparative genome visualization have been developed (Fiume ; Preston ; Thorvaldsdóttir ), to our knowledge, there is no web-based application that can handle dozens or more resequencing data of large genomes from higher eukaryotes. The basic requirements of a visualization program for genome-wide resequencing data are as follows. First, a large amount of data obtained from tens or hundreds of samples from a species with genomes of >100 Mb need to be displayed in a smooth manner. An overview of NGS read mapping results needs to be shown so that users can grasp read coverage of a genome at the hundred- to million-base scale at a glance. Second, the use of storage and memory resources for the data browser should be minimal and small enough to be handled by the average computer server. It is not realistic to load an enormous amount of mapped read data from individual samples one by one with a stand-alone program on PC. Therefore, a client-server system, in which an efficient program runs on the server, is preferred. Third, it should be possible to share the data with collaborators or the public. In general, data from resequencing studies that are published as figures and tables in an article are not sufficient to reproduce a study’s results, whereas raw short reads registered in sequence read archive databases and resultant single-nucleotide polymorphism (SNP) data are informative. For experimental researchers who seek polymorphisms at the genome-wide level, a browser that can effectively address hundreds of genomes and display a large number of polymorphic sites is needed. Here, we present tasuke, a web application for the visualization of large-scale resequencing data obtained from at least 100 genomes. This application allows users to rapidly release their own data on the web. Variant frequencies, read coverage and gene annotation information are shown simultaneously at various scales. tasuke uses a window analysis so that users can get a bird’s-eye view of the SNP density.

2 FUNCTIONS AND APPLICATION

For the sake of ease of use, tasuke was designed as web application implemented in HTML5. In this way, researchers can easily share data via general web browsers using a graphical interface. The input files required are as follows: a reference genome in FASTA format, Variant Call Format (VCF) files (Danecek ) and depth files created by the ‘depth’ command of SAMtools (Li ). Annotation files in General Feature Format (GFF, http://www.sanger.ac.uk/resources/software/gff/) are optional. A MySQL database is also required for the website’s backend data management. tasuke helps bioinformatics researchers of genome-wide resequencing projects to visualize a large amount of polymorphisms on multiple genomes and to release the data to the public. On the upper pane, the reference genome and annotation information are displayed (Fig. 1b and c). Users can choose a specific position by clicking on the selected chromosome or moving a slider in the upper right region. Alternatively, the top menu bar provides users with a search function to find identifiers or genomic positions (Fig. 1a). Nucleotide variations (SNPs and length polymorphisms) and depth of mapped reads are presented in the lower main pane, which can be dragged to the left or right (Fig. 1e). The depth information is important to distinguish whether the region has no SNPs or no mapped reads, which are generally omitted in VCF format. The frequency of variation occurrence and/or average depth are shown in a block that corresponds to a region scalable from 1 bp to 100 kb with colored gradations: blue for SNPs, red for insertions/deletions and gray or yellow for depth (Fig. 1f). The maximum number of blocks displayed in a window is 200, so that up to 20 Mb can be viewed. At the most precise level, individual nucleotides and translated amino acids can be shown (Fig. 1B). By clicking on a block, a window of detailed information about nucleotide variations and depth pops up (Fig. 1l). To find mutations that possibly affect phenotypes, the effect information of each variant, such as non-synonymous changes and frame shifts, which can be added to a VCF file by snpEff (Cingolani ), is shown by selecting ‘snpEFF’ in the menu bar (Fig. 1l and m). If a sample name is clicked, the reference genome is reset to the selected sample and variant frequencies are recalculated for all genomes. This reference switch function is useful to look over variations derived from different origins. From the ‘Tools’ menu (Fig. 1a), users can export a list of variant information, which is described in a tab-delimited file of a specified region of up to 200 kb. An image file of the displayed area is also downloadable.

Fig. 1.

Screenshots of tasuke. (A) A view showing variants of 500 bp/block scale. (a) Menu bar for various functions. (b) Chromosomal positions. (c) Annotation tracks. (d) Sample names and related information. (e) Main panel for variant frequencies of block regions. (f) Magnified view of blocks. Blocks without reads are yellow. (g) Indicator for variant frequency and depth. (h) Overall SNP density. (B) A view showing variants and depth of 1 bp/block scale. (i) Amino acids and nucleotides on a reference genome. (j) Variants and their effects. (k) Indicator of levels of variant effects. (l) Variants and average depth information are shown by clicking on a block. (m) Variant effects are shown by clicking on the sub-window of (l) As a demonstration, we applied tasuke to resequencing data from rice and human samples so that users can experience the functions of tasuke. First, we used resequencing data from 50 rice genomes at ∼15× coverage (Xu ), which was downloaded from the DDBJ Sequence Read Archive (Kodama ). The short-reads of rice were mapped to the reference genome (Sakai ) by BWA (Li and Durbin, 2009). Second, alignments of human genome resequencing data generated by the 1000 Genomes Project were downloaded (ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/), and 100 individuals who represent subpopulations were arbitrarily selected. Variant and read depth information were obtained from both datasets by using SAMtools. The annotations from the Rice Annotation Project Database (Sakai ) and Ensembl (Flicek ) were also stored in MySQL databases. These datasets are accessible through tasuke at http://tasuke.dna.affrc.go.jp/.

3 CONCLUSION

tasuke is designed for the visualization and rapid release of large-scale resequencing data on the web. This application allows users to see variant frequencies, read depth and annotation information in a scalable and smooth manner. We demonstrated its functionality through application to resequencing data from the rice and human genomes. This application is useful for the analysis of other genome-wide NGS data obtained from large samples. In future, to cope with growing resequencing data as well as RNA-seq and other NGS data, we will further improve tasuke.

11 in total

1. Savant: genome browser for high-throughput sequencing data.

Authors: Marc Fiume; Vanessa Williams; Andrew Brook; Michael Brudno
Journal: Bioinformatics Date: 2010-06-20 Impact factor: 6.937

2. Biological databases at DNA Data Bank of Japan in the era of next-generation sequencing technologies.

Authors: Yuichi Kodama; Eli Kaminuma; Satoshi Saruhashi; Kazuho Ikeo; Hideaki Sugawara; Yoshio Tateno; Yasukazu Nakamura
Journal: Adv Exp Med Biol Date: 2010 Impact factor: 2.622

3. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3.

Authors: Pablo Cingolani; Adrian Platts; Le Lily Wang; Melissa Coon; Tung Nguyen; Luan Wang; Susan J Land; Xiangyi Lu; Douglas M Ruden
Journal: Fly (Austin) Date: 2012 Apr-Jun Impact factor: 2.160

4. The Sequence Alignment/Map format and SAMtools.

Authors: Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal: Bioinformatics Date: 2009-06-08 Impact factor: 6.937

5. VarB: a variation browsing and analysis tool for variants derived from next-generation sequencing data.

Authors: Mark D Preston; Magnus Manske; Neil Horner; Samuel Assefa; Susana Campino; Sarah Auburn; Issaka Zongo; Jean-Bosco Ouedraogo; Francois Nosten; Tim Anderson; Taane G Clark
Journal: Bioinformatics Date: 2012-09-13 Impact factor: 6.937

6. Rice Annotation Project Database (RAP-DB): an integrative and interactive database for rice genomics.

Authors: Hiroaki Sakai; Sung Shin Lee; Tsuyoshi Tanaka; Hisataka Numa; Jungsok Kim; Yoshihiro Kawahara; Hironobu Wakimoto; Ching-chia Yang; Masao Iwamoto; Takashi Abe; Yuko Yamada; Akira Muto; Hachiro Inokuchi; Toshimichi Ikemura; Takashi Matsumoto; Takuji Sasaki; Takeshi Itoh
Journal: Plant Cell Physiol Date: 2013-01-07 Impact factor: 4.927

7. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration.

Authors: Helga Thorvaldsdóttir; James T Robinson; Jill P Mesirov
Journal: Brief Bioinform Date: 2012-04-19 Impact factor: 11.622

8. The variant call format and VCFtools.

Authors: Petr Danecek; Adam Auton; Goncalo Abecasis; Cornelis A Albers; Eric Banks; Mark A DePristo; Robert E Handsaker; Gerton Lunter; Gabor T Marth; Stephen T Sherry; Gilean McVean; Richard Durbin
Journal: Bioinformatics Date: 2011-06-07 Impact factor: 6.937

9. Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors: Heng Li; Richard Durbin
Journal: Bioinformatics Date: 2009-05-18 Impact factor: 6.937

10. Ensembl 2013.

Authors: Paul Flicek; Ikhlak Ahmed; M Ridwan Amode; Daniel Barrell; Kathryn Beal; Simon Brent; Denise Carvalho-Silva; Peter Clapham; Guy Coates; Susan Fairley; Stephen Fitzgerald; Laurent Gil; Carlos García-Girón; Leo Gordon; Thibaut Hourlier; Sarah Hunt; Thomas Juettemann; Andreas K Kähäri; Stephen Keenan; Monika Komorowska; Eugene Kulesha; Ian Longden; Thomas Maurel; William M McLaren; Matthieu Muffato; Rishi Nag; Bert Overduin; Miguel Pignatelli; Bethan Pritchard; Emily Pritchard; Harpreet Singh Riat; Graham R S Ritchie; Magali Ruffier; Michael Schuster; Daniel Sheppard; Daniel Sobral; Kieron Taylor; Anja Thormann; Stephen Trevanion; Simon White; Steven P Wilder; Bronwen L Aken; Ewan Birney; Fiona Cunningham; Ian Dunham; Jennifer Harrow; Javier Herrero; Tim J P Hubbard; Nathan Johnson; Rhoda Kinsella; Anne Parker; Giulietta Spudich; Andy Yates; Amonida Zadissa; Stephen M J Searle
Journal: Nucleic Acids Res Date: 2012-11-30 Impact factor: 16.971

17 in total

1. An Enhanced Visualization Method to Aid Behavioral Trajectory Pattern Recognition Infrastructure for Big Longitudinal Data.

Authors: Hua Fang; Zhaoyang Zhang
Journal: IEEE Trans Big Data Date: 2017-01-16

2. Investigation of the genetic diversity of a core collection of japanese rice landraces (JRC) using whole-genome sequencing.

Authors: N Tanaka; M Shenton; Y Kawahara; M Kumagai; H Sakai; H Kanamori; J Yonemaru; S Fukuoka; K Sugimoto; M Ishimoto; J Wu; K Ebana
Journal: Plant Cell Physiol Date: 2020-10-12 Impact factor: 4.927

3. Loci, genes, and mechanisms associated with tolerance to ferrous iron toxicity in rice (Oryza sativa L.).

Authors: Elsa Matthus; Lin-Bo Wu; Yoshiaki Ueda; Stefanie Höller; Mathias Becker; Michael Frei
Journal: Theor Appl Genet Date: 2015-07-08 Impact factor: 5.699

4. The personal genome browser: visualizing functions of genetic variants.

Authors: Liran Juan; Mingxiang Teng; Tianyi Zang; Yafeng Hao; Zhenxing Wang; Chengwu Yan; Yongzhuang Liu; Jie Li; Tianjiao Zhang; Yadong Wang
Journal: Nucleic Acids Res Date: 2014-05-05 Impact factor: 16.971

5. Genetic dissection of ozone tolerance in rice (Oryza sativa L.) by a genome-wide association study.

Authors: Yoshiaki Ueda; Felix Frimpong; Yitao Qi; Elsa Matthus; Linbo Wu; Stefanie Höller; Thorsten Kraska; Michael Frei
Journal: J Exp Bot Date: 2014-11-04 Impact factor: 6.992

6. Single Marker and Haplotype-Based Association Analysis of Semolina and Pasta Colour in Elite Durum Wheat Breeding Lines Using a High-Density Consensus Map.

Authors: Amidou N'Diaye; Jemanesh K Haile; Aron T Cory; Fran R Clarke; John M Clarke; Ron E Knox; Curtis J Pozniak
Journal: PLoS One Date: 2017-01-30 Impact factor: 3.240

7. Genetic factors underlying boron toxicity tolerance in rice: genome-wide association study and transcriptomic analysis.

Authors: Joao Braga de Abreu Neto; María Clara Hurtado-Perez; Monika A Wimmer; Michael Frei
Journal: J Exp Bot Date: 2017-01-01 Impact factor: 6.992

8. Increase of Fungal Pathogenicity and Role of Plant Glutamine in Nitrogen-Induced Susceptibility (NIS) To Rice Blast.

Authors: Huichuan Huang; Thuy Nguyen Thi Thu; Xiahong He; Antoine Gravot; Stéphane Bernillon; Elsa Ballini; Jean-Benoit Morel
Journal: Front Plant Sci Date: 2017-02-28 Impact factor: 5.753

9. Construction of pseudomolecule sequences of the aus rice cultivar Kasalath for comparative genomics of Asian cultivated rice.

Authors: Hiroaki Sakai; Hiroyuki Kanamori; Yuko Arai-Kichise; Mari Shibata-Hatta; Kaworu Ebana; Youko Oono; Kanako Kurita; Hiroko Fujisawa; Satoshi Katagiri; Yoshiyuki Mukai; Masao Hamada; Takeshi Itoh; Takashi Matsumoto; Yuichi Katayose; Kyo Wakasa; Masahiro Yano; Jianzhong Wu
Journal: DNA Res Date: 2014-02-26 Impact factor: 4.458

Review 10. The Nipponbare genome and the next-generation of rice genomics research in Japan.

Authors: Takashi Matsumoto; Jianzhong Wu; Takeshi Itoh; Hisataka Numa; Baltazar Antonio; Takuji Sasaki
Journal: Rice (N Y) Date: 2016-07-22 Impact factor: 4.783