Literature DB >> 22088845

TREAT: a bioinformatics tool for variant annotations and visualizations in targeted and exome sequencing data.

Yan W Asmann¹, Sumit Middha, Asif Hossain, Saurabh Baheti, Ying Li, High-Seng Chai, Zhifu Sun, Patrick H Duffy, Ahmed A Hadad, Asha Nair, Xiaoyu Liu, Yuji Zhang, Eric W Klee, Krishna R Kalari, Jean-Pierre A Kocher.

Abstract

UNLABELLED: TREAT (Targeted RE-sequencing Annotation Tool) is a tool for facile navigation and mining of the variants from both targeted resequencing and whole exome sequencing. It provides a rich integration of publicly available as well as in-house developed annotations and visualizations for variants, variant-hosting genes and host-gene pathways.
AVAILABILITY AND IMPLEMENTATION: TREAT is freely available to non-commercial users as either a stand-alone annotation and visualization tool, or as a comprehensive workflow integrating sequencing alignment and variant calling. The executables, instructions and the Amazon Cloud Images of TREAT can be downloaded at the website: http://ndc.mayo.edu/mayo/research/biostat/stand-alone-packages.cfm.

Entities: CellLine Disease Gene Species

Mesh：

Year: 2011 PMID： 22088845 PMCID： PMC3259432 DOI： 10.1093/bioinformatics/btr612

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

Next-generation sequencing offers the promise of scientific discovery with the challenge of results interpretation (Schuster, 2008). One experiment such as exome sequencing can generate tens of thousands of single nucleotide variants (SNVs) and small insertions or deletions (INDELs), which must be elucidated in the search for disease associated mutations (Ansorge, 2009; Metzker, 2010). Whole exome sequencing is an application of NGS that has been successfully used to identify disease-associated variants in several monogenic disorders (Gilissen ; Lupski ; Ng , 2010) and complex diseases (Bonnefond ; Harbour ). While these studies demonstrated the power of NGS, they also highlighted the challenge of efficiently sifting through thousands of variants to identify a subset that is potentially clinically relevant. Bioinformatics solutions are beginning to be released that address this challenge and facilitate filtering and interpretation of human sequence variation data (Nix ; Sana ; Shetty ; Wang ). We developed TREAT to extend the functionality of these tools and directly integrated structured and sortable formats with embedded hyperlinks to sequence alignment, gene specificity and gene pathway visualizations. In addition, to enable broad accessibility, we have fully deployed TREAT to the Amazon Cloud. TREAT is optionally offered as part of a complete workflow for exome or targeted sequencing, providing users with a convenient method for integrated sequence alignment, mutation detection and results interpretation. We believe this tool offers investigators with an accessible and convenient method for annotating and visualizing sequencing data and a means of efficiently identifying variants of interest.

2 METHODS AND RESULTS

2.1 Variant annotation

TREAT provides four categories of variant annotations (Supplementary Figure S1): (i) the general variant annotations which provide the physical locations, and the dbSNP IDs and allele frequencies of known variants from HapMap and 1000 Genome Pilot Project in Caucasian (CEU), Yoruban (YRI) and East Asian (CHB/JPT) populations; (ii) sample-specific read depths supporting A, C, G, T bases at each variant position, and the quality scores for base calls and read mappings. These annotations are only available when the users choose to use TREAT for read alignment and variant calling; (iii) publically available annotations from SIFT (Kumar ) and SeattleSeq (http://gvs.gs.washington.edu/SeattleSeqAnnotation/) that include variant classifications (synonymous, missense, non-sense, frame-shift, etc.) and the predictions of the functional impact of the variants from SIFT and PolyPhen2 and (iv) in-house developed novel annotations including the tissue expression specificity measures for variant-hosting genes (detailed in Supplementary Data S2), and the identification of variants adjacent to exon–intron boundaries that potentially disrupt known splice-sites. An additional novel function of TREAT is the hyperlinks of each variant hosting gene to its associated KEGG pathway(s) (http://www.genome.jp/kegg) and Gene Ontology terms (http://www.geneontology.org/).

2.2 Reporting and visualization

TREAT automatically creates output in one easy-to-navigate HTML page, which provides the project description, QC reports, target coverage and sequencing depth information, descriptions of the annotations provided by TREAT, and links to the SNV and INDEL reports. The Microsoft Excel formatted SNV and INDEL reports provide row-based synopses of per-variant annotation. Each variant is hyperlinked to Integrative Genomics Viewer (IGV) (Robinson ) for the visualization of read alignments and variant calling information at the variant position. The functions of the variant hosting genes are illustrated via hyperlinks to the KEGG pathways and Gene Ontology terms, and the tissue expression specificity graph.

2.3 Access

TREAT is deployed in two formats, a standalone annotation application and an integrated version for an end-to-end analysis of exome or targeted sequencing data. The standalone annotation tool takes the list of called variants as input files and allows users the flexibility of generating the variants using alignment and variant calling tools of their own choosing. The integrated version accepts either FASTQ or BAM files as input files and carries out sequence alignment using BWA (Li and Durbin, 2009) or Bowie (Langmead ), local sequence re-alignment (GATK; McKenna ) and variant calling (GATK or SNVMix; Goya ), which provides users with a convenient solution to their informatics needs. Both TREAT versions can be downloaded for local runs, or can be launched on the Amazon Elastic Compute Cloud (EC2) (http://en.wikipedia.org/wiki/Amazon_Elastic_Compute_Cloud) using Amazon Machine Images provided at our Website. The Machine Images are loaded with all the open-source tools and necessary annotation files for the direct execution of TREAT. The run time and cost estimate of TREAT Cloud version are provided in the Supplementary Data.

3 DISCUSSIONS

We have developed a bioinformatics tool, TREAT, which addresses the current challenges in analyzing and interpreting targeted and whole exome sequencing data. The annotations provided by TREAT have been carefully evaluated and selected from a pool of available open source tools and databases, and complimented by additional in-house developed annotations (details at the TREAT website). The variant reports in Excel format integrate the visualizations of the sequence alignment at variant positions, pathways and expression specificity of the variant hosting genes via clickable hyperlinks for each reported INDELs and SNVs. In addition, the summary of the targeted resequencing results is stored in a centralized HTML report with links to the TREAT website, the targeted region coverage report and the read QC report, the description of the TREAT workflow, and links to the website of the annotation tools and databases. For maximum flexibility, two versions of TREAT were implemented: an annotation only version, and a version integrating read alignment, variant calling and annotations. Both versions can be downloaded as local installations or as Amazon Cloud images which makes TREAT available for users with no access to local bioinformatics infrastructures. By targeting all user groups and enabling rapid integration of emerging analytic methods, we believe that TREAT provides a sustainable NGS analytic workflow with wide applicability to the research community. We plan to continue adding new functionality and features to TREAT to make it a comprehensive tool for targeted and exome analysis. These include the development of an in-house variant database that collects all variants detected from hundreds of individuals with various types of diseases using exome and whole genome sequencing. This database will provide critical annotations whether the observed variants are truly ‘novel’ or disease specific. In addition, we are in the process of making TREAT applicable to whole genome sequencing data analysis, this would require adding annotation tracks for non-coding regions such as the conservations and regulatory domains. In summary, the rich set of annotations provided by TREAT, the easy to use, centralized HTML summary report, and the Excel-formatted variant reports with hyperlinked visualization utilities enable the filtering of detected variants based on their functional characteristics, and allow the researchers to navigate, filter and elucidate tens of thousands of variants to focus on potential disease-associated variant(s).

19 in total

Review 1. Next-generation sequencing transforms today's biology.

Authors: Stephan C Schuster
Journal: Nat Methods Date: 2007-12-19 Impact factor: 28.547

2. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm.

Authors: Prateek Kumar; Steven Henikoff; Pauline C Ng
Journal: Nat Protoc Date: 2009-06-25 Impact factor: 13.491

Review 3. Sequencing technologies - the next generation.

Authors: Michael L Metzker
Journal: Nat Rev Genet Date: 2009-12-08 Impact factor: 53.242

4. Whole-genome sequencing in a patient with Charcot-Marie-Tooth neuropathy.

Authors: James R Lupski; Jeffrey G Reid; Claudia Gonzaga-Jauregui; David Rio Deiros; David C Y Chen; Lynne Nazareth; Matthew Bainbridge; Huyen Dinh; Chyn Jing; David A Wheeler; Amy L McGuire; Feng Zhang; Pawel Stankiewicz; John J Halperin; Chengyong Yang; Curtis Gehman; Danwei Guo; Rola K Irikat; Warren Tom; Nick J Fantin; Donna M Muzny; Richard A Gibbs
Journal: N Engl J Med Date: 2010-03-10 Impact factor: 91.245

5. SNVMix: predicting single nucleotide variants from next-generation sequencing of tumors.

Authors: Rodrigo Goya; Mark G F Sun; Ryan D Morin; Gillian Leung; Gavin Ha; Kimberley C Wiegand; Janine Senz; Anamaria Crisan; Marco A Marra; Martin Hirst; David Huntsman; Kevin P Murphy; Sam Aparicio; Sohrab P Shah
Journal: Bioinformatics Date: 2010-02-03 Impact factor: 6.937

6. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.

Authors: Ben Langmead; Cole Trapnell; Mihai Pop; Steven L Salzberg
Journal: Genome Biol Date: 2009-03-04 Impact factor: 13.583

7. Integrative genomics viewer.

Authors: James T Robinson; Helga Thorvaldsdóttir; Wendy Winckler; Mitchell Guttman; Eric S Lander; Gad Getz; Jill P Mesirov
Journal: Nat Biotechnol Date: 2011-01 Impact factor: 54.908

8. Targeted capture and massively parallel sequencing of 12 human exomes.

Authors: Sarah B Ng; Emily H Turner; Peggy D Robertson; Steven D Flygare; Abigail W Bigham; Choli Lee; Tristan Shaffer; Michelle Wong; Arindam Bhattacharjee; Evan E Eichler; Michael Bamshad; Deborah A Nickerson; Jay Shendure
Journal: Nature Date: 2009-08-16 Impact factor: 49.962

9. Exome sequencing identifies the cause of a mendelian disorder.

Authors: Sarah B Ng; Kati J Buckingham; Choli Lee; Abigail W Bigham; Holly K Tabor; Karin M Dent; Chad D Huff; Paul T Shannon; Ethylin Wang Jabs; Deborah A Nickerson; Jay Shendure; Michael J Bamshad
Journal: Nat Genet Date: 2009-11-13 Impact factor: 38.330

10. Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors: Heng Li; Richard Durbin
Journal: Bioinformatics Date: 2009-05-18 Impact factor: 6.937

37 in total

1. Exome sequencing and systems biology converge to identify novel mutations in the L-type calcium channel, CACNA1C, linked to autosomal dominant long QT syndrome.

Authors: Nicole J Boczek; Jabe M Best; David J Tester; John R Giudicessi; Sumit Middha; Jared M Evans; Timothy J Kamp; Michael J Ackerman
Journal: Circ Cardiovasc Genet Date: 2013-06

2. Ubiquitin ligase defect by DCAF8 mutation causes HMSN2 with giant axons.

Authors: Christopher J Klein; Yanhong Wu; Peter Vogel; Hans H Goebel; Carsten Bönnemann; Kristen Zukosky; Maria-Victoria Botuyan; Xiaohui Duan; Sumit Middha; Elizabeth J Atkinson; Georges Mer; Peter J Dyck
Journal: Neurology Date: 2014-02-05 Impact factor: 9.910

3. Characterization of three ciliopathy pedigrees expands the phenotype associated with biallelic C2CD3 variants.

Authors: Nicole J Boczek; Katharina Hopp; Lacey Benoit; Daniel Kraft; Margot A Cousin; Patrick R Blackburn; Charles D Madsen; Gavin R Oliver; Asha A Nair; Jie Na; Diana W Bianchi; Geoffrey Beek; Peter C Harris; Pavel Pichurin; Eric W Klee
Journal: Eur J Hum Genet Date: 2018-08-10 Impact factor: 4.246

4. Power Analysis for Genetic Association Test (PAGEANT) provides insights to challenges for rare variant association studies.

Authors: Andriy Derkach; Haoyu Zhang; Nilanjan Chatterjee
Journal: Bioinformatics Date: 2018-05-01 Impact factor: 6.937

5. Targeted next-generation sequencing in blast phase myeloproliferative neoplasms.

Authors: Terra L Lasho; Mythri Mudireddy; Christy M Finke; Curtis A Hanson; Rhett P Ketterling; Natasha Szuber; Kebede H Begna; Mrinal M Patnaik; Naseema Gangat; Animesh Pardanani; Ayalew Tefferi
Journal: Blood Adv Date: 2018-02-27

Review 6. Targeted capture in evolutionary and ecological genomics.

Authors: Matthew R Jones; Jeffrey M Good
Journal: Mol Ecol Date: 2015-07-30 Impact factor: 6.185

7. Utility of DNA, RNA, Protein, and Functional Approaches to Solve Cryptic Immunodeficiencies.

Authors: Margot A Cousin; Matthew J Smith; Ashley N Sigafoos; Jay J Jin; Marine I Murphree; Nicole J Boczek; Patrick R Blackburn; Gavin R Oliver; Ross A Aleff; Karl J Clark; Eric D Wieben; Avni Y Joshi; Pavel N Pichurin; Roshini S Abraham; Eric W Klee
Journal: J Clin Immunol Date: 2018-04-18 Impact factor: 8.317

8. Annotate-it: a Swiss-knife approach to annotation, analysis and interpretation of single nucleotide variation in human disease.

Authors: Alejandro Sifrim; Jeroen Kj Van Houdt; Leon-Charles Tranchevent; Beata Nowakowska; Ryo Sakai; Georgios A Pavlopoulos; Koen Devriendt; Joris R Vermeesch; Yves Moreau; Jan Aerts
Journal: Genome Med Date: 2012-09-26 Impact factor: 11.117

9. TP53 mutations, tetraploidy and homologous recombination repair defects in early stage high-grade serous ovarian cancer.

Authors: Jeremy Chien; Hugues Sicotte; Jian-Bing Fan; Sean Humphray; Julie M Cunningham; Kimberly R Kalli; Ann L Oberg; Steven N Hart; Ying Li; Jaime I Davila; Saurabh Baheti; Chen Wang; Sabine Dietmann; Elizabeth J Atkinson; Yan W Asmann; Debra A Bell; Takayo Ota; Yaman Tarabishy; Rui Kuang; Marina Bibikova; R Keira Cheetham; Russell J Grocock; Elizabeth M Swisher; John Peden; David Bentley; Jean-Pierre A Kocher; Scott H Kaufmann; Lynn C Hartmann; Viji Shridhar; Ellen L Goode
Journal: Nucleic Acids Res Date: 2015-04-27 Impact factor: 16.971

Review 10. Identifying disease mutations in genomic medicine settings: current challenges and how to accelerate progress.

Authors: Gholson J Lyon; Kai Wang
Journal: Genome Med Date: 2012-07-26 Impact factor: 11.117