UNLABELLED: TREAT (Targeted RE-sequencing Annotation Tool) is a tool for facile navigation and mining of the variants from both targeted resequencing and whole exome sequencing. It provides a rich integration of publicly available as well as in-house developed annotations and visualizations for variants, variant-hosting genes and host-gene pathways. AVAILABILITY AND IMPLEMENTATION: TREAT is freely available to non-commercial users as either a stand-alone annotation and visualization tool, or as a comprehensive workflow integrating sequencing alignment and variant calling. The executables, instructions and the Amazon Cloud Images of TREAT can be downloaded at the website: http://ndc.mayo.edu/mayo/research/biostat/stand-alone-packages.cfm.
UNLABELLED: TREAT (Targeted RE-sequencing Annotation Tool) is a tool for facile navigation and mining of the variants from both targeted resequencing and whole exome sequencing. It provides a rich integration of publicly available as well as in-house developed annotations and visualizations for variants, variant-hosting genes and host-gene pathways. AVAILABILITY AND IMPLEMENTATION: TREAT is freely available to non-commercial users as either a stand-alone annotation and visualization tool, or as a comprehensive workflow integrating sequencing alignment and variant calling. The executables, instructions and the Amazon Cloud Images of TREAT can be downloaded at the website: http://ndc.mayo.edu/mayo/research/biostat/stand-alone-packages.cfm.
Next-generation sequencing offers the promise of scientific discovery with the challenge of results interpretation (Schuster, 2008). One experiment such as exome sequencing can generate tens of thousands of single nucleotide variants (SNVs) and small insertions or deletions (INDELs), which must be elucidated in the search for disease associated mutations (Ansorge, 2009; Metzker, 2010). Whole exome sequencing is an application of NGS that has been successfully used to identify disease-associated variants in several monogenic disorders (Gilissen ; Lupski ; Ng , 2010) and complex diseases (Bonnefond ; Harbour ). While these studies demonstrated the power of NGS, they also highlighted the challenge of efficiently sifting through thousands of variants to identify a subset that is potentially clinically relevant. Bioinformatics solutions are beginning to be released that address this challenge and facilitate filtering and interpretation of human sequence variation data (Nix ; Sana ; Shetty ; Wang ). We developed TREAT to extend the functionality of these tools and directly integrated structured and sortable formats with embedded hyperlinks to sequence alignment, gene specificity and gene pathway visualizations. In addition, to enable broad accessibility, we have fully deployed TREAT to the Amazon Cloud. TREAT is optionally offered as part of a complete workflow for exome or targeted sequencing, providing users with a convenient method for integrated sequence alignment, mutation detection and results interpretation. We believe this tool offers investigators with an accessible and convenient method for annotating and visualizing sequencing data and a means of efficiently identifying variants of interest.
2 METHODS AND RESULTS
2.1 Variant annotation
TREAT provides four categories of variant annotations (Supplementary Figure S1): (i) the general variant annotations which provide the physical locations, and the dbSNP IDs and allele frequencies of known variants from HapMap and 1000 Genome Pilot Project in Caucasian (CEU), Yoruban (YRI) and East Asian (CHB/JPT) populations; (ii) sample-specific read depths supporting A, C, G, T bases at each variant position, and the quality scores for base calls and read mappings. These annotations are only available when the users choose to use TREAT for read alignment and variant calling; (iii) publically available annotations from SIFT (Kumar ) and SeattleSeq (http://gvs.gs.washington.edu/SeattleSeqAnnotation/) that include variant classifications (synonymous, missense, non-sense, frame-shift, etc.) and the predictions of the functional impact of the variants from SIFT and PolyPhen2 and (iv) in-house developed novel annotations including the tissue expression specificity measures for variant-hosting genes (detailed in Supplementary Data S2), and the identification of variants adjacent to exon–intron boundaries that potentially disrupt known splice-sites. An additional novel function of TREAT is the hyperlinks of each variant hosting gene to its associated KEGG pathway(s) (http://www.genome.jp/kegg) and Gene Ontology terms (http://www.geneontology.org/).
2.2 Reporting and visualization
TREAT automatically creates output in one easy-to-navigate HTML page, which provides the project description, QC reports, target coverage and sequencing depth information, descriptions of the annotations provided by TREAT, and links to the SNV and INDEL reports. The Microsoft Excel formatted SNV and INDEL reports provide row-based synopses of per-variant annotation. Each variant is hyperlinked to Integrative Genomics Viewer (IGV) (Robinson ) for the visualization of read alignments and variant calling information at the variant position. The functions of the variant hosting genes are illustrated via hyperlinks to the KEGG pathways and Gene Ontology terms, and the tissue expression specificity graph.
2.3 Access
TREAT is deployed in two formats, a standalone annotation application and an integrated version for an end-to-end analysis of exome or targeted sequencing data. The standalone annotation tool takes the list of called variants as input files and allows users the flexibility of generating the variants using alignment and variant calling tools of their own choosing. The integrated version accepts either FASTQ or BAM files as input files and carries out sequence alignment using BWA (Li and Durbin, 2009) or Bowie (Langmead ), local sequence re-alignment (GATK; McKenna ) and variant calling (GATK or SNVMix; Goya ), which provides users with a convenient solution to their informatics needs. Both TREAT versions can be downloaded for local runs, or can be launched on the Amazon Elastic Compute Cloud (EC2) (http://en.wikipedia.org/wiki/Amazon_Elastic_Compute_Cloud) using Amazon Machine Images provided at our Website. The Machine Images are loaded with all the open-source tools and necessary annotation files for the direct execution of TREAT. The run time and cost estimate of TREAT Cloud version are provided in the Supplementary Data.
3 DISCUSSIONS
We have developed a bioinformatics tool, TREAT, which addresses the current challenges in analyzing and interpreting targeted and whole exome sequencing data. The annotations provided by TREAT have been carefully evaluated and selected from a pool of available open source tools and databases, and complimented by additional in-house developed annotations (details at the TREAT website). The variant reports in Excel format integrate the visualizations of the sequence alignment at variant positions, pathways and expression specificity of the variant hosting genes via clickable hyperlinks for each reported INDELs and SNVs. In addition, the summary of the targeted resequencing results is stored in a centralized HTML report with links to the TREAT website, the targeted region coverage report and the read QC report, the description of the TREAT workflow, and links to the website of the annotation tools and databases.For maximum flexibility, two versions of TREAT were implemented: an annotation only version, and a version integrating read alignment, variant calling and annotations. Both versions can be downloaded as local installations or as Amazon Cloud images which makes TREAT available for users with no access to local bioinformatics infrastructures. By targeting all user groups and enabling rapid integration of emerging analytic methods, we believe that TREAT provides a sustainable NGS analytic workflow with wide applicability to the research community.We plan to continue adding new functionality and features to TREAT to make it a comprehensive tool for targeted and exome analysis. These include the development of an in-house variant database that collects all variants detected from hundreds of individuals with various types of diseases using exome and whole genome sequencing. This database will provide critical annotations whether the observed variants are truly ‘novel’ or disease specific. In addition, we are in the process of making TREAT applicable to whole genome sequencing data analysis, this would require adding annotation tracks for non-coding regions such as the conservations and regulatory domains.In summary, the rich set of annotations provided by TREAT, the easy to use, centralized HTML summary report, and the Excel-formatted variant reports with hyperlinked visualization utilities enable the filtering of detected variants based on their functional characteristics, and allow the researchers to navigate, filter and elucidate tens of thousands of variants to focus on potential disease-associated variant(s).
Authors: James R Lupski; Jeffrey G Reid; Claudia Gonzaga-Jauregui; David Rio Deiros; David C Y Chen; Lynne Nazareth; Matthew Bainbridge; Huyen Dinh; Chyn Jing; David A Wheeler; Amy L McGuire; Feng Zhang; Pawel Stankiewicz; John J Halperin; Chengyong Yang; Curtis Gehman; Danwei Guo; Rola K Irikat; Warren Tom; Nick J Fantin; Donna M Muzny; Richard A Gibbs Journal: N Engl J Med Date: 2010-03-10 Impact factor: 91.245
Authors: Rodrigo Goya; Mark G F Sun; Ryan D Morin; Gillian Leung; Gavin Ha; Kimberley C Wiegand; Janine Senz; Anamaria Crisan; Marco A Marra; Martin Hirst; David Huntsman; Kevin P Murphy; Sam Aparicio; Sohrab P Shah Journal: Bioinformatics Date: 2010-02-03 Impact factor: 6.937
Authors: Sarah B Ng; Emily H Turner; Peggy D Robertson; Steven D Flygare; Abigail W Bigham; Choli Lee; Tristan Shaffer; Michelle Wong; Arindam Bhattacharjee; Evan E Eichler; Michael Bamshad; Deborah A Nickerson; Jay Shendure Journal: Nature Date: 2009-08-16 Impact factor: 49.962
Authors: Sarah B Ng; Kati J Buckingham; Choli Lee; Abigail W Bigham; Holly K Tabor; Karin M Dent; Chad D Huff; Paul T Shannon; Ethylin Wang Jabs; Deborah A Nickerson; Jay Shendure; Michael J Bamshad Journal: Nat Genet Date: 2009-11-13 Impact factor: 38.330
Authors: Nicole J Boczek; Jabe M Best; David J Tester; John R Giudicessi; Sumit Middha; Jared M Evans; Timothy J Kamp; Michael J Ackerman Journal: Circ Cardiovasc Genet Date: 2013-06
Authors: Christopher J Klein; Yanhong Wu; Peter Vogel; Hans H Goebel; Carsten Bönnemann; Kristen Zukosky; Maria-Victoria Botuyan; Xiaohui Duan; Sumit Middha; Elizabeth J Atkinson; Georges Mer; Peter J Dyck Journal: Neurology Date: 2014-02-05 Impact factor: 9.910
Authors: Nicole J Boczek; Katharina Hopp; Lacey Benoit; Daniel Kraft; Margot A Cousin; Patrick R Blackburn; Charles D Madsen; Gavin R Oliver; Asha A Nair; Jie Na; Diana W Bianchi; Geoffrey Beek; Peter C Harris; Pavel Pichurin; Eric W Klee Journal: Eur J Hum Genet Date: 2018-08-10 Impact factor: 4.246
Authors: Terra L Lasho; Mythri Mudireddy; Christy M Finke; Curtis A Hanson; Rhett P Ketterling; Natasha Szuber; Kebede H Begna; Mrinal M Patnaik; Naseema Gangat; Animesh Pardanani; Ayalew Tefferi Journal: Blood Adv Date: 2018-02-27
Authors: Margot A Cousin; Matthew J Smith; Ashley N Sigafoos; Jay J Jin; Marine I Murphree; Nicole J Boczek; Patrick R Blackburn; Gavin R Oliver; Ross A Aleff; Karl J Clark; Eric D Wieben; Avni Y Joshi; Pavel N Pichurin; Roshini S Abraham; Eric W Klee Journal: J Clin Immunol Date: 2018-04-18 Impact factor: 8.317
Authors: Jeremy Chien; Hugues Sicotte; Jian-Bing Fan; Sean Humphray; Julie M Cunningham; Kimberly R Kalli; Ann L Oberg; Steven N Hart; Ying Li; Jaime I Davila; Saurabh Baheti; Chen Wang; Sabine Dietmann; Elizabeth J Atkinson; Yan W Asmann; Debra A Bell; Takayo Ota; Yaman Tarabishy; Rui Kuang; Marina Bibikova; R Keira Cheetham; Russell J Grocock; Elizabeth M Swisher; John Peden; David Bentley; Jean-Pierre A Kocher; Scott H Kaufmann; Lynn C Hartmann; Viji Shridhar; Ellen L Goode Journal: Nucleic Acids Res Date: 2015-04-27 Impact factor: 16.971