Literature DB >> 31830259

ClinVAP: a reporting strategy from variants to therapeutic options.

Bilge Sürün1, Charlotta P I Schärfe1, Mathew R Divine1, Julian Heinrich1, Nora C Toussaint2, Lukas Zimmermann3, Janina Beha4, Oliver Kohlbacher1,4.   

Abstract

MOTIVATION: Next-generation sequencing has become routine in oncology and opens up new avenues of therapies, particularly in personalized oncology setting. An increasing number of cases also implies a need for a more robust, automated and reproducible processing of long lists of variants for cancer diagnosis and therapy. While solutions for the large-scale analysis of somatic variants have been implemented, existing solutions often have issues with reproducibility, scalability and interoperability.
RESULTS: Clinical Variant Annotation Pipeline (ClinVAP) is an automated pipeline which annotates, filters and prioritizes somatic single nucleotide variants provided in variant call format. It augments the variant information with documented or predicted clinical effect. These annotated variants are prioritized based on driver gene status and druggability. ClinVAP is available as a fully containerized, self-contained pipeline maximizing reproducibility and scalability allowing the analysis of larger scale data. The resulting JSON-based report is suited for automated downstream processing, but ClinVAP can also automatically render the information into a user-defined template to yield a human-readable report.
AVAILABILITY AND IMPLEMENTATION: ClinVAP is available at https://github.com/PersonalizedOncology/ClinVAP. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2019. Published by Oxford University Press.

Entities:  

Year:  2020        PMID: 31830259      PMCID: PMC7141851          DOI: 10.1093/bioinformatics/btz924

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Understanding the genetic profile of a patient’s tumor to assess clinical actionability is a key to establish personalized targeted therapies. Large volumes of genomic data from cancer patients have become available due to the ever-decreasing costs of sequencing. Strategies need to be developed to gain insights from the data that can be used to support treatment decisions for individual patients. Identification of somatic variants that render a tumor either susceptible or resistant to treatment as well as detection of the genes driving a specific cancer can be essential for therapeutic decision making. Although there are publicly available databases to annotate genetic variants with respect to their actionability, it is time-consuming to query these resources manually. Furthermore, sending patient-related information to web services for therapeutic variant annotation undermines privacy preservation which hinders the use of services such as PharmGKB (Whirl-Carrillo ). On the other hand, local instances of clinical annotation (Wendl ) and reporting (Perera-Bel ) pipelines can be difficult to use. Common problems are complex command line interfaces, the necessity to modify the source code, the requirement of non-standard variant file formats, or the lack of structured output. Here, we introduce our Clinical Variant Annotation Pipeline (ClinVAP) which annotates somatic single nucleotide variants given as standard VCF file with their driver gene type and relevant drug information. This information is prioritized based on actionability and severity of gene disruption. To this end, we supply the user with affected driver genes and direct as well as indirect drug targets by cross-referencing the observed variants with evidence from several public repositories. The resulting report provided as JSON, Microsoft Word, or PDF file can be helpful to discussion otherwise overlooked therapeutic options.

2 Materials and methods

2.1 Data integration

A clinical annotation knowledge base implemented as MongoDB database forms the annotation source of the pipeline. It was built based on the full set of genes contained in the HGNC and UniProt databases. Information on drug targets as well as genes initiating tumorigenesis was collated within the database. The knowledge base is queried by the reporting application for each mutated gene found by Ensembl Variant Effect Predictor (VEP). Annotation of driver genes. A catalog of 1998 driver genes was assembled from the databases and from the literature (Supplementary Material SA, Section 2.1). A confidence score was calculated for each driver gene by counting the number of considered sources that include the corresponding gene, in order to present a simple assessment of its significance as a driver in the literature. Mechanistic drug-target relations. Genes were further annotated with drug target data compiled from databases, and from manually curated dataset of molecular drug targets (Supplementary Material SA, Section 2.1). Analogously to the cancer driver data, a confidence score was generated that represents the number of sources supporting the drug-gene association.

2.2 Reporting application

We devised a fully automated and containerized pipeline which takes a VCF file version 4.0+ as input that by default should contain the somatic variants of a tumor sample and creates a clinical report. The multi-step process builds on Ensembl VEP for variant annotation, queries the knowledge base and processes the resulting annotated file in an R application to render the results into a machine- and/or human-readable report. Variant Effect Prediction. The first step of the pipeline is to annotate the variants using Ensembl VEP v93 in offline mode (McLaren ). The annotations are conducted based on user provided genome assembly version, i.e. GRCh37, GRCh38. Functional effects of variants on the canonical transcripts are predicted using SIFT and Polyphen (Adzhubei ; Kumar ). Variant Annotation. In this step, the descriptive and interpretive information on variants such as genomic position, variant effect, are retrieved from the VEP-annotated VCF file by the R-based reporting application. Among the different annotation blocks of the alternatively spliced variants, the ones that are selected per variant by VEP are used in the next steps. Variants that did not pass quality control in the variant calling pipeline used to produce the input VCF as well as variants that were predicted by VEP as non-coding or low-effect were excluded. Furthermore, the variants predicted as ‘tolerated’ or ‘tolerated low confidence’ by SIFT and ‘benign’ by PolyPhen were removed. Using HGNC identifiers of the remaining variants, the knowledge base is queried to provide information about driver genes and affected drug targets. Clinical evidence summaries from the CIViC database is further incorporated to report the variants with a direct impact on actionability (Griffith ). CIViC’s scoring schema is adopted in the application to provide a quick overview over the confidence of the provided association. Report Generation. The variants that (i) occur in a known cancer driver gene, (ii) have been observed previously in the context of altered treatment response, or (iii) fall in the coding region of the mechanistic target gene of a cancer therapeutic, are distributed into four categories and saved as a JSON file (see tables in Supplementary Material SC). If desired, the JSON report can be rendered into a user-provided template (in Microsoft DOCX format) to obtain a human-readable document (Supplementary Material SB).

2.3 Deployment and benchmarking

ClinVAP is available as self-contained Docker and Singularity images (Supplementary Material SA, Section 3) (Kurtzer ; Merkel, 2014). Containerized execution of the pipeline ensures easier versioning, full reproducibility of results and convenient execution on large-scale datasets. In order to test the robustness and performance of ClinVAP, we processed 500 VCF files from 430 donors including simple somatic mutations from ICGC cancer projects (ICGC, 2010). Average runtime was on approximately 7 min per file and current hardware. The median number of driver genes per report was five, with individual donors having up to 200 driver genes. We identified therapeutic suggestions for 65.2% of the cases, where the CIViC evidence level is restricted to either A, B, or C. In an additional 28.8% of cases, predicted effects of variants were annotated, but no conclusive therapeutic option was identified. Only in 6% of the cases ClinVAP could not provide any helpful information at all.

3 Conclusion

We introduce a fully automated, fast and robust annotation pipeline designed to equip Molecular Tumor Boards with evidence-based patient reports. ClinVAP reports reveal the molecular driving forces in cancer formation along with actionable therapeutic targets from the respective tumor’s set of somatic variants. The container technologies Docker and Singularity allow for easy deployment and reproducibility. The pipeline is run locally and does not require any patient data to be analyzed by external web sites. Therefore, use of ClinVAP is conform with standard privacy and data security regulations. Click here for additional data file.
  9 in total

1.  PathScan: a tool for discerning mutational significance in groups of putative cancer genes.

Authors:  Michael C Wendl; John W Wallis; Ling Lin; Cyriac Kandoth; Elaine R Mardis; Richard K Wilson; Li Ding
Journal:  Bioinformatics       Date:  2011-04-14       Impact factor: 6.937

2.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm.

Authors:  Prateek Kumar; Steven Henikoff; Pauline C Ng
Journal:  Nat Protoc       Date:  2009-06-25       Impact factor: 13.491

Review 3.  Pharmacogenomics knowledge for personalized medicine.

Authors:  M Whirl-Carrillo; E M McDonagh; J M Hebert; L Gong; K Sangkuhl; C F Thorn; R B Altman; T E Klein
Journal:  Clin Pharmacol Ther       Date:  2012-10       Impact factor: 6.875

4.  A method and server for predicting damaging missense mutations.

Authors:  Ivan A Adzhubei; Steffen Schmidt; Leonid Peshkin; Vasily E Ramensky; Anna Gerasimova; Peer Bork; Alexey S Kondrashov; Shamil R Sunyaev
Journal:  Nat Methods       Date:  2010-04       Impact factor: 28.547

5.  International network of cancer genome projects.

Authors:  Thomas J Hudson; Warwick Anderson; Axel Artez; Anna D Barker; Cindy Bell; Rosa R Bernabé; M K Bhan; Fabien Calvo; Iiro Eerola; Daniela S Gerhard; Alan Guttmacher; Mark Guyer; Fiona M Hemsley; Jennifer L Jennings; David Kerr; Peter Klatt; Patrik Kolar; Jun Kusada; David P Lane; Frank Laplace; Lu Youyong; Gerd Nettekoven; Brad Ozenberger; Jane Peterson; T S Rao; Jacques Remacle; Alan J Schafer; Tatsuhiro Shibata; Michael R Stratton; Joseph G Vockley; Koichi Watanabe; Huanming Yang; Matthew M F Yuen; Bartha M Knoppers; Martin Bobrow; Anne Cambon-Thomsen; Lynn G Dressler; Stephanie O M Dyke; Yann Joly; Kazuto Kato; Karen L Kennedy; Pilar Nicolás; Michael J Parker; Emmanuelle Rial-Sebbag; Carlos M Romeo-Casabona; Kenna M Shaw; Susan Wallace; Georgia L Wiesner; Nikolajs Zeps; Peter Lichter; Andrew V Biankin; Christian Chabannon; Lynda Chin; Bruno Clément; Enrique de Alava; Françoise Degos; Martin L Ferguson; Peter Geary; D Neil Hayes; Thomas J Hudson; Amber L Johns; Arek Kasprzyk; Hidewaki Nakagawa; Robert Penny; Miguel A Piris; Rajiv Sarin; Aldo Scarpa; Tatsuhiro Shibata; Marc van de Vijver; P Andrew Futreal; Hiroyuki Aburatani; Mónica Bayés; David D L Botwell; Peter J Campbell; Xavier Estivill; Daniela S Gerhard; Sean M Grimmond; Ivo Gut; Martin Hirst; Carlos López-Otín; Partha Majumder; Marco Marra; John D McPherson; Hidewaki Nakagawa; Zemin Ning; Xose S Puente; Yijun Ruan; Tatsuhiro Shibata; Michael R Stratton; Hendrik G Stunnenberg; Harold Swerdlow; Victor E Velculescu; Richard K Wilson; Hong H Xue; Liu Yang; Paul T Spellman; Gary D Bader; Paul C Boutros; Peter J Campbell; Paul Flicek; Gad Getz; Roderic Guigó; Guangwu Guo; David Haussler; Simon Heath; Tim J Hubbard; Tao Jiang; Steven M Jones; Qibin Li; Nuria López-Bigas; Ruibang Luo; Lakshmi Muthuswamy; B F Francis Ouellette; John V Pearson; Xose S Puente; Victor Quesada; Benjamin J Raphael; Chris Sander; Tatsuhiro Shibata; Terence P Speed; Lincoln D Stein; Joshua M Stuart; Jon W Teague; Yasushi Totoki; Tatsuhiko Tsunoda; Alfonso Valencia; David A Wheeler; Honglong Wu; Shancen Zhao; Guangyu Zhou; Lincoln D Stein; Roderic Guigó; Tim J Hubbard; Yann Joly; Steven M Jones; Arek Kasprzyk; Mark Lathrop; Nuria López-Bigas; B F Francis Ouellette; Paul T Spellman; Jon W Teague; Gilles Thomas; Alfonso Valencia; Teruhiko Yoshida; Karen L Kennedy; Myles Axton; Stephanie O M Dyke; P Andrew Futreal; Daniela S Gerhard; Chris Gunter; Mark Guyer; Thomas J Hudson; John D McPherson; Linda J Miller; Brad Ozenberger; Kenna M Shaw; Arek Kasprzyk; Lincoln D Stein; Junjun Zhang; Syed A Haider; Jianxin Wang; Christina K Yung; Anthony Cros; Anthony Cross; Yong Liang; Saravanamuttu Gnaneshan; Jonathan Guberman; Jack Hsu; Martin Bobrow; Don R C Chalmers; Karl W Hasel; Yann Joly; Terry S H Kaan; Karen L Kennedy; Bartha M Knoppers; William W Lowrance; Tohru Masui; Pilar Nicolás; Emmanuelle Rial-Sebbag; Laura Lyman Rodriguez; Catherine Vergely; Teruhiko Yoshida; Sean M Grimmond; Andrew V Biankin; David D L Bowtell; Nicole Cloonan; Anna deFazio; James R Eshleman; Dariush Etemadmoghadam; Brooke B Gardiner; Brooke A Gardiner; James G Kench; Aldo Scarpa; Robert L Sutherland; Margaret A Tempero; Nicola J Waddell; Peter J Wilson; John D McPherson; Steve Gallinger; Ming-Sound Tsao; Patricia A Shaw; Gloria M Petersen; Debabrata Mukhopadhyay; Lynda Chin; Ronald A DePinho; Sarah Thayer; Lakshmi Muthuswamy; Kamran Shazand; Timothy Beck; Michelle Sam; Lee Timms; Vanessa Ballin; Youyong Lu; Jiafu Ji; Xiuqing Zhang; Feng Chen; Xueda Hu; Guangyu Zhou; Qi Yang; Geng Tian; Lianhai Zhang; Xiaofang Xing; Xianghong Li; Zhenggang Zhu; Yingyan Yu; Jun Yu; Huanming Yang; Mark Lathrop; Jörg Tost; Paul Brennan; Ivana Holcatova; David Zaridze; Alvis Brazma; Lars Egevard; Egor Prokhortchouk; Rosamonde Elizabeth Banks; Mathias Uhlén; Anne Cambon-Thomsen; Juris Viksna; Fredrik Ponten; Konstantin Skryabin; Michael R Stratton; P Andrew Futreal; Ewan Birney; Ake Borg; Anne-Lise Børresen-Dale; Carlos Caldas; John A Foekens; Sancha Martin; Jorge S Reis-Filho; Andrea L Richardson; Christos Sotiriou; Hendrik G Stunnenberg; Giles Thoms; Marc van de Vijver; Laura van't Veer; Fabien Calvo; Daniel Birnbaum; Hélène Blanche; Pascal Boucher; Sandrine Boyault; Christian Chabannon; Ivo Gut; Jocelyne D Masson-Jacquemier; Mark Lathrop; Iris Pauporté; Xavier Pivot; Anne Vincent-Salomon; Eric Tabone; Charles Theillet; Gilles Thomas; Jörg Tost; Isabelle Treilleux; Fabien Calvo; Paulette Bioulac-Sage; Bruno Clément; Thomas Decaens; Françoise Degos; Dominique Franco; Ivo Gut; Marta Gut; Simon Heath; Mark Lathrop; Didier Samuel; Gilles Thomas; Jessica Zucman-Rossi; Peter Lichter; Roland Eils; Benedikt Brors; Jan O Korbel; Andrey Korshunov; Pablo Landgraf; Hans Lehrach; Stefan Pfister; Bernhard Radlwimmer; Guido Reifenberger; Michael D Taylor; Christof von Kalle; Partha P Majumder; Rajiv Sarin; T S Rao; M K Bhan; Aldo Scarpa; Paolo Pederzoli; Rita A Lawlor; Massimo Delledonne; Alberto Bardelli; Andrew V Biankin; Sean M Grimmond; Thomas Gress; David Klimstra; Giuseppe Zamboni; Tatsuhiro Shibata; Yusuke Nakamura; Hidewaki Nakagawa; Jun Kusada; Tatsuhiko Tsunoda; Satoru Miyano; Hiroyuki Aburatani; Kazuto Kato; Akihiro Fujimoto; Teruhiko Yoshida; Elias Campo; Carlos López-Otín; Xavier Estivill; Roderic Guigó; Silvia de Sanjosé; Miguel A Piris; Emili Montserrat; Marcos González-Díaz; Xose S Puente; Pedro Jares; Alfonso Valencia; Heinz Himmelbauer; Heinz Himmelbaue; Victor Quesada; Silvia Bea; Michael R Stratton; P Andrew Futreal; Peter J Campbell; Anne Vincent-Salomon; Andrea L Richardson; Jorge S Reis-Filho; Marc van de Vijver; Gilles Thomas; Jocelyne D Masson-Jacquemier; Samuel Aparicio; Ake Borg; Anne-Lise Børresen-Dale; Carlos Caldas; John A Foekens; Hendrik G Stunnenberg; Laura van't Veer; Douglas F Easton; Paul T Spellman; Sancha Martin; Anna D Barker; Lynda Chin; Francis S Collins; Carolyn C Compton; Martin L Ferguson; Daniela S Gerhard; Gad Getz; Chris Gunter; Alan Guttmacher; Mark Guyer; D Neil Hayes; Eric S Lander; Brad Ozenberger; Robert Penny; Jane Peterson; Chris Sander; Kenna M Shaw; Terence P Speed; Paul T Spellman; Joseph G Vockley; David A Wheeler; Richard K Wilson; Thomas J Hudson; Lynda Chin; Bartha M Knoppers; Eric S Lander; Peter Lichter; Lincoln D Stein; Michael R Stratton; Warwick Anderson; Anna D Barker; Cindy Bell; Martin Bobrow; Wylie Burke; Francis S Collins; Carolyn C Compton; Ronald A DePinho; Douglas F Easton; P Andrew Futreal; Daniela S Gerhard; Anthony R Green; Mark Guyer; Stanley R Hamilton; Tim J Hubbard; Olli P Kallioniemi; Karen L Kennedy; Timothy J Ley; Edison T Liu; Youyong Lu; Partha Majumder; Marco Marra; Brad Ozenberger; Jane Peterson; Alan J Schafer; Paul T Spellman; Hendrik G Stunnenberg; Brandon J Wainwright; Richard K Wilson; Huanming Yang
Journal:  Nature       Date:  2010-04-15       Impact factor: 49.962

6.  Singularity: Scientific containers for mobility of compute.

Authors:  Gregory M Kurtzer; Vanessa Sochat; Michael W Bauer
Journal:  PLoS One       Date:  2017-05-11       Impact factor: 3.240

7.  CIViC is a community knowledgebase for expert crowdsourcing the clinical interpretation of variants in cancer.

Authors:  Malachi Griffith; Nicholas C Spies; Kilannin Krysiak; Joshua F McMichael; Adam C Coffman; Arpad M Danos; Benjamin J Ainscough; Cody A Ramirez; Damian T Rieke; Lynzey Kujan; Erica K Barnell; Alex H Wagner; Zachary L Skidmore; Amber Wollam; Connor J Liu; Martin R Jones; Rachel L Bilski; Robert Lesurf; Yan-Yang Feng; Nakul M Shah; Melika Bonakdar; Lee Trani; Matthew Matlock; Avinash Ramu; Katie M Campbell; Gregory C Spies; Aaron P Graubert; Karthik Gangavarapu; James M Eldred; David E Larson; Jason R Walker; Benjamin M Good; Chunlei Wu; Andrew I Su; Rodrigo Dienstmann; Adam A Margolin; David Tamborero; Nuria Lopez-Bigas; Steven J M Jones; Ron Bose; David H Spencer; Lukas D Wartman; Richard K Wilson; Elaine R Mardis; Obi L Griffith
Journal:  Nat Genet       Date:  2017-01-31       Impact factor: 38.330

8.  The Ensembl Variant Effect Predictor.

Authors:  William McLaren; Laurent Gil; Sarah E Hunt; Harpreet Singh Riat; Graham R S Ritchie; Anja Thormann; Paul Flicek; Fiona Cunningham
Journal:  Genome Biol       Date:  2016-06-06       Impact factor: 13.583

9.  From somatic variants towards precision oncology: Evidence-driven reporting of treatment options in molecular tumor boards.

Authors:  Júlia Perera-Bel; Barbara Hutter; Christoph Heining; Annalen Bleckmann; Martina Fröhlich; Stefan Fröhling; Hanno Glimm; Benedikt Brors; Tim Beißbarth
Journal:  Genome Med       Date:  2018-03-15       Impact factor: 11.117

  9 in total
  1 in total

1.  Efficient Privacy-Preserving Whole Genome Variant Queries.

Authors:  Mete Akgün; Nico Pfeifer; Oliver Kohlbacher
Journal:  Bioinformatics       Date:  2022-02-12       Impact factor: 6.937

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.