Literature DB >> 23303510

HitWalker: variant prioritization for personalized functional cancer genomics.

Daniel Bottomly¹, Beth Wilmot, Jeffrey W Tyner, Christopher A Eide, Marc M Loriaux, Brian J Druker, Shannon K McWeeney.

Abstract

SUMMARY: Determining the functional relevance of identified sequence variants in cancer is a prerequisite to ultimately matching specific therapies with individual patients. This level of mechanistic understanding requires integration of genomic information with complementary functional analyses to identify oncogenic targets and relies on the development of computational frameworks to aid in the prioritization and visualization of these diverse data types. In response to this, we have developed HitWalker, which prioritizes patient variants relative to their weighted proximity to functional assay results in a protein-protein interaction network. It is highly extensible, allowing incorporation of diverse data types to refine prioritization. In addition to a ranked list of variants, we have also devised a simple shortest path-based approach of visualizing the results in an intuitive manner to provide biological interpretation.
AVAILABILITY AND IMPLEMENTATION: The program, documentation and example data are available as an R package from www.biodevlab.org/HitWalker.html.

Entities: Disease Gene Mutation Species

Mesh：

Year: 2013 PMID： 23303510 PMCID： PMC3570211 DOI： 10.1093/bioinformatics/btt003

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

The advent of next-generation sequencing technology such as that available from Illumina provides an unprecedented ability to interrogate individual genomes (Metzker, 2010). Accompanying technologies such as the ability to multiplex samples, as well as efficient sequence capture technologies (Ng ) further enable targeted regions of the genome to be re-sequenced at a reasonable cost per sample. In this manner, specific genes or whole exomes can be interrogated to identify variants potentially having a deleterious impact on protein coding regions. Efficacy of this approach for clinical research has been shown through the discovery of variants underlying simple Mendelian disorders, as well as more complicated variants/mutations involved in cancer and potentially complex traits (Kiezun ). Because many variants are produced for a given sample, mechanistic and population genetic assumptions are often used to reduce the set of variants (Ng ). For instance, limiting attention to non-synonymous single-nucleotide variations, as well as using the use of variant databases, such as dbSNP (Sherry ) and the 1000 genomes project (The 1000 Genomes Project Consortium, 2010), enables researchers to focus on low-frequency variants that are potentially damaging. However, even after these filters, it is relatively infrequent that a researcher is left with a manageable set of variants for biological validation. For cancer cells derived from a given patient, functional assays can be performed that allow researchers to determine the relevance of genes to cell viability, such as through the use of targeted siRNA screens (Tyner ). These gene targets can be scored using a binary or quantitative encoding that indicates outlier status relative to other samples. Similarly, screens that measure the sensitivity of a patient’s cells to a panel of small molecules can be scored relative to genes using the known gene targets of each compound (Tyner ). These functional assays by themselves provide some information to researchers on the identity of the signalling pathways that are required for cancer growth and survival. However, they do not necessarily indicate the mutated gene(s) leading to dysregulation of these signalling pathways, as the true causative variant(s) could be found in any gene with capacity to regulate the pathway. To prioritize variants detected in our cohort of leukaemia patients relative to our functional assay results, we have devised the R and SQL framework HitWalker using a protein–protein interaction (PPI) network as a backbone. In addition to a ranked list of variants, HitWalker also allows for simple visualization of the relevant subnetwork, as well as overlaying relevant meta-data.

2 DESCRIPTION OF SOFTWARE

2.1 Variant prioritization

Variants are related to genes implicated from the functional assays through a user-defined PPI network, such as STRING (Szklarczyk ), which provides known and hypothetical links between proteins that are weighted by a confidence score. Variants and the functional assay results are mapped to the proteins of the network using pre-supplied meta-data. For example, variants can be mapped to transcripts, which in turn can be mapped to proteins (as well as gene symbols). This hierarchy is managed through the specification of ‘core’ or ‘summary’ IDs to be used in the internal functions. By default, prioritization is performed using a random walk with restarts algorithm (RWR) (Köhler ; Tong ). In HitWalker, RWR provides a measure of weighted proximity between a set of proteins associated with functional assay hits and a set of proteins containing variants. Variants are prioritized based on the resulting RWR association score attributed to the protein. See the Supplementary Material for more information. We note that any number of approaches related to our RWR implementation could be easily applied as well, including corrections for degree (Erten ).

2.2 Visualization

The relevant subnetworks can be visualized by approximating the RWR algorithm using the unweighted shortest paths between the top queries and seeds. Shortest paths connecting the functional hit and variant genes are also displayed for additional context. For the shortest path calculations, the path with the highest overall path score is chosen if multiple equivalent shortest paths exist. The user can define functions mapping meta-data in the database to attributes of the network. For instance, genes can be represented using different shapes and colours. An example of such a figure is shown in Figure 1 and Supplementary Figure S1. In a patient with acute myeloid leukaemia, an S451F amino acid change in the FLT3 gene was the top ranking variant, which has been previously characterized and catalogued in COSMIC (Forbes ). Non-synonymous single-nucleotide variants in ZAK and PRKCE were the next highest-ranked variants, and all three genes are candidates based on the observed drug response.

Fig. 1.

Modified visualization output from HitWalker displaying the top three assay hits (EPHA4, JAK3 and FRK) and variants (FLT3, ZAK and PRKCE) for an acute myeloid leukaemia patient. Note that other hits are pulled out and annotated, as they are on the shortest path. Gene names are provided for each node. For nodes containing variants (blue), frequency information is reported in terms of the patient cohort counts (F), as well as the RWR rank (R). Red and green nodes indicate siRNA and gene target hits, respectively. Dotted borders indicate absence of capture probes for a given gene. Dashed borders indicate functional assay targets whose inhibition did not significantly alter cell viability. Confidence scores for the interactions between the two genes are reported near the lines connecting two given genes

3 DISCUSSION

Our software provides a flexible framework for prioritizing variants relative to functional assay outcomes and a PPI network or other association graph using an RWR algorithm. In addition, we include visualization of key subnetwork members to aid in biological interpretation. Prioritized variants can then be followed up using Sanger sequencing, as well as additional experiments to categorize their phenotypic effect. Software developers can easily modify queries and R functions for their in-house databases and functional assays. Although MySQL is our database of choice, any other R/DBI-supported database would work with minor modifications to the R code.

11 in total

1. dbSNP: the NCBI database of genetic variation.

Authors: S T Sherry; M H Ward; M Kholodov; J Baker; L Phan; E M Smigielski; K Sirotkin
Journal: Nucleic Acids Res Date: 2001-01-01 Impact factor: 16.971

2. Exome sequencing and the genetic basis of complex traits.

Authors: Adam Kiezun; Kiran Garimella; Ron Do; Nathan O Stitziel; Benjamin M Neale; Paul J McLaren; Namrata Gupta; Pamela Sklar; Patrick F Sullivan; Jennifer L Moran; Christina M Hultman; Paul Lichtenstein; Patrik Magnusson; Thomas Lehner; Yin Yao Shugart; Alkes L Price; Paul I W de Bakker; Shaun M Purcell; Shamil R Sunyaev
Journal: Nat Genet Date: 2012-05-29 Impact factor: 38.330

3. Walking the interactome for prioritization of candidate disease genes.

Authors: Sebastian Köhler; Sebastian Bauer; Denise Horn; Peter N Robinson
Journal: Am J Hum Genet Date: 2008-03-27 Impact factor: 11.025

Review 4. Sequencing technologies - the next generation.

Authors: Michael L Metzker
Journal: Nat Rev Genet Date: 2009-12-08 Impact factor: 53.242

5. A map of human genome variation from population-scale sequencing.

Authors: Gonçalo R Abecasis; David Altshuler; Adam Auton; Lisa D Brooks; Richard M Durbin; Richard A Gibbs; Matt E Hurles; Gil A McVean
Journal: Nature Date: 2010-10-28 Impact factor: 49.962

6. RNAi screen for rapid therapeutic target identification in leukemia patients.

Authors: Jeffrey W Tyner; Michael W Deininger; Marc M Loriaux; Bill H Chang; Jason R Gotlib; Stephanie G Willis; Heidi Erickson; Tibor Kovacsovics; Thomas O'Hare; Michael C Heinrich; Brian J Druker
Journal: Proc Natl Acad Sci U S A Date: 2009-05-11 Impact factor: 11.205

7. Kinase pathway dependence in primary human leukemias determined by rapid inhibitor screening.

Authors: Jeffrey W Tyner; Wayne F Yang; Armand Bankhead; Guang Fan; Luke B Fletcher; Jade Bryant; Jason M Glover; Bill H Chang; Stephen E Spurgeon; William H Fleming; Tibor Kovacsovics; Jason R Gotlib; Stephen T Oh; Michael W Deininger; Christian Michel Zwaan; Monique L Den Boer; Marry M van den Heuvel-Eibrink; Thomas O'Hare; Brian J Druker; Marc M Loriaux
Journal: Cancer Res Date: 2012-10-18 Impact factor: 12.701

8. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer.

Authors: Simon A Forbes; Nidhi Bindal; Sally Bamford; Charlotte Cole; Chai Yin Kok; David Beare; Mingming Jia; Rebecca Shepherd; Kenric Leung; Andrew Menzies; Jon W Teague; Peter J Campbell; Michael R Stratton; P Andrew Futreal
Journal: Nucleic Acids Res Date: 2010-10-15 Impact factor: 16.971

9. DADA: Degree-Aware Algorithms for Network-Based Disease Gene Prioritization.

Authors: Sinan Erten; Gurkan Bebek; Rob M Ewing; Mehmet Koyutürk
Journal: BioData Min Date: 2011-06-24 Impact factor: 2.522

10. Targeted capture and massively parallel sequencing of 12 human exomes.

Authors: Sarah B Ng; Emily H Turner; Peggy D Robertson; Steven D Flygare; Abigail W Bigham; Choli Lee; Tristan Shaffer; Michelle Wong; Arindam Bhattacharjee; Evan E Eichler; Michael Bamshad; Deborah A Nickerson; Jay Shendure
Journal: Nature Date: 2009-08-16 Impact factor: 49.962

5 in total

1. Identification and Characterization of Tyrosine Kinase Nonreceptor 2 Mutations in Leukemia through Integration of Kinase Inhibitor Screening and Genomic Analysis.

Authors: Julia E Maxson; Melissa L Abel; Jinhua Wang; Xianming Deng; Sina Reckel; Samuel B Luty; Huahang Sun; Julie Gorenstein; Seamus B Hughes; Daniel Bottomly; Beth Wilmot; Shannon K McWeeney; Jerald Radich; Oliver Hantschel; Richard E Middleton; Nathanael S Gray; Brian J Druker; Jeffrey W Tyner
Journal: Cancer Res Date: 2015-12-17 Impact factor: 12.701

Review 2. Integrating functional genomics to accelerate mechanistic personalized medicine.

Authors: Jeffrey W Tyner
Journal: Cold Spring Harb Mol Case Stud Date: 2017-03

3. Comparison of methods to identify aberrant expression patterns in individual patients: augmenting our toolkit for precision medicine.

Authors: Daniel Bottomly; Peter A Ryabinin; Jeffrey W Tyner; Bill H Chang; Marc M Loriaux; Brian J Druker; Shannon K McWeeney; Beth Wilmot
Journal: Genome Med Date: 2013-11-29 Impact factor: 11.117

4. HitWalker2: visual analytics for precision medicine and beyond.

Authors: Daniel Bottomly; Shannon K McWeeney; Beth Wilmot
Journal: Bioinformatics Date: 2015-12-26 Impact factor: 6.937

5. Novel Method Enabling the Use of Cryopreserved Primary Acute Myeloid Leukemia Cells in Functional Drug Screens.

Authors: Michelle Degnin; Anupriya Agarwal; Katherine Tarlock; Soheil Meshinchi; Brian J Druker; Cristina E Tognon
Journal: J Pediatr Hematol Oncol Date: 2017-10 Impact factor: 1.289

5 in total