Joanna Roder1, Laura Maguire2, Robert Georgantas2, Heinrich Roder2. 1. Biodesix, Inc., 2970 Wilderness Place, Ste100, Boulder, CO, 80301, USA. joanna.roder@biodesix.com. 2. Biodesix, Inc., 2970 Wilderness Place, Ste100, Boulder, CO, 80301, USA.
Abstract
BACKGROUND: Machine learning (ML) can be an effective tool to extract information from attribute-rich molecular datasets for the generation of molecular diagnostic tests. However, the way in which the resulting scores or classifications are produced from the input data may not be transparent. Algorithmic explainability or interpretability has become a focus of ML research. Shapley values, first introduced in game theory, can provide explanations of the result generated from a specific set of input data by a complex ML algorithm. METHODS: For a multivariate molecular diagnostic test in clinical use (the VeriStrat® test), we calculate and discuss the interpretation of exact Shapley values. We also employ some standard approximation techniques for Shapley value computation (local interpretable model-agnostic explanation (LIME) and Shapley Additive Explanations (SHAP) based methods) and compare the results with exact Shapley values. RESULTS: Exact Shapley values calculated for data collected from a cohort of 256 patients showed that the relative importance of attributes for test classification varied by sample. While all eight features used in the VeriStrat® test contributed equally to classification for some samples, other samples showed more complex patterns of attribute importance for classification generation. Exact Shapley values and Shapley-based interaction metrics were able to provide interpretable classification explanations at the sample or patient level, while patient subgroups could be defined by comparing Shapley value profiles between patients. LIME and SHAP approximation approaches, even those seeking to include correlations between attributes, produced results that were quantitatively and, in some cases qualitatively, different from the exact Shapley values. CONCLUSIONS: Shapley values can be used to determine the relative importance of input attributes to the result generated by a multivariate molecular diagnostic test for an individual sample or patient. Patient subgroups defined by Shapley value profiles may motivate translational research. However, correlations inherent in molecular data and the typically small ML training sets available for molecular diagnostic test development may cause some approximation methods to produce approximate Shapley values that differ both qualitatively and quantitatively from exact Shapley values. Hence, caution is advised when using approximate methods to evaluate Shapley explanations of the results of molecular diagnostic tests.
BACKGROUND: Machine learning (ML) can be an effective tool to extract information from attribute-rich molecular datasets for the generation of molecular diagnostic tests. However, the way in which the resulting scores or classifications are produced from the input data may not be transparent. Algorithmic explainability or interpretability has become a focus of ML research. Shapley values, first introduced in game theory, can provide explanations of the result generated from a specific set of input data by a complex ML algorithm. METHODS: For a multivariate molecular diagnostic test in clinical use (the VeriStrat® test), we calculate and discuss the interpretation of exact Shapley values. We also employ some standard approximation techniques for Shapley value computation (local interpretable model-agnostic explanation (LIME) and Shapley Additive Explanations (SHAP) based methods) and compare the results with exact Shapley values. RESULTS: Exact Shapley values calculated for data collected from a cohort of 256 patients showed that the relative importance of attributes for test classification varied by sample. While all eight features used in the VeriStrat® test contributed equally to classification for some samples, other samples showed more complex patterns of attribute importance for classification generation. Exact Shapley values and Shapley-based interaction metrics were able to provide interpretable classification explanations at the sample or patient level, while patient subgroups could be defined by comparing Shapley value profiles between patients. LIME and SHAP approximation approaches, even those seeking to include correlations between attributes, produced results that were quantitatively and, in some cases qualitatively, different from the exact Shapley values. CONCLUSIONS: Shapley values can be used to determine the relative importance of input attributes to the result generated by a multivariate molecular diagnostic test for an individual sample or patient. Patient subgroups defined by Shapley value profiles may motivate translational research. However, correlations inherent in molecular data and the typically small ML training sets available for molecular diagnostic test development may cause some approximation methods to produce approximate Shapley values that differ both qualitatively and quantitatively from exact Shapley values. Hence, caution is advised when using approximate methods to evaluate Shapley explanations of the results of molecular diagnostic tests.
Authors: Ticiana A Leal; Angela C Argento; Krish Bhadra; D Kyle Hogarth; Julia Grigorieva; Rachel M Hartfield; Robert C McDonald; Philip D Bonomi Journal: Curr Med Res Opin Date: 2020-07-23 Impact factor: 2.580
Authors: Urban A Kiernan; Kemmons A Tubbs; Dobrin Nedelkov; Eric E Niederkofler; Randall W Nelson Journal: FEBS Lett Date: 2003-02-27 Impact factor: 4.124
Authors: Fumiko Taguchi; Benjamin Solomon; Vanesa Gregorc; Heinrich Roder; Robert Gray; Kazuo Kasahara; Makoto Nishio; Julie Brahmer; Anna Spreafico; Vienna Ludovini; Pierre P Massion; Rafal Dziadziuszko; Joan Schiller; Julia Grigorieva; Maxim Tsypin; Stephen W Hunsucker; Richard Caprioli; Mark W Duncan; Fred R Hirsch; Paul A Bunn; David P Carbone Journal: J Natl Cancer Inst Date: 2007-06-06 Impact factor: 13.506
Authors: Dana B Cardin; Laura Goff; Chung-I Li; Yu Shyr; Charles Winkler; Russell DeVore; Larry Schlabach; Melanie Holloway; Pam McClanahan; Krista Meyer; Julia Grigorieva; Jordan Berlin; Emily Chan Journal: Cancer Med Date: 2014-02-12 Impact factor: 4.452
Authors: Mary Jo Fidler; Cristina L Fhied; Joanna Roder; Sanjib Basu; Selina Sayidine; Ibtihaj Fughhi; Mark Pool; Marta Batus; Philip Bonomi; Jeffrey A Borgia Journal: BMC Cancer Date: 2018-03-20 Impact factor: 4.430
Authors: Sulaiman Somani; Adam J Russak; Akhil Vaid; Jessica K De Freitas; Fayzan F Chaudhry; Ishan Paranjpe; Kipp W Johnson; Samuel J Lee; Riccardo Miotto; Felix Richter; Shan Zhao; Noam D Beckmann; Nidhi Naik; Arash Kia; Prem Timsina; Anuradha Lala; Manish Paranjpe; Eddye Golden; Matteo Danieletto; Manbir Singh; Dara Meyer; Paul F O'Reilly; Laura Huckins; Patricia Kovatch; Joseph Finkelstein; Robert M Freeman; Edgar Argulian; Andrew Kasarskis; Bethany Percha; Judith A Aberg; Emilia Bagiella; Carol R Horowitz; Barbara Murphy; Eric J Nestler; Eric E Schadt; Judy H Cho; Carlos Cordon-Cardo; Valentin Fuster; Dennis S Charney; David L Reich; Erwin P Bottinger; Matthew A Levin; Jagat Narula; Zahi A Fayad; Allan C Just; Alexander W Charney; Girish N Nadkarni; Benjamin S Glicksberg Journal: J Med Internet Res Date: 2020-11-06 Impact factor: 5.428
Authors: Patricia Rich; R Brian Mitchell; Eric Schaefer; Paul R Walker; John W Dubay; Jason Boyd; David Oubre; Ray Page; Mazen Khalil; Suman Sinha; Scott Boniol; Hafez Halawani; Edgardo S Santos; Warren Brenner; James M Orsini; Emily Pauli; Jonathan Goldberg; Andrea Veatch; Mitchell Haut; Bassam Ghabach; Savita Bidyasar; Maria Quejada; Waseemullah Khan; Kan Huang; Linda Traylor; Wallace Akerley Journal: J Immunother Cancer Date: 2021-10 Impact factor: 13.751